| Summary | Large IMSP books/groups slow every page request |
| Queue | Turba |
| Queue Version | HEAD |
| Type | Enhancement |
| State | Resolved |
| Priority | 1. Low |
| Owners | Michael Rubinsky <mrubinsk (at) horde (dot) org> |
| Requester | noah (at) lsit (dot) ucsb (dot) edu |
| Created | 04/01/2008 (40 days ago) |
| Due | |
| Updated | 04/11/2008 (30 days ago) |
| Assigned | 04/03/2008 (38 days ago) |
| Resolved | 04/11/2008 (30 days ago) |
| Attachments | parse-members.patch ![]() |
| Milestone | |
| Patch | 1 |
> When you say that it is "absolutely not accurate for at least two
> IMSP clients," do you mean that it produces inaccurate results, or
> that it does not produce results for those clients? We are using a
> combination of Turba and Mulberry as IMSP clients, and the improved
> regex catches a good number of groups created by Mulberry.
What I meant was that SEARCHADDRESS does not necessarily return an email address at all as your statement:
> The email string returned by SEARCHADDRESS contains the full names
> needed to construct the member list (form: "name <email@domain>").
alluded to. But, looking at your patch, I see that you probably were not referring to SEARCHADDRESS responses, but to the list of email addresses contained in the email field...in which case your statement was correct.
To answer your more general questions first: In general I would recommend separate patches for Framework packages and applications. When they are related, you can but a link to the related patch in the comments.
For your patches, specifically, please open separate enhancement requests for these patches. I do have some comments off the top though - first of all, adding new methods to Net_IMSP that Turba 2.2 relies on means we need to check for the existence of that method in Turba before it's used, and fall back to some other method if it's not available, since Turba 2.2 does not *require* Horde 3.2, only 3.x. However, I think it would be worth the changes seeing how much of an improvement you are seeing.
Glad to see someone still making use of IMSP ;)
> FWIW, this is only 100% accurate for entries added by Turba...and is
> absolutely not accurate for at least two other IMSP clients that I
> have used - although it has been some time since I've looked at it.
It seems you are right. At least part of that is my inflexible regex. Changing the regex gives some better results:
/(?:^(.*)\s*<)?(\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3})/ is better than
/(?:^(.*)\W<)?(\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3})/
It seems many clients don't include the space between the name and the <email>.
When you say that it is "absolutely not accurate for at least two IMSP clients," do you mean that it produces inaccurate results, or that it does not produce results for those clients? We are using a combination of Turba and Mulberry as IMSP clients, and the improved regex catches a good number of groups created by Mulberry.
I've seen it miss on various inputs, but I've not seen it produce actual inaccurate results.
> I fixed this by only hitting that code when we actually *want* to
> populate the __members property. This should also help in other
> places as well, like when browsing the address book.
>
> I don't currently have an IMSP server handy to test this, so please
> let me know how this works out.
Your alterations work fine. Performance was enhanced somewhat. It's definitely a good approach.
I have a couple other patches that drastically increase performance by adding a getEntries function to the framework Net/IMSP/Book.php file. It does one FETCHADDRESS command per some maximum batch size (I'm using 220). This reduced request times for browsing address books with over 600 entries from minutes (or timeout) to ~10 seconds. I had to also patch Turba's lib/Driver/imsp.php file w/ a _readBatch function that's highly similar to _read, and probably could be folded into it by someone with a tiny bit more time and turba knowledge (I changed _read to call _readBatch if more than a certain number of names are passed to it).
My questions are, is it worth submitting my patches, and what's the protocol for submitting bugs and patches against multiple projects (in this case Framework & Turba)? Would I open separate bugs, one for the IMSP lib and one for Turba? If you think they're worth submitting, should I add them to this bug, or open a fresh one?
Thanks.
State ⇒ Feedback
> The email string returned by SEARCHADDRESS contains the full names
> needed to construct the member list (form: "name <email@domain>").
FWIW, this is only 100% accurate for entries added by Turba...and is absolutely not accurate for at least two other IMSP clients that I have used - although it has been some time since I've looked at it.
> Included is a patch against the newest cvs code for lib/Driver/imsp.php.
I fixed this by only hitting that code when we actually *want* to populate the __members property. This should also help in other places as well, like when browsing the address book.
I don't currently have an IMSP server handy to test this, so please let me know how this works out.
State ⇒ Assigned
Assigned to Michael Rubinsky
New Attachment: parse-members.patch
Patch ⇒ parse-members.patch
Milestone ⇒
Queue ⇒ Turba
Summary ⇒ Large IMSP books/groups slow every page request
Type ⇒ Enhancement
Priority ⇒ 1. Low
State ⇒ New
The IMSP backend is still really slow with large address books. One of the major problems is that populating the "Add to a Contact list" menu takes a long time as it does a FETCHADDRESS for every email returned by SEARCHADDRESS (every email not in another group already that is), in order to construct the list of members.
I don't believe the list of members is even used in this context, but I could not figure out a good way to not compute the members
The email string returned by SEARCHADDRESS contains the full names needed to construct the member list (form: "name <email@domain>"). I therefore propose trying to parse the names from this string first, then falling back on an additional IMSP call for entries where a name is not provided.
Included is a patch against the newest cvs code for lib/Driver/imsp.php.
I've been running this for a few days, it doesn't solve all the performance issues, but it does tie page load times to the size of the address book being loaded. Before, loading the webmail address book would still take a long time (minutes), because the "Add to" list still had to be very slowly generated.
Please verify that my regex is correct for grabbing names and emails: "/(?:^(.*)\W<)?(\w[-._\w]*\w@\w[-._\w]*\w\.\w{2,3})/"