Summary | RFC822 parsing library very inefficient |
Queue | Horde Framework Packages |
Queue Version | Git master |
Type | Enhancement |
State | Resolved |
Priority | 2. Medium |
Owners | slusarz (at) horde (dot) org |
Requester | slusarz (at) horde (dot) org |
Created | 01/31/2012 (4901 days ago) |
Due | |
Updated | 02/01/2012 (4900 days ago) |
Assigned | |
Resolved | 02/01/2012 (4900 days ago) |
Milestone | |
Patch | No |
commit 90462df31f5ec31200a1ed9144f6569e3a203d33
Author: Michael M Slusarz <slusarz@horde.org>
Date: Sun Jan 29 21:04:06 2012 -0700
[mms] Improved parser for e-mail addresses (
Request #10949).The previous parsing method involved splitting at "important" RFC 822
characters (e.g. < . :) and then brute-forcing to see if this was the
correct decision. New code goes through the string linearly, checking
the grammar against the ABNF contained in the RFC.
Performance statistics: On a message with 50 e-mail addresses,
performance was 20x faster. Within the script itself, total cumulative
time within the parseAddressList() method went from 92% -> 7%.
Real performance numbers are probably not quite this great. The new
library substnatially reduces recursion. Methods relying on recursion
are artifically slowed down by xdebug since each function call has a
more significant performance penalty than under regular PHP since
xdebug needs to record data about each call. Even factoring this in,
the new code is a substantial performance improvement - for messages
containing substantial numbers of e-mail addresses (> 30), the limiting
bottleneck was previously Rfc822 address parsing. This should no longer
be the case.
framework/Mail/lib/Horde/Mail/Rfc822.php | 1026
+++++++++-------------
framework/Mail/lib/Horde/Mail/Rfc822/Address.php | 74 ++
framework/Mail/lib/Horde/Mail/Rfc822/Group.php | 40 +
framework/Mail/package.xml | 24 +-
framework/Mail/test/Horde/Mail/ParseTest.php | 59 +-
5 files changed, 601 insertions(+), 622 deletions(-)
http://git.horde.org/horde-git/-/commit/90462df31f5ec31200a1ed9144f6569e3a203d33
commit 90462df31f5ec31200a1ed9144f6569e3a203d33
Author: Michael M Slusarz <slusarz@horde.org>
Date: Sun Jan 29 21:04:06 2012 -0700
[mms] Improved parser for e-mail addresses (
Request #10949).The previous parsing method involved splitting at "important" RFC 822
characters (e.g. < . :) and then brute-forcing to see if this was the
correct decision. New code goes through the string linearly, checking
the grammar against the ABNF contained in the RFC.
Performance statistics: On a message with 50 e-mail addresses,
performance was 20x faster. Within the script itself, total cumulative
time within the parseAddressList() method went from 92% -> 7%.
Real performance numbers are probably not quite this great. The new
library substnatially reduces recursion. Methods relying on recursion
are artifically slowed down by xdebug since each function call has a
more significant performance penalty than under regular PHP since
xdebug needs to record data about each call. Even factoring this in,
the new code is a substantial performance improvement - for messages
containing substantial numbers of e-mail addresses (> 30), the limiting
bottleneck was previously Rfc822 address parsing. This should no longer
be the case.
framework/Mail/lib/Horde/Mail/Rfc822.php | 1026
+++++++++-------------
framework/Mail/lib/Horde/Mail/Rfc822/Address.php | 74 ++
framework/Mail/lib/Horde/Mail/Rfc822/Group.php | 40 +
framework/Mail/package.xml | 24 +-
framework/Mail/test/Horde/Mail/ParseTest.php | 59 +-
5 files changed, 601 insertions(+), 622 deletions(-)
http://git.horde.org/horde-git/-/commit/90462df31f5ec31200a1ed9144f6569e3a203d33
Priority ⇒ 2. Medium
Patch ⇒ No
Milestone ⇒
Assigned to Michael Slusarz
Queue ⇒ Horde Framework Packages
Summary ⇒ RFC822 parsing library very inefficient
Type ⇒ Enhancement
State ⇒ Assigned
recipient e-mail addresses, Horde_Mail_Rfc822->parseAddressList is
taking 94.91% of cumulative run time. More spcifically, the
_hasUnclosedQuotes() component is being called 9,185 times and taking
91.88% of run time.
Fix: use a tokenizer/linear parsing approach. Probably will port
Timo's parsing approach from dovecot (no PHP solution currently
available is appropriate).
Bumping priority since this library is used on almost every page in
IMP - performance advantages here will be seen throughout IMP.