6.0.0-alpha14
7/2/25

[#10949] RFC822 parsing library very inefficient
Summary RFC822 parsing library very inefficient
Queue Horde Framework Packages
Queue Version Git master
Type Enhancement
State Resolved
Priority 2. Medium
Owners slusarz (at) horde (dot) org
Requester slusarz (at) horde (dot) org
Created 01/31/2012 (4901 days ago)
Due
Updated 02/01/2012 (4900 days ago)
Assigned
Resolved 02/01/2012 (4900 days ago)
Milestone
Patch No

History
02/01/2012 09:21:38 AM Git Commit Comment #3 Reply to this comment
Changes have been made in Git (refs/heads/develop):

commit 90462df31f5ec31200a1ed9144f6569e3a203d33
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Sun Jan 29 21:04:06 2012 -0700

     [mms] Improved parser for e-mail addresses (Request #10949).

     The previous parsing method involved splitting at "important" RFC 822
     characters (e.g. < . :) and then brute-forcing to see if this was the
     correct decision.  New code goes through the string linearly, checking
     the grammar against the ABNF contained in the RFC.

     Performance statistics: On a message with 50 e-mail addresses,
     performance was 20x faster.  Within the script itself, total cumulative
     time within the parseAddressList() method went from 92% -> 7%.

     Real performance numbers are probably not quite this great. The new
     library substnatially reduces recursion.  Methods relying on recursion
     are artifically slowed down by xdebug since each function call has a
     more significant performance penalty than under regular PHP since
     xdebug needs to record data about each call. Even factoring this in,
     the new code is a substantial performance improvement - for messages
     containing substantial numbers of e-mail addresses (> 30), the limiting
     bottleneck was previously Rfc822 address parsing.  This should no longer
     be the case.

  framework/Mail/lib/Horde/Mail/Rfc822.php         | 1026 
+++++++++-------------
  framework/Mail/lib/Horde/Mail/Rfc822/Address.php |   74 ++
  framework/Mail/lib/Horde/Mail/Rfc822/Group.php   |   40 +
  framework/Mail/package.xml                       |   24 +-
  framework/Mail/test/Horde/Mail/ParseTest.php     |   59 +-
  5 files changed, 601 insertions(+), 622 deletions(-)

http://git.horde.org/horde-git/-/commit/90462df31f5ec31200a1ed9144f6569e3a203d33
02/01/2012 08:09:04 AM Michael Slusarz State ⇒ Resolved
 
02/01/2012 08:08:53 AM Git Commit Comment #2 Reply to this comment
Changes have been made in Git (refs/heads/master):

commit 90462df31f5ec31200a1ed9144f6569e3a203d33
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Sun Jan 29 21:04:06 2012 -0700

     [mms] Improved parser for e-mail addresses (Request #10949).

     The previous parsing method involved splitting at "important" RFC 822
     characters (e.g. < . :) and then brute-forcing to see if this was the
     correct decision.  New code goes through the string linearly, checking
     the grammar against the ABNF contained in the RFC.

     Performance statistics: On a message with 50 e-mail addresses,
     performance was 20x faster.  Within the script itself, total cumulative
     time within the parseAddressList() method went from 92% -> 7%.

     Real performance numbers are probably not quite this great. The new
     library substnatially reduces recursion.  Methods relying on recursion
     are artifically slowed down by xdebug since each function call has a
     more significant performance penalty than under regular PHP since
     xdebug needs to record data about each call. Even factoring this in,
     the new code is a substantial performance improvement - for messages
     containing substantial numbers of e-mail addresses (> 30), the limiting
     bottleneck was previously Rfc822 address parsing.  This should no longer
     be the case.

  framework/Mail/lib/Horde/Mail/Rfc822.php         | 1026 
+++++++++-------------
  framework/Mail/lib/Horde/Mail/Rfc822/Address.php |   74 ++
  framework/Mail/lib/Horde/Mail/Rfc822/Group.php   |   40 +
  framework/Mail/package.xml                       |   24 +-
  framework/Mail/test/Horde/Mail/ParseTest.php     |   59 +-
  5 files changed, 601 insertions(+), 622 deletions(-)

http://git.horde.org/horde-git/-/commit/90462df31f5ec31200a1ed9144f6569e3a203d33
01/31/2012 06:35:46 PM Michael Slusarz Comment #1
Priority ⇒ 2. Medium
Patch ⇒ No
Milestone ⇒
Assigned to Michael Slusarz
Queue ⇒ Horde Framework Packages
Summary ⇒ RFC822 parsing library very inefficient
Type ⇒ Enhancement
State ⇒ Assigned
Reply to this comment
According to xdebug: viewing a mail message with approximately 50 
recipient e-mail addresses, Horde_Mail_Rfc822->parseAddressList is 
taking 94.91% of cumulative run time.  More spcifically, the 
_hasUnclosedQuotes() component is being called 9,185 times and taking 
91.88% of run time.

Fix: use a tokenizer/linear parsing approach.  Probably will port 
Timo's parsing approach from dovecot (no PHP solution currently 
available is appropriate).

Bumping priority since this library is used on almost every page in 
IMP - performance advantages here will be seen throughout IMP.

Saved Queries