Tickets :: [#1621] non-ASCII 7-bit message headers not RFC2047-encoded

6.0.0-beta6

3/26/26

Summary	non-ASCII 7-bit message headers not RFC2047-encoded
Queue	IMP
Queue Version	HEAD
Type	Bug
State	Resolved
Priority	2. Medium
Owners	slusarz (at) horde (dot) org
Requester	windhamg (at) email (dot) arizona (dot) edu
Created	3/25/05 (7671 days ago)
Due
Updated	10/6/08 (6380 days ago)
Assigned	9/30/08 (6386 days ago)
Resolved	10/6/08 (6380 days ago)
Github Issue Link
Github Pull Request
Milestone
Patch	No

10/06/2008 04:09:31 PM	Michael Slusarz	Comment #18 State ⇒ Resolved	Reply to this comment
That's a weird check, and I have no idea what it is checking for, but other software apps do the same thing so I will take it on faith that it is doing what it is supposed to. Fixed in Horde 3.3.1 and HEAD.

10/06/2008 04:08:25 PM	CVS Commit	Comment #17	Reply to this comment
Changes have been made in CVS for this ticket: http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.139.4.43&r2=1.139.4.44&ty=u

10/06/2008 04:07:25 PM	CVS Commit	Comment #16	Reply to this comment
Changes have been made in CVS for this ticket: http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.207&r2=1.208&ty=u

09/30/2008 05:07:09 AM	Chuck Hagenbuch	State ⇒ Assigned

09/30/2008 02:23:46 AM	hiromi (at) tac (dot) tsukuba (dot) ac (dot) jp	Comment #15	Reply to this comment
Last change of MIME.php has a side-effect in the case of $charset is iso-2022-jp, it encodes us-ascii string too. This is a sample (User-Agent is Internet Messaging Program (IMP) H3 (4.3)). --- Content-Type: text/plain; charset=ISO-2022-JP; DelSp="iso-2022-jp''Yes"; format="iso-2022-jp''flowed" User-Agent: =?iso-2022-jp?b?SW50ZXJuZXQg?= =?iso-2022-jp?b?TWVzc2FnaW5nIA==?= =?iso-2022-jp?b?UHJvZ3JhbSA=?= =?iso-2022-jp?b?KElNUCkg?= =?iso-2022-jp?b?SDMg?= =?iso-2022-jp?b?KDQuMyk=?= --- Please check the contents of string like this; ((stristr('iso-2022-jp', $charset) && strstr($string, "\x1b\$B"))

07/26/2008 11:43:38 PM	Michael Slusarz	Comment #14 State ⇒ Resolved	Reply to this comment
My changes seem to work well so going to mark this as resolved.

07/26/2008 11:40:01 PM	CVS Commit	Comment #13	Reply to this comment
Changes have been made in CVS for this ticket: http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.139.4.40&r2=1.139.4.41&ty=u http://cvs.horde.org/diff.php/framework/MIME/MIME/Message.php?r1=1.76.10.17&r2=1.76.10.18&ty=u

07/25/2008 10:30:03 PM	Michael Slusarz	Comment #12	Reply to this comment
This patch also: http://lists.horde.org/archives/cvs/Week-of-Mon-20080721/081444.html

07/25/2008 10:28:19 PM	CVS Commit	Comment #11	Reply to this comment
Changes have been made in CVS for this ticket: http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.200&r2=1.201&ty=u http://cvs.horde.org/diff.php/framework/MIME/MIME/Message.php?r1=1.100&r2=1.101&ty=u

07/25/2008 10:05:52 PM	windhamg (at) email (dot) arizona (dot) edu	Comment #10	Reply to this comment
Wow...3 1/2 years is a long time. :) I've moved on to a different role in our organization, and don't work with Horde/IMP any longer; also, I believe our existing Horde environment is horribly out-of-date...so I don't think we'll be able to test this patch. Thanks anyways!

07/25/2008 09:35:25 PM	Michael Slusarz	Comment #9 State ⇒ Feedback	Reply to this comment
Reviving from the dead... how does this patch look/work?

07/25/2008 09:32:52 PM	CVS Commit	Comment #8	Reply to this comment
Changes have been made in CVS for this ticket: http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.198&r2=1.199&ty=u

06/14/2005 12:24:36 AM	Michael Slusarz	State ⇒ Stalled

03/30/2005 06:53:27 PM	Michael Slusarz	Comment #7	Reply to this comment
So my reply, which will attempt to battle yours for ignorance :) I do understand that ISO-2022-JP is a 7-bit charset in that any individual byte is in the range 00-7f (hex). However, obviously, the charset uses the presence of an escape character to indicate that consecutive bytes need to be combined to properly form the character. Therefore, it is my understanding that the mb_ereg_() functions _should_ somehow be able to return a multibyte character when the non-charset preg_() functions will not. Example: String: ESCAPE_CHARACTER MB_CHAR_1 MB_CHAR_2 This string has three bytes. All three bytes are in the range 00-7f. Therefore, doing a preg_() match will result in this string appearing to be 3 7bit characters - thus, is8bit() will return false. However, to mb_ereg() this string should be interpreted as a single character, two byte string. Therefore a search for 00-7f should* fail since the character is actually something more like 2e3f (hex). Even though the underlying string is entirely 7bit, mb_ereg() should be applying the regex to the "actual" representation of the string. All of this goes to tell me that it is probably an error with the regex which is causing the multibyte character to not be recognized. I would think a regex like "/.{1}/" would match "ESCAPE_CHARACTER" for preg and "japanese character" for ereg(). However, I haven't yet figured out a way to do this in a single regex. Anyone with ereg() style regex experience that could chime in would be appreciated.

03/26/2005 12:00:53 AM	windhamg (at) email (dot) arizona (dot) edu	Comment #6	Reply to this comment
Well, I tried the '[^\x00-\x7f]' regex pattern in is8bit(), but no dice. I may be speaking ignorantly (in fact, it's very likely) but, even though we are using a multibyte-aware regex function, this character set (ISO-2022-JP) is still a 7-bit character set. How are we going to find byte values in the range [\x80-\xff] in a 7-bit-byte character set? I'm starting to think this is a lost cause...I placed some diagnostic output in the String::regexMatch function and see that, even though the $charset being passed in is "ISO-2022-JP", the resultant mb_regex_encoding() is "EUC-JP". IMHO, the root of this problem is that the MIME::encode function claims to "Encode a string containing non-ASCII characters according to RFC 2047", while it actually only encodes strings containing non-8bit characters. Since non-8bit does not always imply ASCII, we need to find a good test of "ASCII-ness". I can test for ISO-2022-JP using a regex like '\x1b[\(\$]', but it would be nicer to have a more general test (if one exists) for non-ASCII 7-bit encodings.

03/25/2005 08:26:17 PM	Michael Slusarz	Comment #5	Reply to this comment
A couple issues with your patch: 1) We shouldn't be dealing with mb_* functions in MIME - these should be exclusively in String:: or elsewhere.. 2) Any multibyte check should be done in MIME::is8bit() instead of MIME::encode() 3) The code seems to indicate that any string that is autodetected as not 'ASCII' needs to be encoded. However, what if the string is autodetected as 'UTF-8'? If the UTF-8 characters are all in the ASCII range, then no encoding is required. 4) Multibyte characters will not be returned as 7-bit ASCII text from the mb_ereg _functions. Since this function is multibyte aware, it will know to combine consecutive multibyte bytes together to form the character. I think the issue is that we are only looking for the 8-bit characters in the Regex. We are not looking for 7-bit characters or multibyte characters*. Therefore, we should probably just change the regex to search for "Not 7-bit ASCII characters" instead of searching for "8-bit characters". Could you try changing the regex in MIME::is8bit() to "[^\x00-\x7f]" and see if that fixes things?

03/25/2005 07:27:12 PM	windhamg (at) email (dot) arizona (dot) edu	Comment #4 New Attachment: MIME.php.diff	Reply to this comment
Here's a patch for framework/MIME/MIME.php that seems to have fixed the problem on my system. I'm not 100% sure that it doesn't introduce any side effects, but I tested it with several character sets, and it appears to do the "right thing".

03/25/2005 06:18:23 AM	windhamg (at) email (dot) arizona (dot) edu	Comment #3	Reply to this comment
I applied this patch (the revised version that Chuck committed a few minutes ago), but it did not fix my problem. Although ISO-2022-JP is a multibyte character set, it consists of only 7-bit bytes--so the String::regexMatch() call returns an empty array, the is8bit() check subsequenty returns FALSE, and the RFC2047 encoding is not performed.

03/25/2005 12:42:14 AM	Michael Slusarz	Assigned to Michael Slusarz

03/25/2005 12:42:00 AM	Michael Slusarz	Comment #2 State ⇒ Feedback	Reply to this comment
Does this patch fix the problem: http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.143&r2=1.144&ty=u

03/25/2005 12:16:17 AM	windhamg (at) email (dot) arizona (dot) edu	Comment #1 Priority ⇒ 2. Medium State ⇒ Unconfirmed Queue ⇒ IMP Type ⇒ Bug Summary ⇒ non-ASCII 7-bit message headers not RFC2047-encoded	Reply to this comment
When using a non-ASCII, 7bit "sending charset" (such as ISO-2022-JP) the message headers are not being properly encoded, per RFC2047. The MIME::encode() function appears to be using only the "is8bit" check in deciding to encode the text, regardless of whether or not it's ASCII. The result of this is that the resulting mail headers end up being displayed as "raw" ISO-2022-JP text, which is "gibberish" to the user.