[#1621] non-ASCII 7-bit message headers not RFC2047-encoded
Summary non-ASCII 7-bit message headers not RFC2047-encoded
Queue IMP
Queue Version HEAD
Type Bug
State Resolved
Priority 2. Medium
Owners slusarz (at) horde (dot) org
Requester windhamg (at) email (dot) arizona (dot) edu
Created 2005-03-25 (5143 days ago)
Updated 2008-10-06 (3852 days ago)
Assigned 2008-09-30 (3858 days ago)
Resolved 2008-10-06 (3852 days ago)
Patch No

2008-10-06 16:09:31 Michael Slusarz Comment #18
State ⇒ Resolved
Reply to this comment
That's a weird check, and I have no idea what it is checking for, but 
other software apps do the same thing so I will take it on faith that 
it is doing what it is supposed to.  Fixed in Horde 3.3.1 and HEAD.
2008-10-06 16:08:25 CVS Commit Comment #17 Reply to this comment
2008-10-06 16:07:25 CVS Commit Comment #16 Reply to this comment
2008-09-30 05:07:09 Chuck Hagenbuch State ⇒ Assigned
2008-09-30 02:23:46 hiromi (at) tac (dot) tsukuba (dot) ac (dot) jp Comment #15 Reply to this comment
Last change of MIME.php has a side-effect in the case of $charset is 

it encodes us-ascii string too.

This is a sample (User-Agent is Internet Messaging Program (IMP) H3 (4.3)).


Content-Type: text/plain;




User-Agent: =?iso-2022-jp?b?SW50ZXJuZXQg?=

         =?iso-2022-jp?b?TWVzc2FnaW5nIA==?= =?iso-2022-jp?b?UHJvZ3JhbSA=?=

         =?iso-2022-jp?b?KElNUCkg?= =?iso-2022-jp?b?SDMg?=



Please check the contents of string like this;

    ((stristr('iso-2022-jp', $charset) && strstr($string, "\x1b\$B"))

2008-07-26 23:43:38 Michael Slusarz Comment #14
State ⇒ Resolved
Reply to this comment
My changes seem to work well so going to mark this as resolved.
2008-07-25 22:05:52 windhamg (at) email (dot) arizona (dot) edu Comment #10 Reply to this comment
Wow...3 1/2 years is a long time. :)

I've moved on to a different role in our organization, and don't work 
with Horde/IMP any longer; also, I believe our existing Horde 
environment is horribly out-of-date...so I don't think we'll be able 
to test this patch.

Thanks anyways!
2008-07-25 21:35:25 Michael Slusarz Comment #9
State ⇒ Feedback
Reply to this comment
Reviving from the dead... how does this patch look/work?
2008-07-25 21:32:52 CVS Commit Comment #8 Reply to this comment
2005-06-14 00:24:36 Michael Slusarz State ⇒ Stalled
2005-03-30 18:53:27 Michael Slusarz Comment #7 Reply to this comment
So my reply, which will attempt to battle yours for ignorance :)

I do understand that ISO-2022-JP is a 7-bit charset in that any 
individual byte is in the range 00-7f (hex).  However, obviously, the 
charset uses the presence of an escape character to indicate that 
consecutive bytes need to be combined to properly form the character.

Therefore, it is my understanding that the mb_ereg_*() functions 
_should_ somehow be able to return a multibyte character when the 
non-charset preg_*() functions will not.  Example:


This string has three bytes.  All three bytes are in the range 00-7f.   
Therefore, doing a preg_*() match will result in this string appearing 
to be 3 7bit characters - thus, is8bit() will return false.

However, to mb_ereg()  this string should be interpreted as a single 
character, two byte string. Therefore a search for 00-7f *should* fail 
since the character is actually something more like 2e3f (hex).  Even 
though the underlying string is entirely 7bit, mb_ereg() should be 
applying the regex to the "actual" representation of the string.

All of this goes to tell me that it is probably an error with the 
regex which is causing the multibyte character to not be recognized.   
I would think a regex like "/.{1}/" would match "ESCAPE_CHARACTER" for 
preg and "japanese character" for ereg().  However, I haven't yet 
figured out a way to do this in a single regex.  Anyone with ereg() 
style regex experience that could chime in would be appreciated.
2005-03-26 00:00:53 windhamg (at) email (dot) arizona (dot) edu Comment #6 Reply to this comment
Well, I tried the '[^\x00-\x7f]' regex pattern in is8bit(), but no 
dice.  I may be speaking ignorantly (in fact, it's very likely) but, 
even though we are using a multibyte-aware regex function, this 
character set (ISO-2022-JP) *is still* a 7-bit character set.  How are 
we going to find byte values in the range [\x80-\xff] in a 7-bit-byte 
character set?

I'm starting to think this is a lost cause...I placed some diagnostic 
output in the String::regexMatch function and see that, even though 
the $charset being passed in is "ISO-2022-JP", the resultant 
mb_regex_encoding() is "EUC-JP".

IMHO, the root of this problem is that the MIME::encode function 
claims to "Encode a string containing non-ASCII characters according 
to RFC 2047", while it actually only encodes strings containing 
non-8bit characters.  Since  non-8bit does not always imply ASCII, we 
need to find a good test of "ASCII-ness".  I can test for ISO-2022-JP 
using a regex like '\x1b[\(\$]', but it would be nicer to have a more 
general test (if one exists) for non-ASCII 7-bit encodings.
2005-03-25 20:26:17 Michael Slusarz Comment #5 Reply to this comment
A couple issues with your patch:

1) We shouldn't be dealing with mb_* functions in MIME - these should 
be exclusively in String:: or elsewhere..

2) Any multibyte check should be done in MIME::is8bit() instead of 

3) The code seems to indicate that any string that is autodetected as 
not 'ASCII'  needs to be encoded.  However, what if the string is 
autodetected as 'UTF-8'?  If the UTF-8 characters are all in the ASCII 
range, then no encoding is required.

4) Multibyte characters will *not* be returned as 7-bit ASCII text 
from the mb_ereg _*functions.  Since this function is multibyte aware, 
it will know to combine consecutive multibyte bytes together to form 
the character.  I think the issue is that we are only looking for the 
8-bit  characters in the Regex.  We are not looking for 7-bit 
characters **or multibyte characters**.  Therefore, we should probably 
just change the regex to search for "Not 7-bit ASCII characters" 
instead of searching for "8-bit characters".

Could you try changing the regex in MIME::is8bit() to "[^\x00-\x7f]" 
and see if that fixes things?
2005-03-25 19:27:12 windhamg (at) email (dot) arizona (dot) edu Comment #4
New Attachment: MIME.php.diff Download
Reply to this comment
Here's a patch for framework/MIME/MIME.php that seems to have fixed 
the problem on my system.  I'm not 100% sure that it doesn't introduce 
any side effects, but I tested it with several character sets, and it 
appears to do the "right thing".
2005-03-25 06:18:23 windhamg (at) email (dot) arizona (dot) edu Comment #3 Reply to this comment
I applied this patch (the revised version that Chuck committed a few 
minutes ago), but it did not fix my problem.  Although ISO-2022-JP is 
a multibyte character set, it consists of only 7-bit bytes--so the 
String::regexMatch() call returns an empty array, the is8bit() check 
subsequenty returns FALSE, and the RFC2047 encoding is not performed.
2005-03-25 00:42:14 Michael Slusarz Assigned to Michael Slusarz
2005-03-25 00:42:00 Michael Slusarz Comment #2
State ⇒ Feedback
Reply to this comment
2005-03-25 00:16:17 windhamg (at) email (dot) arizona (dot) edu Comment #1
Type ⇒ Bug
State ⇒ Unconfirmed
Priority ⇒ 2. Medium
Summary ⇒ non-ASCII 7-bit message headers not RFC2047-encoded
Queue ⇒ IMP
Reply to this comment
When using a non-ASCII, 7bit "sending charset" (such as ISO-2022-JP) 
the message headers are not being properly encoded, per RFC2047.  The 
MIME::encode() function appears to be using only the "is8bit" check in 
deciding to encode the text, regardless of whether or not it's ASCII.

The result of this is that the resulting mail headers end up being 
displayed as "raw" ISO-2022-JP text, which is "gibberish" to the user.

Saved Queries