6.0.0-beta1
7/8/25

[#1621] non-ASCII 7-bit message headers not RFC2047-encoded
Summary non-ASCII 7-bit message headers not RFC2047-encoded
Queue IMP
Queue Version HEAD
Type Bug
State Resolved
Priority 2. Medium
Owners slusarz (at) horde (dot) org
Requester windhamg (at) email (dot) arizona (dot) edu
Created 03/25/2005 (7410 days ago)
Due
Updated 10/06/2008 (6119 days ago)
Assigned 09/30/2008 (6125 days ago)
Resolved 10/06/2008 (6119 days ago)
Github Issue Link
Github Pull Request
Milestone
Patch No

History
10/06/2008 04:09:31 PM Michael Slusarz Comment #18
State ⇒ Resolved
Reply to this comment
That's a weird check, and I have no idea what it is checking for, but 
other software apps do the same thing so I will take it on faith that 
it is doing what it is supposed to.  Fixed in Horde 3.3.1 and HEAD.
10/06/2008 04:08:25 PM CVS Commit Comment #17 Reply to this comment
10/06/2008 04:07:25 PM CVS Commit Comment #16 Reply to this comment
09/30/2008 05:07:09 AM Chuck Hagenbuch State ⇒ Assigned
 
09/30/2008 02:23:46 AM hiromi (at) tac (dot) tsukuba (dot) ac (dot) jp Comment #15 Reply to this comment
Last change of MIME.php has a side-effect in the case of $charset is 
iso-2022-jp,

it encodes us-ascii string too.

This is a sample (User-Agent is Internet Messaging Program (IMP) H3 (4.3)).

---

Content-Type: text/plain;

         charset=ISO-2022-JP;

         DelSp*="iso-2022-jp''Yes";

         format*="iso-2022-jp''flowed"

User-Agent: =?iso-2022-jp?b?SW50ZXJuZXQg?=

         =?iso-2022-jp?b?TWVzc2FnaW5nIA==?= =?iso-2022-jp?b?UHJvZ3JhbSA=?=

         =?iso-2022-jp?b?KElNUCkg?= =?iso-2022-jp?b?SDMg?=

         =?iso-2022-jp?b?KDQuMyk=?=

---



Please check the contents of string like this;

    ((stristr('iso-2022-jp', $charset) && strstr($string, "\x1b\$B"))


07/26/2008 11:43:38 PM Michael Slusarz Comment #14
State ⇒ Resolved
Reply to this comment
My changes seem to work well so going to mark this as resolved.
07/25/2008 10:05:52 PM windhamg (at) email (dot) arizona (dot) edu Comment #10 Reply to this comment
Wow...3 1/2 years is a long time. :)



I've moved on to a different role in our organization, and don't work 
with Horde/IMP any longer; also, I believe our existing Horde 
environment is horribly out-of-date...so I don't think we'll be able 
to test this patch.



Thanks anyways!
07/25/2008 09:35:25 PM Michael Slusarz Comment #9
State ⇒ Feedback
Reply to this comment
Reviving from the dead... how does this patch look/work?
07/25/2008 09:32:52 PM CVS Commit Comment #8 Reply to this comment
06/14/2005 12:24:36 AM Michael Slusarz State ⇒ Stalled
 
03/30/2005 06:53:27 PM Michael Slusarz Comment #7 Reply to this comment
So my reply, which will attempt to battle yours for ignorance :)



I do understand that ISO-2022-JP is a 7-bit charset in that any 
individual byte is in the range 00-7f (hex).  However, obviously, the 
charset uses the presence of an escape character to indicate that 
consecutive bytes need to be combined to properly form the character.



Therefore, it is my understanding that the mb_ereg_*() functions 
_should_ somehow be able to return a multibyte character when the 
non-charset preg_*() functions will not.  Example:



String: ESCAPE_CHARACTER MB_CHAR_1 MB_CHAR_2



This string has three bytes.  All three bytes are in the range 00-7f.   
Therefore, doing a preg_*() match will result in this string appearing 
to be 3 7bit characters - thus, is8bit() will return false.



However, to mb_ereg()  this string should be interpreted as a single 
character, two byte string. Therefore a search for 00-7f *should* fail 
since the character is actually something more like 2e3f (hex).  Even 
though the underlying string is entirely 7bit, mb_ereg() should be 
applying the regex to the "actual" representation of the string.



All of this goes to tell me that it is probably an error with the 
regex which is causing the multibyte character to not be recognized.   
I would think a regex like "/.{1}/" would match "ESCAPE_CHARACTER" for 
preg and "japanese character" for ereg().  However, I haven't yet 
figured out a way to do this in a single regex.  Anyone with ereg() 
style regex experience that could chime in would be appreciated.
03/26/2005 12:00:53 AM windhamg (at) email (dot) arizona (dot) edu Comment #6 Reply to this comment
Well, I tried the '[^\x00-\x7f]' regex pattern in is8bit(), but no 
dice.  I may be speaking ignorantly (in fact, it's very likely) but, 
even though we are using a multibyte-aware regex function, this 
character set (ISO-2022-JP) *is still* a 7-bit character set.  How are 
we going to find byte values in the range [\x80-\xff] in a 7-bit-byte 
character set?



I'm starting to think this is a lost cause...I placed some diagnostic 
output in the String::regexMatch function and see that, even though 
the $charset being passed in is "ISO-2022-JP", the resultant 
mb_regex_encoding() is "EUC-JP".



IMHO, the root of this problem is that the MIME::encode function 
claims to "Encode a string containing non-ASCII characters according 
to RFC 2047", while it actually only encodes strings containing 
non-8bit characters.  Since  non-8bit does not always imply ASCII, we 
need to find a good test of "ASCII-ness".  I can test for ISO-2022-JP 
using a regex like '\x1b[\(\$]', but it would be nicer to have a more 
general test (if one exists) for non-ASCII 7-bit encodings.
03/25/2005 08:26:17 PM Michael Slusarz Comment #5 Reply to this comment
A couple issues with your patch:

1) We shouldn't be dealing with mb_* functions in MIME - these should 
be exclusively in String:: or elsewhere..

2) Any multibyte check should be done in MIME::is8bit() instead of 
MIME::encode()

3) The code seems to indicate that any string that is autodetected as 
not 'ASCII'  needs to be encoded.  However, what if the string is 
autodetected as 'UTF-8'?  If the UTF-8 characters are all in the ASCII 
range, then no encoding is required.

4) Multibyte characters will *not* be returned as 7-bit ASCII text 
from the mb_ereg _*functions.  Since this function is multibyte aware, 
it will know to combine consecutive multibyte bytes together to form 
the character.  I think the issue is that we are only looking for the 
8-bit  characters in the Regex.  We are not looking for 7-bit 
characters **or multibyte characters**.  Therefore, we should probably 
just change the regex to search for "Not 7-bit ASCII characters" 
instead of searching for "8-bit characters".



Could you try changing the regex in MIME::is8bit() to "[^\x00-\x7f]" 
and see if that fixes things?
03/25/2005 07:27:12 PM windhamg (at) email (dot) arizona (dot) edu Comment #4
New Attachment: MIME.php.diff Download
Reply to this comment
Here's a patch for framework/MIME/MIME.php that seems to have fixed 
the problem on my system.  I'm not 100% sure that it doesn't introduce 
any side effects, but I tested it with several character sets, and it 
appears to do the "right thing".
03/25/2005 06:18:23 AM windhamg (at) email (dot) arizona (dot) edu Comment #3 Reply to this comment
I applied this patch (the revised version that Chuck committed a few 
minutes ago), but it did not fix my problem.  Although ISO-2022-JP is 
a multibyte character set, it consists of only 7-bit bytes--so the 
String::regexMatch() call returns an empty array, the is8bit() check 
subsequenty returns FALSE, and the RFC2047 encoding is not performed.
03/25/2005 12:42:14 AM Michael Slusarz Assigned to Michael Slusarz
 
03/25/2005 12:42:00 AM Michael Slusarz Comment #2
State ⇒ Feedback
Reply to this comment
03/25/2005 12:16:17 AM windhamg (at) email (dot) arizona (dot) edu Comment #1
Priority ⇒ 2. Medium
State ⇒ Unconfirmed
Queue ⇒ IMP
Type ⇒ Bug
Summary ⇒ non-ASCII 7-bit message headers not RFC2047-encoded
Reply to this comment
When using a non-ASCII, 7bit "sending charset" (such as ISO-2022-JP) 
the message headers are not being properly encoded, per RFC2047.  The 
MIME::encode() function appears to be using only the "is8bit" check in 
deciding to encode the text, regardless of whether or not it's ASCII.



The result of this is that the resulting mail headers end up being 
displayed as "raw" ISO-2022-JP text, which is "gibberish" to the user.

Saved Queries