Summary | non-ASCII 7-bit message headers not RFC2047-encoded |
Queue | IMP |
Queue Version | HEAD |
Type | Bug |
State | Resolved |
Priority | 2. Medium |
Owners | slusarz (at) horde (dot) org |
Requester | windhamg (at) email (dot) arizona (dot) edu |
Created | 03/25/2005 (7410 days ago) |
Due | |
Updated | 10/06/2008 (6119 days ago) |
Assigned | 09/30/2008 (6125 days ago) |
Resolved | 10/06/2008 (6119 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
State ⇒ Resolved
other software apps do the same thing so I will take it on faith that
it is doing what it is supposed to. Fixed in Horde 3.3.1 and HEAD.
http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.139.4.43&r2=1.139.4.44&ty=u
http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.207&r2=1.208&ty=u
iso-2022-jp,
it encodes us-ascii string too.
This is a sample (User-Agent is Internet Messaging Program (IMP) H3 (4.3)).
---
Content-Type: text/plain;
charset=ISO-2022-JP;
DelSp*="iso-2022-jp''Yes";
format*="iso-2022-jp''flowed"
User-Agent: =?iso-2022-jp?b?SW50ZXJuZXQg?=
=?iso-2022-jp?b?TWVzc2FnaW5nIA==?= =?iso-2022-jp?b?UHJvZ3JhbSA=?=
=?iso-2022-jp?b?KElNUCkg?= =?iso-2022-jp?b?SDMg?=
=?iso-2022-jp?b?KDQuMyk=?=
---
Please check the contents of string like this;
((stristr('iso-2022-jp', $charset) && strstr($string, "\x1b\$B"))
State ⇒ Resolved
http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.139.4.40&r2=1.139.4.41&ty=u
http://cvs.horde.org/diff.php/framework/MIME/MIME/Message.php?r1=1.76.10.17&r2=1.76.10.18&ty=u
http://lists.horde.org/archives/cvs/Week-of-Mon-20080721/081444.html
http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.200&r2=1.201&ty=u
http://cvs.horde.org/diff.php/framework/MIME/MIME/Message.php?r1=1.100&r2=1.101&ty=u
I've moved on to a different role in our organization, and don't work
with Horde/IMP any longer; also, I believe our existing Horde
environment is horribly out-of-date...so I don't think we'll be able
to test this patch.
Thanks anyways!
State ⇒ Feedback
http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.198&r2=1.199&ty=u
I do understand that ISO-2022-JP is a 7-bit charset in that any
individual byte is in the range 00-7f (hex). However, obviously, the
charset uses the presence of an escape character to indicate that
consecutive bytes need to be combined to properly form the character.
Therefore, it is my understanding that the mb_ereg_*() functions
_should_ somehow be able to return a multibyte character when the
non-charset preg_*() functions will not. Example:
String: ESCAPE_CHARACTER MB_CHAR_1 MB_CHAR_2
This string has three bytes. All three bytes are in the range 00-7f.
Therefore, doing a preg_*() match will result in this string appearing
to be 3 7bit characters - thus, is8bit() will return false.
However, to mb_ereg() this string should be interpreted as a single
character, two byte string. Therefore a search for 00-7f *should* fail
since the character is actually something more like 2e3f (hex). Even
though the underlying string is entirely 7bit, mb_ereg() should be
applying the regex to the "actual" representation of the string.
All of this goes to tell me that it is probably an error with the
regex which is causing the multibyte character to not be recognized.
I would think a regex like "/.{1}/" would match "ESCAPE_CHARACTER" for
preg and "japanese character" for ereg(). However, I haven't yet
figured out a way to do this in a single regex. Anyone with ereg()
style regex experience that could chime in would be appreciated.
dice. I may be speaking ignorantly (in fact, it's very likely) but,
even though we are using a multibyte-aware regex function, this
character set (ISO-2022-JP) *is still* a 7-bit character set. How are
we going to find byte values in the range [\x80-\xff] in a 7-bit-byte
character set?
I'm starting to think this is a lost cause...I placed some diagnostic
output in the String::regexMatch function and see that, even though
the $charset being passed in is "ISO-2022-JP", the resultant
mb_regex_encoding() is "EUC-JP".
IMHO, the root of this problem is that the MIME::encode function
claims to "Encode a string containing non-ASCII characters according
to RFC 2047", while it actually only encodes strings containing
non-8bit characters. Since non-8bit does not always imply ASCII, we
need to find a good test of "ASCII-ness". I can test for ISO-2022-JP
using a regex like '\x1b[\(\$]', but it would be nicer to have a more
general test (if one exists) for non-ASCII 7-bit encodings.
1) We shouldn't be dealing with mb_* functions in MIME - these should
be exclusively in String:: or elsewhere..
2) Any multibyte check should be done in MIME::is8bit() instead of
MIME::encode()
3) The code seems to indicate that any string that is autodetected as
not 'ASCII' needs to be encoded. However, what if the string is
autodetected as 'UTF-8'? If the UTF-8 characters are all in the ASCII
range, then no encoding is required.
4) Multibyte characters will *not* be returned as 7-bit ASCII text
from the mb_ereg _*functions. Since this function is multibyte aware,
it will know to combine consecutive multibyte bytes together to form
the character. I think the issue is that we are only looking for the
8-bit characters in the Regex. We are not looking for 7-bit
characters **or multibyte characters**. Therefore, we should probably
just change the regex to search for "Not 7-bit ASCII characters"
instead of searching for "8-bit characters".
Could you try changing the regex in MIME::is8bit() to "[^\x00-\x7f]"
and see if that fixes things?
New Attachment: MIME.php.diff
the problem on my system. I'm not 100% sure that it doesn't introduce
any side effects, but I tested it with several character sets, and it
appears to do the "right thing".
minutes ago), but it did not fix my problem. Although ISO-2022-JP is
a multibyte character set, it consists of only 7-bit bytes--so the
String::regexMatch() call returns an empty array, the is8bit() check
subsequenty returns FALSE, and the RFC2047 encoding is not performed.
State ⇒ Feedback
http://cvs.horde.org/diff.php/framework/MIME/MIME.php?r1=1.143&r2=1.144&ty=u
Priority ⇒ 2. Medium
State ⇒ Unconfirmed
Queue ⇒ IMP
Type ⇒ Bug
Summary ⇒ non-ASCII 7-bit message headers not RFC2047-encoded
the message headers are not being properly encoded, per RFC2047. The
MIME::encode() function appears to be using only the "is8bit" check in
deciding to encode the text, regardless of whether or not it's ASCII.
The result of this is that the resulting mail headers end up being
displayed as "raw" ISO-2022-JP text, which is "gibberish" to the user.