6.0.0-alpha14
6/24/25

[#12127] utf-16 encoded text files MIME part header broken
Summary utf-16 encoded text files MIME part header broken
Queue IMP
Queue Version 6.0.4
Type Bug
State Resolved
Priority 2. Medium
Owners slusarz (at) horde (dot) org
Requester janne.peltonen (at) helsinki (dot) fi
Created 03/18/2013 (4481 days ago)
Due
Updated 03/28/2013 (4471 days ago)
Assigned 03/27/2013 (4472 days ago)
Resolved 03/28/2013 (4471 days ago)
Github Issue Link
Github Pull Request
Milestone
Patch No

History
03/28/2013 07:54:41 AM vmkari (at) cc (dot) helsinki (dot) fi Comment #11 Reply to this comment
This bug is fixed.
Many thanks for solving this problem.
03/28/2013 02:53:40 AM Michael Slusarz Comment #10
State ⇒ Resolved
Reply to this comment
This bug is fixed.  (Sort of forgot that null characters WERE valid 
characters allowed in RFC 822 - at least in a quoted string - but were 
removed in RFC 2822.  So I should go and fix that up also.).

All of this should be in Horde_Mime 2.1.0.
03/28/2013 02:46:27 AM Git Commit Comment #9 Reply to this comment
Changes have been made in Git (master):

commit da81a37bbd0c14338746efcafc5d3670d27e80ef
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Wed Mar 27 20:43:21 2013 -0600

     [mms] More accurate/comprehensive Horde_Mime::is8bit() check (Bug #12127).

  framework/Mime/lib/Horde/Mime.php |   25 +++++++------------------
  framework/Mime/package.xml        |    2 ++
  2 files changed, 9 insertions(+), 18 deletions(-)

http://git.horde.org/horde-git/-/commit/da81a37bbd0c14338746efcafc5d3670d27e80ef
03/28/2013 01:58:47 AM Git Commit Comment #8 Reply to this comment
Changes have been made in Git (master):

commit 537347aa29977d7337d8afbe597f663d70277a30
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Wed Mar 27 19:54:26 2013 -0600

     [mms] Need to always add charset information to MIME encoded 
parameters if they are not displayable in pure US-ASCII (Bug #12127).

  framework/Mime/lib/Horde/Mime.php           |   10 ++++++----
  framework/Mime/package.xml                  |    1 +
  framework/Mime/test/Horde/Mime/MimeTest.php |   19 +++++++++++++++++++
  3 files changed, 26 insertions(+), 4 deletions(-)

http://git.horde.org/horde-git/-/commit/537347aa29977d7337d8afbe597f663d70277a30
03/28/2013 01:58:40 AM Git Commit Comment #7 Reply to this comment
Changes have been made in Git (master):

commit 50f1505733f3455acd897142186abe5f8f84788d
Author: Michael M Slusarz <slusarz@horde.org>
Date:   Wed Mar 27 19:52:57 2013 -0600

     [mms] Correctly quote forbidden characters in MIME parameter data 
(Bug #12127).

  framework/Mime/doc/Horde/Mime/UPGRADING     |   11 ++++++++++
  framework/Mime/lib/Horde/Mime.php           |   29 +++++++++++++++++---------
  framework/Mime/package.xml                  |   11 ++++++---
  framework/Mime/test/Horde/Mime/MimeTest.php |    8 +++++++
  4 files changed, 45 insertions(+), 14 deletions(-)

http://git.horde.org/horde-git/-/commit/50f1505733f3455acd897142186abe5f8f84788d
03/27/2013 09:26:58 PM Michael Slusarz Comment #6
Assigned to Michael Slusarz
State ⇒ Assigned
Reply to this comment
OK... starting to make sense.  We encode MIME parameters in the header 
to the same charset as the text body.  So that's why the null 
characters are in the header.

But MIME encoding should embed the the charset in these headers.  So 
there is something wrong with that.
03/27/2013 07:27:00 AM vmkari (at) cc (dot) helsinki (dot) fi Comment #5
New Attachment: liite16.txt.gz Download
Reply to this comment
At a minimum, I need an example file that triggers this behavior.   
Might want to zip/tgz before adding to the ticket to prevent the 
data from being munged.
Here's one UTF-16LE text file with a BOM, gzipped for safety. We 
appreciate your help. Thanks.

03/27/2013 06:17:59 AM Michael Slusarz Comment #4 Reply to this comment
Those headers should never ever contain NUL bytes nor 8-bit data of 
any kind. That is the key problem. (This bug is not about the 
difficulties detecting the correct charset).
But this is still potentially an issue similar to what I described.   
For example: the browser may send UTF-16 as the charset, but the PHP 
conversion methods mishandle it since that is NOT the charset that the 
text really is.  And/or the fact that PHP string conversion methods 
are known to be buggy with UTF16LE (i.e. iconv() on certain systems).

At a minimum, I need an example file that triggers this behavior.   
Might want to zip/tgz before adding to the ticket to prevent the data 
from being munged.

03/19/2013 02:10:10 PM vmkari (at) cc (dot) helsinki (dot) fi Comment #3 Reply to this comment

[Show Quoted Text - 13 lines]
Many thanks Michael for your prompt response, but unfortunately the 
reply misses the main point of the original bug report. Horde/IMP 
wrongly encodes the metadata about the UTF16-LE encoded text file 
attachment, i.e. the MIME headers describing the attachment then 
contain UTF16-LE. Those headers should never ever contain NUL bytes 
nor 8-bit data of any kind. That is the key problem. (This bug is not 
about the difficulties detecting the correct charset).

03/18/2013 07:48:11 PM Michael Slusarz Comment #2
State ⇒ Feedback
Reply to this comment
IIRC, this is a known limitation with uploading text data in browsers. 
  Browsers rarely (if ever?) insert charset information into the 
Content-Type of the uploaded file.  So we pretty much have to assume 
that the file being uploaded is the same charset as the browser (which 
is assumed to be the same charset the underlying OS is using).

See, e.g., 
http://stackoverflow.com/questions/6459741/how-to-determine-if-uploaded-file-is-in-utf-8-or-utf-16

BOM's could potentially be used to fix this some of the time.  But 
there's no guarantee they will exist, or won't be stripped/munged by 
some intermediate process.
03/18/2013 02:04:05 PM janne (dot) peltonen (at) helsinki (dot) fi Comment #1
Priority ⇒ 2. Medium
Type ⇒ Bug
Summary ⇒ utf-16 encoded text files MIME part header broken
Queue ⇒ IMP
Milestone ⇒
Patch ⇒ No
State ⇒ Unconfirmed
Reply to this comment
Hi! If I try to send a utf-16 (little endian with the tag byte 
sequence FF FE at the beginning) encoded text file as an attachment 
and I've enabled saving attachments to the Sent folder with the 
message, IMP creates a very strange header for the MIME part of the 
attachment: all the values of the header keywords end up encoded as 
utf-16 (except that the Content-Transfer-Encoding loses the final NUL 
byte). Like this:

--clip--
--=_pRVOz2N9C9oTNFcAA1MfBg1
Content-Type: text/plain; charset=u^@t^@f^@-^@1^@6^@l^@e^@;
  name=m^@a^@a^@l^@a^@m^@p^@o^@1^@b^@.^@t^@x^@t
Content-Disposition: attachment; 
filename=m^@a^@a^@l^@a^@m^@p^@o^@1^@b^@.^@t^@x^@t^@;
  size=146736
Content-Transfer-Encoding: q^@u^@o^@t^@e^@d^@-^@p^@r^@i^@n^@t^@a^@b^@l^@e
--clip--

here, ^@ stands for the null byte. The MIME part headers should be 
7bit clean, and at least Sendmail apparently converts this to ASCII 
while sending, so the receiver gets the message as they should, but 
the IMAP backends do various things for the saved attachment. Cyrus, 
for instance, complains that the message contains NUL bytes and 
doesn't save it at all to the Sent folder - except for a buggy version 
I happen to be running...

Saved Queries