6.0.0-beta1
7/4/25

[#14618] Attachments with special file names (RFC 2231)
Summary Attachments with special file names (RFC 2231)
Queue IMP
Queue Version 6.2.18
Type Bug
State Resolved
Priority 2. Medium
Owners jan (at) horde (dot) org
Requester wahnes (at) uni-koeln (dot) de
Created 04/18/2017 (2999 days ago)
Due
Updated 10/20/2017 (2814 days ago)
Assigned 04/27/2017 (2990 days ago)
Resolved 04/28/2017 (2989 days ago)
Github Issue Link
Github Pull Request
Milestone
Patch No

History
10/20/2017 08:33:46 PM Git Commit Comment #10 Reply to this comment
Changes have been made in Git (FRAMEWORK_5_2):

commit db0c29770f694135bd118803bafa32fa50e6a107
Author: Jan Schneider <jan@horde.org>
Date:   Fri, 28 Apr 2017 18:00:17 +0200

[jan] Fix filename charset of certain attachments (Bug #14618).

  M docs/CHANGES
  M package.xml

https://github.com/horde/imp/commit/db0c29770f694135bd118803bafa32fa50e6a107
10/20/2017 08:33:46 PM Git Commit Comment #9 Reply to this comment
Changes have been made in Git (FRAMEWORK_5_2):

commit e82bd3c0818f25beb0cfd86bc1735fe4c39086d8
Author: Jan Schneider <jan@horde.org>
Date:   Fri, 28 Apr 2017 17:59:38 +0200

The header charset for attachments is always UTF-8 (Bug #14618).

  M lib/Compose.php

https://github.com/horde/imp/commit/e82bd3c0818f25beb0cfd86bc1735fe4c39086d8
05/03/2017 04:32:51 PM wahnes (at) uni-koeln (dot) de Comment #8 Reply to this comment
Many thanks for this bugfix.

As it turns out after sending test e-mails using Imp 6.2.19, there are 
few receiving mail programs that actually do profit from this fix, 
with mutt being one of the notable exceptions. Others like Thunderbird 
and Open-Xchange did already accept "us-ascii" in lieu of "utf-8", and 
Outlook does not recognize filename encoding according to RFC 2231 at 
all. Same sad thing goes for GMX's web interface, no RFC 2231 support 
there either.
05/03/2017 09:42:10 AM Git Commit Comment #7 Reply to this comment
Changes have been made in Git (master):

commit 91590bd39d92d614442afb56c48bb5c17ca1b8cb
Author: Jan Schneider <jan@horde.org>
Date:   Fri Apr 28 18:00:17 2017 +0200

     [jan] Fix filename charset of certain attachments (Bug #14618).

  imp/package.xml | 1 +
  1 file changed, 1 insertion(+)

http://github.com/horde/horde/commit/91590bd39d92d614442afb56c48bb5c17ca1b8cb
04/28/2017 04:03:26 PM Jan Schneider State ⇒ Resolved
Assigned to Jan Schneider
 
04/28/2017 04:00:29 PM Git Commit Comment #6 Reply to this comment
Changes have been made in Git (FRAMEWORK_5_2):

commit 65e2461a1f7fcc5a29080f37f90d84e0431bf0fa
Author: Jan Schneider <jan@horde.org>
Date:   Fri Apr 28 18:00:17 2017 +0200

     [jan] Fix filename charset of certain attachments (Bug #14618).

  imp/docs/CHANGES | 1 +
  imp/package.xml  | 2 ++
  2 files changed, 3 insertions(+)

http://github.com/horde/horde/commit/65e2461a1f7fcc5a29080f37f90d84e0431bf0fa
04/28/2017 04:00:28 PM Git Commit Comment #5 Reply to this comment
Changes have been made in Git (FRAMEWORK_5_2):

commit d121cc674c814ccc5d820c5e3d9e86d7028bfba1
Author: Jan Schneider <jan@horde.org>
Date:   Fri Apr 28 17:57:22 2017 +0200

     The header charset for attachments is always UTF-8 (Bug #14618).

  imp/lib/Compose.php | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

http://github.com/horde/horde/commit/d121cc674c814ccc5d820c5e3d9e86d7028bfba1
04/28/2017 03:57:49 PM Git Commit Comment #4 Reply to this comment
Changes have been made in Git (master):

commit 8a852241683e302f740b40c718d0140d8bb00ab5
Author: Jan Schneider <jan@horde.org>
Date:   Fri Apr 28 17:57:22 2017 +0200

     The header charset for attachments is always UTF-8 (Bug #14618).

  imp/lib/Compose.php | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

http://github.com/horde/horde/commit/8a852241683e302f740b40c718d0140d8bb00ab5
04/28/2017 02:06:53 PM wahnes (at) uni-koeln (dot) de Comment #3
New Attachment: testfiles-with-strange-names.tar Download
Reply to this comment
Here's another try with a tar file containing the said files. It opens 
OK on my Linux box with UTF-8 locale, but I don't know if there are 
any standards for the charset in a .tar file, given it's such an old 
format.

The zip file might be using UTF-16 or such, I don't know.
04/27/2017 07:35:08 PM Jan Schneider Comment #2
State ⇒ Feedback
Reply to this comment
Well, at least the files that you had in the archive have no valid 
charset at all. This may be due to packaging though, but they looked 
like double encoded UTF-8.
04/18/2017 11:03:58 AM wahnes (at) uni-koeln (dot) de Comment #1
Patch ⇒ No
State ⇒ Unconfirmed
New Attachment: horde attachment examples rfc 2231.zip Download
Milestone ⇒
Queue ⇒ IMP
Summary ⇒ Attachments with special file names (RFC 2231)
Type ⇒ Bug
Priority ⇒ 2. Medium
Reply to this comment
In some cases, the RFC 2231 encoding of the file name for attached 
files is wrong, causing trouble on the receiving side of email with 
such attachments. This seems to be happening when the *content* of an 
attachment is pure ASCII but the *filename* contains non-ASCII 
characters.

Example:
Given a file by the name of "File with a long name coñtaïning strånge 
characters but pure ASCII content.txt" that does, as the name implies, 
contain only ASCII characters and thus will have a MIME encoding of 
"Content-Type: text/plain". When attaching this file, the file name 
will be encoded like this:

name*0*=us-ascii''File%20with%20a%20long%20name%20co%F1ta%EFning%20str%E5ng;
name*1*=e%20characters%20but%20pure%20ASCII%20content.txt

Note that the charset used for the encoding of the filename (given 
before the first single-quote character in the "name*0*" line) is 
"us-ascii" in this case. Cleary, this cannot be the case as ASCII does 
not contain the character "ñ". In fact, ASCII does not contain any 
character with hex code above 0x7F, so an encoding that uses a hex 
code "F1" with "us-ascii" must be wrong. The actual charset would be 
ISO-8859-1 or similar, as it contains the "n" with tilde at position 
0xF1 (241 decimal).

This error does not happen, however, when the attachment's *content* 
has non-ASCII characters in it. When attaching a file that has got 
both non-ASCII content and a non-ASCII name, the encoding generated by 
Horde is fine. For example, a file by the name of "Example file with 
name coñtaïning strånge characters which has non-ASCII content 
too.txt" that does in fact contain non-ASCII content (e.g. the string 
"Hallo Bärbel") is encoded correctly. In this case, the encoding 
generated would be

name*0*=utf-8''Example%20file%20with%20name%20co%C3%B1ta%C3%AFning%20str;
name*1*=%C3%A5nge%20characters%20which%20has%20non-ASCII%20content%20too.tx;
name*2*=t

which is perfectly right. For instance, the "ñ" is encoded here as two 
characters in UTF-8, 0xC3 and 0xB1, which is correct.

The root cause of the problem seems to be that Horde uses the charset 
of the attachment's *content* to encode the attachment's filename. 
This is wrong because the filename can use a different encoding than 
the content. This issue manifests itself as well when there is an 
attachment that contains non-ASCII characters but the filename uses 
pure ASCII: the filename will be encoded as "UTF-8". This does not 
cause real problems because any ASCII text is also valid UTF-8 text, 
but it adds to my assumption that Horde wrongfully uses the content's 
charset in the place where the filename's charset should be used.

I will attach a zip file with four files that can be used to 
illustrate the problem:
1. File with ASCII content and ASCII filename --> OK.
2. File with ASCII content and non-ASCII filename --> wrong.
3. File with non-ASCII content and ASCII filename --> OK in a way 
because UTF-8 is a superset of ASCII.
4. File with non-ASCII content and non-ASCII filename --> OK.

I hope the file names will be preserved correctly in the zip file. 
This zip file was generated using Microsoft Windows's built-in ZIP 
functionality, so the file names might not be recognized as they 
should everywhere. If you are unable to read them, I will try some 
other way to send them.

Saved Queries