Summary | Attachments with special file names (RFC 2231) |
Queue | IMP |
Queue Version | 6.2.18 |
Type | Bug |
State | Resolved |
Priority | 2. Medium |
Owners | jan (at) horde (dot) org |
Requester | wahnes (at) uni-koeln (dot) de |
Created | 04/18/2017 (2999 days ago) |
Due | |
Updated | 10/20/2017 (2814 days ago) |
Assigned | 04/27/2017 (2990 days ago) |
Resolved | 04/28/2017 (2989 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
commit db0c29770f694135bd118803bafa32fa50e6a107
Author: Jan Schneider <jan@horde.org>
Date: Fri, 28 Apr 2017 18:00:17 +0200
[jan] Fix filename charset of certain attachments (
Bug #14618).M docs/CHANGES
M package.xml
https://github.com/horde/imp/commit/db0c29770f694135bd118803bafa32fa50e6a107
commit e82bd3c0818f25beb0cfd86bc1735fe4c39086d8
Author: Jan Schneider <jan@horde.org>
Date: Fri, 28 Apr 2017 17:59:38 +0200
The header charset for attachments is always UTF-8 (
Bug #14618).M lib/Compose.php
https://github.com/horde/imp/commit/e82bd3c0818f25beb0cfd86bc1735fe4c39086d8
As it turns out after sending test e-mails using Imp 6.2.19, there are
few receiving mail programs that actually do profit from this fix,
with mutt being one of the notable exceptions. Others like Thunderbird
and Open-Xchange did already accept "us-ascii" in lieu of "utf-8", and
Outlook does not recognize filename encoding according to RFC 2231 at
all. Same sad thing goes for GMX's web interface, no RFC 2231 support
there either.
commit 91590bd39d92d614442afb56c48bb5c17ca1b8cb
Author: Jan Schneider <jan@horde.org>
Date: Fri Apr 28 18:00:17 2017 +0200
[jan] Fix filename charset of certain attachments (
Bug #14618).imp/package.xml | 1 +
1 file changed, 1 insertion(+)
http://github.com/horde/horde/commit/91590bd39d92d614442afb56c48bb5c17ca1b8cb
Assigned to Jan Schneider
commit 65e2461a1f7fcc5a29080f37f90d84e0431bf0fa
Author: Jan Schneider <jan@horde.org>
Date: Fri Apr 28 18:00:17 2017 +0200
[jan] Fix filename charset of certain attachments (
Bug #14618).imp/docs/CHANGES | 1 +
imp/package.xml | 2 ++
2 files changed, 3 insertions(+)
http://github.com/horde/horde/commit/65e2461a1f7fcc5a29080f37f90d84e0431bf0fa
commit d121cc674c814ccc5d820c5e3d9e86d7028bfba1
Author: Jan Schneider <jan@horde.org>
Date: Fri Apr 28 17:57:22 2017 +0200
The header charset for attachments is always UTF-8 (
Bug #14618).imp/lib/Compose.php | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
http://github.com/horde/horde/commit/d121cc674c814ccc5d820c5e3d9e86d7028bfba1
commit 8a852241683e302f740b40c718d0140d8bb00ab5
Author: Jan Schneider <jan@horde.org>
Date: Fri Apr 28 17:57:22 2017 +0200
The header charset for attachments is always UTF-8 (
Bug #14618).imp/lib/Compose.php | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
http://github.com/horde/horde/commit/8a852241683e302f740b40c718d0140d8bb00ab5
New Attachment: testfiles-with-strange-names.tar
OK on my Linux box with UTF-8 locale, but I don't know if there are
any standards for the charset in a .tar file, given it's such an old
format.
The zip file might be using UTF-16 or such, I don't know.
State ⇒ Feedback
charset at all. This may be due to packaging though, but they looked
like double encoded UTF-8.
Patch ⇒ No
State ⇒ Unconfirmed
New Attachment: horde attachment examples rfc 2231.zip
Milestone ⇒
Queue ⇒ IMP
Summary ⇒ Attachments with special file names (RFC 2231)
Type ⇒ Bug
Priority ⇒ 2. Medium
files is wrong, causing trouble on the receiving side of email with
such attachments. This seems to be happening when the *content* of an
attachment is pure ASCII but the *filename* contains non-ASCII
characters.
Example:
Given a file by the name of "File with a long name coñtaïning strånge
characters but pure ASCII content.txt" that does, as the name implies,
contain only ASCII characters and thus will have a MIME encoding of
"Content-Type: text/plain". When attaching this file, the file name
will be encoded like this:
name*0*=us-ascii''File%20with%20a%20long%20name%20co%F1ta%EFning%20str%E5ng;
name*1*=e%20characters%20but%20pure%20ASCII%20content.txt
Note that the charset used for the encoding of the filename (given
before the first single-quote character in the "name*0*" line) is
"us-ascii" in this case. Cleary, this cannot be the case as ASCII does
not contain the character "ñ". In fact, ASCII does not contain any
character with hex code above 0x7F, so an encoding that uses a hex
code "F1" with "us-ascii" must be wrong. The actual charset would be
ISO-8859-1 or similar, as it contains the "n" with tilde at position
0xF1 (241 decimal).
This error does not happen, however, when the attachment's *content*
has non-ASCII characters in it. When attaching a file that has got
both non-ASCII content and a non-ASCII name, the encoding generated by
Horde is fine. For example, a file by the name of "Example file with
name coñtaïning strånge characters which has non-ASCII content
too.txt" that does in fact contain non-ASCII content (e.g. the string
"Hallo Bärbel") is encoded correctly. In this case, the encoding
generated would be
name*0*=utf-8''Example%20file%20with%20name%20co%C3%B1ta%C3%AFning%20str;
name*1*=%C3%A5nge%20characters%20which%20has%20non-ASCII%20content%20too.tx;
name*2*=t
which is perfectly right. For instance, the "ñ" is encoded here as two
characters in UTF-8, 0xC3 and 0xB1, which is correct.
The root cause of the problem seems to be that Horde uses the charset
of the attachment's *content* to encode the attachment's filename.
This is wrong because the filename can use a different encoding than
the content. This issue manifests itself as well when there is an
attachment that contains non-ASCII characters but the filename uses
pure ASCII: the filename will be encoded as "UTF-8". This does not
cause real problems because any ASCII text is also valid UTF-8 text,
but it adds to my assumption that Horde wrongfully uses the content's
charset in the place where the filename's charset should be used.
I will attach a zip file with four files that can be used to
illustrate the problem:
1. File with ASCII content and ASCII filename --> OK.
2. File with ASCII content and non-ASCII filename --> wrong.
3. File with non-ASCII content and ASCII filename --> OK in a way
because UTF-8 is a superset of ASCII.
4. File with non-ASCII content and non-ASCII filename --> OK.
I hope the file names will be preserved correctly in the zip file.
This zip file was generated using Microsoft Windows's built-in ZIP
functionality, so the file names might not be recognized as they
should everywhere. If you are unable to read them, I will try some
other way to send them.