Summary | Attachment modification (newline structure changes) |
Queue | IMP |
Queue Version | HEAD |
Type | Bug |
State | Not A Bug |
Priority | 2. Medium |
Owners | slusarz (at) horde (dot) org |
Requester | ag (at) netside (dot) de |
Created | 03/06/2006 (7051 days ago) |
Due | |
Updated | 01/12/2007 (6739 days ago) |
Assigned | 03/06/2006 (7051 days ago) |
Resolved | 03/08/2006 (7049 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
http://lists.horde.org/archives/imp/Week-of-Mon-20070108/046674.html
For the record. There is no reason to disable mime type in a browser.
I only disabled mime_magic and fileinfo modules in PHP because they
don't correctly return mime types for word and excel files, and let
Horde mime magic library do the job. Everything is back to normal again.
Thanks to all who helped!
However one last question remains:
Why is it *sometimes* (like 5 times from 100) exactly the same
attachment is sent to IMP as text/plain (even with Office application
installed, and with different browsers). Could it be AJAX issue (as
it is asynchronious) or Apache issue? Can anyone confirm this?
Maybe you are running into this issue:
http://lists.horde.org/archives/imp/Week-of-Mon-20070108/046674.html
However one last question remains:
Why is it *sometimes* (like 5 times from 100) exactly the same
attachment is sent to IMP as text/plain (even with Office application
installed, and with different browsers). Could it be AJAX issue (as it
is asynchronious) or Apache issue? Can anyone confirm this?
into text/plain (see DefaultType option).
I think this issue should be mentioned on the FAQ at least.
and FF and IE attach .doc files as application/octet-stream (exactly
like it is supposed to).
that your server must be presicly configured to give Horde the right
mime type.
nauseum. Your requested changes are not going to happen. We may
consider patch implementing a configuration option, or a patch
allowing MIME magic tests to be skipped, but we will always send
messages by default in the least altered format possible, especially
when we are EXPLICITLY TOLD the data is text.
There are two parts about this.
1. If user has't for example Office application installed and tries to
send document attachment it gets sent as text/plain. This is normal
because browser doesn't know file type and sends
application/octet-stream. IE6, IE7, FF1.5 and FF2 behaves exactly this
way. Only Opera 9.1 has builtin MIME detection for my tested Word
Document Type. This means that you can't rely on user client/OS.
2. If you can't rely on user client/OS configuration then it means
that your server must be presicly configured to give Horde the right
mime type. However I tested 3 different installations of Fedora Core
5, RedHat Enterprise 4 and RedHat Enterprise 5 beta 2. None of them
detected correct mime type with default installation. Even with
fileinfo PECL module installed they still reported text/plain. And
with only some hacking through config files I found this ->
http://mail-archives.apache.org/mod_mbox/httpd-cvs/199807.mbox/%3C19980718113554.10097.qmail@hyperreal.org%3E Seems like most of the office documents was dissabled by default in vanilla apache since 1998! This means that very large base of UNIX type OS'es that uses vanilla apache is affected by the problem most probably
too.
Now back to the point. I know that it is faster to send plain text
attachments and plain text, and wiser used space way, and I agree with
you that having possibility to read those attachments in console
programs is good. BUT, for the sake of large user base without office
installed and for the sake of large admin base with vanilla apache
configuration, PLEASE revert IMP to encode all text/plain attachments
with base64 again.
quoted-printable encoded 3 times and base64'ed the fourth.
I'm using MIME magic library of type detection, but it was not updated
or changed in any way (nor PHP). I just upgraded Horde and it began to
happen.
Could it be that IMP misses MIME magic library somehow and how do I
check for that?
Quoting "Daniel A. Ramaley" <daniel.ramaley@DRAKE.EDU>:
This is the same discussion as appeared on the bug report and,
unfortunately, this discussion is still incorrect.
As mentioned in the bug report - this is not a Horde/IMP issue. This
is an issue with quoted-printable not being able to handle binary data
UNLESS IT IS EXPLICTLY TOLD IT IS BEEN GIVEN BINARY DATA. More
important, this issue has *nothing* to do with EOL characters - or,
more correctly, messing with EOL characters is *absoultely* the wrong
way to look at this issue.
Maybe a simple example will be in order. Say I have the following
text/plain file:
-----
Line one.CR
Line two.
-----
And i send it in quoted-printable. It will be sent as the following:
-----
Line one.CRLF
Line two.
-----
As can be seen, pursuant to RFCs, all end of line characters are
converted to CRLF. Most important, no matter what OS the message is
read on, that OS can convert the CRLF string to whatever EOL
convention that OS uses - this is part of the decoding of an RFC
message on the receiving end. So the message appears with the same
line breaks no matter what OS is used to read the message. What is
important to realize is that this text message *WILL BE DIFFERENT*
depending on the OS used. On unix, the message will look like the
following:
-----
Line one.LF
Line two.
-----
On windows the message will look like the following:
-----
Line one.CRLF
Line two.
-----
As can be seen, the file length of the former file is 19. The file
length of the latter file is 20. *Ack*! What is going on? The
answer is nothing - as explained several times in the bug reports this
is exactly what the RFCs allow. Horde/IMP isn't broken. Since it is
text data, the difference is file sizes doesn't make any difference
since with textual data we only care about the *display*.
But, exactly like the RFCs warn us, the problem occurs when we try to
use quoted-printable to send BINARY data. Using the same example as
above, lets assume that this message is not text data but is binary
data instead. Lets assume it is a windows based program that parses
this data, and this program delimits lines by CRLF. Lets assume
Horde/IMP is running on a UNIX machine. We go to attach our message
using IMP. So far so good since the message will be canonicalized
when sending to:
-----
Line one.CRLF
Line two.
-----
Which just fortuitously happens to be in the format we need. Now
imagine this message is received on an IMP installation on a UNIX
machine. We go to download the file. The file is downloaded as such:
-----
Line one.LF
Line two.
-----
And, no suprise, the file is in the wrong format. The windows program
can't read the file. People incorrectly point the finger at Horde/IMP.
So how could this latter situation happen? Because the file is
reported to IMP at the time of sending as a text file. As adequately
demonstrated above, the RFCs clearly indicate that EOL formatting is
not guaranteed when using quoted-printable encoding of text data.
Thus, there is *nothing* broken. There is either an issue with the
browser incorrectly identifying the file as text to IMP when
attaching, or there is an issue with MIME magic detection of the file.
We don't support Q-P encoding of binary data. It defeats the whole
purpose of Q-P in the first place - Q-P is intended to provide a non
MIME-compliant reader (e.g. simple mail user agent, a user looking at
the raw text of the message) a way to understand the gist of the text
message without having to do any further processing.
We are not going to send all messages in base64 since
#1it wouldresult in *all* messages being approximately 33% larger than they
should and
#2it does not provide the ability to quickly look at amail message without specialized software and still be able to
understand most (if not all) of the message
Just FYI, to correctly Q-P binary data, the message above would have
to be sent as follows:
-----
Line one.=0E=0CLine two.
-----
But if we know the message is binary data, we are just going to base64
encode it anyway since it is a more efficient way of sending binary
data (33% more efficient if the entire message is binary data) and if
a message is binary data, we don't need the feature of being able to
look at the message (e.g. Q-P) without specialized software since the
data is going to be indecipherable anyway.
So if binary data is reported as text at the time of attachment then
there can be no expectation that the message will be transmitted
through RFC-compliant mail without alteration. As mentioned
previously, there may be two reasons why binary data is attached as
text:
1.) browser reports data as text/*
This is a browser issue.
SOLUTION: Fix your browser. Or hack Horde/IMP to send all messages in
base64. But this will neither become an option or the standard in our
codebase
2.) MIME magic reports application/octet-stream data as text/*
This may or may not be a Horde issue. This is only a Horde issue if
our internal MIME magic detection is used. But this is the *third*
option and is only used if both the PECL fileinfo module is not
installed and the PHP mime_magic extesion is not available. If either
of these modules are used, then the issue is with their mime magic
algorithims which is something out of the control of us.
In conclusion, there is absolutely nothing wrong with the way we send
Q-P data since we only Q-P encode a message if we are dealing with
text data. This is why
Bug 3565was correctly marked Bogus.Namely that different MUAs can decode the file correctly and yet get
different results.
Linux and Windows with these files and compared the results. On Linux
the downloaded attachment had Unix-style linebreaks and on Windows it
had Windows-style linebreaks. Yes the resulting files differ, but that
is what i expected and what i was wondering if horde/imp should do the
same. If someone reads his/her mail from a linux-browser these kind of
attachments are ok if saved and viewed with a linux editor and if
someone reads his/her mail from a windows-browser these kind of
attachments have unix-style linebreaks and are not quite readable with
a cheap editor that doesn't recognize these linebreaks.
Its a fact that quoted-printable doesn't (need to) encode linebreaks
on text-files. Thus these linebreaks transform into unix-style
linebreaks on mail transport. Why not let the decoder of
quoted-printable text-files substitute linebreaks to the
linebreak-style of the browsers-os?
Namely that different MUAs can decode the file correctly and yet get
different results.
RFC essentially says is that Quoted-Printable (by default) is not able
to reliably handle binary data. This is because the Q-P algorithim
differs depending on whether you are feeding it binary vs. non-binary
(e.g. text) data. Namely, the encoding of line ending characters
differ depending on whether the data is binary or not.
What is important for our purposes is that there is absolutely no way
to determine whether a given data stream is binary or not if it
contains only 7bit characters. The only way to tell if data is binary
is via outside information. If IMP determines the data is binary, we
encode in base64. If it is text we encode in quoted-printable.
In this case, IMP is determining the binary-ness of the data via the
mime type the browser is reporting when uploading the file (or via a
MIME magic call if the data is uploaded via application/octet-stream).
In your case, the browser is reporting this data as a text/* part.
By definition of the RFCs, a text/* part is NOT BINARY DATA. Thus, we
are encoding the file correctly. The issue is with either with your
browser or the mime magic regexs. In either case, there is nothing
broken with the way Horde/IMP is encoding/decoding the data.
the quoted-printable part is already altered before horde opens it.
And my solution to this is to alter MIME/Part.php to use
MIME_PART_RFC_EOL instead of MIME_PART_EOL as default $_eol.
Could this be an enhancement to horde? (restricted to Windows-Browsers).
If a sender recognizes an attachment as text/plain it doesn't have to
encode CR or LF if they're part of a CRLF line break.
It is known that plain text gets altered if passing between systems
with differing newline conventions.
Since this happens very likely to a windows mail on a linux system the
quoted-printable part is already altered before horde opens it.
And my solution to this is to alter MIME/Part.php to use
MIME_PART_RFC_EOL instead of MIME_PART_EOL as default $_eol.
Could this be an enhancement to horde? (restricted to Windows-Browsers).
<quot>
Any octet, except a CR or
LF that is part of a CRLF line break of the canonical
(standard) form of the data being encoded, may be
represented by an "=" followed by a two digit
hexadecimal representation of the octet's value.
</quot>
same quoted-printable encodings so in my opinion the encoding has to
be right. And there are 4 clients that decode absolutely correct and 2
clients that don't.
New Attachment: Auftg320004.dat
some results that you will find interesting.
First of all: Its not only an Outlook-quoted-printable-bug but also a
IMP-quoted-printable bug.
You said line-breaks like 0x0d 0x0a need to be encoded in
quoted-printable like =0d=0a and since outlook failes here youre not
responsible for modification in attachments.
Hmm... but why is a mail sent by horde with such a file attached
encoded the same *wrong* way like outlook did?
My testing invoved several outlook versions (express, 10, 11),
thunderbird and three horde/imp versions on the client-side and
postfix, sendmail and exchange on server sides. All clients were used
under Windows XP - horde/imp on Firefox and IE.
Thunderbird and the oldest horde/imp installation sent the file
base64-encoded and all other clients chose quoted-printable for
encoding the attached file. Since base64 encoding is not a problem at
all no server-client pair had any problems with these mails.
No Server had any problems at all with any of the other mails and
connected by outlook or thunderbird the files downloaded from that
mails where absolutely fine.
One of the three horde/imp clients had no problem either on receiving
the quoted-printable-attachments, but the two newer horde/imp
installations corrupted the quoted-printable-attachments of any mail
received.
OK : Horde 2.2.5 / Imp 3.2.3
Error : Horde 3.0.6 / Imp H3 (4.0.4)
Error : Horde 3.2-cvs / Imp H3 (4.2-cvs)
Now i'm sure that this is really of interest for you since every
tested combination works fine except the two newer horde/imp
installations. Even mails with quoted-printable-attachments sent by
one of the newer horde/imp installations can't be properly saved by
either of them - but the older horde/imp installation works fine with
those emails.
Since theres a major version jump between the older and the two newer
versions somewhere in between this "bug" must have been included.
I'm attaching a test-file - please remove if not needed anymore!
State ⇒ Not A Bug
broken. The text taken directly from RFC 2045 [6.7]:
WARNING TO IMPLEMENTORS: If binary data is encoded in quoted-
printable, care must be taken to encode CR and LF characters as "=0D"
and "=0A", respectively. In particular, a CRLF sequence in binary
data should be encoded as "=0D=0A". Otherwise, if CRLF were
represented as a hard line break, it might be incorrectly decoded on
platforms with different line break conventions.
As is plainly obvious from your message, none of the 'CR' characters
are encoded (i.e. there is no '=0D' strings in that message). The
important part of that text is "it might be incorrectly decoded on
platforms with different line break conventions". This is why that
quoted-printable messages may be saved /viewed correctly on some
systems (e.g. a mac, windows) but is slightly differently displayed
using Horde/IMP (e.g. on unix). But we are doing nothing wrong in
displaying the message, as we are following the RFC (and, in fact,
there is no way we could guess the correct linebreak usage anyway).
Pursuant to your request, I have deleted your sample messages from
this ticket.
State ⇒ Assigned
i attached to this ticket is removed after solving (hopefully) it.
Priority ⇒ 2. Medium
State ⇒ Unconfirmed
New Attachment: mail-examples.zip
Queue ⇒ IMP
Summary ⇒ Attachment modification (newline structure changes)
Type ⇒ Bug
Scenario: An automated ordering system is receiving email-orders with
defined orderfiles from customers. All customers use a offline
ordering software that generates the orderfile which then has to be
(manually) sent to the adress of the ordering system.
I got complains about modified orderfiles from certain customers after
switching the mailserver over to horde/imp. So i took a closer look at
those emails since not all order-emails had corrupted attachments. It
clearly depends on the sender of the email - or to be more specific:
the mailer used to send the email seems to be the factor.
I grabbed two example orders from that queue (see attached zip)
What happens: The attachment is a textfile with 0x0d 0x0a as
newline-sequence. Saving the attachment and examining it reveals no
change if the attachment was base64 encoded. If the mailer had put
this attachment as quoted-printable these newline-sequences are
changed into 0x0a 0x0a.
To verify the source of the problem i asked one of these
"quoted-printable"-senders to send to my private account (no horde/imp
- sorry). I found no changes on the saved attachment. So the sending
mailer does not change the attachments newline-sequence but certainly
horde/imp seems to.
Im pretty sure "quoted-printable" is the source for this problem, but
maybe its a good idea to change imps behavior when an attachment is
marked as "Content-type: application/octet-stream"