6.0.0-git
2019-12-14

[#3565] Attachment modification (newline structure changes)
Summary Attachment modification (newline structure changes)
Queue IMP
Queue Version HEAD
Type Bug
State Not A Bug
Priority 2. Medium
Owners slusarz (at) horde (dot) org
Requester ag (at) netside (dot) de
Created 2006-03-06 (5031 days ago)
Due
Updated 2007-01-12 (4719 days ago)
Assigned 2006-03-06 (5031 days ago)
Resolved 2006-03-08 (5029 days ago)
Milestone
Patch No

History
2007-01-12 09:01:31 vilius (at) lnk (dot) lt Comment #19 Reply to this comment
Bingo!



For the record. There is no reason to disable mime type in a browser. 
I only disabled mime_magic and fileinfo modules in PHP because they 
don't correctly return mime types for word and excel files, and let 
Horde mime magic library do the job. Everything is back to normal again.



Thanks to all who helped!
2007-01-12 05:50:25 Michael Slusarz Comment #18 Reply to this comment
OK. Never mind. I solved most of the problems one way or another.
However one last question remains:

Why is it *sometimes* (like 5 times from 100) exactly the same
attachment is sent to IMP as text/plain (even with Office application
installed, and with different browsers). Could it be AJAX issue (as
it is asynchronious) or Apache issue? Can anyone confirm this?
We don't use AJAX for uploading in IMP.  So that's not it.



Maybe you are running into this issue:

http://lists.horde.org/archives/imp/Week-of-Mon-20070108/046674.html
2007-01-11 13:58:56 vilius (at) lnk (dot) lt Comment #17 Reply to this comment
OK. Never mind. I solved most of the problems one way or another. 
However one last question remains:



Why is it *sometimes* (like 5 times from 100) exactly the same 
attachment is sent to IMP as text/plain (even with Office application 
installed, and with different browsers). Could it be AJAX issue (as it 
is asynchronious) or Apache issue? Can anyone confirm this?
2007-01-08 08:09:07 vilius (at) lnk (dot) lt Comment #16 Reply to this comment

[Show Quoted Text - 10 lines]
Yes, but apache 2.0.x and newer transforms application/octet-stream 
into text/plain (see DefaultType option).



I think this issue should be mentioned on the FAQ at least.
2007-01-08 08:01:27 Michael Slusarz Comment #15 Reply to this comment

[Show Quoted Text - 9 lines]
This is an incorrect statement.  I don't have office on my computer, 
and FF and IE attach .doc files as application/octet-stream (exactly 
like it is supposed to).
2. If you can't rely on user client/OS configuration then it means
that your server must be presicly configured to give Horde the right
mime type.
As mentioned above, this is incorrect.

[Show Quoted Text - 14 lines]
This has already been discussed in this ticket (and elsewhere) ad 
nauseum.  Your requested changes are not going to happen.  We may 
consider patch implementing a configuration option, or a patch 
allowing MIME magic tests to be skipped, but we will always send 
messages by default in the least altered format possible, especially 
when we are EXPLICITLY TOLD the data is text.
2007-01-05 17:23:16 vilius (at) lnk (dot) lt Comment #14 Reply to this comment
I finally had time to do a little bit research about the problem. 
There are two parts about this.



1. If user has't for example Office application installed and tries to 
send document attachment it gets sent as text/plain. This is normal 
because browser doesn't know file type and sends 
application/octet-stream. IE6, IE7, FF1.5 and FF2 behaves exactly this 
way. Only Opera 9.1 has builtin MIME detection for my tested Word 
Document Type. This means that you can't rely on user client/OS.



2. If you can't rely on user client/OS configuration then it means 
that your server must be presicly configured to give Horde the right 
mime type. However I tested 3 different installations of Fedora Core 
5, RedHat Enterprise 4 and RedHat Enterprise 5 beta 2. None of them 
detected correct mime type with default installation. Even with 
fileinfo PECL module installed they still reported text/plain. And 
with only some hacking through config files I found this -> 
http://mail-archives.apache.org/mod_mbox/httpd-cvs/199807.mbox/%3C19980718113554.10097.qmail@hyperreal.org%3E    Seems like most of the office documents was dissabled by default in vanilla apache since 1998! This means that very large base of UNIX type OS'es that uses vanilla apache is affected by the problem most probably 
too.



Now back to the point. I know that it is faster to send plain text 
attachments and plain text, and wiser used space way, and I agree with 
you that having possibility to read those attachments in console 
programs is good. BUT, for the sake of large user base without office 
installed and for the sake of large admin base with vanilla apache 
configuration, PLEASE revert IMP to encode all text/plain attachments 
with base64 again.
2007-01-04 16:09:35 vilius (at) lnk (dot) lt Comment #13 Reply to this comment
OK. But this still doesn't explain why exactly the same document gets 
quoted-printable encoded 3 times and base64'ed the fourth.



I'm using MIME magic library of type detection, but it was not updated 
or changed in any way (nor PHP). I just upgraded Horde and it began to 
happen.



Could it be that IMP misses MIME magic library somehow and how do I 
check for that?
2006-04-26 17:34:59 Michael Slusarz Comment #12 Reply to this comment
From imp@lists:



Quoting "Daniel A. Ramaley" <daniel.ramaley@DRAKE.EDU>:

[Show Quoted Text - 15 lines]
[snip]



This is the same discussion as appeared on the bug report and, 
unfortunately, this discussion is still incorrect.



As mentioned in the bug report - this is not a Horde/IMP issue.  This 
is an issue with quoted-printable not being able to handle binary data 
UNLESS IT IS EXPLICTLY TOLD IT IS BEEN GIVEN BINARY DATA.  More 
important, this issue has *nothing* to do with EOL characters - or, 
more correctly, messing with EOL characters is *absoultely* the wrong 
way to look at this issue.



Maybe a simple example will be in order.  Say I have the following 
text/plain file:



-----

Line one.CR

Line two.

-----



And i send it in quoted-printable.  It will be sent as the following:



-----

Line one.CRLF

Line two.

-----



As can be seen, pursuant to RFCs, all end of line characters are 
converted to CRLF.  Most important, no matter what OS the message is 
read on, that OS can convert the CRLF string to whatever EOL 
convention that OS uses - this is part of the decoding of an RFC 
message on the receiving end.  So the message appears with the same 
line breaks no matter what OS is used to read the message.  What is 
important to realize is that this text message *WILL BE DIFFERENT* 
depending on the OS used.  On unix, the message will look like the 
following:



-----

Line one.LF

Line two.

-----



On windows the message will look like the following:



-----

Line one.CRLF

Line two.

-----



As can be seen, the file length of the former file is 19.  The file 
length of the latter file is 20.  *Ack*!  What is going on?  The 
answer is nothing - as explained several times in the bug reports this 
is exactly what the RFCs allow.   Horde/IMP isn't broken.  Since it is 
text data, the difference is file sizes doesn't make any difference 
since with textual data we only care about the *display*.



But, exactly like the RFCs warn us, the problem occurs when we try to 
use quoted-printable to send BINARY data.  Using the same example as 
above, lets assume that this message is not text data but is binary 
data instead.  Lets assume it is a windows based program that parses 
this data, and this program delimits lines by CRLF.  Lets assume 
Horde/IMP is running on a UNIX machine.  We go to attach our message 
using IMP.  So far so good since the message will be canonicalized 
when sending to:



-----

Line one.CRLF

Line two.

-----



Which just fortuitously happens to be in the format we need.  Now 
imagine this message is received on an IMP installation on a UNIX 
machine.  We go to download the file.  The file is downloaded as such:



-----

Line one.LF

Line two.

-----



And, no suprise, the file is in the wrong format.  The windows program 
can't read the file.  People incorrectly point the finger at Horde/IMP.



So how could this latter situation happen?  Because the file is 
reported to IMP at the time of sending as a text file.  As adequately 
demonstrated above, the RFCs clearly indicate that EOL formatting is 
not guaranteed when using quoted-printable encoding of text data.   
Thus, there is *nothing* broken.  There is either an issue with the 
browser incorrectly identifying the file as text to IMP when 
attaching, or there is an issue with MIME magic detection of the file.



We don't support Q-P encoding of binary data.  It defeats the whole 
purpose of Q-P in the first place - Q-P is intended to provide a non 
MIME-compliant reader (e.g. simple mail user agent, a user looking at 
the raw text of the message) a way to understand the gist of the text 
message without having to do any further processing.



We are not going to send all messages in base64 since #1 it would 
result in *all* messages being approximately 33% larger than they 
should and #2 it does not provide the ability to quickly look at a 
mail message without specialized software and still be able to 
understand most (if not all) of the message



Just FYI, to correctly Q-P binary data, the message above would have 
to be sent as follows:

-----

Line one.=0E=0CLine two.

-----



But if we know the message is binary data, we are just going to base64 
encode it anyway since it is a more efficient way of sending binary 
data (33% more efficient if the entire message is binary data) and if 
a message is binary data, we don't need the feature of being able to 
look at the message (e.g. Q-P) without specialized software since the 
data is going to be indecipherable anyway.



So if binary data is reported as text at the time of attachment then 
there can be no expectation that the message will be transmitted 
through RFC-compliant mail without alteration.  As mentioned 
previously, there may be two reasons why binary data is attached as 
text:

1.) browser reports data as text/*

This is a browser issue.

SOLUTION: Fix your browser.  Or hack Horde/IMP to send all messages in 
base64.  But this will neither become an option or the standard in our 
codebase

2.) MIME magic reports application/octet-stream data as text/*

This may or may not be a Horde issue.  This is only a Horde issue if 
our internal MIME magic detection is used.  But this is the *third* 
option and is only used if both the PECL fileinfo module is not 
installed and the PHP mime_magic extesion is not available.  If either 
of these modules are used, then the issue is with their mime magic 
algorithims which is something out of the control of us.



In conclusion, there is absolutely nothing wrong with the way we send 
Q-P data since we only Q-P encode a message if we are dealing with 
text data.  This is why Bug 3565 was correctly marked Bogus.
2006-03-29 07:31:31 ag (at) netside (dot) de Comment #11 Reply to this comment
This is exactly what is expected (and what the RFC is warning about).
  Namely that different MUAs can decode the file correctly and yet get
different results.
Different results depending on their OS ... i tested Thunderbird on 
Linux and Windows with these files and compared the results. On Linux 
the downloaded attachment had Unix-style linebreaks and on Windows it 
had Windows-style linebreaks. Yes the resulting files differ, but that 
is what i expected and what i was wondering if horde/imp should do the 
same. If someone reads his/her mail from a linux-browser these kind of 
attachments are ok if saved and viewed with a linux editor and if 
someone reads his/her mail from a windows-browser these kind of 
attachments have unix-style linebreaks and are not quite readable with 
a cheap editor that doesn't recognize these linebreaks.



Its a fact that quoted-printable doesn't (need to) encode linebreaks 
on text-files. Thus these linebreaks transform into unix-style 
linebreaks on mail transport. Why not let the decoder of 
quoted-printable text-files substitute linebreaks to the 
linebreak-style of the browsers-os?
2006-03-28 20:56:44 Michael Slusarz Comment #10 Reply to this comment

[Show Quoted Text - 21 lines]
This is exactly what is expected (and what the RFC is warning about).   
Namely that different MUAs can decode the file correctly and yet get 
different results.
2006-03-28 20:53:11 Michael Slusarz Comment #9 Reply to this comment
It sounds like you are still not understanding the issue.  What the 
RFC essentially says is that Quoted-Printable (by default) is not able 
to reliably handle binary data.  This is because the Q-P algorithim 
differs depending on whether you are feeding it binary vs. non-binary 
(e.g. text) data.  Namely, the encoding of line ending characters 
differ depending on whether the data is binary or not.



What is important for our purposes is that there is absolutely no way 
to determine whether a given data stream is binary or not if it 
contains only 7bit characters.  The only way to tell if data is binary 
is via outside information.  If IMP determines the data is binary, we 
encode in base64.  If it is text we encode in quoted-printable.



In this case, IMP is determining the binary-ness of the data via the 
mime type the browser is reporting when uploading the file (or via a 
MIME magic call if the data is uploaded via application/octet-stream). 
  In your case, the browser is reporting this data as a text/* part.   
By definition of the RFCs, a text/* part is NOT BINARY DATA.  Thus, we 
are encoding the file correctly.  The issue is with either with your 
browser or the mime magic regexs.  In either case, there is nothing 
broken with the way Horde/IMP is encoding/decoding the data.
2006-03-28 20:40:19 Michael Slusarz Comment #8 Reply to this comment
Since this happens very likely to a windows mail on a linux system
the quoted-printable part is already altered before horde opens it.

And my solution to this is to alter MIME/Part.php to use
MIME_PART_RFC_EOL instead of MIME_PART_EOL as default $_eol.

Could this be an enhancement to horde? (restricted to Windows-Browsers).
No.  The OS the browser is running on has absolutely no bearing on this issue.
2006-03-21 10:23:01 ag (at) netside (dot) de Comment #7 Reply to this comment
Okay i understand that there's a definition-gap in RFC 2045.

If a sender recognizes an attachment as text/plain it doesn't have to 
encode CR or LF if they're part of a CRLF line break.

It is known that plain text gets altered if passing between systems 
with differing newline conventions.

Since this happens very likely to a windows mail on a linux system the 
quoted-printable part is already altered before horde opens it.



And my solution to this is to alter MIME/Part.php to use 
MIME_PART_RFC_EOL instead of MIME_PART_EOL as default $_eol.



Could this be an enhancement to horde? (restricted to Windows-Browsers).
2006-03-21 09:38:03 ag (at) netside (dot) de Comment #6 Reply to this comment
Reading the RFC 2045 i found following sentence (at chapter 6.7 (1) ):

<quot>

Any octet, except a CR or

           LF that is part of a CRLF line break of the canonical

           (standard) form of the data being encoded, may be

           represented by an "=" followed by a two digit

           hexadecimal representation of the octet's value.

</quot>


2006-03-20 16:32:00 ag (at) netside (dot) de Comment #5 Reply to this comment
To clarify: I've got 5 clients on 3 servers producing 15 exactly the 
same quoted-printable encodings so in my opinion the encoding has to 
be right. And there are 4 clients that decode absolutely correct and 2 
clients that don't.


2006-03-20 15:40:03 ag (at) netside (dot) de Comment #4
New Attachment: Auftg320004.dat Download
Reply to this comment
I'd like to reopen this bug, because i did a lot of testing and have 
some results that you will find interesting.



First of all: Its not only an Outlook-quoted-printable-bug but also a 
IMP-quoted-printable bug.



You said line-breaks like 0x0d 0x0a need to be encoded in 
quoted-printable like =0d=0a and since outlook failes here youre not 
responsible for modification in attachments.

Hmm... but why is a mail sent by horde with such a file attached 
encoded the same *wrong* way like outlook did?



My testing invoved several outlook versions (express, 10, 11), 
thunderbird and three horde/imp versions on the client-side and 
postfix, sendmail and exchange on server sides. All clients were used 
under Windows XP - horde/imp on Firefox and IE.



Thunderbird and the oldest horde/imp installation sent the file 
base64-encoded and all other clients chose quoted-printable for 
encoding the attached file. Since base64 encoding is not a problem at 
all no server-client pair had any problems with these mails.

No Server had any problems at all with any of the other mails and 
connected by outlook or thunderbird the files downloaded from that 
mails where absolutely fine.

One of the three horde/imp clients had no problem either on receiving 
the quoted-printable-attachments, but the two newer horde/imp 
installations corrupted the quoted-printable-attachments of any mail 
received.



OK : Horde 2.2.5 / Imp 3.2.3



Error : Horde 3.0.6 / Imp H3 (4.0.4)

Error : Horde 3.2-cvs / Imp H3 (4.2-cvs)



Now i'm sure that this is really of interest for you since every 
tested combination works fine except the two newer horde/imp 
installations. Even mails with quoted-printable-attachments sent by 
one of the newer horde/imp installations can't be properly saved by 
either of them - but the older horde/imp installation works fine with 
those emails.

Since theres a major version jump between the older and the two newer 
versions somewhere in between this "bug" must have been included.



I'm attaching a test-file - please remove if not needed anymore!
2006-03-08 06:26:06 Michael Slusarz Deleted Original Message
 
2006-03-08 06:25:59 Michael Slusarz Comment #3
State ⇒ Not A Bug
Reply to this comment
This one is easy enough - the quoted-printable message you provided is 
broken.  The text taken directly from RFC 2045 [6.7]:



    WARNING TO IMPLEMENTORS:  If binary data is encoded in quoted-

    printable, care must be taken to encode CR and LF characters as "=0D"

    and "=0A", respectively.  In particular, a CRLF sequence in binary

    data should be encoded as "=0D=0A".  Otherwise, if CRLF were

    represented as a hard line break, it might be incorrectly decoded on

    platforms with different line break conventions.



As is plainly obvious from your message, none of the 'CR' characters 
are encoded (i.e. there is no '=0D' strings in that message).  The 
important part of that text is "it might be incorrectly decoded on 
platforms with different line break conventions".  This is why that 
quoted-printable messages may be saved /viewed correctly on some 
systems (e.g. a mac, windows) but is slightly differently displayed 
using Horde/IMP (e.g. on unix).  But we are doing nothing wrong in 
displaying the message, as we are following the RFC (and, in fact, 
there is no way we could guess the correct linebreak usage anyway).



Pursuant to your request, I have deleted your sample messages from 
this ticket.
2006-03-06 14:24:48 Jan Schneider Assigned to Michael Slusarz
State ⇒ Assigned
 
2006-03-06 11:54:28 ag (at) netside (dot) de Comment #2 Reply to this comment
Err... forgot to mention .... i would really appreciate that the file 
i attached to this ticket is removed after solving (hopefully) it.
2006-03-06 11:51:36 ag (at) netside (dot) de Comment #1
Type ⇒ Bug
State ⇒ Unconfirmed
Priority ⇒ 2. Medium
Summary ⇒ Attachment modification (newline structure changes)
Queue ⇒ IMP
New Attachment: mail-examples.zip
Reply to this comment
Notice: imp/test.php says version H3 (4.2-cvs)



Scenario: An automated ordering system is receiving email-orders with 
defined orderfiles from customers. All customers use a offline 
ordering software that generates the orderfile which then has to be 
(manually) sent to the adress of the ordering system.



I got complains about modified orderfiles from certain customers after 
switching the mailserver over to horde/imp. So i took a closer look at 
those emails since not all order-emails had corrupted attachments. It 
clearly depends on the sender of the email - or to be more specific: 
the mailer used to send the email seems to be the factor.



I grabbed two example orders from that queue (see attached zip)



What happens: The attachment is a textfile with 0x0d 0x0a as 
newline-sequence. Saving the attachment and examining it reveals no 
change if the attachment was base64 encoded. If the mailer had put 
this attachment as quoted-printable these newline-sequences are 
changed into 0x0a 0x0a.



To verify the source of the problem i asked one of these 
"quoted-printable"-senders to send to my private account (no horde/imp 
- sorry). I found no changes on the saved attachment. So the sending 
mailer does not change the attachments newline-sequence but certainly 
horde/imp seems to.



Im pretty sure "quoted-printable" is the source for this problem, but 
maybe its a good idea to change imps behavior when an attachment is 
marked as "Content-type: application/octet-stream"


Saved Queries