Summary | MIME.php wrapHeaders corrupting filenames |
Queue | Horde Framework Packages |
Type | Bug |
State | Resolved |
Priority | 2. Medium |
Owners | slusarz (at) horde (dot) org |
Requester | slusarz (at) horde (dot) org |
Created | 09/29/2004 (7601 days ago) |
Due | |
Updated | 10/26/2004 (7574 days ago) |
Assigned | 09/29/2004 (7601 days ago) |
Resolved | 10/26/2004 (7574 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
State ⇒ Resolved
in this report. I will leave this bug report open for a while to make
sure this has been fixed correctly.
I've just started to take a look at this, but a quick comment on this
solution.
Although this is the correct way to break these lines according to RFC 2231,
there is a *boatload* of mailers that don't support this. So implementing it
this way is out of the question, at least for right now (IMP, for example,
supports decoding RFC 2231 encoded strings, but we have to do it in an
extremely hackish way as c-client/PHP doesn't even support this format).
Now that I have been thinking about this for a few minutes... isn't this the
same problem and/or potential solution I discussed here:
http://marc.theaimsgroup.com/?l=horde-dev&m=108334367512331&w=2
If this solution doesn't work, most likely we will have to just have
the line be
longer than 78 characters since that is the only way I can see right now that
would work with most/all mailers.
State ⇒ Assigned
Priority ⇒ 2. Medium
Type ⇒ Bug
Summary ⇒ MIME.php wrapHeaders corrupting filenames
Queue ⇒ Horde Framework Packages
Assigned to Michael Slusarz
circumstances taking long filenames which have spaces in them and
replacing a space in the filename with a tab:
function wrapHeaders($header, $text, $eol = "\r\n")
{
/* Remove any existing linebreaks. */
$text = preg_replace("/\r?\n\s?/", ' ', $text);
/* Wrap the line. */
$line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75,
$eol . "\t");
/* Make sure there are no empty lines. */
$line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/", "/"
. $eol . "\t/", $line);
return substr($line, strlen($header) + 2);
}
Example:
Horde:
Content-Type: application/msword; name="Mid-Pgm Assessment
Form000000000000000.doc"
Content-Disposition: attachment; filename="Mid-Pgm Assessment
Form000000000000000.doc"
Content-Transfer-Encoding: base64
Horde with filename > 78 and no spaces:
Content-Type: application/msword;
name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Disposition: attachment;
filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64
Here are some examples of how other mailers construct this:
Pine:
Content-Type: APPLICATION/msword; name="Mid-Pgm Assessment
Form000000000000000.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename="Mid-Pgm Assessment
Form000000000000000.doc"
Pine with a filename > 78:
Content-Type: APPLICATION/msword; name*0="Mid-Pgm Assessment
Form000000000000000 this is a test and this is another test and th";
name*1="is is a third test and just one more for kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment; filename*0="Mid-Pgm Assessment
Form000000000000000 this is a test and this is another test and th";
filename*1="is is a third test and just one more for kicks.doc"
Pine with a filename > 78 and no spaces:
Content-Type: APPLICATION/msword;
name*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_;
name*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: BASE64
Content-Disposition: attachment;
filename*0=Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_;
filename*1="this_is_a_third_test_and_just_one_more_for_kicks.doc"
Mulberry:
Content-Type: application/msword;
name="Mid-Pgm Assessment Form000000000000000.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Mid-Pgm Assessment Form000000000000000.doc"; size=25088
Mulberry with a filename > 78:
Content-Type: application/msword;
name="Mid-Pgm Assessment Form000000000000000 this is a test and this
is another test and this is a third test and just one more for
kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Mid-Pgm Assessment Form000000000000000 this is a test and
this is another test and this is a third test and just one more for
kicks.doc";
size=24064
Mulberry with a filename > 78 and no spaces:
Content-Type: application/msword;
name="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="Mid-Pgm_Assessment_Form0000000000000_this_is_a_test_and_this_is_another_test_and_this_is_a_third_test_and_just_one_more_for_kicks.doc";
size=24064
The following patch which replaces the tab character with a space at least
does not potentially embed a funky character in the attachment filename
quoted string which some mailers cannot make sense of and therefore
include but it does not deal with a long filename comprised of only
alphanumeric characters:
diff -r1.132 MIME.php
809c809
< $line = wordwrap(rtrim($header) . ': ' . rtrim($text), 75,
$eol . "\t");
---
$eol . " ");
< $line = preg_replace("/" . $eol . "\t\s*" . $eol . "\t/",
"/" . $eol . "\t/", $line);
---
. $eol . " /", $line);
characters in a line. Each line of characters MUST be no more than
998 characters, and SHOULD be no more than 78 characters, excluding
the CRLF.
The 998 character limit is due to limitations in many implementations
which send, receive, or store Internet Message Format messages that
simply cannot handle more than 998 characters on a line. Receiving
implementations would do well to handle an arbitrarily large number
of characters in a line for robustness sake. However, there are so
many implementations which (in compliance with the transport
requirements of [RFC2821]) do not accept messages containing more
than 1000 character including the CR and LF per line, it is important
for implementations not to create such messages.
The more conservative 78 character recommendation is to accommodate
the many implementations of user interfaces that display these
messages which may truncate, or disastrously wrap, the display of
more than 78 characters per line, in spite of the fact that such
implementations are non-conformant to the intent of this
specification (and that of [RFC2821] if they actually cause
information to be lost). Again, even though this limitation is put on
messages, it is encumbant upon implementations which display messages
I think since the character limit is a "MUST be no more than 998" and a
"SHOULD be no more than 78" then there are the following options:
- use spaces instead of tabs to indent continuation lines on MIME part
headers
- start a new continuation line each time a semi-colon is encountered
outside of a quoted-string unless it is the trailing character
- limit each of these lines to 998 or 78:
- either truncate the value portion of the header attribute to make the
overall length of the line less than 998
or
- use the attribute_key*<n> syntax to break up quoted-strings so that
no line exceeds 78 characters
I was thinking that replacing the call with something like the following -
this hasn't been syntactically checked or anything:
function wrapHeaders($header, $text, $eol = "\r\n")
{
/* Remove any existing linebreaks. */
$text = trim(preg_replace("/\r?\n\s?/", ' ', $text));
$header = trim($header);
$line = '';
if ((strlen($text) + strlen($header)) < 75) {
$line .= $header . ': ' . $text . $eol;
} else {
/* need a more accurate separator regex here but this is
just for demonstrative purposes */
$attrs = array_map('trim', preg_split(';', $text, -1,
PREG_SPLIT_NO_EMPTY));
for ($i = 0; $i < count($attrs); $i++) {
if ($i == 0) {
/* if this is the first line account for the
length of the header addition */
$prefix = $header . ': ';
} else {
/* otherwise it is just a single whitespace
indent to account for */
$prefix = ' ';
}
$offset = strlen($prefix);
if ((strlen($offset) + strlen($attrs[$i])) < 75) {
$line .= $prefix . $attrs[$i] . ';' . $eol;
} else {
$attrItems = explode('=', $attrs[$i], 1);
/* if the separator isn't found in the attribute then
* the value should probably not be folded.
* just make sure it doesn't exceed 995
*/
if (!$attrItems) {
$line .= $prefix . substr($attrs[$i], 0, 995
- $offset) . ';' . $eol;
} else {
$attrName = $attrItems[0];
$attrVale = trim($attrItems[1], '"');
$chunks = chunk_split(trim($attrItems[1],
'"'), 75 - ($offset + strlen($attrName) + 6))
for ($c = 0; $c < count($chunks); $c++) {
$line .= $line .= $prefix .
"$attrName*$c=" . '"' . $chunks[$c] . '";' . $eol;
}
}
}
}
return substr($line, strlen($header) + 2);
}
}
I think there should also be some code in place to deal with displaying
these long filenames at the top of the message in HTML. I think the
anchor tag should be truncated to a certain number of characters and an
alt tag with the full string should be added.
Comments?
--
Sam Nicolary