6.0.0-git
2019-03-20

[#10148] Incorrect message charset in replies with reply_headers
Summary Incorrect message charset in replies with reply_headers
Queue IMP
Queue Version Git master
Type Bug
State Resolved
Priority 2. Medium
Owners slusarz (at) horde (dot) org
Requester jan (at) horde (dot) org
Created 2011-05-27 (2854 days ago)
Due
Updated 2017-09-06 (560 days ago)
Assigned
Resolved 2011-07-28 (2792 days ago)
Milestone
Patch No

History
2017-09-06 08:18:13 Git Commit Comment #18 Reply to this comment
Changes have been made in Git (master):

commit d9ecd6a34a5e8c46d881c2c221b09082b12e57bf
Author: bertrand Gugger <toggg@php.net>
Date:   Tue Mar 13 11:18:17 2007 +0000

     #10148 Simplify the row regexp to work with new PCRE, thanks Mark 
Wiesemann for the testing and support


     git-svn-id: 
https://svn.php.net/repository/pear/packages/Text_Wiki/trunk@231777 
c90b9560-bf6c-de11-be94-00142212c4b1

  Text/Wiki/Parse/Mediawiki/Table.php | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

http://github.com/horde/horde/commit/d9ecd6a34a5e8c46d881c2c221b09082b12e57bf
2012-02-13 12:07:02 Michael Slusarz Comment #17 Reply to this comment
I'd like to point out a situation where replying with the original 
message encoding breaks the message:
Messages are sent in UTF-8 by default in Horde 4.1.
2012-02-13 09:47:14 teplavoda (at) gmail (dot) com Comment #16 Reply to this comment
I'd like to point out a situation where replying with the original 
message encoding breaks the message:

I've received an email encoded in ISO-8859-1. It looks good because it 
contains no accented characters (many people in my country write 
without accented characters because of various reasons).

I reply to this email using Horde. There are accented characters in my 
name and that means they should be both in my identity (From header in 
the message) and in my signature. These characters are sent as 
question marks and my message looks like my UA is broken (and that is 
truth in my opinion).

The other thing is I alwalys use accented characters because they are 
a part of my mother tongue. When I reply to a message in an encoding 
which doesn't support accented characters, my message will be 
unreadable by the recipient.

I think I'm not the only one willing to compose emails in my mother 
tongue and there should be a way to make it possible. I would prefer a 
configuration option to force all outgoing messages to UTF-8. I would 
have definitely enabled it in our installation if it was possible.

It's certainly truth that clients with broken UTF-8 support exist 
somewhere on the internet and they wouldn't be able to read our 
messages. But I prefer to inform my users and recipients of my 
messages that the problem lies on the receiving side and not in our 
webmail. With current Horde my users realize that replaying to an 
email with the same message text in e.g. Thunderbird works while in 
Horde it doesn't work.
2011-09-01 19:21:50 je (at) ktf (dot) rtu (dot) lv Comment #15 Reply to this comment

[Show Quoted Text - 10 lines]
yes, you are right but just for some part of the problem.

Today the recurrent round of encoding war among Latvian IT specialists 
ended which began as usual - one sent UTF8 encoded message on which 
reply was made using client which for some damned legacy reasons 
analyses the message and decides what encoding to use and due to the 
fact that neither Subject no reply body text contained chars with 
vowels, the client demoted encoding to ISO8859-1 ("the best one 
everyone can read" ). On that message  I replayed using chars with 
vowels. The result was ordinary unreadable email in iso8859-1 with a 
lot of question marks (as horde keeps encodind from original) with 
following investigation on "who is ass among us". Therefore I made the 
correction in Compose.php to have UTF8 ALLWAYS ON assuming that the 
time of UTF8 incapable clients is long time gone and such should not 
be honoured.

On the care for mental health of those who do not know what the 
encoding means - I suggest one step further - remove prom preferences 
any mention of encoding to make no confusion at all.

By the way, even my mom (71)  knows what the encoding is; at least - 
what effect is has on e-mails. So the problem lies within basic 
computer literacy and not the encodings. Even browsers have a lot of 
available encodings and users - at least in Baltic countries - know 
how to use them to get the problematic page readable.
Conversely, whether or not I know about charsets is important because
I, as a user, **must proactively make a decision** using this 
information at compose time.  And if I use my non-existent/incorrect 
knowledge, there is a good chance I could break the outgoing e-mail 
since the chosen charset may not support the characters I have 
placed in the body.
IMP4 had such option and i do not remeber protests from horde 
deplyoers in horde/imp mailin list about users going insane seeing the 
encoding option. Yes, using that control it was possible to break 
outgoing email, but without it  you do not give the possibility for 
the user to make a correction to it if it is desperately necessary. 
What solution do you offer for such cases? Start to write a new letter 
and copy/paste the text on which one is replying after reception of 
the letter having "What that means?!!!" as the main text of the body?

Janis
2011-09-01 16:39:32 Michael Slusarz Comment #14 Reply to this comment
it is clear the only a half of the problem is solved - It is very 
likely that headers can contain US-ASCII only (and - as the results 
- relpy in 8859-1) characters only, while the the reply message BODY 
requires UTF8.
And this is completely expected.  And this has already been discussed 
(not sure if it was this thread).  Again: for replies, we send the 
message in the charset that the original sender sent (as long as the 
original sender sent in non-ASCII).  That is because this is the only 
charset we know for sure the sender can handle.

Unless/until we decide to send all messages in UTF-8, this is correct 
behavior and not a bug.
2011-09-01 16:34:23 Michael Slusarz Comment #13 Reply to this comment
forgot to notice that the issue is not at all resolved as far as the 
latest stable IMP is concerned.
Works fine here.  Given the example you previously provided.
2011-09-01 16:33:11 Michael Slusarz Comment #12 Reply to this comment

[Show Quoted Text - 18 lines]
Your assumption is that everybody can read UTF-8 messages.  While I 
wish that was the case, I guarantee there are legacy clients that 
don't handle UTF-8 (or have buggy handling).  So changing our default 
to always send in UTF-8 is a decision NOT to be taken lightly.

FYI: Gmail doesn't send all messages in UTF-8 either.
  But the code would be so much easier to maintain if we just convert
everything to UTF-8 and consistently stick with that internally.
do you have any internal plans when this "will hit the ground"?
There are none.  This would require a complete refactor of both IMP 
and various framework code.

[Show Quoted Text - 9 lines]
This is a terrible analogy.  Whether or not I know how an airplane is 
flying is irrelevant because when do I have to use that information?   
And if I don't have that knowledge, or have the wrong knowledge, it 
doesn't affect anything.

Conversely, whether or not I know about charsets is important because 
I, as a user, **must proactively make a decision** using this 
information at compose time.  And if I use my non-existent/incorrect 
knowledge, there is a good chance I could break the outgoing e-mail 
since the chosen charset may not support the characters I have placed 
in the body.
2011-09-01 12:55:40 je (at) ktf (dot) rtu (dot) lv Comment #11 Reply to this comment
In my installation i did the following:
//   if (($msg_text['charset'] == 'us-ascii') &&
//           (Horde_Mime::is8bit($msg_pre, 'UTF-8') ||
//             Horde_Mime::is8bit($msg_post, 'UTF-8'))) {
             $msg_text['charset'] = 'UTF-8';
//       }

i may be wrong, but now no one argues about reception of unreadable e-mails.
2011-09-01 12:34:08 je (at) ktf (dot) rtu (dot) lv Comment #10 Reply to this comment
I am sorry, but form what i see in git version control:

Ensure correct message charset is use if forward/reply headers contain 
non US-ASCII characters (Bug #10148)

it is clear the only a half of the problem is solved - It is very 
likely that headers can contain US-ASCII only (and - as the results - 
relpy in 8859-1) characters only, while the the reply message BODY 
requires UTF8.
2011-09-01 12:24:59 je (at) ktf (dot) rtu (dot) lv Comment #9 Reply to this comment
forgot to notice that the issue is not at all resolved as far as the 
latest stable IMP is concerned.
2011-09-01 12:18:21 je (at) ktf (dot) rtu (dot) lv Comment #8 Reply to this comment
at least one of his MUA's) in that charset.  And generally people 
aren't doing something like responding to a Norwegian message in 
Mandarin Chinese, so this charset hint is useful in almost all cases.
actually, it is quite short-sighted approach to the problem - 
especially in case if there still exist mail clients not sticking to 
original encoding, but adjusting it to the minimum - for example - it 
is real situation that the reply to UTF8 message is demoted to 
ISO8859-1 just because the text of reply does not contain letters with 
vowels.

As for my specific case - it is nothing extraordinary having two or 
three languages in replay - Latvian, English and Russian for which 
only UTF8 offers full coverage.

the problem already starts with the lack of automatic promotion from 
transliterated Latvian (iso8859-1, using "aa" instead of "?" etc) to 
the correct one - iso8859-13 or UTF8
  But the code would be so much easier to maintain if we just convert 
everything to UTF-8 and consistently stick with that internally.
do you have any internal plans when this "will hit the ground"?
Having the user pick the charset is simply out of the question.   
Nobody, outside of maybe programmers and computer scientists, has 
any clue what charsets mean anyway, so giving them an option to 
change is completely pointless.
so, if one do not understands how the air plane is flying, lets delete 
it from the list of available transportation means including for 
those, who know even how to steer them? Do you really think that 
"stupidification" of software is the way to go?
2011-07-28 22:34:01 Michael Slusarz State ⇒ Resolved
Priority ⇒ 2. Medium
 
2011-07-28 22:33:37 Git Commit Comment #7 Reply to this comment
Changes have been made in Git for this ticket:

Bug #10148: Forward/Reply Headers Charset fix
Ensure correct message charset is use if forward/reply headers contain
non US-ASCII characters.

  3 files changed, 24 insertions(+), 4 deletions(-)
http://git.horde.org/horde-git/-/commit/a3a90d924835f9958b371a313c9fc2d99e28c271
2011-07-28 22:17:51 Michael Slusarz Comment #6 Reply to this comment
I tend to always use UTF-8. Can we try this out without ditching all 
conversion code completely and see for a few releases if we get any 
negative feedback?
Definitely not discounting that this would be a proper goal, but 
doesn't seem like something appropriate for adding in the middle of 
minor releases.
2011-06-29 15:40:18 Jan Schneider Comment #5 Reply to this comment
Unfortunately the underlying conversion methods don't return any 
information whether the conversion worked completely, so this won't 
work.
I tend to always use UTF-8. Can we try this out without ditching all 
conversion code completely and see for a few releases if we get any 
negative feedback?
2011-06-07 05:56:57 Michael Slusarz Comment #4 Reply to this comment
If the reply_headers preference is set, 
IMP_Compose#replyMessageText() is inserting the decoded From: header 
into the message text. This header might contain non-ASCII 
characters. When determining the message's charset further down, 
only the original message's charset is considered though. This is a 
problem if the original message matches the email charset of the 
current language (so that we don't use UTF-8 for the reply message), 
but the From: header can not be converted to that charset.
I'm thinking we should junk a) auto-determining charset of outgoing 
reply message based on the original message and b) doing away with the 
sending_charset preference.  In other words, we should send everything 
in UTF-8.  Are there really mail readers out there in 2011 that still 
don't support UTF-8?

I am more inclined to keep a) simply because it gives us hard 
information that the sender can at least read the e-mail message (on 
at least one of his MUA's) in that charset.  And generally people 
aren't doing something like responding to a Norwegian message in 
Mandarin Chinese, so this charset hint is useful in almost all cases.   
But the code would be so much easier to maintain if we just convert 
everything to UTF-8 and consistently stick with that internally.

a) would work better if we had some way of telling that the conversion 
from UTF-8 -> other charset was unsuccessful (e.g. there is no 
codepoint mapping for a certain character).  But I don't think our 
conversion methods give us this kind of feedback so that is not helpful.

Having the user pick the charset is simply out of the question.   
Nobody, outside of maybe programmers and computer scientists, has any 
clue what charsets mean anyway, so giving them an option to change is 
completely pointless.

Thoughts?
2011-06-02 07:11:52 je (at) ktf (dot) rtu (dot) lv Comment #3
New Attachment: example.eml Download
Reply to this comment
I'd like to extend the ticket to more broader encoding issue:

as the possibility to set the letter encoding by hand disappeared in 
IMP5, due to the non-uniformity of mail clients used I often run into 
following problem:

while replaying, the reply is sent in encoding set in the original 
letter, so, if I receive a letter from other Latvian guy, encoding of 
which is set to ISO8859-1 or US-ASCII (for example - by mailer because 
no extended letters are used in text) and reply to it using full 
Latvian alphabet (ISO8859-13 or UTF8), my replay can be guessed at 
best if not totally unreadable.

May be it is possible to promote encoding if replay uses extended charsets?

(info non related to the issue stipped from example)
2011-05-27 12:55:17 Jan Schneider Comment #2
New Attachment: bug10148.eml Download
Reply to this comment
This is such a message from the IMP mailing list.
2011-05-27 12:54:15 Jan Schneider Comment #1
Type ⇒ Bug
State ⇒ Assigned
Priority ⇒ 1. Low
Summary ⇒ Incorrect message charset in replies with reply_headers
Queue ⇒ IMP
Assigned to Michael Slusarz
Milestone ⇒
Patch ⇒ No
Reply to this comment
If the reply_headers preference is set, IMP_Compose#replyMessageText() 
is inserting the decoded From: header into the message text. This 
header might contain non-ASCII characters. When determining the 
message's charset further down, only the original message's charset is 
considered though. This is a problem if the original message matches 
the email charset of the current language (so that we don't use UTF-8 
for the reply message), but the From: header can not be converted to 
that charset.

Saved Queries