Incorrect message charset in replies with reply_headers

If the reply_headers preference is set, IMP_Compose#replyMes

Fri, 27 May 2011 12:54:15 +0000

If the reply_headers preference is set, IMP_Compose#replyMessageText() is inserting the decoded From: header into the message text. This header might contain non-ASCII characters. When determining the message's charset further down, only the original message's charset is considered though. This is a problem if the original message matches the email charset of the current language (so that we don't use UTF-8 for the reply message), but the From: header can not be converted to that charset.

This is such a message from the IMP mailing list.

Fri, 27 May 2011 12:55:17 +0000

This is such a message from the IMP mailing list.

I'd like to extend the ticket to more broader encoding issue

Thu, 02 Jun 2011 07:11:52 +0000

I'd like to extend the ticket to more broader encoding issue: as the possibility to set the letter encoding by hand disappeared in IMP5, due to the non-uniformity of mail clients used I often run into following problem: while replaying, the reply is sent in encoding set in the original letter, so, if I receive a letter from other Latvian guy, encoding of which is set to ISO8859-1 or US-ASCII (for example - by mailer because no extended letters are used in text) and reply to it using full Latvian alphabet (ISO8859-13 or UTF8), my replay can be guessed at best if not totally unreadable. May be it is possible to promote encoding if replay uses extended charsets? (info non related to the issue stipped from example)

> If the reply_headers preference is set, > IMP_Compose#re

Tue, 07 Jun 2011 05:56:57 +0000

> If the reply_headers preference is set, > IMP_Compose#replyMessageText() is inserting the decoded From: header > into the message text. This header might contain non-ASCII > characters. When determining the message's charset further down, only > the original message's charset is considered though. This is a > problem if the original message matches the email charset of the > current language (so that we don't use UTF-8 for the reply message), > but the From: header can not be converted to that charset. I'm thinking we should junk a) auto-determining charset of outgoing reply message based on the original message and b) doing away with the sending_charset preference. In other words, we should send everything in UTF-8. Are there really mail readers out there in 2011 that still don't support UTF-8? I am more inclined to keep a) simply because it gives us hard information that the sender can at least read the e-mail message (on at least one of his MUA's) in that charset. And generally people aren't doing something like responding to a Norwegian message in Mandarin Chinese, so this charset hint is useful in almost all cases. But the code would be so much easier to maintain if we just convert everything to UTF-8 and consistently stick with that internally. a) would work better if we had some way of telling that the conversion from UTF-8 -> other charset was unsuccessful (e.g. there is no codepoint mapping for a certain character). But I don't think our conversion methods give us this kind of feedback so that is not helpful. Having the user pick the charset is simply out of the question. Nobody, outside of maybe programmers and computer scientists, has any clue what charsets mean anyway, so giving them an option to change is completely pointless. Thoughts?

Unfortunately the underlying conversion methods don't return

Wed, 29 Jun 2011 15:40:18 +0000

Unfortunately the underlying conversion methods don't return any information whether the conversion worked completely, so this won't work. I tend to always use UTF-8. Can we try this out without ditching all conversion code completely and see for a few releases if we get any negative feedback?

> I tend to always use UTF-8. Can we try this out without di

Thu, 28 Jul 2011 22:17:51 +0000

> I tend to always use UTF-8. Can we try this out without ditching all > conversion code completely and see for a few releases if we get any > negative feedback? Definitely not discounting that this would be a proper goal, but doesn't seem like something appropriate for adding in the middle of minor releases.

Changes have been made in Git for this ticket: Bug #10148:

Thu, 28 Jul 2011 22:33:37 +0000

Changes have been made in Git for this ticket: Bug #10148: Forward/Reply Headers Charset fix Ensure correct message charset is use if forward/reply headers contain non US-ASCII characters. 3 files changed, 24 insertions(+), 4 deletions(-) http://git.horde.org/horde-git/-/commit/a3a90d924835f9958b371a313c9fc2d99e28c271

> at least one of his MUA's) in that charset. And generally

Thu, 01 Sep 2011 12:18:21 +0000

> at least one of his MUA's) in that charset. And generally people > aren't doing something like responding to a Norwegian message in > Mandarin Chinese, so this charset hint is useful in almost all cases. actually, it is quite short-sighted approach to the problem - especially in case if there still exist mail clients not sticking to original encoding, but adjusting it to the minimum - for example - it is real situation that the reply to UTF8 message is demoted to ISO8859-1 just because the text of reply does not contain letters with vowels. As for my specific case - it is nothing extraordinary having two or three languages in replay - Latvian, English and Russian for which only UTF8 offers full coverage. the problem already starts with the lack of automatic promotion from transliterated Latvian (iso8859-1, using "aa" instead of "?" etc) to the correct one - iso8859-13 or UTF8 > But the code would be so much easier to maintain if we just convert > everything to UTF-8 and consistently stick with that internally. do you have any internal plans when this "will hit the ground"? > Having the user pick the charset is simply out of the question. > Nobody, outside of maybe programmers and computer scientists, has any > clue what charsets mean anyway, so giving them an option to change is > completely pointless. so, if one do not understands how the air plane is flying, lets delete it from the list of available transportation means including for those, who know even how to steer them? Do you really think that "stupidification" of software is the way to go?

forgot to notice that the issue is not at all resolved as fa

Thu, 01 Sep 2011 12:24:59 +0000

forgot to notice that the issue is not at all resolved as far as the latest stable IMP is concerned.

I am sorry, but form what i see in git version control: E

Thu, 01 Sep 2011 12:34:08 +0000

I am sorry, but form what i see in git version control: Ensure correct message charset is use if forward/reply headers contain non US-ASCII characters (Bug #10148) it is clear the only a half of the problem is solved - It is very likely that headers can contain US-ASCII only (and - as the results - relpy in 8859-1) characters only, while the the reply message BODY requires UTF8.

In my installation i did the following: // if (($msg_text

Thu, 01 Sep 2011 12:55:40 +0000

In my installation i did the following: // if (($msg_text['charset'] == 'us-ascii') && // (Horde_Mime::is8bit($msg_pre, 'UTF-8') || // Horde_Mime::is8bit($msg_post, 'UTF-8'))) { $msg_text['charset'] = 'UTF-8'; // } i may be wrong, but now no one argues about reception of unreadable e-mails.

>> at least one of his MUA's) in that charset. And generall

Thu, 01 Sep 2011 16:33:11 +0000

>> at least one of his MUA's) in that charset. And generally people >> aren't doing something like responding to a Norwegian message in >> Mandarin Chinese, so this charset hint is useful in almost all cases. > > actually, it is quite short-sighted approach to the problem - > especially in case if there still exist mail clients not sticking to > original encoding, but adjusting it to the minimum - for example - it > is real situation that the reply to UTF8 message is demoted to > ISO8859-1 just because the text of reply does not contain letters > with vowels. > > As for my specific case - it is nothing extraordinary having two or > three languages in replay - Latvian, English and Russian for which > only UTF8 offers full coverage. > > the problem already starts with the lack of automatic promotion from > transliterated Latvian (iso8859-1, using "aa" instead of "?" etc) to > the correct one - iso8859-13 or UTF8 Your assumption is that everybody can read UTF-8 messages. While I wish that was the case, I guarantee there are legacy clients that don't handle UTF-8 (or have buggy handling). So changing our default to always send in UTF-8 is a decision NOT to be taken lightly. FYI: Gmail doesn't send all messages in UTF-8 either. >> But the code would be so much easier to maintain if we just convert >> everything to UTF-8 and consistently stick with that internally. > > do you have any internal plans when this "will hit the ground"? There are none. This would require a complete refactor of both IMP and various framework code. >> Having the user pick the charset is simply out of the question. >> Nobody, outside of maybe programmers and computer scientists, has any >> clue what charsets mean anyway, so giving them an option to change is >> completely pointless. > > so, if one do not understands how the air plane is flying, lets > delete it from the list of available transportation means including > for those, who know even how to steer them? Do you really think that > "stupidification" of software is the way to go? This is a terrible analogy. Whether or not I know how an airplane is flying is irrelevant because when do I have to use that information? And if I don't have that knowledge, or have the wrong knowledge, it doesn't affect anything. Conversely, whether or not I know about charsets is important because I, as a user, **must proactively make a decision** using this information at compose time. And if I use my non-existent/incorrect knowledge, there is a good chance I could break the outgoing e-mail since the chosen charset may not support the characters I have placed in the body.

> forgot to notice that the issue is not at all resolved as

Thu, 01 Sep 2011 16:34:23 +0000

> forgot to notice that the issue is not at all resolved as far as the > latest stable IMP is concerned. Works fine here. Given the example you previously provided.

> it is clear the only a half of the problem is solved - It

Thu, 01 Sep 2011 16:39:32 +0000

> it is clear the only a half of the problem is solved - It is very > likely that headers can contain US-ASCII only (and - as the results - > relpy in 8859-1) characters only, while the the reply message BODY > requires UTF8. And this is completely expected. And this has already been discussed (not sure if it was this thread). Again: for replies, we send the message in the charset that the original sender sent (as long as the original sender sent in non-ASCII). That is because this is the only charset we know for sure the sender can handle. Unless/until we decide to send all messages in UTF-8, this is correct behavior and not a bug.

>> it is clear the only a half of the problem is solved - It

Thu, 01 Sep 2011 19:21:50 +0000

>> it is clear the only a half of the problem is solved - It is very >> likely that headers can contain US-ASCII only (and - as the results - >> relpy in 8859-1) characters only, while the the reply message BODY >> requires UTF8. > > And this is completely expected. And this has already been discussed > (not sure if it was this thread). Again: for replies, we send the > message in the charset that the original sender sent (as long as the > original sender sent in non-ASCII). That is because this is the only > charset we know for sure the sender can handle. yes, you are right but just for some part of the problem. Today the recurrent round of encoding war among Latvian IT specialists ended which began as usual - one sent UTF8 encoded message on which reply was made using client which for some damned legacy reasons analyses the message and decides what encoding to use and due to the fact that neither Subject no reply body text contained chars with vowels, the client demoted encoding to ISO8859-1 ("the best one everyone can read" ). On that message I replayed using chars with vowels. The result was ordinary unreadable email in iso8859-1 with a lot of question marks (as horde keeps encodind from original) with following investigation on "who is ass among us". Therefore I made the correction in Compose.php to have UTF8 ALLWAYS ON assuming that the time of UTF8 incapable clients is long time gone and such should not be honoured. On the care for mental health of those who do not know what the encoding means - I suggest one step further - remove prom preferences any mention of encoding to make no confusion at all. By the way, even my mom (71) knows what the encoding is; at least - what effect is has on e-mails. So the problem lies within basic computer literacy and not the encodings. Even browsers have a lot of available encodings and users - at least in Baltic countries - know how to use them to get the problematic page readable. >Conversely, whether or not I know about charsets is important because >I, as a user, **must proactively make a decision** using this >information at compose time. And if I use my non-existent/incorrect >knowledge, there is a good chance I could break the outgoing e-mail >since the chosen charset may not support the characters I have placed >in the body. IMP4 had such option and i do not remeber protests from horde deplyoers in horde/imp mailin list about users going insane seeing the encoding option. Yes, using that control it was possible to break outgoing email, but without it you do not give the possibility for the user to make a correction to it if it is desperately necessary. What solution do you offer for such cases? Start to write a new letter and copy/paste the text on which one is replying after reception of the letter having "What that means?!!!" as the main text of the body? Janis

I'd like to point out a situation where replying with the or

Mon, 13 Feb 2012 09:47:14 +0000

I'd like to point out a situation where replying with the original message encoding breaks the message: I've received an email encoded in ISO-8859-1. It looks good because it contains no accented characters (many people in my country write without accented characters because of various reasons). I reply to this email using Horde. There are accented characters in my name and that means they should be both in my identity (From header in the message) and in my signature. These characters are sent as question marks and my message looks like my UA is broken (and that is truth in my opinion). The other thing is I alwalys use accented characters because they are a part of my mother tongue. When I reply to a message in an encoding which doesn't support accented characters, my message will be unreadable by the recipient. I think I'm not the only one willing to compose emails in my mother tongue and there should be a way to make it possible. I would prefer a configuration option to force all outgoing messages to UTF-8. I would have definitely enabled it in our installation if it was possible. It's certainly truth that clients with broken UTF-8 support exist somewhere on the internet and they wouldn't be able to read our messages. But I prefer to inform my users and recipients of my messages that the problem lies on the receiving side and not in our webmail. With current Horde my users realize that replaying to an email with the same message text in e.g. Thunderbird works while in Horde it doesn't work.

> I'd like to point out a situation where replying with the

Mon, 13 Feb 2012 12:07:02 +0000

> I'd like to point out a situation where replying with the original > message encoding breaks the message: Messages are sent in UTF-8 by default in Horde 4.1.

Changes have been made in Git (master): commit d9ecd6a34a5e

Wed, 06 Sep 2017 08:18:13 +0000

Changes have been made in Git (master): commit d9ecd6a34a5e8c46d881c2c221b09082b12e57bf Author: bertrand Gugger Date: Tue Mar 13 11:18:17 2007 +0000 #10148 Simplify the row regexp to work with new PCRE, thanks Mark Wiesemann for the testing and support git-svn-id: https://svn.php.net/repository/pear/packages/Text_Wiki/trunk@231777 c90b9560-bf6c-de11-be94-00142212c4b1 Text/Wiki/Parse/Mediawiki/Table.php | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) http://github.com/horde/horde/commit/d9ecd6a34a5e8c46d881c2c221b09082b12e57bf