Summary | Incorrect message charset in replies with reply_headers |
Queue | IMP |
Queue Version | Git master |
Type | Bug |
State | Resolved |
Priority | 2. Medium |
Owners | slusarz (at) horde (dot) org |
Requester | jan (at) horde (dot) org |
Created | 05/27/2011 (5157 days ago) |
Due | |
Updated | 09/06/2017 (2863 days ago) |
Assigned | |
Resolved | 07/28/2011 (5095 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
commit d9ecd6a34a5e8c46d881c2c221b09082b12e57bf
Author: bertrand Gugger <toggg@php.net>
Date: Tue Mar 13 11:18:17 2007 +0000
#10148 Simplify the row regexp to work with new PCRE, thanks Mark
Wiesemann for the testing and support
git-svn-id:
https://svn.php.net/repository/pear/packages/Text_Wiki/trunk@231777
c90b9560-bf6c-de11-be94-00142212c4b1
Text/Wiki/Parse/Mediawiki/Table.php | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
http://github.com/horde/horde/commit/d9ecd6a34a5e8c46d881c2c221b09082b12e57bf
message encoding breaks the message:
message encoding breaks the message:
I've received an email encoded in ISO-8859-1. It looks good because it
contains no accented characters (many people in my country write
without accented characters because of various reasons).
I reply to this email using Horde. There are accented characters in my
name and that means they should be both in my identity (From header in
the message) and in my signature. These characters are sent as
question marks and my message looks like my UA is broken (and that is
truth in my opinion).
The other thing is I alwalys use accented characters because they are
a part of my mother tongue. When I reply to a message in an encoding
which doesn't support accented characters, my message will be
unreadable by the recipient.
I think I'm not the only one willing to compose emails in my mother
tongue and there should be a way to make it possible. I would prefer a
configuration option to force all outgoing messages to UTF-8. I would
have definitely enabled it in our installation if it was possible.
It's certainly truth that clients with broken UTF-8 support exist
somewhere on the internet and they wouldn't be able to read our
messages. But I prefer to inform my users and recipients of my
messages that the problem lies on the receiving side and not in our
webmail. With current Horde my users realize that replaying to an
email with the same message text in e.g. Thunderbird works while in
Horde it doesn't work.
Today the recurrent round of encoding war among Latvian IT specialists
ended which began as usual - one sent UTF8 encoded message on which
reply was made using client which for some damned legacy reasons
analyses the message and decides what encoding to use and due to the
fact that neither Subject no reply body text contained chars with
vowels, the client demoted encoding to ISO8859-1 ("the best one
everyone can read" ). On that message I replayed using chars with
vowels. The result was ordinary unreadable email in iso8859-1 with a
lot of question marks (as horde keeps encodind from original) with
following investigation on "who is ass among us". Therefore I made the
correction in Compose.php to have UTF8 ALLWAYS ON assuming that the
time of UTF8 incapable clients is long time gone and such should not
be honoured.
On the care for mental health of those who do not know what the
encoding means - I suggest one step further - remove prom preferences
any mention of encoding to make no confusion at all.
By the way, even my mom (71) knows what the encoding is; at least -
what effect is has on e-mails. So the problem lies within basic
computer literacy and not the encodings. Even browsers have a lot of
available encodings and users - at least in Baltic countries - know
how to use them to get the problematic page readable.
I, as a user, **must proactively make a decision** using this
information at compose time. And if I use my non-existent/incorrect
knowledge, there is a good chance I could break the outgoing e-mail
since the chosen charset may not support the characters I have
placed in the body.
deplyoers in horde/imp mailin list about users going insane seeing the
encoding option. Yes, using that control it was possible to break
outgoing email, but without it you do not give the possibility for
the user to make a correction to it if it is desperately necessary.
What solution do you offer for such cases? Start to write a new letter
and copy/paste the text on which one is replying after reception of
the letter having "What that means?!!!" as the main text of the body?
Janis
likely that headers can contain US-ASCII only (and - as the results
- relpy in 8859-1) characters only, while the the reply message BODY
requires UTF8.
(not sure if it was this thread). Again: for replies, we send the
message in the charset that the original sender sent (as long as the
original sender sent in non-ASCII). That is because this is the only
charset we know for sure the sender can handle.
Unless/until we decide to send all messages in UTF-8, this is correct
behavior and not a bug.
latest stable IMP is concerned.
wish that was the case, I guarantee there are legacy clients that
don't handle UTF-8 (or have buggy handling). So changing our default
to always send in UTF-8 is a decision NOT to be taken lightly.
FYI: Gmail doesn't send all messages in UTF-8 either.
everything to UTF-8 and consistently stick with that internally.
and various framework code.
flying is irrelevant because when do I have to use that information?
And if I don't have that knowledge, or have the wrong knowledge, it
doesn't affect anything.
Conversely, whether or not I know about charsets is important because
I, as a user, **must proactively make a decision** using this
information at compose time. And if I use my non-existent/incorrect
knowledge, there is a good chance I could break the outgoing e-mail
since the chosen charset may not support the characters I have placed
in the body.
// if (($msg_text['charset'] == 'us-ascii') &&
// (Horde_Mime::is8bit($msg_pre, 'UTF-8') ||
// Horde_Mime::is8bit($msg_post, 'UTF-8'))) {
$msg_text['charset'] = 'UTF-8';
// }
i may be wrong, but now no one argues about reception of unreadable e-mails.
Ensure correct message charset is use if forward/reply headers contain
non US-ASCII characters (
Bug #10148)it is clear the only a half of the problem is solved - It is very
likely that headers can contain US-ASCII only (and - as the results -
relpy in 8859-1) characters only, while the the reply message BODY
requires UTF8.
latest stable IMP is concerned.
aren't doing something like responding to a Norwegian message in
Mandarin Chinese, so this charset hint is useful in almost all cases.
especially in case if there still exist mail clients not sticking to
original encoding, but adjusting it to the minimum - for example - it
is real situation that the reply to UTF8 message is demoted to
ISO8859-1 just because the text of reply does not contain letters with
vowels.
As for my specific case - it is nothing extraordinary having two or
three languages in replay - Latvian, English and Russian for which
only UTF8 offers full coverage.
the problem already starts with the lack of automatic promotion from
transliterated Latvian (iso8859-1, using "aa" instead of "?" etc) to
the correct one - iso8859-13 or UTF8
everything to UTF-8 and consistently stick with that internally.
Nobody, outside of maybe programmers and computer scientists, has
any clue what charsets mean anyway, so giving them an option to
change is completely pointless.
it from the list of available transportation means including for
those, who know even how to steer them? Do you really think that
"stupidification" of software is the way to go?
Priority ⇒ 2. Medium
Bug #10148: Forward/Reply Headers Charset fixEnsure correct message charset is use if forward/reply headers contain
non US-ASCII characters.
3 files changed, 24 insertions(+), 4 deletions(-)
http://git.horde.org/horde-git/-/commit/a3a90d924835f9958b371a313c9fc2d99e28c271
conversion code completely and see for a few releases if we get any
negative feedback?
doesn't seem like something appropriate for adding in the middle of
minor releases.
information whether the conversion worked completely, so this won't
work.
I tend to always use UTF-8. Can we try this out without ditching all
conversion code completely and see for a few releases if we get any
negative feedback?
IMP_Compose#replyMessageText() is inserting the decoded From: header
into the message text. This header might contain non-ASCII
characters. When determining the message's charset further down,
only the original message's charset is considered though. This is a
problem if the original message matches the email charset of the
current language (so that we don't use UTF-8 for the reply message),
but the From: header can not be converted to that charset.
reply message based on the original message and b) doing away with the
sending_charset preference. In other words, we should send everything
in UTF-8. Are there really mail readers out there in 2011 that still
don't support UTF-8?
I am more inclined to keep a) simply because it gives us hard
information that the sender can at least read the e-mail message (on
at least one of his MUA's) in that charset. And generally people
aren't doing something like responding to a Norwegian message in
Mandarin Chinese, so this charset hint is useful in almost all cases.
But the code would be so much easier to maintain if we just convert
everything to UTF-8 and consistently stick with that internally.
a) would work better if we had some way of telling that the conversion
from UTF-8 -> other charset was unsuccessful (e.g. there is no
codepoint mapping for a certain character). But I don't think our
conversion methods give us this kind of feedback so that is not helpful.
Having the user pick the charset is simply out of the question.
Nobody, outside of maybe programmers and computer scientists, has any
clue what charsets mean anyway, so giving them an option to change is
completely pointless.
Thoughts?
New Attachment: example.eml
as the possibility to set the letter encoding by hand disappeared in
IMP5, due to the non-uniformity of mail clients used I often run into
following problem:
while replaying, the reply is sent in encoding set in the original
letter, so, if I receive a letter from other Latvian guy, encoding of
which is set to ISO8859-1 or US-ASCII (for example - by mailer because
no extended letters are used in text) and reply to it using full
Latvian alphabet (ISO8859-13 or UTF8), my replay can be guessed at
best if not totally unreadable.
May be it is possible to promote encoding if replay uses extended charsets?
(info non related to the issue stipped from example)
New Attachment: bug10148.eml
Priority ⇒ 1. Low
Type ⇒ Bug
Summary ⇒ Incorrect message charset in replies with reply_headers
Queue ⇒ IMP
Assigned to Michael Slusarz
Milestone ⇒
Patch ⇒ No
State ⇒ Assigned
is inserting the decoded From: header into the message text. This
header might contain non-ASCII characters. When determining the
message's charset further down, only the original message's charset is
considered though. This is a problem if the original message matches
the email charset of the current language (so that we don't use UTF-8
for the reply message), but the From: header can not be converted to
that charset.