Summary | compose html2text charset |
Queue | IMP |
Queue Version | Git master |
Type | Bug |
State | Resolved |
Priority | 1. Low |
Owners | slusarz (at) horde (dot) org |
Requester | rsalmon (at) mbpgroup (dot) com |
Created | 08/19/2010 (5435 days ago) |
Due | |
Updated | 08/25/2010 (5429 days ago) |
Assigned | 08/19/2010 (5435 days ago) |
Resolved | 08/20/2010 (5434 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | Yes |
doesn't exist in ISO-8859-1. See:
http://en.wikipedia.org/wiki/Windows-1252
As Jan mentioned, it looks like something (Outlook?) is attempting to
pass off windows-1252 as ISO-8859-1. So there is nothing technically
wrong with what we are doing (i.e. there is no bug).
That being said... it may be useful to somehow catch iso-8859-1 text
that looks like windows-1252 and convert as such. Moving to Ticket
#9201.compose new text message (setting charset to UTF-8)
- set subject and body to "azerty "
-> send and open
- subject ok
- body ok : source is "azerty =E2=82=AC", but the euro sign is
displayed just fine in FF.
-> reply (compose_html enabled)
- subject ok
- body displayed ok
-> click html2text
- subject ok
- body *nok* (euro sign: ?)
-> reply (compose_html disabled)
- subject ok
- body displayed ok
compose new text message (setting charset to ISO-8859-15)
- set subject and body to "azerty "
-> send and open
- subject ok
- body ok
-> reply (compose_html enabled). Charset has automatically switched
to UTF-8.
- subject ok
- body displayed ok
-> click html2text
- subject ok
- body *nok* (euro sign: ?)
-> reply (compose_html disabled). Charset has automatically
switched to UTF-8.
- subject ok
- body displayed ok
compose new HTML message (setting charset to UTF-8)
- set subject and body to "azerty "
-> send and open
- subject ok
- body text part *nok* : azerty ?
- body html part *nok* : displayed azerty ?, but source is "azerty
=E2=82=AC"
compose new HTML message (setting charset to ISO-8859-15)
- set subject and body to "azerty "
-> send and open
- subject ok
- body text part *nok* : euro sign converted to 'EUR'
- body html part *nok* : source is "azerty =A4" but "EUR" is
displayed in FF.
New Attachment: email[1].eml
emails. So if this happens for the single quote, it will happen for
other character I guess.
Another example of a character that doesn't like being converted
between charset : (euro sign).
See new attached message.
- open the email : is transform as "EUR" !
- reply to email (pref compose_html enabled) : euro sign is now a
question mark.
- click on html to text: all the text is gone.
State ⇒ Resolved
text from MS-Word or anything similar.
character may display correctly but there can be no guarantee. As
suspected, there is nothing left to do in this ticket.
from MS-Word or anything similar.
really a charset, let alone 8859-1.
html to text,
I get : "préparer à vendre d?août ;"
I expect : "préparer à vendre daoût ;"
The single quote becomes a '?' !
ISO-8859-1 -> UTF-8 -> ISO-8859-1 loses that character. Don't know if
that's a PHP bug or an issue with Horde_String, but there is nothing
wrong theoretically with that conversions code.
Do note - that weird quote character (it is NOT the standard single
quote character from US-ASCII) doesn't display in ANY of the messages
I receive. It always appears as bytecode [0092] on my FF screen, for
example.
html to text,
I get : "préparer à vendre d?août ;"
I expect : "préparer à vendre daoût ;"
The single quote becomes a '?' !
/*-secure-{"response":{"text":"ronan@maison.com a
\u00e9crit\u00a0:\n\n> Bonjour,\n>\n> \u00a0\n>\n> pr\u00e9parer
\u00e0 vendre d?ao\u00fbt\u00a0;\n"}}*/
- here is the POST
text <p> ronan@maison.com a écrit :</p> <blockquote
style="background-color: rgb(240, 240, 240); border-left: 1px solid
blue; padding-left: 1em;" type="cite"> <div class="Section1"> <p
class="MsoNormal"> <font face="Arial" size="3"><span
style="font-size: 12pt; font-family:
Arial;">Bonjour,</span></font></p> <p class="MsoNormal"> <font
face="Arial" size="3"><span style="font-size: 12pt; font-family:
Arial;"> </span></font></p> <p class="MsoNormal"> <font
face="Arial" size="3"><span style="font-size: 12pt; font-family:
Arial;">préparer à vendre daoût ;</span></font></p> </div>
</blockquote>
to text,
I get : "préparer à vendre d?août ;"
I expect : "préparer à vendre daoût ;"
The single quote becomes a '?' !
Bug #9187: Fix charset issues when doing Html2text compose conversion.http://git.horde.org/diff.php/imp/lib/Ui/Compose.php?rt=horde-git&r1=b371414ef2533f1b57355c545afc8b4901c76bfb&r2=cd03906a381a67d4c1c67972e047a875d77eac9d
Bug #9187: Add callback parameter to the Html2text filter.http://git.horde.org/diff.php/framework/Text_Filter/lib/Horde/Text/Filter/Html2text.php?rt=horde-git&r1=e1e160c088651055d80024107b4acd679a07265c&r2=bcb3f3b1070dc6ddb9125d619b860e2bb071ee64
http://git.horde.org/diff.php/framework/Text_Filter/package.xml?rt=horde-git&r1=890fb2cab418a76e1722a789b9a924c4e3c1cd5b&r2=bcb3f3b1070dc6ddb9125d619b860e2bb071ee64
Assigned to Michael Slusarz
State ⇒ Feedback
accent get screwed. It appears to come from DOMDocument not being
able to detect properly the charset.
The following fix does the job for me (inspired from "User
Contributed Notes"
http://www.php.net/manual/en/domdocument.loadhtml.php)
available (it is not required for Horde).
Try my patch. It has been working with the XSS filter for a bit now
and seems to do the right thing.
Bug #9187: Use same DOM loading technique that we use for XSS filterhttp://git.horde.org/diff.php/framework/Text_Filter/lib/Horde/Text/Filter/Html2text.php?rt=horde-git&r1=69d3938fec6ae969f8a5eb8d2402c0b9a653e731&r2=0a4cf5ca50dc225931682195cd0fd36cd6ec9e62
Bug #9187: Fix variable typo.http://git.horde.org/diff.php/imp/lib/Ui/Compose.php?rt=horde-git&r1=02d755c4f13fcc243bfebbff5a9e5f18187d496f&r2=b371414ef2533f1b57355c545afc8b4901c76bfb
Priority ⇒ 1. Low
State ⇒ Unconfirmed
New Attachment: email.eml
Patch ⇒ Yes
Milestone ⇒
Queue ⇒ IMP
Summary ⇒ compose html2text charset
Type ⇒ Bug
$mime_drivers['html']['inline'] => true,
php-5.3.2
Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
When clicking on "Switch to plain text composition", I get no text.
Fix :
imp/lib/Ui/Compose.php:384
-- return $msg . "\n" . $sig;
++ return $data . "\n" . $sig;
I have a second issue when switching from html 2 text composition,
accent get screwed. It appears to come from DOMDocument not being able
to detect properly the charset.
The following fix does the job for me (inspired from "User Contributed
Notes" http://www.php.net/manual/en/domdocument.loadhtml.php)
--- Html2text.php 2010-07-27 10:20:23.000000000 +0200
+++
/var/www/html/horde/libs/Horde/Text/Filter/Html2text.php 2010-08-19
12:38:34.000000000 +0200
@@ -102,16 +102,22 @@
public function postProcess($text)
{
if (extension_loaded('dom')) {
- $text = Horde_String::convertCharset($text,
$this->_params['charset'], 'UTF-8');
+ if ($this->_params['charset'] != 'UTF-8') {
+ $text = Horde_String::convertCharset($text,
$this->_params['charset'], 'UTF-8');
+ }
$old_error = libxml_use_internal_errors(true);
$doc = new DOMDocument();
- $doc->loadHTML('<?xml encoding="UTF-8">' . $text);
+ $doc->loadHTML('<?xml encoding="UTF-8">' .
mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8"));
if ($old_error) {
libxml_use_internal_errors(false);
}
- $text = Horde_String::convertCharset($this->_node($doc,
$doc), 'UTF-8', $this->_params['charset']);
+ if ($this->_params['charset'] != 'UTF-8') {
+ $text =
Horde_String::convertCharset($this->_node($doc, $doc), 'UTF-8',
$this->_params['charset']);
+ } else {
+ $text = $this->_node($doc, $doc);
+ }
}