6.0.0-alpha12
6/7/25

[#9567] charset pb replying to message
Summary charset pb replying to message
Queue IMP
Queue Version Git master
Type Bug
State Resolved
Priority 1. Low
Owners slusarz (at) horde (dot) org
Requester rsalmon (at) mbpgroup (dot) com
Created 02/09/2011 (5232 days ago)
Due
Updated 03/18/2011 (5195 days ago)
Assigned 03/04/2011 (5209 days ago)
Resolved 03/18/2011 (5195 days ago)
Github Issue Link
Github Pull Request
Milestone
Patch No

History
03/18/2011 03:50:12 PM Michael Slusarz State ⇒ Resolved
 
03/18/2011 09:30:17 AM rsalmon (at) mbpgroup (dot) com Comment #44 Reply to this comment
I ran the new test case, and it fails. See attached log file.
I add a problem pulling up to date code from git, and I didn't 
notice until now.
I meant, I *had* a problem! Everything looks good now.
03/18/2011 09:27:34 AM rsalmon (at) mbpgroup (dot) com Comment #43 Reply to this comment
I ran the new test case, and it fails. See attached log file.
I add a problem pulling up to date code from git, and I didn't notice 
until now.

The test passes OK.
03/18/2011 08:21:43 AM rsalmon (at) mbpgroup (dot) com Comment #42
New Attachment: XssTest.log Download
Reply to this comment
Rewrote HTML document loading to better match what the URL below 
describes.  See if this helps.
Replying seems to be working just fine. I tried with a few messages 
and they all looked OK.

I ran the new test case, and it fails. See attached log file.



03/17/2011 09:29:08 PM Git Commit Comment #41 Reply to this comment
Changes have been made in Git for this ticket:

Bug #9567: XML encoding tag may not appear at beginning of output

  1 files changed, 6 insertions(+), 3 deletions(-)
http://git.horde.org/horde-git/-/commit/ba922deff269a9a6a35427610ddb6eb3adf9282f
03/17/2011 06:35:07 PM Michael Slusarz Comment #40 Reply to this comment
Rewrote HTML document loading to better match what the URL below 
describes.  See if this helps.
03/17/2011 06:34:32 PM Git Commit Comment #39 Reply to this comment
Changes have been made in Git for this ticket:

Bug #9567: Improve loading of HTML documents

  6 files changed, 123 insertions(+), 46 deletions(-)
http://git.horde.org/horde-git/-/commit/cf8cb46f13765b88342477735f7bbb473727fffd
03/17/2011 05:26:56 PM Michael Slusarz Comment #38 Reply to this comment
typos : it is
content="text/html; charset=iso-8859-1" instead of
content="text/html; charset="iso8859-1"
This shouldn't (and doesn't) make a difference.
03/17/2011 05:24:19 PM Michael Slusarz Comment #37 Reply to this comment
See attached file, I think that this is how the test case in 
Text_Filter should work.
Nope.  Works perfect for me.
03/17/2011 05:19:19 PM rsalmon (at) mbpgroup (dot) com Comment #36 Reply to this comment
Changes have been made in Git for this ticket:

Add another test for Bug #9567
typos : it is
content="text/html; charset=iso-8859-1" instead of
content="text/html; charset="iso8859-1"



03/17/2011 05:08:41 PM Git Commit Comment #35 Reply to this comment
Changes have been made in Git for this ticket:

Add another test for Bug #9567

  1 files changed, 0 insertions(+), 0 deletions(-)
http://git.horde.org/horde-git/-/commit/423b29e9ae9e6b4227f7d8870aa8a2b6d823f120
03/17/2011 08:29:11 AM rsalmon (at) mbpgroup (dot) com Comment #34
New Attachment: testBug9567.patch Download
Reply to this comment
I still can't reproduce the behaviour I was having using IMP, but I 
found a way to reproduce using the test case in Text_Filter you 
originally created.

See attached file, I think that this is how the test case in 
Text_Filter should work.

The test passes OK with the patch from comment #31.

What do you think ?



03/16/2011 05:28:45 PM rsalmon (at) mbpgroup (dot) com Comment #33 Reply to this comment
This is driving mad.

For the last 20 minutes, I can't reproduce this issue any more.
But I don't know why, I don't know what I did or changed :-(

Anyway, since I can't reproduce this issue any more, I guess there is 
no need to keep this ticket open.

But I think the patch from comment #31 still applies.

Thanks for you patience.

03/16/2011 04:53:29 PM Jan Schneider Comment #32 Reply to this comment
As I think that the above small patch is right, I don't mind if some 
of the other dev can try to reply to the message 'email_charset.eml' 
to see if I'm really alone on this one.
I didn't follow the complete ticket history, but I don't see anything 
wrong when replying to this message in traditional mode, using the 
HTML editor.
03/16/2011 04:41:57 PM rsalmon (at) mbpgroup (dot) com Comment #31 Reply to this comment

[Show Quoted Text - 9 lines]
If we assume that information from this link 
http://devzone.zend.com/article/8855 are right, specifically section 5 
: "DOMDocument::saveXML($node) method is always performed in UTF-8"

Then, no matter what $doc->encoding is set to, the following code will 
*always* return a UTF-8 encoded string :
  if ($body && $body->hasChildNodes()) {
         foreach ($body->childNodes as $child) {
                 $text .= $dom->dom->saveXML($child);
         }
  }

So, I think that Horde_Text_Filter_Xss::postProcess should be patch 
like this :

- return Horde_String::convertCharset($text, $dom->encoding, 
$this->_params['charset']);
+ return Horde_String::convertCharset($text, 'UTF-8', 
$this->_params['charset']);


Now, why $dom->encoding is different on my machine than yours, I don't 
have the answer (and I tried a lot of things). but according to 
http://devzone.zend.com/article/8855, Section 4, 
DOMDocument::loadHTML() should detect meta tag 'charset', and on my 
system, it does (I guess) and this should explain as why 
$dom->encoding=iso-8859-1 (or whatever charset the meta tag is set to, 
see other comments).


As I think that the above small patch is right, I don't mind if some 
of the other dev can try to reply to the message 'email_charset.eml' 
to see if I'm really alone on this one.

Thanks.


03/16/2011 03:50:12 PM rsalmon (at) mbpgroup (dot) com Comment #30 Reply to this comment
I don't understand why this doesn't work here and works for you. What
version of libxml2 and php are you using just in case this is related
to the version I'm using. As I'm using the latest (or close enough),
I'll try to downgrade to whatever version you are using.
PHP 5.3.4 (cli) (built: Feb 12 2011 00:26:56)
libxml2 Version => 2.7.8
I recompiled both libxml2 and php, still having the same issue :-(



03/16/2011 07:50:26 AM rsalmon (at) mbpgroup (dot) com Comment #29 Reply to this comment
To the question as why $dom->encoding is 'ISO-8859-1', the answer is
in the message :
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type 
content=3D"text/html; charset=3Diso-8859-1"=
Maybe one problem is that this message is NOT iso-8859-1; it is windows-1252.
If I change <!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type 
content=3D"text/html; charset=3Diso-8859-1"= to 
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type 
content=3D"text/html; charset=3Dwindows-1252"=
I get the $dom->encoding = windows-1252 which is I my case expected 
and the same result as before : rubbish.


03/15/2011 11:01:39 PM Michael Slusarz Comment #28 Reply to this comment
I don't understand why this doesn't work here and works for you. 
What version of libxml2 and php are you using just in case this is 
related to the version I'm using. As I'm using the latest (or close 
enough), I'll try to downgrade to whatever version you are using.
PHP 5.3.4 (cli) (built: Feb 12 2011 00:26:56)
libxml2 Version => 2.7.8
03/15/2011 10:41:22 PM Michael Slusarz Comment #27 Reply to this comment
To the question as why $dom->encoding is 'ISO-8859-1', the answer is 
in the message :
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type 
content=3D"text/html; charset=3Diso-8859-1"=
Maybe one problem is that this message is NOT iso-8859-1; it is windows-1252.
03/15/2011 10:12:57 PM rsalmon (at) mbpgroup (dot) com New Attachment: email_charset.eml Download
 
03/15/2011 10:12:21 PM rsalmon (at) mbpgroup (dot) com Comment #26 Reply to this comment
This isn't correct.  Xss filter needs to return text in whatever 
charset it was provided in, which is why the convertCharset() call 
is necessary.  The question is why $dom->encoding is 'ISO-8859-1' 
for you and 'UTF-8' for *everybody* else.
Hmm wait, weird, the message attached to this ticket looks wrong. 
re-attaching the message.

To the question as why $dom->encoding is 'ISO-8859-1', the answer is 
in the message :
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type 
content=3D"text/html; charset=3Diso-8859-1"=

If I change 'charset=3Diso-8859-1"=' to 'charset=3Diso-8859-15"=', 
then $dom->encoding = ISO-8859-15

If I remove the meta tag from the message, everything works fine.

I've checked other messages I'm having issue with, and they all have 
the same charset meta tag

So this behaviour is expected according to 
http://devzone.zend.com/article/8855, Section 4.


I don't understand why this doesn't work here and works for you. What 
version of libxml2 and php are you using just in case this is related 
to the version I'm using. As I'm using the latest (or close enough), 
I'll try to downgrade to whatever version you are using.



03/15/2011 10:06:58 PM rsalmon (at) mbpgroup (dot) com Comment #25 Reply to this comment

[Show Quoted Text - 12 lines]
I'm lost as well, and desperately trying to find a way out of this. 
Forget about this *patch*

03/15/2011 05:12:46 PM Michael Slusarz Comment #24 Reply to this comment
This may be useful, specifically Section 4 about loading/encoding.   
This is the starting point in trying to figure this out.
03/15/2011 05:04:01 PM Michael Slusarz Comment #23 Reply to this comment

[Show Quoted Text - 10 lines]
This isn't correct.  Xss filter needs to return text in whatever 
charset it was provided in, which is why the convertCharset() call is 
necessary.  The question is why $dom->encoding is 'ISO-8859-1' for you 
and 'UTF-8' for *everybody* else.
03/15/2011 04:56:30 PM Michael Slusarz Comment #22 Reply to this comment
--- Xss.php.org        2011-03-15 10:41:22.000000000 +0100
+++ Xss.php        2011-03-15 10:41:24.000000000 +0100
-        return Horde_String::convertCharset($text, $dom->encoding, 
  $this->_params['charset']);
+        return $text;
with this patch, testBug9567 fails.
I've attached the output.
OK - I am *totally confused*.  The test runs successfully.  Why should 
we be patching?  Of course the test is going to fail - you are 
altering the output for a successful test.
03/15/2011 10:15:13 AM rsalmon (at) mbpgroup (dot) com Comment #21
New Attachment: phpunit2.log Download
Reply to this comment
--- Xss.php.org        2011-03-15 10:41:22.000000000 +0100
+++ Xss.php        2011-03-15 10:41:24.000000000 +0100
-        return Horde_String::convertCharset($text, $dom->encoding,   
$this->_params['charset']);
+        return $text;
with this patch, testBug9567 fails.
I've attached the output.
03/15/2011 09:54:46 AM rsalmon (at) mbpgroup (dot) com Comment #20 Reply to this comment
A user had submitted this patch awhile back.  Maybe this fixes 
things for you?
Nope, it does fix anything, although I use the same OS Centos 5.4, but 
probably not the same libxml/php version

Googling a bit, I ran into this article 
http://devzone.zend.com/article/8855, 5. Save/dumping operations and 
encoding :

"Node or XML subtree dumping using the DOMDocument::saveXML($node) 
method is always performed in UTF-8."

This is the issue I'm having, $dom->encoding = iso-8859-1 and 
$dom->dom->saveXML($child) returns utf-8.

The following patch works for me for all messages read, reply, 
forward... (for whatever I've tested so far) :
--- Xss.php.org        2011-03-15 10:41:22.000000000 +0100
+++ Xss.php        2011-03-15 10:41:24.000000000 +0100
@@ -130,7 +130,7 @@
              }
          }

-        return Horde_String::convertCharset($text, $dom->encoding, 
$this->_params['charset']);
+        return $text;
      }

      /**




03/15/2011 03:06:51 AM Michael Slusarz Comment #19 Reply to this comment
03/15/2011 03:06:08 AM Michael Slusarz Comment #18
New Attachment: xml_charset.diff Download
Reply to this comment
I have added a test case in Text_Filter that passes for me.  See if
it passes for you.
after installing PHPUnit (thanks to remi's repo) I get 2 failures 
running the test, but not related to this bug I guess.
Yes - those are not related to this bug.

A user had submitted this patch awhile back.  Maybe this fixes things for you?
03/12/2011 04:24:26 PM rsalmon (at) mbpgroup (dot) com Comment #17
New Attachment: phpunit.log Download
Reply to this comment
I have added a test case in Text_Filter that passes for me.  See if 
it passes for you.
after installing PHPUnit (thanks to remi's repo) I get 2 failures 
running the test, but not related to this bug I guess.

see attatched log file.

I can provide you another message as an example if you want, but I 
can't attach it to this ticket as it is a non public message. I tried 
to remove private information, but no matter what editor I was using, 
I always ended up altering the charset of the message when saving.


03/10/2011 08:26:10 PM Michael Slusarz Comment #16 Reply to this comment
I have added a test case in Text_Filter that passes for me.  See if it 
passes for you.

Easiest way to run is to go to 
horde/framework/Text_Filter/test/Horde/Text/Filter and run 'php 
AllTests.php'
03/10/2011 08:24:25 PM Git Commit Comment #15 Reply to this comment
Changes have been made in Git for this ticket:

Add test for Bug #9567

  1 files changed, 0 insertions(+), 0 deletions(-)
http://git.horde.org/horde-git/-/commit/9bc38e1452ccb11d2c709175b65d705243df1ad0
03/10/2011 07:50:23 PM Michael Slusarz Deleted Original Message
 
03/09/2011 08:46:06 AM rsalmon (at) mbpgroup (dot) com Comment #14
New Attachment: xss.patch Download
Reply to this comment
with the patch attached (and using the message originally attached to 
this ticket), here is the output of charset detection :

2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777ASCII 
[pid 15410 on line 130 of 
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777UTF-8 
[pid 15410 on line 130 of 
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777ASCII 
[pid 15410 on line 130 of 
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777UTF-8 
[pid 15410 on line 133 of 
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 
777777777777777777iso-8859-1 [pid 15410 on line 134 of 
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]


So, this confirm that dom->saveXML returns UTF-8 characters, but 
$doc->encoding is iso-8859-1.

I'm having this issue not only with the attached message, but pretty 
much with all messages in my inbox (as a matter of fact all messages 
containing accents).

Just in case this was related to libxml, I've update the lib to libxml2-2.7.8


03/07/2011 09:37:30 AM rsalmon (at) mbpgroup (dot) com Comment #13
New Attachment: Xss[1].tgz Download
Reply to this comment
I've attached the wrong file in the last comment.
03/07/2011 09:22:05 AM rsalmon (at) mbpgroup (dot) com Comment #12
New Attachment: Xss.tgz
Reply to this comment
So what you are saying is that BEFORE line 83, $charset is utf-8 and 
$doc->encoding is iso-8859-1?  If that is the case, I don't see why 
this isn't working... we are converting $text to ISO-8859-1 (from 
UTF-8) and then sending to loadHTML.  So things should be fine.
I got mislead with editors charset. Depending on which one I was using 
(vi, nedit) I wasn't getting (seeing)  the same output, and I just 
realised that now.

So, this got me up to framework/Text_Filter/lib/Horde/Text/Filter/Xss.php

I've attached the log patch and horde log file. The log file is trace 
of replying to the message.

It looks like dom->saveXML returns UTF-8 characters.

If I change the last 'return' of function postProcess($text) like this
-        return Horde_String::convertCharset($text, $dom->encoding, 
$this->_params['charset']);
+        return $text;

Then accents look Ok!

03/04/2011 05:59:13 PM Michael Slusarz Comment #11
Assigned to Michael Slusarz
State ⇒ Feedback
Reply to this comment

[Show Quoted Text - 11 lines]
So what you are saying is that BEFORE line 83, $charset is utf-8 and 
$doc->encoding is iso-8859-1?  If that is the case, I don't see why 
this isn't working... we are converting $text to ISO-8859-1 (from 
UTF-8) and then sending to loadHTML.  So things should be fine.

Maybe check what the value of $doc->encoding is AFTER line 83?  Or try 
creating a new DOMDocument object - e.g.:

                 /* If libxml can't auto-detect encoding, convert to what it
                  * *thinks* the encoding should be. */
                 $doc = new DOMDocument();
                 $doc->loadHTML(Horde_String::convertCharset($text, 
$charset, $doc->encoding));
03/04/2011 04:37:35 PM rsalmon (at) mbpgroup (dot) com Comment #10 Reply to this comment
I got further.

problem is coming from Domhtml.php. DOMDocument thinks that $text is 
iso-8859-1, but it is UTF-8 as it has been converted earlier on.

The text message gets screwed after the following call (line 83) :
                 $doc->loadHTML(Horde_String::convertCharset($text, 
$charset, $doc->encoding));

$charset = utf-8
$doc->encoding = iso-8859-1

I don't know what to do from there. Anyway I can help ?

I use:
Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.2.13) Gecko/20110103 
Fedora/3.6.13-1.fc14 Firefox/3.6.13
php-5.3.5-1.el5.remi.1


03/04/2011 04:09:30 PM rsalmon (at) mbpgroup (dot) com Comment #9 Reply to this comment
Still working perfectly here.
body (variable $msg) looks fine up to line 2504 in imp/lib/Compose.php

         if ($mode == 'html') {
             $msg = 
$GLOBALS['injector']->getInstance('Horde_Core_Factory_TextFilter')->filter($msg, array('Cleanhtml', 'Xss'), array(array('body_only' => true), array('strip_styles' => true, 'strip_style_attributes' => 
false)));
         } elseif ($type == 'text/html') {
             $msg = 
$GLOBALS['injector']->getInstance('Horde_Core_Factory_TextFilter')->filter($msg, 
'Html2text');
             $type = 'text/plain';
         }

but after line 2511, msg looks wrong (accent are screwed).

If I change array('Cleanhtml', 'Xss') to array(), accents look OK (but 
the reply message is a bit screwed :-)).


03/03/2011 11:00:04 PM Michael Slusarz Comment #8
New Attachment: output.png Download
Reply to this comment
Still working perfectly here.
02/28/2011 08:43:57 AM rsalmon (at) mbpgroup (dot) com Comment #7
New Attachment: screenshot.png Download
Reply to this comment
Restarting from scratch to explain the issue I'm having when replying 
to messages.

using the message attached to this ticket, traditional view mode,

- $_prefs['compose_html']['value'] = 0;
- $_prefs['reply_format']['value'] = 0;
=> reply Ok

- $_prefs['compose_html']['value'] = 1;
- $_prefs['reply_format']['value'] = 1;
=> reply *NOK*

see attached screenshot, accents look like rubbish.


02/11/2011 05:32:56 PM Michael Slusarz Comment #6 Reply to this comment
A fatal error has occurred
"Array" is not configured in the Horde Registry.
You did not update imp/config/portal.php for changes made last night.
02/11/2011 02:03:46 PM rsalmon (at) mbpgroup (dot) com Comment #5 Reply to this comment
But the first part is not a duplicate of Bug #9549. When I reply to
the attached message, accents are showing OK.
So there is no longer an issue, correct?  I can't reproduce.
Arrgh, really bad week.  I meant  "accents are not showing OK".

I can't debug today. I updated from git this morning and I can't use 
dynamic imp now (see below). I have either missed something or there's 
something wrong in git repo. I'll wait monday...


A fatal error has occurred
"Array" is not configured in the Horde Registry.

1. require() /var/www/html/hordetest/imp/index.php:19
2. IMP_Dimp::header() /var/www/html/hordetest/imp/index-dimp.php:38
3. include() /var/www/html/hordetest/imp/lib/Dimp.php:78
4. include() /var/www/html/hordetest/imp/templates/common-header.inc:9
5. Horde_Registry->getInitialPage() 
/var/www/html/hordetest/imp/templates/dimp/javascript_defs.php:22


02/11/2011 09:12:13 AM Michael Slusarz Comment #4 Reply to this comment
Duplicate of Bug #9549.
I agree that my first comment can be a bit confusing, and that the 
part with "HTML composition" is can be a duplicate of Bug #9549.

But the first part is not a duplicate of Bug #9549. When I reply to 
the attached message, accents are showing OK.
So there is no longer an issue, correct?  I can't reproduce.
02/10/2011 08:08:47 AM rsalmon (at) mbpgroup (dot) com Comment #3 Reply to this comment
Duplicate of Bug #9549.
I agree that my first comment can be a bit confusing, and that the 
part with "HTML composition" is can be a duplicate of Bug #9549.

But the first part is not a duplicate of Bug #9549. When I reply to 
the attached message, accents are showing OK.

02/09/2011 07:12:50 PM Michael Slusarz Comment #2
State ⇒ Duplicate
Reply to this comment
Duplicate of Bug #9549.
02/09/2011 02:47:50 PM rsalmon (at) mbpgroup (dot) com Comment #1
State ⇒ Unconfirmed
Priority ⇒ 1. Low
Type ⇒ Bug
Summary ⇒ charset pb replying to message
Queue ⇒ IMP
Milestone ⇒
Patch ⇒ No
New Attachment: email.eml Download
Reply to this comment
using dynamic view
compose_html=1
reply_format=1

attached is the message from ticket #9189 and #9190.

replying to this message gives :
"préparer à vendre d’août ; "
expected :
"préparer à vendre d'août ;"

Funny thing is (this probably is related to ticket #9549) :
using dynamic view,
- select message and reply
- click on "HTML composition" (do not click or modify the body of the message)

output :
<p><a href="mailto:ronan@maison.com">ronan@maison.com</a> a écrit 
:</p><blockquote type="cite" style="border-left:2px solid 
blue;margin-left:8px;padding-left:8px;">&gt; Bonjour,<br />
&gt;<br />
&gt;  <br />
&gt;<br />
&gt; préparer à vendre d?août ;<br />
</blockquote><br /><br />

the output is html source, but the text (accents) looks ok.


Saved Queries