Summary | charset pb replying to message |
Queue | IMP |
Queue Version | Git master |
Type | Bug |
State | Resolved |
Priority | 1. Low |
Owners | slusarz (at) horde (dot) org |
Requester | rsalmon (at) mbpgroup (dot) com |
Created | 02/09/2011 (5232 days ago) |
Due | |
Updated | 03/18/2011 (5195 days ago) |
Assigned | 03/04/2011 (5209 days ago) |
Resolved | 03/18/2011 (5195 days ago) |
Github Issue Link | |
Github Pull Request | |
Milestone | |
Patch | No |
notice until now.
until now.
The test passes OK.
New Attachment: XssTest.log
describes. See if this helps.
and they all looked OK.
I ran the new test case, and it fails. See attached log file.
Bug #9567: XML encoding tag may not appear at beginning of output1 files changed, 6 insertions(+), 3 deletions(-)
http://git.horde.org/horde-git/-/commit/ba922deff269a9a6a35427610ddb6eb3adf9282f
describes. See if this helps.
Bug #9567: Improve loading of HTML documents6 files changed, 123 insertions(+), 46 deletions(-)
http://git.horde.org/horde-git/-/commit/cf8cb46f13765b88342477735f7bbb473727fffd
content="text/html; charset=iso-8859-1" instead of
content="text/html; charset="iso8859-1"
Text_Filter should work.
Add another test for
Bug #9567content="text/html; charset=iso-8859-1" instead of
content="text/html; charset="iso8859-1"
Add another test for
Bug #95671 files changed, 0 insertions(+), 0 deletions(-)
http://git.horde.org/horde-git/-/commit/423b29e9ae9e6b4227f7d8870aa8a2b6d823f120
New Attachment: testBug9567.patch
found a way to reproduce using the test case in Text_Filter you
originally created.
See attached file, I think that this is how the test case in
Text_Filter should work.
The test passes OK with the patch from comment
#31.What do you think ?
For the last 20 minutes, I can't reproduce this issue any more.
But I don't know why, I don't know what I did or changed :-(
Anyway, since I can't reproduce this issue any more, I guess there is
no need to keep this ticket open.
But I think the patch from comment
#31still applies.Thanks for you patience.
of the other dev can try to reply to the message 'email_charset.eml'
to see if I'm really alone on this one.
wrong when replying to this message in traditional mode, using the
HTML editor.
http://devzone.zend.com/article/8855 are right, specifically section 5
: "DOMDocument::saveXML($node) method is always performed in UTF-8"
Then, no matter what $doc->encoding is set to, the following code will
*always* return a UTF-8 encoded string :
if ($body && $body->hasChildNodes()) {
foreach ($body->childNodes as $child) {
$text .= $dom->dom->saveXML($child);
}
}
So, I think that Horde_Text_Filter_Xss::postProcess should be patch
like this :
- return Horde_String::convertCharset($text, $dom->encoding,
$this->_params['charset']);
+ return Horde_String::convertCharset($text, 'UTF-8',
$this->_params['charset']);
Now, why $dom->encoding is different on my machine than yours, I don't
have the answer (and I tried a lot of things). but according to
http://devzone.zend.com/article/8855, Section 4,
DOMDocument::loadHTML() should detect meta tag 'charset', and on my
system, it does (I guess) and this should explain as why
$dom->encoding=iso-8859-1 (or whatever charset the meta tag is set to,
see other comments).
As I think that the above small patch is right, I don't mind if some
of the other dev can try to reply to the message 'email_charset.eml'
to see if I'm really alone on this one.
Thanks.
version of libxml2 and php are you using just in case this is related
to the version I'm using. As I'm using the latest (or close enough),
I'll try to downgrade to whatever version you are using.
libxml2 Version => 2.7.8
in the message :
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type
content=3D"text/html; charset=3Diso-8859-1"=
content=3D"text/html; charset=3Diso-8859-1"= to
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type
content=3D"text/html; charset=3Dwindows-1252"=
I get the $dom->encoding = windows-1252 which is I my case expected
and the same result as before : rubbish.
What version of libxml2 and php are you using just in case this is
related to the version I'm using. As I'm using the latest (or close
enough), I'll try to downgrade to whatever version you are using.
libxml2 Version => 2.7.8
in the message :
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type
content=3D"text/html; charset=3Diso-8859-1"=
charset it was provided in, which is why the convertCharset() call
is necessary. The question is why $dom->encoding is 'ISO-8859-1'
for you and 'UTF-8' for *everybody* else.
re-attaching the message.
To the question as why $dom->encoding is 'ISO-8859-1', the answer is
in the message :
<!--a75c305b1c0a6022--><meta http-equiv=3DContent-Type
content=3D"text/html; charset=3Diso-8859-1"=
If I change 'charset=3Diso-8859-1"=' to 'charset=3Diso-8859-15"=',
then $dom->encoding = ISO-8859-15
If I remove the meta tag from the message, everything works fine.
I've checked other messages I'm having issue with, and they all have
the same charset meta tag
So this behaviour is expected according to
http://devzone.zend.com/article/8855, Section 4.
I don't understand why this doesn't work here and works for you. What
version of libxml2 and php are you using just in case this is related
to the version I'm using. As I'm using the latest (or close enough),
I'll try to downgrade to whatever version you are using.
Forget about this *patch*
This is the starting point in trying to figure this out.
charset it was provided in, which is why the convertCharset() call is
necessary. The question is why $dom->encoding is 'ISO-8859-1' for you
and 'UTF-8' for *everybody* else.
+++ Xss.php 2011-03-15 10:41:24.000000000 +0100
- return Horde_String::convertCharset($text, $dom->encoding,
$this->_params['charset']);
+ return $text;
Bug9567fails.I've attached the output.
we be patching? Of course the test is going to fail - you are
altering the output for a successful test.
New Attachment: phpunit2.log
+++ Xss.php 2011-03-15 10:41:24.000000000 +0100
- return Horde_String::convertCharset($text, $dom->encoding,
$this->_params['charset']);
+ return $text;
Bug9567fails.I've attached the output.
things for you?
probably not the same libxml/php version
Googling a bit, I ran into this article
http://devzone.zend.com/article/8855, 5. Save/dumping operations and
encoding :
"Node or XML subtree dumping using the DOMDocument::saveXML($node)
method is always performed in UTF-8."
This is the issue I'm having, $dom->encoding = iso-8859-1 and
$dom->dom->saveXML($child) returns utf-8.
The following patch works for me for all messages read, reply,
forward... (for whatever I've tested so far) :
--- Xss.php.org 2011-03-15 10:41:22.000000000 +0100
+++ Xss.php 2011-03-15 10:41:24.000000000 +0100
@@ -130,7 +130,7 @@
}
}
- return Horde_String::convertCharset($text, $dom->encoding,
$this->_params['charset']);
+ return $text;
}
/**
http://lists.horde.org/archives/dev/Week-of-Mon-20101115/025488.html
New Attachment: xml_charset.diff
it passes for you.
running the test, but not related to this bug I guess.
A user had submitted this patch awhile back. Maybe this fixes things for you?
New Attachment: phpunit.log
it passes for you.
running the test, but not related to this bug I guess.
see attatched log file.
I can provide you another message as an example if you want, but I
can't attach it to this ticket as it is a non public message. I tried
to remove private information, but no matter what editor I was using,
I always ended up altering the charset of the message when saving.
passes for you.
Easiest way to run is to go to
horde/framework/Text_Filter/test/Horde/Text/Filter and run 'php
AllTests.php'
Add test for
Bug #95671 files changed, 0 insertions(+), 0 deletions(-)
http://git.horde.org/horde-git/-/commit/9bc38e1452ccb11d2c709175b65d705243df1ad0
New Attachment: xss.patch
this ticket), here is the output of charset detection :
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777ASCII
[pid 15410 on line 130 of
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777UTF-8
[pid 15410 on line 130 of
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777ASCII
[pid 15410 on line 130 of
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp] 777777777777777777UTF-8
[pid 15410 on line 133 of
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
2011-03-09T09:33:34+01:00 INFO: HORDE [imp]
777777777777777777iso-8859-1 [pid 15410 on line 134 of
"/var/www/html/hordetest/libs/Horde/Text/Filter/Xss.php"]
So, this confirm that dom->saveXML returns UTF-8 characters, but
$doc->encoding is iso-8859-1.
I'm having this issue not only with the attached message, but pretty
much with all messages in my inbox (as a matter of fact all messages
containing accents).
Just in case this was related to libxml, I've update the lib to libxml2-2.7.8
New Attachment: Xss[1].tgz
New Attachment: Xss.tgz
$doc->encoding is iso-8859-1? If that is the case, I don't see why
this isn't working... we are converting $text to ISO-8859-1 (from
UTF-8) and then sending to loadHTML. So things should be fine.
(vi, nedit) I wasn't getting (seeing) the same output, and I just
realised that now.
So, this got me up to framework/Text_Filter/lib/Horde/Text/Filter/Xss.php
I've attached the log patch and horde log file. The log file is trace
of replying to the message.
It looks like dom->saveXML returns UTF-8 characters.
If I change the last 'return' of function postProcess($text) like this
- return Horde_String::convertCharset($text, $dom->encoding,
$this->_params['charset']);
+ return $text;
Then accents look Ok!
Assigned to Michael Slusarz
State ⇒ Feedback
$doc->encoding is iso-8859-1? If that is the case, I don't see why
this isn't working... we are converting $text to ISO-8859-1 (from
UTF-8) and then sending to loadHTML. So things should be fine.
Maybe check what the value of $doc->encoding is AFTER line 83? Or try
creating a new DOMDocument object - e.g.:
/* If libxml can't auto-detect encoding, convert to what it
* *thinks* the encoding should be. */
$doc = new DOMDocument();
$doc->loadHTML(Horde_String::convertCharset($text,
$charset, $doc->encoding));
problem is coming from Domhtml.php. DOMDocument thinks that $text is
iso-8859-1, but it is UTF-8 as it has been converted earlier on.
The text message gets screwed after the following call (line 83) :
$doc->loadHTML(Horde_String::convertCharset($text,
$charset, $doc->encoding));
$charset = utf-8
$doc->encoding = iso-8859-1
I don't know what to do from there. Anyway I can help ?
I use:
Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.2.13) Gecko/20110103
Fedora/3.6.13-1.fc14 Firefox/3.6.13
php-5.3.5-1.el5.remi.1
if ($mode == 'html') {
$msg =
$GLOBALS['injector']->getInstance('Horde_Core_Factory_TextFilter')->filter($msg, array('Cleanhtml', 'Xss'), array(array('body_only' => true), array('strip_styles' => true, 'strip_style_attributes' =>
false)));
} elseif ($type == 'text/html') {
$msg =
$GLOBALS['injector']->getInstance('Horde_Core_Factory_TextFilter')->filter($msg,
'Html2text');
$type = 'text/plain';
}
but after line 2511, msg looks wrong (accent are screwed).
If I change array('Cleanhtml', 'Xss') to array(), accents look OK (but
the reply message is a bit screwed :-)).
New Attachment: output.png
New Attachment: screenshot.png
to messages.
using the message attached to this ticket, traditional view mode,
- $_prefs['compose_html']['value'] = 0;
- $_prefs['reply_format']['value'] = 0;
=> reply Ok
- $_prefs['compose_html']['value'] = 1;
- $_prefs['reply_format']['value'] = 1;
=> reply *NOK*
see attached screenshot, accents look like rubbish.
"Array" is not configured in the Horde Registry.
Bug #9549. When I reply tothe attached message, accents are showing OK.
I can't debug today. I updated from git this morning and I can't use
dynamic imp now (see below). I have either missed something or there's
something wrong in git repo. I'll wait monday...
A fatal error has occurred
"Array" is not configured in the Horde Registry.
1. require() /var/www/html/hordetest/imp/index.php:19
2. IMP_Dimp::header() /var/www/html/hordetest/imp/index-dimp.php:38
3. include() /var/www/html/hordetest/imp/lib/Dimp.php:78
4. include() /var/www/html/hordetest/imp/templates/common-header.inc:9
5. Horde_Registry->getInitialPage()
/var/www/html/hordetest/imp/templates/dimp/javascript_defs.php:22
Bug #9549.part with "HTML composition" is can be a duplicate of
Bug #9549.But the first part is not a duplicate of
Bug #9549. When I reply tothe attached message, accents are showing OK.
Bug #9549.part with "HTML composition" is can be a duplicate of
Bug #9549.But the first part is not a duplicate of
Bug #9549. When I reply tothe attached message, accents are showing OK.
State ⇒ Duplicate
Bug #9549.State ⇒ Unconfirmed
Priority ⇒ 1. Low
Type ⇒ Bug
Summary ⇒ charset pb replying to message
Queue ⇒ IMP
Milestone ⇒
Patch ⇒ No
New Attachment: email.eml
compose_html=1
reply_format=1
attached is the message from
ticket #9189and#9190.replying to this message gives :
"préparer à vendre dâaoût ; "
expected :
"préparer à vendre d'août ;"
Funny thing is (this probably is related to
ticket #9549) :using dynamic view,
- select message and reply
- click on "HTML composition" (do not click or modify the body of the message)
output :
<p><a href="mailto:ronan@maison.com">ronan@maison.com</a> a écrit
:</p><blockquote type="cite" style="border-left:2px solid
blue;margin-left:8px;padding-left:8px;">> Bonjour,<br />
><br />
> <br />
><br />
> préparer à vendre d?août ;<br />
</blockquote><br /><br />
the output is html source, but the text (accents) looks ok.