<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet href="https://dev.horde.org/themes/horde//default/feed-rss.xsl" type="text/xsl"?> 
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> 
 <channel> 
  <title>db_migrate and incorrect charset handling</title> 
  <pubDate>Fri, 10 Apr 2026 17:52:42 +0000</pubDate> 
  <link>https://bugs.horde.org/ticket/9617</link> 
  <atom:link rel="self" type="application/rss+xml" title="db_migrate and incorrect charset handling" href="https://bugs.horde.org/ticket/9617/rss" /> 
  <description>db_migrate and incorrect charset handling</description> 
 
   
   
  <item> 
   <title>I been testing data migration from framework3 to h4 using db</title> 
   <description>I been testing data migration from framework3 to h4 using db_migrate.

Either it is mysql which is case insentive or it is the migration script, but it seems as if you cannot add tags to rampage_tags if tags differ only by their case. 

Eg. these tags are consider the same:
TYÖ
työ

If those tags exists in old data then db_migrate will fail with error:
QUERY FAILED: Duplicate entry &#039;työ&#039; for key &#039;rampage_tags_tag_name&#039;
INSERT INTO `rampage_tags` (tag_name) VALUES (&#039;työ&#039;)</description> 
   <pubDate>Wed, 02 Mar 2011 12:02:40 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t61994</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Use Horde_St</title> 
   <description>Changes have been made in Git for this ticket:

Use Horde_String::lower
Possibly fixes Bug: 9617

 1 files changed, 1 insertions(+), 1 deletions(-)
http://git.horde.org/horde-git/-/commit/ae5714cbb917c43e184f942db0e1b5f3197f679f</description> 
   <pubDate>Fri, 04 Mar 2011 14:37:52 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62028</link> 
  </item> 
   
  <item> 
   <title>Can you try what I just committed? </title> 
   <description>Can you try what I just committed? </description> 
   <pubDate>Fri, 04 Mar 2011 14:38:29 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62029</link> 
  </item> 
   
  <item> 
   <title>&gt; Can you try what I just committed?

I tried, but script </title> 
   <description>&gt; Can you try what I just committed?

I tried, but script fails with the same error.</description> 
   <pubDate>Fri, 04 Mar 2011 14:57:31 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62030</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Need to conv</title> 
   <description>Changes have been made in Git for this ticket:

Need to convert from database&#039;s charset before comparing
Bug: 9617

 1 files changed, 1 insertions(+), 1 deletions(-)
http://git.horde.org/horde-git/-/commit/0f36ff69b64c96e3ab91d6f479fa34cb15a451a9</description> 
   <pubDate>Fri, 04 Mar 2011 19:36:48 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62047</link> 
  </item> 
   
  <item> 
   <title>&gt; Changes have been made in Git for this ticket:
&gt;
&gt; Need </title> 
   <description>&gt; Changes have been made in Git for this ticket:
&gt;
&gt; Need to convert from database&#039;s charset before comparing
&gt; Bug: 9617
&gt;
&gt;  1 files changed, 1 insertions(+), 1 deletions(-)
&gt; http://git.horde.org/horde-git/-/commit/0f36ff69b64c96e3ab91d6f479fa34cb15a451a9

Attached patch seemed to fix it for me.

Another bug has appeared. Output from debug log:
DEBUG: SQL SELECT user_id, user_name FROM `rampage_users` WHERE user_name IN (&#039;ntllt&#039;)
DEBUG: SQL QUERY FAILED: Duplicate entry &#039;ntllt&#039; for key rampage_users_user_name&#039; INSERT INTO `rampage_users` (user_name) VALUES (&#039;ntllt&#039;)
</description> 
   <pubDate>Fri, 04 Mar 2011 21:15:24 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62048</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Need to conv</title> 
   <description>Changes have been made in Git for this ticket:

Need to convert to utf-8 when reading the category before tagging.
Bug: 9617

 1 files changed, 2 insertions(+), 2 deletions(-)
http://git.horde.org/horde-git/-/commit/3baf0b99425470dfdd77e02de1da4f32bf4851ff</description> 
   <pubDate>Fri, 04 Mar 2011 21:49:52 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62049</link> 
  </item> 
   
  <item> 
   <title>
&gt; Attached patch seemed to fix it for me.

It should be </title> 
   <description>
&gt; Attached patch seemed to fix it for me.

It should be sufficient to convert from the db charset to utf-8 here. However, there was the missing conversion from the database to utf-8 in the migration script that probably made the extra conversion in your patch necessary. This has been fixed.

&gt; Another bug has appeared. Output from debug log:
&gt; DEBUG: SQL SELECT user_id, user_name FROM `rampage_users` WHERE 
&gt; user_name IN (&#039;ntllt&#039;)
&gt; DEBUG: SQL QUERY FAILED: Duplicate entry &#039;ntllt&#039; for key 
&gt; rampage_users_user_name&#039; INSERT INTO `rampage_users` (user_name) 
&gt; VALUES (&#039;ntllt&#039;)


Yeah, looks like there is a bunch of charset conversions missing in content. Working on it, and updated the title of the ticket to reflect the actual problem.</description> 
   <pubDate>Fri, 04 Mar 2011 21:52:55 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62050</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Fix charset </title> 
   <description>Changes have been made in Git for this ticket:

Fix charset handling in tagger
Bug: 9617

 3 files changed, 26 insertions(+), 8 deletions(-)
http://git.horde.org/horde-git/-/commit/5ac680c13a93b32df97fc5bb3c29cb3c4b8e4cbb</description> 
   <pubDate>Sat, 05 Mar 2011 07:45:59 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62054</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Bug #9617: F</title> 
   <description>Changes have been made in Git for this ticket:

Bug #9617: Fix property name.

 1 files changed, 2 insertions(+), 2 deletions(-)
http://git.horde.org/horde-git/-/commit/7d484c517ddde6a6818845ea1b33be3f20c36c89</description> 
   <pubDate>Mon, 07 Mar 2011 16:23:21 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62103</link> 
  </item> 
   
  <item> 
   <title>&gt; Changes have been made in Git for this ticket:
&gt;
&gt; Bug #</title> 
   <description>&gt; Changes have been made in Git for this ticket:
&gt;
&gt; Bug #9617: Fix property name.
&gt;
&gt;  1 files changed, 2 insertions(+), 2 deletions(-)
&gt; http://git.horde.org/horde-git/-/commit/7d484c517ddde6a6818845ea1b33be3f20c36c89
</description> 
   <pubDate>Mon, 07 Mar 2011 21:02:50 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62114</link> 
  </item> 
   
  <item> 
   <title>The original problem still remains. PHP&#039;s manual suggest tha</title> 
   <description>The original problem still remains. PHP&#039;s manual suggest that one should not assume that strtolower()/strtoupper() work correctly with multibyte charset like utf-8.

Should the code use mb_strtoupper()/mb_strtolower() or Horde::String instead of strtolower()/strtoupper()?</description> 
   <pubDate>Mon, 07 Mar 2011 21:07:44 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62115</link> 
  </item> 
   
  <item> 
   <title>&gt; PHP&#039;s manual suggest that one should not assume that strto</title> 
   <description>&gt; PHP&#039;s manual suggest that one should not assume that strtolower()/strtoupper() work correctly with 
&gt; multibyte charset like utf-8.

Where does it say that? I don&#039;t see any such suggestions in the man pages.</description> 
   <pubDate>Mon, 07 Mar 2011 21:19:30 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t62118</link> 
  </item> 
   
  <item> 
   <title>&gt;&gt; PHP&#039;s manual suggest that one should not assume that 
&gt;&gt;</title> 
   <description>&gt;&gt; PHP&#039;s manual suggest that one should not assume that 
&gt;&gt; strtolower()/strtoupper() work correctly with
&gt;&gt; multibyte charset like utf-8.
&gt;
&gt; Where does it say that? I don&#039;t see any such suggestions in the man pages.

It does not it say it in so many words or at least says it ambiguously: &quot;Note that &#039;alphabetic&#039; is determined by the current locale&quot;

But if we look at php&#039;s source code for strtoupper() it works by bytes, therefore it will not work correctly with UTF-8 encoded strings that contain non ascii characters. 

Excerpt from ext/standard/string.c:
char *php_strtoupper(char *s, size_t len)
{
        unsigned char *c, *e;
        
        c = (unsigned char *)s;
        e = (unsigned char *)c+len;

        while (c &lt; e) {
                *c = toupper(*c);
                c++;
        }
        return s;
}

The non ascii characters in UTF-8 are multi byte. Therefore using php&#039;s strtoupper()/strtolower() will not work correctly with UTF-8 encoded strings with non ascii characters.
</description> 
   <pubDate>Mon, 04 Apr 2011 14:24:39 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63098</link> 
  </item> 
   
  <item> 
   <title>&gt;&gt;&gt; PHP&#039;s manual suggest that one should not assume that
&gt;&gt;</title> 
   <description>&gt;&gt;&gt; PHP&#039;s manual suggest that one should not assume that
&gt;&gt;&gt; strtolower()/strtoupper() work correctly with
&gt;&gt;&gt; multibyte charset like utf-8.
&gt;&gt;
&gt;&gt; Where does it say that? I don&#039;t see any such suggestions in the man pages.
&gt;
&gt; It does not it say it in so many words or at least says it 
&gt; ambiguously: &quot;Note that &#039;alphabetic&#039; is determined by the current 
&gt; locale&quot;

Which is exactly what we want.

&gt; But if we look at php&#039;s source code for strtoupper() it works by 
&gt; bytes, therefore it will not work correctly with UTF-8 encoded 
&gt; strings that contain non ascii characters.

So the manual is plain wrong.

&gt; Excerpt from ext/standard/string.c:
&gt; char *php_strtoupper(char *s, size_t len)
&gt; {
&gt;         unsigned char *c, *e;
&gt;
&gt;         c = (unsigned char *)s;
&gt;         e = (unsigned char *)c+len;
&gt;
&gt;         while (c &lt; e) {
&gt;                 *c = toupper(*c);
&gt;                 c++;
&gt;         }
&gt;         return s;
&gt; }
&gt;
&gt; The non ascii characters in UTF-8 are multi byte. Therefore using 
&gt; php&#039;s strtoupper()/strtolower() will not work correctly with UTF-8 
&gt; encoded strings with non ascii characters.

Thanks for tracking this down so deep.</description> 
   <pubDate>Mon, 04 Apr 2011 14:33:13 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63100</link> 
  </item> 
   
  <item> 
   <title>The test suite runs fine though. Can you provide a patch to </title> 
   <description>The test suite runs fine though. Can you provide a patch to TaggerTest.php that demonstrates the broken behavior?</description> 
   <pubDate>Mon, 04 Apr 2011 15:15:37 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63103</link> 
  </item> 
   
  <item> 
   <title>Nevermind, I found one. Just to get this straight, the idea </title> 
   <description>Nevermind, I found one. Just to get this straight, the idea is that tags TYÖ and työ are considered equal, right?</description> 
   <pubDate>Mon, 04 Apr 2011 15:30:26 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63105</link> 
  </item> 
   
  <item> 
   <title>&gt; Nevermind, I found one. Just to get this straight, the ide</title> 
   <description>&gt; Nevermind, I found one. Just to get this straight, the idea is that 
&gt; tags TYÖ and työ are considered equal, right?

That is right.</description> 
   <pubDate>Mon, 04 Apr 2011 15:37:38 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63106</link> 
  </item> 
   
  <item> 
   <title>Okay, but this opens a complete new can of worms. The &quot;SELEC</title> 
   <description>Okay, but this opens a complete new can of worms. The &quot;SELECT ... WHERE tag_name IN (...)&quot; won&#039;t work in this case, because it is case sensitive.
The correct solution would be to delegate the lowercasing to the database, but at least for SQLite this doesn&#039;t seem to work. &quot;SELECT LOWER(&#039;TYÖ&#039;)&quot; returns &quot;tyÖ&quot; there. It works fine in MySQL though.</description> 
   <pubDate>Mon, 04 Apr 2011 16:46:58 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63108</link> 
  </item> 
   
  <item> 
   <title>http://www.sqlite.org/faq.html#q18
This makes unit testing </title> 
   <description>http://www.sqlite.org/faq.html#q18
This makes unit testing this stuff a PITA.</description> 
   <pubDate>Mon, 04 Apr 2011 16:55:18 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63109</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Add failing </title> 
   <description>Changes have been made in Git for this ticket:

Add failing test for bug #9617.

 1 files changed, 5 insertions(+), 4 deletions(-)
http://git.horde.org/horde-git/-/commit/55802691eafbb6931b93e8c96ee8d2d4fd5b441b</description> 
   <pubDate>Mon, 04 Apr 2011 17:07:41 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63110</link> 
  </item> 
   
  <item> 
   <title>Changes have been made in Git for this ticket:

Fix case-ins</title> 
   <description>Changes have been made in Git for this ticket:

Fix case-insensitive filtering of duplicate tags (Bug #9617).
This simplifies the _checkTags() method a lot too. Unfortunately it
doesn&#039;t work at all with SQLite, so unit tests are rather useless.

 3 files changed, 19 insertions(+), 26 deletions(-)
http://git.horde.org/horde-git/-/commit/a90c671771adbbb1aa08576a2b9d13e011ca6790</description> 
   <pubDate>Mon, 04 Apr 2011 17:07:45 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63111</link> 
  </item> 
   
  <item> 
   <title>I don&#039;t see any short-term solution for that. We MUST expect</title> 
   <description>I don&#039;t see any short-term solution for that. We MUST expect the database to do case-insensitive searches, it&#039;s completely insane that it doesn&#039;t work by default with SQLite. I&#039;m surprised it didn&#039;t break anything else yet. This makes SQLite pretty useless for any real-world usage of Horde.</description> 
   <pubDate>Mon, 04 Apr 2011 17:09:53 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63112</link> 
  </item> 
   
  <item> 
   <title>Well, sqlite isn&#039;t going to scale for a full Horde installat</title> 
   <description>Well, sqlite isn&#039;t going to scale for a full Horde installation in the real world anyway, so while it&#039;s a pain, I&#039;m not sure how much of a problem it is...</description> 
   <pubDate>Tue, 05 Apr 2011 14:45:48 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63161</link> 
  </item> 
   
  <item> 
   <title>Original problem is fixed, resolving. Testing problems like </title> 
   <description>Original problem is fixed, resolving. Testing problems like this is impossible due to SQLite issues, but nothing we can do about that.</description> 
   <pubDate>Wed, 06 Apr 2011 18:38:52 +0000</pubDate> 
   <link>https://bugs.horde.org/ticket/9617#t63234</link> 
  </item> 
   
   
 
 </channel> 
</rss> 
