<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet href="/h/themes/default/feed-rss.xsl" type="text/xsl"?> 
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"> 
 <channel> 
  <title>Contact data damaged when adding contact to address book</title> 
  <pubDate>Sat, 18 May 2013 23:38:46 +0000</pubDate> 
  <link>http://bugs.horde.org/ticket/11014</link> 
  <atom:link rel="self" type="application/rss+xml" title="Contact data damaged when adding contact to address book" href="http://bugs.horde.org/ticket/11014/rss" /> 
  <description>Contact data damaged when adding contact to address book</description> 
 
   
   
  <item> 
   <title>Summary: Surname or name of contact are possibly damaged whe</title> 
   <description>Summary: Surname or name of contact are possibly damaged when adding
contact to address book in case the data is UTF-8 encoded. A part of
multibyte character in surname or name is sometimes treated as a
whitespace when sscanf() is called.

Analysis: There is sscanf() function used in /turba/lib/Driver.php:

$splitval = sscanf($val, $parse['format']);

$var can be a multibyte string (UTF-8 encoded).

sscanf(), as a PHP String function, doesn't handle multibyte encodings such as UTF-8.

So, instead of sscanf(), another way of composite fields computing should be used.

Please note 1: PHP manual says: sscanf() function is not locale-aware. This is not true. PHP's sscanf() uses isspace() system function call which is locale-aware. Moreover, isspace() works as expected only for alphanumeric characters and symbols. The results of isspace() on other than these bytes of unsigned char type is unpredictable in general. Therefore composite name computing with sscanf() may or may not succeed, and eventually success is not the proof of code correctness.

Please note 2: this was reported with Ticket #10956 and shown to slusarz@horde.org who wasn't able to reproduce this issue. IMHO the ability to reproduce the issue depends on subtle operating system details (i.e. on isspace() implementation).

Please note 3: So, this could be in fact Solaris (not Horde) issue (doesn't happen under Ubuntu Linux for instance). Solaris isspace() is badly damaged. However you shouldn't use sscanf() there because data can be multibyte. But there is no similar multibyte function in PHP. I think PHP needs something like swscanf(). Maybe Multibyte String or intl functions could be used.

But in this case the problem is to find three substrings separated by space(s). explode() should be enough?</description> 
   <pubDate>Mon, 20 Feb 2012 19:45:54 +0000</pubDate> 
   <link>http://bugs.horde.org/ticket/11014#t70419</link> 
  </item> 
   
  <item> 
   <title>&gt; But in this case the problem is to find three substrings s</title> 
   <description>&gt; But in this case the problem is to find three substrings separated by 
&gt; space(s). explode() should be enough?
No, because that would take away flexibility in the parsing rules. And we are using the same strings for formatting and parsing composite strings. The formatting rules can be specified arbitrarily, whitespace splitting is not suffiecient. sscanf() is the only available counterpart method to printf formatting. And we probably don't want to implement sscanf() in PHP.</description> 
   <pubDate>Wed, 22 Feb 2012 15:12:59 +0000</pubDate> 
   <link>http://bugs.horde.org/ticket/11014#t70455</link> 
  </item> 
   
  <item> 
   <title>A solution might be to allow an optional parameter 'regexp' </title> 
   <description>A solution might be to allow an optional parameter 'regexp' inside the 'parse' settings that would be used instead of the 'format' string to parse out the composition fields.</description> 
   <pubDate>Wed, 22 Feb 2012 15:21:02 +0000</pubDate> 
   <link>http://bugs.horde.org/ticket/11014#t70457</link> 
  </item> 
   
  <item> 
   <title>&gt; A solution might be to allow an optional parameter 'regexp</title> 
   <description>&gt; A solution might be to allow an optional parameter 'regexp' inside 
&gt; the 'parse' settings that would be used instead of the 'format' 
&gt; string to parse out the composition fields.

Would that work for your situation?</description> 
   <pubDate>Wed, 30 Jan 2013 16:58:05 +0000</pubDate> 
   <link>http://bugs.horde.org/ticket/11014#t76489</link> 
  </item> 
   
   
 
 </channel> 
</rss> 
