Summary | Replace TSV parser with extended CSV parser |
Queue | Horde Framework Packages |
Queue Version | FRAMEWORK_3 |
Type | Enhancement |
State | Accepted |
Priority | 1. Low |
Owners | |
Requester | bklang (at) horde (dot) org |
Created | 08/25/2008 (6106 days ago) |
Due | |
Updated | 09/22/2008 (6078 days ago) |
Assigned | 08/27/2008 (6104 days ago) |
Resolved | |
Milestone | |
Patch | No |
State ⇒ Accepted
Priority ⇒ 1. Low
State ⇒ Feedback
in the drop down, but don't provide the delimiter field in the details
step, hardcoding it as a tab instead.
driver. And then spliting the pine and mulberry drivers into their
own simple drivers and put that in Turba since it's really Turba
specific (similar to the ldif driver).
like the thing to do might be to deprecate the TSV driver since the
CSV driver can handle tab-delimited just fine and doesn't have the
same parsing problem. The only change I think necessary would be to
do something to allow a '<tab>' to be inserted in the delimeter field
on the step 2 of the data import screen. I had to paste it from the
clipboard to get it into the field.
Priority ⇒ 1. Low
Type ⇒ Bug
Summary ⇒ Outlook-generated CSV/TSV files parse errors
Queue ⇒ Horde Framework Packages
Milestone ⇒
Patch ⇒ No
State ⇒ Unconfirmed
Horde was calculating an incorrect number of rows. Closer inspection
of the exported data and Horde's Data/tsv.php showed that the parser
simply splits the files on line endings. The data I exported from
Outlook contained numerous fields that contained newlines embedded in
quotation marks. To make matters worse Outlook did not consistently
quote each field. It appears to only have quoted fields which
contained quotes, the delimeter or the newline.
Example of a single field:
<tab>"""John's Barbeque""
Good food here."<tab>
Outlook intended for this to be the string
"John's Barbeque"\nGoodfoodhere.
but the Horde Framework parser sees the newline and assumes it's the
next record.
I've experimented with different ways of writing the parser to look
for newlines but each time I find new corner cases. Rather than spin
our wheels on this it might make sense to look at the PEAR library
(which seems only to operate on files rather than strings) or find a
reference implementation. I haven't yet read the RFC to determine
whether Outlook violates the standard or not by not consistently
quoting, but regardless its how the file was generated.