[#7237] Replace TSV parser with extended CSV parser
Summary Replace TSV parser with extended CSV parser
Queue Horde Framework Packages
Queue Version FRAMEWORK_3
Type Enhancement
State Accepted
Priority 1. Low
Requester bklang (at) horde (dot) org
Created 2008-08-25 (4854 days ago)
Updated 2008-09-22 (4826 days ago)
Assigned 2008-08-27 (4852 days ago)
Patch No

2008-09-22 11:39:56 Jan Schneider Summary ⇒ Replace TSV parser with extended CSV parser
2008-09-22 11:39:35 Jan Schneider Type ⇒ Enhancement
State ⇒ Accepted
Priority ⇒ 1. Low
2008-08-27 07:29:22 Jan Schneider Comment #4
State ⇒ Feedback
Reply to this comment
Agreed. To solve the UI problem, I would keep TSV as a separate value 
in the drop down, but don't provide the delimiter field in the details 
step, hardcoding it as a tab instead.
2008-08-26 19:04:07 Matt Selsky Comment #3 Reply to this comment
I'd vote for deprecating the TSV driver and merging it with the CSV 
driver.  And then spliting the pine and mulberry drivers into their 
own simple drivers and put that in Turba since it's really Turba 
specific (similar to the ldif driver).
2008-08-25 23:45:42 Ben Klang Comment #2 Reply to this comment
I tried again using the CSV driver and had much better results.  Looks 
like the thing to do might be to deprecate the TSV driver since the 
CSV driver can handle tab-delimited just fine and doesn't have the 
same parsing problem.  The only change I think necessary would be to 
do something to allow a '<tab>' to be inserted in the delimeter field 
on the step 2 of the data import screen.  I had to paste it from the 
clipboard to get it into the field.
2008-08-25 23:16:58 Ben Klang Comment #1
Type ⇒ Bug
State ⇒ Unconfirmed
Priority ⇒ 1. Low
Summary ⇒ Outlook-generated CSV/TSV files parse errors
Queue ⇒ Horde Framework Packages
Milestone ⇒
Patch ⇒ No
Reply to this comment
While trying to import a TSV file created by Outlook I found that 
Horde was calculating an incorrect number of rows.  Closer inspection 
of the exported data and Horde's Data/tsv.php showed that the parser 
simply splits the files on line endings.  The data I exported from 
Outlook contained numerous fields that contained newlines embedded in 
quotation marks.  To make matters worse Outlook did not consistently 
quote each field.  It appears to only have quoted fields which 
contained quotes, the delimeter or the newline.

Example of a single field:

<tab>"""John's Barbeque""

Good food here."<tab>

Outlook intended for this to be the string

"John's Barbeque"\nGoodfoodhere.

but the Horde Framework parser sees the newline and assumes it's the 
next record.

I've experimented with different ways of writing the parser to look 
for newlines but each time I find new corner cases.  Rather than spin 
our wheels on this it might make sense to look at the PEAR library 
(which seems only to operate on files rather than strings) or find a 
reference implementation.  I haven't yet read the RFC to determine 
whether Outlook violates the standard or not by not consistently 
quoting, but regardless its how the file was generated.

Saved Queries