Tickets :: [#7237] Replace TSV parser with extended CSV parser

6.0.0-beta1

2/27/26

Summary	Replace TSV parser with extended CSV parser
Queue	Horde Framework Packages
Queue Version	FRAMEWORK_3
Type	Enhancement
State	Accepted
Priority	1. Low
Owners
Requester	bklang (at) horde (dot) org
Created	08/25/2008 (6395 days ago)
Due
Updated	09/22/2008 (6367 days ago)
Assigned	08/27/2008 (6393 days ago)
Resolved
Milestone
Patch	No

09/22/2008 11:39:56 AM	Jan Schneider	Summary ⇒ Replace TSV parser with extended CSV parser

09/22/2008 11:39:35 AM	Jan Schneider	Type ⇒ Enhancement State ⇒ Accepted Priority ⇒ 1. Low

08/27/2008 07:29:22 AM	Jan Schneider	Comment #4 State ⇒ Feedback	Reply to this comment
Agreed. To solve the UI problem, I would keep TSV as a separate value in the drop down, but don't provide the delimiter field in the details step, hardcoding it as a tab instead.

08/26/2008 07:04:07 PM	Matt Selsky	Comment #3	Reply to this comment
I'd vote for deprecating the TSV driver and merging it with the CSV driver. And then spliting the pine and mulberry drivers into their own simple drivers and put that in Turba since it's really Turba specific (similar to the ldif driver).

08/25/2008 11:45:42 PM	Ben Klang	Comment #2	Reply to this comment
I tried again using the CSV driver and had much better results. Looks like the thing to do might be to deprecate the TSV driver since the CSV driver can handle tab-delimited just fine and doesn't have the same parsing problem. The only change I think necessary would be to do something to allow a '<tab>' to be inserted in the delimeter field on the step 2 of the data import screen. I had to paste it from the clipboard to get it into the field.

08/25/2008 11:16:58 PM	Ben Klang	Comment #1 Priority ⇒ 1. Low Type ⇒ Bug Summary ⇒ Outlook-generated CSV/TSV files parse errors Queue ⇒ Horde Framework Packages Milestone ⇒ Patch ⇒ No State ⇒ Unconfirmed	Reply to this comment
While trying to import a TSV file created by Outlook I found that Horde was calculating an incorrect number of rows. Closer inspection of the exported data and Horde's Data/tsv.php showed that the parser simply splits the files on line endings. The data I exported from Outlook contained numerous fields that contained newlines embedded in quotation marks. To make matters worse Outlook did not consistently quote each field. It appears to only have quoted fields which contained quotes, the delimeter or the newline. Example of a single field: <tab>"""John's Barbeque"" Good food here."<tab> Outlook intended for this to be the string "John's Barbeque"\nGoodfoodhere. but the Horde Framework parser sees the newline and assumes it's the next record. I've experimented with different ways of writing the parser to look for newlines but each time I find new corner cases. Rather than spin our wheels on this it might make sense to look at the PEAR library (which seems only to operate on files rather than strings) or find a reference implementation. I haven't yet read the RFC to determine whether Outlook violates the standard or not by not consistently quoting, but regardless its how the file was generated.