FastCSV: 5X faster than encoding/csv

weberc2 · September 4, 2016, 4:05pm

https://bitbucket.org/weberc2/fastcsv This is an alpha release, mostly intended to be a proof of concept. Not ready for real use.

oleg578 · September 5, 2016, 8:54am

Explain, please, how is UTF-8 supported?

weberc2 · September 5, 2016, 11:52am

Commas, quotes, and new line characters are all byte-sized runes that cannot appear in valid utf-8 except as themselves. Therefore I can iterate over bytes (much cheaper than runes) and still safely parse utf-8.

oleg578 · September 5, 2016, 12:22pm

Of course, this is not obstacle, but not all files in world only in English.
And some English csv files can have not only 0x27 apostrophe, for example.
What you package will do with, for example, another apostrophe, not English?
Another UTF-8 apostrohe

encoding/csv is also not perfect, I was faced with the encoding problems in Windows applications csv files with BOM.
I had to parse directly, without any external csv packages.
I think the problem of CSV in encoding part is bigger than we think

weberc2 · September 5, 2016, 12:28pm

There’s nothing English-specific about this parser; apostrophes aren’t a special character in CSV, so there is no concern. I’ll still need to look into BOM. Feel free to provide test cases you’re concerned about and I’ll look into them.

oleg578 · September 7, 2016, 11:18am

I have read carefully RFC 4180 and take my approval back.
UTF-8 encoding can be consider in TEXTDATA (see RFC 4180 section 2).
Also I test csv generated on other languages, not English, and got same result - only TEXTDATA is in regional language.
Another symbols (delimiter) are ASCII.
I wish you success

system · December 6, 2016, 11:18am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.