https://bitbucket.org/weberc2/fastcsv This is an alpha release, mostly intended to be a proof of concept. Not ready for real use.
Explain, please, how is UTF-8 supported?
Commas, quotes, and new line characters are all byte-sized runes that cannot appear in valid utf-8 except as themselves. Therefore I can iterate over bytes (much cheaper than runes) and still safely parse utf-8.
Of course, this is not obstacle, but not all files in world only in English.
And some English csv files can have not only 0x27 apostrophe, for example.
What you package will do with, for example, another apostrophe, not English?
Another UTF-8 apostrohe
encoding/csv is also not perfect, I was faced with the encoding problems in Windows applications csv files with BOM.
I had to parse directly, without any external csv packages.
I think the problem of CSV in encoding part is bigger than we think
There’s nothing English-specific about this parser; apostrophes aren’t a special character in CSV, so there is no concern. I’ll still need to look into BOM. Feel free to provide test cases you’re concerned about and I’ll look into them.
I have read carefully RFC 4180 and take my approval back.
UTF-8 encoding can be consider in TEXTDATA (see RFC 4180 section 2).
Also I test csv generated on other languages, not English, and got same result - only TEXTDATA is in regional language.
Another symbols (delimiter) are ASCII.
I wish you success
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.