Streaming CSV Reader?

weberc2 · August 26, 2016, 12:28am

Hi, I’m building an application that needs to parse a lot of CSV data very quickly. The standard library implementation appears to be quite slow (cpuprofiling doesn’t work on my version of OS X, so I’m making an inference based on some benchmarks and memory profiling), presumably because it’s doing a lot of allocating and copying.

I was able to throw together an implementation that scans a line into a buffer and then returns a [][]byte where each subslice is a slice into the backing buffer (so no copying or allocating). My prototype implementation improves the performance of my small application by 50% (I haven’t benchmarked the two CSV implementations against each other directly yet).

I was wondering if anyone else has looked into a similar solution, or if they knew of a third party library that does something like this?

EDIT: I just benchmarked my prototype with a 1M CSV file against the standard library implementation; here are the results:

BenchmarkStdCsv-4             50          39056038 ns/op         2506322 B/op      85012 allocs/op
BenchmarkMyCsv-4             100          15135853 ns/op         3614197 B/op      10007 allocs/op

bins · August 26, 2016, 6:58am

you are not alone in this:

system · November 24, 2016, 6:58am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.