Streaming CSV Reader?

Hi, I’m building an application that needs to parse a lot of CSV data very quickly. The standard library implementation appears to be quite slow (cpuprofiling doesn’t work on my version of OS X, so I’m making an inference based on some benchmarks and memory profiling), presumably because it’s doing a lot of allocating and copying.

I was able to throw together an implementation that scans a line into a buffer and then returns a [][]byte where each subslice is a slice into the backing buffer (so no copying or allocating). My prototype implementation improves the performance of my small application by 50% (I haven’t benchmarked the two CSV implementations against each other directly yet).

I was wondering if anyone else has looked into a similar solution, or if they knew of a third party library that does something like this?

EDIT: I just benchmarked my prototype with a 1M CSV file against the standard library implementation; here are the results:

BenchmarkStdCsv-4             50          39056038 ns/op         2506322 B/op      85012 allocs/op
BenchmarkMyCsv-4             100          15135853 ns/op         3614197 B/op      10007 allocs/op
1 Like

you are not alone in this:

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.