Hi, I’m building an application that needs to parse a lot of CSV data very quickly. The standard library implementation appears to be quite slow (cpuprofiling doesn’t work on my version of OS X, so I’m making an inference based on some benchmarks and memory profiling), presumably because it’s doing a lot of allocating and copying.
I was able to throw together an implementation that scans a line into a buffer and then returns a [][]byte
where each subslice is a slice into the backing buffer (so no copying or allocating). My prototype implementation improves the performance of my small application by 50% (I haven’t benchmarked the two CSV implementations against each other directly yet).
I was wondering if anyone else has looked into a similar solution, or if they knew of a third party library that does something like this?
EDIT: I just benchmarked my prototype with a 1M CSV file against the standard library implementation; here are the results:
BenchmarkStdCsv-4 50 39056038 ns/op 2506322 B/op 85012 allocs/op
BenchmarkMyCsv-4 100 15135853 ns/op 3614197 B/op 10007 allocs/op