[ASK] Multithreaded input from each line

nathankerr · April 8, 2017, 7:35pm

I think you are trying to do something like https://play.golang.org/p/Xd0K78QPSG

Without knowing the details of what a job entails, it is hard to help in a specific way. Depending on the workload, the approach I provided may be performant enough, but I don’t really expect it be very good.

I would start by writing a single-threaded version (using a smaller dataset if needed) to gain an understanding of the work that needs to happen. It also needs to be correct. You will need enough tests to show that any optimized (by multithreading or other techniques) version is also correct.

The next step is to figure out the minimal scalable unit (explanation with example) and write out the parallel version.

Without knowing more about the actual work that needs to be done, my instincts say:

mmap the file
assign each worker a chunk of the file, by bytes
each worker is responsible for the first full job spec in its chunk through the last job spec that starts before its chunk ends. This is a little tricky, but I have done it before for csv files; make sure you test this throughly.
try to handle results without communicating between workers or coordinators. Communication is overhead and reduces performance.

Hope this helps.