It’s pretty much finished now and is working acceptably.
I’ve structured the finished bars by month. Some of the commercial apps go weekly or even daily but this seems like overkill as I’m nowhere near to maxing out my memory.
This is an event-driven app so I have to load the bars for each instrument in the universe I am backtesting into a single slice and then sort by timestamp.
The main bottleneck is this appending and sorting, but that’s trivially solved by caching so I only need to do it once. Subsequent runs are much faster.
Running a cached dataset is pretty fast - I’m pre-loading into a buffer in the background as I run the trading rules. Now the main constraint is disk read-speed. If this takes off I’ll invest in a big chunk of memory and run off a RAM disk.