Fastest way to read a large slice from disk?

scotpip · October 20, 2021, 10:47pm

Hi - my lack of low-level knowledge is biting me a bit with this one, so I’d appreciate some pointers.

Requirement

I have a flat struct containing a number of int and float64 values.
I have a sorted slice containing around 100,000,000 of these structs
I want to cache the slices to disk for convenient re-use as they are expensive to build
This is a write once, read many operation so read speed is the priority
Files will be written and read on the same workstation - safety and portability are not an issue.

Failed attempts to date…

I have this working with glob, but it’s painfully slow. Zipping the file doesn’t help much.

There are a wide range of serialization packages, most of them rather poorly documented. Benchmarks are contradictory. Trying them is proving a painful experience and I suspect they aren’t the right solution in any case.

Can I write from memory direct to disk?

Given that I don’t need safety or portability,is there is a more direct way to write from memory to disk?

Unfortunately binary files are outside my comfort zone and googling has come up blank - I found a couple of blogs but the code doesn’t compile.

So I’d very much appreciate any pointers that would help speed up the search for the fastest solution.

PS: Done a bit more digging and realised working with the struct as a whole is becoming memory-bound. So I’m going to have to break up the files. Gob works relatively better with smaller files, but I’m still looking for the fastest possible route as this is now the key performance bottleneck in my system.

doggedOwl · October 21, 2021, 10:27am

How are you reading the File? Are you using a buffered reader to minimize kernel round-trips? Are you reading in chunks and potentially parallelize the reading starting from different offsets?

Maybe the following link will give you some pointers mainly on parallelizing the reading.
https://medium.com/swlh/processing-16gb-file-in-seconds-go-lang-3982c235dfa2

scotpip · October 21, 2021, 11:26am

Thanks for responding.

As I posted in my edit I was planning to cache a single large file for simplicity, but testing has shown that the data set is going to be too big to fit in memory.

The slice is a time series, so now the plan is to break the cache into smaller periods and read ahead as I process the sequence of records.

But that still requires working out the most efficient way to save the slice to disk and read it back into memory. I strongly suspect that using glob is not optimal.

For these small files glob is 4 times faster than github.com/vmihailenco/msgpack/v5, but if I sacrifice portability and safety, is there not a more direct way to write memory to disk?

As I said, I’m not concerned with write time, and I’m not concerned with disk space.

My sole interest is the fastest possible unmarshalling of the data.

doggedOwl · October 21, 2021, 1:08pm

You have two factors here.

Reading the actual bytes from the file. For this the previous suggestion to use a buffered reader and read in chunks still stands. You will need to profile what is the best buffer/chunk size for you system.
Decoding the bytes into the actual slice. I doubt this will be the bottleneck in Gob. Profile first and if Gob decoding is really the bottleneck than maybe consider something like GitHub - alexflint/go-memdump: Very fast, very unsafe serialization for Go. To do even better you’ll have to use unsafe package to work with raw bytes and pointers and write a custom encoder/decoder tailored to your struct, (Not really recommended)

scotpip · October 21, 2021, 1:26pm

Thanks

I suspect that a smart approach to preloading in the background as I go may be as important as the raw unmarshall speed.

I have plenty cores to play with so I’ll focus on that first and then profile as you suggest if it’s still too sluggish. That will give me a chance to dig into Go’s cool concurrency features!

christophberger · October 21, 2021, 4:58pm

A quick and maybe naïve thought: Would a database do the trick?

Your data is uniform and hence maps well to a table
Database reads are usually quick, many DB systems are optimized for speed
The database can even do the initial sorting for you
A database as simple as sqlite (or the Go port, modernc.org/sqlilte) or maybe just a plain KV store like bbolt might be sufficient

scotpip · October 22, 2021, 6:54am

Christoph

Thanks for the suggestion.

I did try this some time ago with a previous iteration of the app in a different language.

I’m dealing with financial tick data, which means tens-of-millions to billions of records.

I tried both SQL and specialised OS time-series databases but they were simply too slow. Working with binary files is orders of magnitude quicker.

This is widely discussed in the financial programming community - wrangling the data is one of the thornier issues with backtesting - and almost all personal projects and consumer-level platforms end up opting to use binary files.

Most simply roll their own, as I am doing, and some use a scientific data format like HDF5

The corporates do use very specialised databases to handle their trillions of records, but licences start north of $100k a year, so not for mere mortals…

christophberger · October 22, 2021, 10:18am

Understood. When even specialized databases are too slow, then you need a custom solution (unless you can throw faster hardware at the problem). OTOH attempting to get faster than specialized databases can be a tough task.

If you decide to go with HDF5, there seem to be a couple of Go packages available…

scotpip · October 22, 2021, 4:32pm

The advantage of HT5 is that it allows you to slice and dice your data with queries.

But a lower-tech way to do that is simply to break the files into, say, years and months per instrument and bar type. That gives you trivial access to any time range you need.

This is what all the main consumer level platforms seem to do, like MT5, JForex, Tickblaze, Zorro etc.

Pretty much every OS project I’ve found on GitHub takes the same approach. Though very few are finished - writing trading apps is non-trivial!

In general simplicity trumps complexity, I find, especially when humungous quantities of data are involved.

And not just for the data handling - my first iteration of the app in Java ground to a halt in a sea of complexity.

I’ve switched to Go for a second iteration in part because it virtually forces you to simplify your design, which is a great discipline. This time it’s turning much simpler, cleaner and faster, with just as many features and far more flexibility…

I’m done with Java-style OOP - I’m returning to separating data and code the way I did in the old days before the OOP cult got established, and starting to enjoy my coding far more. Sprinkle in a few Functional ideas and you’re running on steam…

telo_tade · October 24, 2021, 5:03am

your data is sorted. You can partition it into many files, only needed to keep in memory the filename and the range contained in that file.

If you do not need joins, or any complex queries, this could work.

scotpip · October 24, 2021, 8:56am

It’s pretty much finished now and is working acceptably.

I’ve structured the finished bars by month. Some of the commercial apps go weekly or even daily but this seems like overkill as I’m nowhere near to maxing out my memory.

This is an event-driven app so I have to load the bars for each instrument in the universe I am backtesting into a single slice and then sort by timestamp.

The main bottleneck is this appending and sorting, but that’s trivially solved by caching so I only need to do it once. Subsequent runs are much faster.

Running a cached dataset is pretty fast - I’m pre-loading into a buffer in the background as I run the trading rules. Now the main constraint is disk read-speed. If this takes off I’ll invest in a big chunk of memory and run off a RAM disk.

scotpip · November 2, 2021, 5:14pm

UPDATE - A PERFORMANT SOLUTION

For anyone else taking this path, here’s what I ended up with.

Because I’m writing and reading the data on the same machine, I’m using the unsafe package Memdump:

https://github.com/alexflint/go-memdump

Compared to gob and the safe packages I tried it’s substantially faster for reads.

But it produces humungous files which eat up space and are slow to hoist off the disk.

So I’m also using the compression package go-quicklz, a nice clean single file implementation of the QuickLZ algo. It runs much faster than the native compression and I’m seeing impressive compression ratios of around 25:1.

The package offers two compression levels 1 and 3. Level 3 offers marginally better read times, but is extremely stow for writing so I settled for level 1.

https://github.com/dgryski/go-quicklz

In theory the LZ4 algo benchmarks a little better, but I hit issues with the only native Go implementation I could find when using it with large files.

I’m using an old HDD, and the decompression time is less than the disk-reading time it saves, and I can now fit all my data on the disk.

There are probably marginally faster solutions out there, but I suspect that any gains will be exponentially harder to find and performance with this setup is better than most of the commercial backtesting apps out there, and much better than some.

dc0d · November 8, 2021, 10:49pm

It seems you have already found a working solution. However, memory-mapped files also may worth a look. They allow working with files that are larger than the available RAM. The OS is in charge of caching accessed parts and releasing the memory. Most likely it is more performant than most solutions that try to mitigate the same problem (in a safe manner at least). bbolt uses memory-mapped files and it is great for read-heavy scenarios. Also, if it is possible to batch the writes, it is good for write-heavy scenarios too.

go-memdump seems very interesting. It seems it uses the reflect package for a specific kind of serialization.

scotpip · November 22, 2021, 1:14am

Thanks for the suggestion. In the end I went with lots of small files instead - more practical and flexible.

To get going with I have 14 billion ticks of FX data, organised into slices by trading week and currency pair.

I have generators to turn those into time aggregations, tick bars, range bars and various types of renko bar and store them to an archive. The generation runs quickly enough, and it’s write once, read often.

By far the largest bottleneck is simply hoisting the generated bars off disk and into memory for backtesting.

The biggest speedup was switching the compression to a different library.

ZSDT is a fairly new algo by the LZ guy, and it’s much the fastest I’ve found for medium sized files (40 meg-ish). I’m talking 300% speedup on my previous best performer, which was a pleasant surprise.

Also, very easy to use. I’m using level 1 compression, which is the fastest and still gives good compression levels.

Although it’s a wrapper for a C library and I’m on Windoze, > go get installed it seamlessly. Yay for Golang!

Simply switching compression library gave me much more of a speedup than various cunning caching schemes, job queues and the like, though they did help a bit.

system · February 20, 2022, 1:15am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.