LevelDB written in Go: Build from scratch?

mrjn · January 25, 2017, 7:08am

I’m Manish – author of dgraph.io. We’ve been using RocksDB via Cgo for a while now, but have memory leak issues, ugly code – I’ve started to hate the whole having to use Cgo to run this.

The alternative, BoltDB, is really slow for us. Dgraph is designed to allow high write load, while also doing high read load, and BoltDB falters in such an environment. We even batch up our writes, but the global mutex lock around all reads and writes really make the performance suck. Our RDF insert rate goes down to 10% of what we can achieve with RocksDB using Cgo. So, there’s no comparison here!

I hate bad code, and therefore I think the clean, good thing to do here would be to build a leveldb equivalent using Go from scratch, no Cgo. We tried using https://github.com/syndtr/goleveldb, but that’s even slower than BoltDB.

At this point, I think taking a month off Dgraph and just building a good, well optimized LSM tree based Key-Value store, which can do concurrent writes is the best way to achieve the performance and the clean code that we want. But before I embark on this project, I wanted to check if there’re other options that we should look into; and if there’s interest in the community for helping out with the project.

Basing it on LevelDB’s logic would be a good beginning, but eventually, this might become the equivalent of RocksDB in Go. Long term. For now, we just need Get, Put (also in batch), Iterate – all happening concurrently, fast.

Rio_M · January 25, 2017, 8:21am

Of course. There is interest. I am using BoltDB and something better will be great.

mrjn · January 25, 2017, 2:52pm

This looks like an interesting paper to follow for implementation. They propose an improvement over LSM-tree based KV store, which is optimized to use the faster random access of modern SSDs.

I think most databases today can and probably do run on SSD storage. So, this might give a great performance improvement.

Any suggestions about any other papers that’d be worth reading?

mrjn · January 26, 2017, 12:40am

Btw, not an expert on LevelDB or RocksDB. So, if anyone has any advice for me, do ping me at manish@dgraph.io. Meanwhile, I have a bunch of reading up to do.

Tamas_Gulacsi · January 27, 2017, 6:56pm

I wouldn’t start from scratch, but try to optimize/simplify syndtr’s code first.

The linked WiscKey (separate key and data) can be implemented by having a separate LevelDB for keys and values, where values could be represented as sequential uint64 values in addition order.
That’d mean values are always sorted, only deletion means some work, and keys are a lot less data to move around.
So an interesting concept!

jaten · January 29, 2017, 2:27pm

Get, Put (also in batch), Iterate – all happening concurrently, fast.

Multiple writers needs MVCC, which significantly complicates the implementation. Such code typically needs years of work to become reliable.

So most of the databases you are discussing allow multiple readers, (and possibly readers overlapping one writer), but only a single writer at a time.

It will be far easier to find and fix your memory leaks than to write a whole new database.

It’s not released as 1.0 yet, but you could experiment with CockroachDB and see what kind of write throughput you get there. They also maintain a go library packaging of RocksDB (which is used under the covers) that may avoid some of the build pain you mention.

Also, there is GitHub - tikv/tikv: Distributed transactional key-value database, originally created to complement TiDB to evaluate.

mrjn · January 29, 2017, 7:37pm

True. I don’t plan to do that. Instead, I plan to allow most reads to be executed while a write is happening, acquiring locks at a very granular level. That would give most of the performance needed from this KV store.

We have debated that for a while now, but that didn’t lead us any closer to fixing all the issues (leaks is just the latest issue, we have had many before). Also, just the sheer ugliness of calling C++ code, via C, via Cgo (with no profiling or debugging abilities beyond the Cgo wall) gets annoying to the extent where you question if you’d have been better just writing the whole thing in C++.

We chose Go over C++ for a reason. It is clean, the code is understandable, it’s minimal, goroutines are awesome. Cgo breaks goroutines and all of the other mentioned benefits. Historically it has triggered deep Cgo bugs for us (pthread something, fixed in Go1.4, then again in Go1.5), then caused file descriptor issues (all fixed or circumvented over time).

IMO it’s more trouble to keep going with a dead elephant tied to your back than to take a detour and dump the elephant somewhere. If I’m able to build something decent enough for our use, it might just be good for the whole community.

Having spent some time learning the ins of LSM and B+ trees, I think I have a good solution in mind (verifying with original LevelDB authors, if I’m lucky to get a reply) which would come close to RocksDB in performance; while being written entirely in Go – simpler design, but same concepts.

system · April 29, 2017, 7:37pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.