elPrep is a tool for accelerating major parts of common DNA sequencing software pipelines, with performance improvement factors of up to 10x compared to de-facto standard tools used in the field. Performance and memory use are extremely important for us, because we are dealing with workloads of up to several hours and memory use of up to 200 GB RAM, depending on specific use cases.
When deciding for a new programming language for elPrep version 3, we investigated Go, Java, and C++ as candidates and implemented significant parts of elPrep in all three languages. The Go implementation performed best, yielding the best balance between runtime performance and memory use. While Java is somewhat faster, the memory use was significantly higher than in the Go version. The C++17 version ran significantly slower than both Go and Java, primarily due to a lack of a concurrent, parallel garbage collector.
You can find the details of the experiment in this paper.
From your text : “Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case.” It seems pretty hard to understand how a GC can perform better than reference counting. Can you elaborate on that ?
This is actually discussed in detail in the discussion section of the paper (first two subsections “Memory management issues” and “C++17 performance in more detail”).
The short version is that there is a phase where the reference counts of a large number of objects all drop to zero, with transitively decrementing other reference counts down to zero, and so on, which is an inherently sequential process that creates a large pause, similar to that of stop-the-world sequential garbage collectors. On the other hand, a concurrent, parallel garbage collector can take care of these obsolete objects while the rest of the program continues running, avoiding that large pause almost completely.
This is not a new insight of this paper, but it is already discussed elsewhere in the literature that reference counting may perform worse than garbage collection depending on circumstances. See for example “The Garbage Collection Handbook” by Jones, Hosking, and Moss.
(And yes, we are aware of C++ allocators which theoretically would allow for deallocating large numbers of objects in one go, but we also discuss in the paper why they would not be practical in our case.)
FYI, I have refactored a bit the packages (it’s now called groot instead of rootio) but the gap in performances hasn’t budged.
I’ll probably offer a reward to anybody (the first one) who manages to bring read performances of the Go library on par (faster?) with the C++ ones.
(Gathering updated examples… And €€€)
In this article, we discuss the difficulty of achieving the best performance in each language in terms of programming language constructs and standard library support. While benchmarks are easy to objectively measure and evaluate, this is less obvious for assessing ease of programming. However, because we expect elPrep to be regularly modified and extended, this is an equally important aspect. We illustrate representative examples of challenges in all 3 languages, and give our opinion why we think that Go is a reasonable choice also in this light.
This is a more subjective paper, which is why it is categorized as an “Article Commentary”, but we believe this is still an interesting perspective for a wider audience.