Our paper about comparing Go, Java, and C++ for implementing our DNA sequencing tool elPrep has finally been published at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2903-5
elPrep is a tool for accelerating major parts of common DNA sequencing software pipelines, with performance improvement factors of up to 10x compared to de-facto standard tools used in the field. Performance and memory use are extremely important for us, because we are dealing with workloads of up to several hours and memory use of up to 200 GB RAM, depending on specific use cases.
When deciding for a new programming language for elPrep version 3, we investigated Go, Java, and C++ as candidates and implemented significant parts of elPrep in all three languages. The Go implementation performed best, yielding the best balance between runtime performance and memory use. While Java is somewhat faster, the memory use was significantly higher than in the Go version. The C++17 version ran significantly slower than both Go and Java, primarily due to a lack of a concurrent, parallel garbage collector.
You can find the details of the experiment in this paper.
elPrep itself is available at https://github.com/exascience/elprep and another paper describing elPrep at a more general level is available at https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0209523
Interesting, it is pretty wild how advanced go and Java garbage collectors are that professionals cant write faster c++ code!
Go routines are the best for low memory concurrency. I see the best returns using go when doing http.Get requests to millions of links.
From your text : “Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case.” It seems pretty hard to understand how a GC can perform better than reference counting. Can you elaborate on that ?
This is actually discussed in detail in the discussion section of the paper (first two subsections “Memory management issues” and “C++17 performance in more detail”).
The short version is that there is a phase where the reference counts of a large number of objects all drop to zero, with transitively decrementing other reference counts down to zero, and so on, which is an inherently sequential process that creates a large pause, similar to that of stop-the-world sequential garbage collectors. On the other hand, a concurrent, parallel garbage collector can take care of these obsolete objects while the rest of the program continues running, avoiding that large pause almost completely.
This is not a new insight of this paper, but it is already discussed elsewhere in the literature that reference counting may perform worse than garbage collection depending on circumstances. See for example “The Garbage Collection Handbook” by Jones, Hosking, and Moss.
(And yes, we are aware of C++ allocators which theoretically would allow for deallocating large numbers of objects in one go, but we also discuss in the paper why they would not be practical in our case.)
FYI, I have refactored a bit the packages (it’s now called
groot instead of
rootio) but the gap in performances hasn’t budged.
I’ll probably offer a reward to anybody (the first one) who manages to bring read performances of the Go library on par (faster?) with the C++ ones.
(Gathering updated examples… And €€€)
We have just published a small follow-up paper to this in August: Comparing Ease of Programming in C++, Go, and Java for Implementing a Next-Generation Sequencing Tool
In this article, we discuss the difficulty of achieving the best performance in each language in terms of programming language constructs and standard library support. While benchmarks are easy to objectively measure and evaluate, this is less obvious for assessing ease of programming. However, because we expect elPrep to be regularly modified and extended, this is an equally important aspect. We illustrate representative examples of challenges in all 3 languages, and give our opinion why we think that Go is a reasonable choice also in this light.
This is a more subjective paper, which is why it is categorized as an “Article Commentary”, but we believe this is still an interesting perspective for a wider audience.
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.