Performance comparison Go, Java, C++


(Pascal Costanza) #1

Our paper about comparing Go, Java, and C++ for implementing our DNA sequencing tool elPrep has finally been published at https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2903-5

elPrep is a tool for accelerating major parts of common DNA sequencing software pipelines, with performance improvement factors of up to 10x compared to de-facto standard tools used in the field. Performance and memory use are extremely important for us, because we are dealing with workloads of up to several hours and memory use of up to 200 GB RAM, depending on specific use cases.

When deciding for a new programming language for elPrep version 3, we investigated Go, Java, and C++ as candidates and implemented significant parts of elPrep in all three languages. The Go implementation performed best, yielding the best balance between runtime performance and memory use. While Java is somewhat faster, the memory use was significantly higher than in the Go version. The C++17 version ran significantly slower than both Go and Java, primarily due to a lack of a concurrent, parallel garbage collector.

You can find the details of the experiment in this paper.

elPrep itself is available at https://github.com/exascience/elprep and another paper describing elPrep at a more general level is available at https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0209523


(Buckler Co) #2

Interesting, it is pretty wild how advanced go and Java garbage collectors are that professionals cant write faster c++ code!

Go routines are the best for low memory concurrency. I see the best returns using go when doing http.Get requests to millions of links.


(Ivan Matmati) #3

Hi,

From your text : “Our analysis shows that concurrent, parallel garbage collection is better at managing a large heap of objects than reference counting in our case.” It seems pretty hard to understand how a GC can perform better than reference counting. Can you elaborate on that ?


(Pascal Costanza) #4

This is actually discussed in detail in the discussion section of the paper (first two subsections “Memory management issues” and “C++17 performance in more detail”).

The short version is that there is a phase where the reference counts of a large number of objects all drop to zero, with transitively decrementing other reference counts down to zero, and so on, which is an inherently sequential process that creates a large pause, similar to that of stop-the-world sequential garbage collectors. On the other hand, a concurrent, parallel garbage collector can take care of these obsolete objects while the rest of the program continues running, avoiding that large pause almost completely.

This is not a new insight of this paper, but it is already discussed elsewhere in the literature that reference counting may perform worse than garbage collection depending on circumstances. See for example “The Garbage Collection Handbook” by Jones, Hosking, and Moss.

(And yes, we are aware of C++ allocators which theoretically would allow for deallocating large numbers of objects in one go, but we also discuss in the paper why they would not be practical in our case.) :wink:


(Eric Lindblad) #6

From the golang-nuts discussion list on Google Groups.

perfs of a Go binary VS C++

Comment from a C++ coder not posted on that thread.

Being a factor of two off compared to C++ isn’t all that bad :slight_smile:

October 2017


(Sebastien Binet) #7

FYI, I have refactored a bit the packages (it’s now called groot instead of rootio) but the gap in performances hasn’t budged.

I’ll probably offer a reward to anybody (the first one) who manages to bring read performances of the Go library on par (faster?) with the C++ ones.
(Gathering updated examples… And €€€)