elPrep version 3.0 released

We have just released elPrep version 3.0. elPrep is a high-performance tool for preparing .sam/.bam/.cram files for variant calling in DNA sequencing pipelines. It is designed as an in-memory and multi-threaded application to fully take advantage of the processing power available with modern servers. Its software architecture is based on functional programming techniques, which allows easy composition of multiple alignment filters and optimisations such as loop merging.

In our PLOS ONE paper, we showed that elPrep executes a 5-step preparation pipeline between 6-10 times faster than Picard/SAMTools for whole-exome data, and 5x faster for whole-genome data.

elPrep was previously developed in Common Lisp (versions 1.x and 2.x) and has now been ported to Go, which has led to some runtime performance improvements, and easier handling of memory management due to its concurrent garbage collector. For multi-treading, elPrep uses the pargo library for parallel programming which we previously announced here as well.

See https://github.com/ExaScience/elprep for the elPrep repository (open source released under a BSD license).

2 Likes

Sam! Bam! Cram! DNA sequencing must be a lot of fun. :slight_smile:

Joking aside, it is always good to read a success story about migrating a project to Go.
Just curious - how was your experience with doing the transition from a functional language to a mostly imperative one? I have the feeling that FP people are not always easy about giving up functional programming.

The file formats are not necessarily the most fun in DNA sequencing. :wink:

I wouldn’t characterise Common Lisp as a functional programming language. Yes, it’s one of the first languages that got lexical scoping and lambda expressions right, and was therefore for a while one of the few ones where you could do “serious” functional programming, but it also supports imperative and object-oriented programming really well, and most Common Lisp projects in the wild don’t emphasise functional programming that much. Common Lisp and Go are actually not that far away from each other (which I admit sounds very surprising), so porting from Common Lisp to Go is actually relatively straightforward.

The main reason why we considered Go was because last year, a concurrent parallel garbage collector was announced for Go, and since Common Lisp implementations tend to use stop-the-world garbage collectors, this sounded like a very promising feature (which indeed turned out to be a boon). We also evaluated Java (for concurrent garbage collection) and C++ (for reference counting via shared_ptr), but they perform worse than both Common Lisp and Go.

What’s positive for us about Go: native UTF-8 support, which makes dealing with the ASCII-based file formats in sequencing much easier than in the other languages; goroutines which, although not designed for parallel programming, actually work quite well for expressing task-based parallelism; the concurrent parallel garbage collector is really good; functional programming is supported through lambda expressions; defer statements, which give us an equivalent of Common Lisp’s unwind-protect (or Java’s try-finally).

What needed some adaptation: lack of (optional) dynamic scoping; error handling (but we got used to it ;); some issues with moving from list-based to slice-based containers; optional/keyword parameters. None of these are major issues and can be worked around with good solutions in Go that don’t feel “wrong” in the end.

What I do miss is in Go is Lisp macros, and I would actually prefer adding them over generic types (which is what seems to be discussed a lot in the Go community). A limited macro system, like syntax-rules in R5RS Scheme wouldn’t add too much expressive power to Go and would actually feel like an appropriate addition to Go, in my humble opinion. (One can dream… :wink:

We will probably continue to prototype in Common Lisp and deploy with Go…

yeah, a hygienic macro system would be interesting to have.
I still don’t know yet how I’d like to have it work (nor its drawbacks or impedance mismatches with other parts of Go2) but I think it’s an interesting avenue to explore.

Thanks for the reply, and for correcting my misconception about Common Lisp. I never had guessed that it is so close to Go.

The Go GC has indeed been improved a lot since the start, and it is no surprise for me to see projects migrate to Go for improving performance.

What I can see from your list of positive features is that Go’s strengths lie not so much in having some sort of outstanding programming paradigm but rather in providing a sane set of pragmatic features where each one perhaps does not look dramatically impressive on its own, but taken together they really start to shine.

I absolutely like your attitude towards the things that did not work out so well. I know of discussion threads (outside this forum) where people freak out even on a single one of the issues you list. (Go’s verbose error handling, for example, is a frequent topic.)

If you are interested in Lisp-like macros for Go, then the gomacro project might be something for you. It is a Go REPL with macros support. I haven’t tried it myself yet but the README made me so curious that I’ll try it as soon as I have some idle time.

Thanks a lot for the feedback, very much appreciated! gomacro looks interesting, I will take a look at it.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.