Why does Go terminate the whole process if one goroutine paincs?

scottlieb · April 14, 2022, 10:35am

According to Go’s specifications:

While executing a function F, an explicit call to panic or a run-time panic terminates the execution of F. Any functions deferred by F are then executed as usual. Next, any deferred functions run by F’s caller are run, and so on up to any deferred by the top-level function in the executing goroutine. At that point, the program is terminated and the error condition is reported, including the value of the argument to panic. This termination sequence is called panicking.

This means that if a goroutine panics and is not recovered, the application terminates.
Aside from the fact that an unhandled panic can be an easy bug to miss, this also puts my application at the mercy of imported packages which may start goroutines with unhandled panics. If an imported package panics in this way, my application will terminate and, as far as I am aware, there is no way to recover.
In a production environment, keeping an application alive is mission-critical, and this behavior poses a problem. My question is twofold:

What is the “philosophical” idea behind this design choice? Why did Go developers decide to omit isolation between lifecycles of goroutines?
Is there some way that I missed to keep goroutines alive in the case where a different goroutine panics?

telo_tade · April 14, 2022, 12:39pm

an unhandled panic can be an easy bug to miss

Panics are not bugs, they are something that should never happen, thus we do not handle them. You want to crash your whole app and read the log.

What is meant to be handled is errors. When one of the functions in your code detects an error, it should return/handle that situation appropriately.

In a production environment, keeping an application alive is mission-critical

True, this is why you design your code so that it is unlikely to panic. You should be grateful for the panic mechanism, as it gives you precious feedback on why your code failed to run correctly.

Is there some way that I missed to keep goroutines alive in the case where a different goroutine panics?

Yes, you can recover from a panic (similarly to catching an error):

defer func() {
    // recover from panic if one occured. Set err to nil otherwise.
    if (recover() != nil) {
        err = errors.New("array index out of bounds")
    }
}()

But this is cheating, your best interest is to write code that will not panic. Do not use recover() to hide your bugs.

scottlieb · April 17, 2022, 8:02am

Hi, thanks for your answer!
I am familiar with and often use the panic(), recover() pattern. I agree that recover() should not be used to hide bugs but, as the name suggests, to recover from extreme situations which caused a panic().
However, my problem is deeper and more related to the design philosophies of Go. Maybe an example will illustrate:
In my application, I used an external Go package to collect metrics during rutime, lets call it metrics. metrics starts a new goroutine to asynchronously handle metrics processing. Due to human error, there was a bug in this goroutine which cause a write to a closed channel, triggering a panic. This panic is never recovered and my application crashed; over a totally recoverable error in a non-critical metrics service.

Of course, many mistakes where made here. metrics should, by convention, never have started it’s own goroutine, and if it did it should most certainly should have recover()ed from any panics and reported them as errors. I, as a developer, should have gone over the metrics package more thoroughly and realized that there was an unrecovered goroutine, and should never have imported this particular package.

However, the point still stands. A series of human errors caused a totally needless crash and, as a Go developer, there was nothing I could have done to preempt this. Of course, I could have “not written bugs” (which I strive to do), but we all know this isn’t possible. For this reason we write mechanisms to deal with unexpected bugs during runtime. For this reason Go functions return errors (an idiom which I love, btw).

Still, many bugs cause paincs, and in my production environment staying up is everything. How can I protect my critical services from panics in non-critical ones. Why does Go force me to terminate my application?

I love Go and the design choices behind it. I feel the language is so well considered and built to encourage idiomatic, readable, resilient and maintainable code. I wonder what the consideration is in this case?

system · July 16, 2022, 8:02am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.