I want to use the execution trace package towards debugging a deadlock by studying the collected traces. However, when a deadlock happens, the system crashes without producing any trace.
Deadlocks are detected through checkdead() where it checks if the number of running goroutines is 0 and throws a fatal error if so. Fatal errors (unlike panics) are not catchable and recoverable. They are an output message followed by a sys.exit(1)
. So the idea of calling StopTrace()
(to flush trace buffers) in case of a deadlock using recover()
is not feasible.
So I was thinking maybe I can modify the checkdead() function in the way to StopTrace()
before throwing the fatal error. I mean turning the code snippet below (from checkdead() declaration):
...
getg().m.throwing = -1 // do not dump full stacks
unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang
throw("all goroutines are asleep - deadlock!")
...
into
...
getg().m.throwing = -1 // do not dump full stacks
unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang
StopTrace() // to flush trace buffers before exit
throw("all goroutines are asleep - deadlock!")
...
However, because of the compile directives (#pragmas) that force some functions to have no write barriers, the above piece of code crashes when I want to re-build the runtime. For example, the function templateThread() in runtime
package calls checkdead()
during execution and it has the pragma //go:nowritebarriersrec
meaning that this function and any function that it calls must not have write barriers.
With all these being said, I have two questions?
- Does anybody know a better approach on how to force
checkdead()
to flush trace buffers before exiting the program? - Is there any way that we bypass the compilation directives to avoid the build crash? It might cause the runtime to misbehave though. I am just curious.
p.s: I am not sure how well I explained my problem. Please let me know in the comments if any part is unclear.