I want to use the execution trace package towards debugging a deadlock by studying the collected traces. However, when a deadlock happens, the system crashes without producing any trace.
Deadlocks are detected through checkdead() where it checks if the number of running goroutines is 0 and throws a fatal error if so. Fatal errors (unlike panics) are not catchable and recoverable. They are an output message followed by a
sys.exit(1). So the idea of calling
StopTrace() (to flush trace buffers) in case of a deadlock using
recover() is not feasible.
So I was thinking maybe I can modify the checkdead() function in the way to
StopTrace() before throwing the fatal error. I mean turning the code snippet below (from checkdead() declaration):
... getg().m.throwing = -1 // do not dump full stacks unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang throw("all goroutines are asleep - deadlock!") ...
... getg().m.throwing = -1 // do not dump full stacks unlock(&sched.lock) // unlock so that GODEBUG=scheddetail=1 doesn't hang StopTrace() // to flush trace buffers before exit throw("all goroutines are asleep - deadlock!") ...
However, because of the compile directives (#pragmas) that force some functions to have no write barriers, the above piece of code crashes when I want to re-build the runtime. For example, the function templateThread() in
runtime package calls
checkdead() during execution and it has the pragma
//go:nowritebarriersrec meaning that this function and any function that it calls must not have write barriers.
With all these being said, I have two questions?
- Does anybody know a better approach on how to force
checkdead()to flush trace buffers before exiting the program?
- Is there any way that we bypass the compilation directives to avoid the build crash? It might cause the runtime to misbehave though. I am just curious.
p.s: I am not sure how well I explained my problem. Please let me know in the comments if any part is unclear.