Force Go deadlock detector to flush some traces out

staheri · May 29, 2020, 5:44pm

I want to use the execution trace package towards debugging a deadlock by studying the collected traces. However, when a deadlock happens, the system crashes without producing any trace.
Deadlocks are detected through checkdead() where it checks if the number of running goroutines is 0 and throws a fatal error if so. Fatal errors (unlike panics) are not catchable and recoverable. They are an output message followed by a sys.exit(1). So the idea of calling StopTrace() (to flush trace buffers) in case of a deadlock using recover() is not feasible.

So I was thinking maybe I can modify the checkdead() function in the way to StopTrace() before throwing the fatal error. I mean turning the code snippet below (from checkdead() declaration):

...
getg().m.throwing = -1                   // do not dump full stacks
unlock(&sched.lock)                      // unlock so that GODEBUG=scheddetail=1 doesn't hang
throw("all goroutines are asleep - deadlock!")
...

into

...
getg().m.throwing = -1                   // do not dump full stacks
unlock(&sched.lock)                      // unlock so that GODEBUG=scheddetail=1 doesn't hang
StopTrace()                              // to flush trace buffers before exit
throw("all goroutines are asleep - deadlock!")
...

However, because of the compile directives (#pragmas) that force some functions to have no write barriers, the above piece of code crashes when I want to re-build the runtime. For example, the function templateThread() in runtime package calls checkdead() during execution and it has the pragma //go:nowritebarriersrec meaning that this function and any function that it calls must not have write barriers.

With all these being said, I have two questions?

Does anybody know a better approach on how to force checkdead() to flush trace buffers before exiting the program?
Is there any way that we bypass the compilation directives to avoid the build crash? It might cause the runtime to misbehave though. I am just curious.

p.s: I am not sure how well I explained my problem. Please let me know in the comments if any part is unclear.

system · August 27, 2020, 5:44pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.