My test case for this issue is a simple TCP ping-pong server that uses channels for connection handoff. The wrinkle is that I’m trying to periodically close the connection from a separate goroutine, then open a new connection that rebinds to the address. (The non-toy use case for this is implementing reload of configuration files in a long-running server.) This is implemented by:
Listener.Close() to interrupt the
Listener.Accept() call on the other goroutine
- Using a channel to tell the other goroutine that it should call
Listener.Close() itself and exit
- Using a channel to wait for the other goroutine to complete its call to
- As written, the code appears to work (it runs correctly, and no races are reported under stress testing with
- When step 3 above (waiting for the other goroutine to call
Listener.Close) is omitted (by running the test case with the command-line argument
--waitforstop=false), the program crashes with:
listen error: listen tcp :7777: bind: address already in use.
strace, the problem is that the
close(2) call on the underlying socket has not executed by the time the goroutine executing
main() tries to create the new listener; therefore the
bind(2) call fails with
EADDRINUSE. Based on my reading of the runtime’s source code, this is because the underlying file descriptor is reference-counted, and
close(2) is not called until the reference count drops to 0.
The problem I’m having is that based on my reading of the documentation, I’m unable to either prove that step 3 makes the code safe (i.e., that by the time
<-wrapper.HasStopped finishes, the underlying socket has definitely been closed), or understand why the omission of step 3 makes the code unsafe (i.e., why the reference count does not immediately go to 0 after the
main() goroutine calls
Listener.Close()). Is there any way to get a synchronous guarantee that the underlying socket has been closed?
Incidentally, this issue was discussed here:
but the solution there (obtaining the file descriptor with
listener.(*net.TCPListener).File() and calling
Close() on it) does not work with current versions of the Linux runtime, since
File() produces a new fd via
dup(2) and returns that instead.