My test case for this issue is a simple TCP ping-pong server that uses channels for connection handoff. The wrinkle is that I’m trying to periodically close the connection from a separate goroutine, then open a new connection that rebinds to the address. (The non-toy use case for this is implementing reload of configuration files in a long-running server.) This is implemented by:
- Using
Listener.Close()
to interrupt theListener.Accept()
call on the other goroutine - Using a channel to tell the other goroutine that it should call
Listener.Close()
itself and exit - Using a channel to wait for the other goroutine to complete its call to
Listener.Close()
Observed behavior:
- As written, the code appears to work (it runs correctly, and no races are reported under stress testing with
-race
) - When step 3 above (waiting for the other goroutine to call
Listener.Close
) is omitted (by running the test case with the command-line argument--waitforstop=false
), the program crashes with:listen error: listen tcp :7777: bind: address already in use
.
According to strace
, the problem is that the close(2)
call on the underlying socket has not executed by the time the goroutine executing main()
tries to create the new listener; therefore the bind(2)
call fails with EADDRINUSE
. Based on my reading of the runtime’s source code, this is because the underlying file descriptor is reference-counted, and close(2)
is not called until the reference count drops to 0.
The problem I’m having is that based on my reading of the documentation, I’m unable to either prove that step 3 makes the code safe (i.e., that by the time <-wrapper.HasStopped
finishes, the underlying socket has definitely been closed), or understand why the omission of step 3 makes the code unsafe (i.e., why the reference count does not immediately go to 0 after the main()
goroutine calls Listener.Close()
). Is there any way to get a synchronous guarantee that the underlying socket has been closed?
Incidentally, this issue was discussed here:
but the solution there (obtaining the file descriptor with listener.(*net.TCPListener).File()
and calling Close()
on it) does not work with current versions of the Linux runtime, since File()
produces a new fd via dup(2)
and returns that instead.