"bind: address already in use" even after listener closed


(Matt Holt) #1

I’m trying to close a listener then open it again, and I get this error:

bind: address already in use

With this code:

// ln is listening on :8080
err = ln.Close() // succeeds, no error
if err != nil {
    log.Fatal(err)
}
ln2, err := net.Listen("tcp", ":8080")
if err != nil {
    log.Fatal(err) // bind: address already in use
}

I was wondering if SO_REUSEADDR had something to do with this, but as far as I know, that is already being used under the hood in the Go standard library when creating a new tcp listener.

Any ideas how I can re-bind to that address without delay?


Net: unclear how to confirm synchronously that a net.TCPListener is closed
(Joe Henke) #2

Hey Matt,

I tried to reproduce this and did not on my mac but did on the go playground.

Where did this happen for you?

Interestingly, both on my mac and on the go playground, if you use -addr="" or change to defaultAddr to "" in the source (which I think just means it will bind to any open port, yeah?) it will never rebind to the same port, and in fact will bind to the previous attempt’s port + 1. Not sure if this is significant; I don’t know precisely what binding to "" is specced to do.

Joe


(Matt Holt) #3

Thanks for trying to reproduce it - next time I’ll try to provide a full code sample :smile:

I’m experiencing the Go playground’s behavior on my Mac. But when I ran your test program on my Mac, it passed. :cold_sweat:

I will look into this further in my own program. Meanwhile, I am intensely curious as to why the test fails on the Go playground…


(Matt Holt) #4

Okay, I’ve narrowed it down a little bit.

This only happens for me when my program has restarted itself using exec.Command(os.Args[0], ...) and, in that command, it sets ExtraFiles to a list of file descriptors for listeners. (Similar to this method: http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/) This lets the child process (itself) use the existing listeners without downtime.

In the “restarted” process, then: I close the listeners, immediately create new ones on the same addresses again, and it fails with “address already in use”. But if I pause 5 seconds after closing the listener (before creating the new listener), it succeeds.

The original process where the listeners were created don’t have this problem. In other words, if I don’t “restart” the process, I can close and create the listeners immediately, like @jdh’s program does. But if I do that same thing in a restart, it doesn’t work.

Here’s a program that reproduces what I’m describing:

The erroring line is line 44.

Are you seeing the same thing? Any idea why?


(Joe Henke) #5

I think I’ve got it.

Check out my example shared-conn program.

Two issues.

  1. You must close the os.Files themselves; I assume they have their own file descriptor, so closing the net.Listener doesn’t close them, and if those are left open, I assume the OS leaves that port bound, so you cannot rebind. This issue exists in both the parent and child processes.
  2. There is a race condition in the handoff. Basically, if the child attempts to rebind before the parent closes its shared FD, the child will fail to rebind because the port is still bound.

My example program has flags to expose both of these failures.

Solutions

  1. Close the files.
  2. os.Signal the child process once the parent has closed its stuff, and only then let the child attempt to rebind.
    • Would love to hear your findings if you try this.

(Matt Holt) #6

Oh my. You’ve done it.

I had discovered (concurrently with you) that calling .File() on a *net.TCPListener returns a duplicated file descriptor. So I started playing with closing those too, but couldn’t get the combination of the placement and ordering of the .Close() calls right. Seems you have, as both parent and child pass on my machine.

This makes sense, though. Both parent and child have two file descriptors for the socket that gets transferred over, so you have to close both in each case.

Let me get over being really excited about this and work your methodology into my program… huge thank you! I couldn’t find any other explanation for this.

Update: I rewired my program and it’s working. Hours of debugging has come to an end. :sweat_smile:


(Joe Henke) #7

Awesome! You’re very welcome. I enjoyed figuring this out and am glad it was useful!


(Austin Cherry) #8

Hey @matt,

I think I found the solution to your issue. I’m definitely not a socket expert, but I was able to get your example working. Short version is, the kernel has a TIME_WAIT state when it is closing a TCP connection graceful, meaning you can’t reuse it until that is finished. Normally in C an any easy way to “bypass” this “problem” is to set the SO_REUSEADDR socket option to true, which you won’t have access to in the net package. Luckily, the net package is kind of enough to set this for us on a TCPListener as in your example application by default. Obviously, you still get the bind: address already in use error, which is SO answer explains in depth as to why this still happens, even though SO_REUSEADDR is enabled on the socket.

TLDR; either use straight syscalls so you can enable SO_REUSEPORT on your socket, which isn’t portable to all systems. Or the better solution is to bind the parent process to 0.0.0.0:1234 and the child to 192.168.0.1:1234 (or whatever the machine IP is).


One page REST api go server to return system MAC address
(Matt Holt) #9

Hey Austin, thanks for the answer! I didn’t know much about TIME_WAIT (or CLOSE_WAIT) until this bug, in researching it. I definitely spent some time with this code:

syscall.SetsockoptInt(fd, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1)

but it was a dead end; like you said, Go already does this. So then I tried migrating as much to syscall as I could, (syscall.Close(), etc) just to see if I could bypass any Go wrapping, but this failed too so I reverted.

Dave Cheney suggested (on Slack) that I use netstat -anp tcp to check the socket state. I saw that it was actually still in LISTEN state when it should have been in CLOSE_WAIT. So I started playing with closing more files like Joe was talking about above, which did lead to the final solution.

This was not obvious. :slight_smile: Thanks again!


(Matt Holt) closed #10

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.