Stuck while reading TCP socket

Hello everyone, I’m very unfamiliar with Go as well as lower level details of sockets. I hope you all can help me out.

I have this client-server application that I’m running on a cluster of computers. The server is running on any one system and it will connect to several processes and maintain this open connection throughout the lifetime of the application. Whenever a client connects to the server, the server reads one line from this connection. This process works smoothly until I have 130 client processes. With 131 client processes, it fails most of the time. The server gets stuck trying to read that single line. Relevant code is shown below.

server

func main() {
// Some code
	port := 8080
	host := "0.0.0.0"

	ln, err := net.Listen("tcp", fmt.Sprintf("%s:%d", host, port))
	utils.CheckError(err)
	log.Printf("Server running on port %d\n", port)
	for {
		conn, err := ln.Accept()
		utils.CheckError(err)
		handleConnection(conn)
	}
// Some other code
}

func handleConnection(c net.Conn) {
	status, err := bufio.NewReader(c).ReadString('\n') // This is where the server gets stuck!!!
	utils.CheckError(err)
// Some other code
}

client

func main() {
// Some code
	conn, err := net.Dial("tcp", os.Args[1])
	utils.CheckError(err)
	filename := os.Args[2]
// Some code
	f, err := os.Open(pdFilename)
	utils.CheckError(err)
	line, err := bufio.NewReader(f).ReadString('\n')
	utils.CheckError(err)
	fmt.Fprintf(conn, "%s", line)
// Some other code
}

As you can see in the following example, you should use a goroutine to handle the results in a concurent environment on server side.

https://golang.org/src/net/example_test.go

The example is handling individual requests in goroutines only to serve the requests faster. I don’t think handling things serially should lead to the kind of behaviour that I’m getting.

Well, in servers programming there isn’t anything serial (even if it looks like) but concurrent because is no guarantee that requests will happen serially. So you must consider this in all your future developments.

1 Like

Well, thanks for the pro tip. I do launch a goroutine later down the handleConnection function. I was hoping to get an idea about what might be wrong in this code. I don’t think Handling requests serially is making this code behave weirdly when processes are > 130. When I say serially, I don’t mean in any particular order. I know requests can arrive in any order. But I’m processing them one after another.

Hi, Samvid, Maybe try adding something like this:

import "runtime/debug"
/* ... */
func main() {
    /* ... */
    go func() {
        time.Sleep(5 * time.Second)
        debug.PrintStack()
    }()
    /* ... */
}

I think that should print out the goroutine stack traces so you can see what they’re doing.

@samvidmistry don’t forget to close connections conn.close().
Also make sure that the client is actually sending data - do you monitor all the errors (if any) at client and server?

I can’t figured out why that happen around 130 requests but this is not the point. Basically concurrent mean running in the same time. So, if two or more requests touch ln.Accept() almost in the same time in the next moment you must jump into a new goroutine for every request to handle the result. Without those goroutines you will have the same code invoked by many requests in the same time with unpredictable results.

This is not true. A non-concurrent TCP server is not a bad thing and should be working. This has disadvantages, yes, but it does not produce “unpredictable results”.
If there are too many clients waiting for the .Accept() because the server can not keep up with the load then they might get cancelled.

Check again all the errors that are returned and make the utils.CheckError(err) fail the process if there is an error. And don’t forget to close all the connections.

2 Likes

Thank you for your help. The bug was in the client. It was getting stuck in some other code and could not send data to the server, hence the server also got stuck. The bug was so severe that I don’t even know how the program was working even for 2 processes, let alone 130. Just to reiterate, a serially accepting server should not be problemetic in itself. Also, how do I close this issue?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.