Why closing a tcp client connection causes restart of tcp server, after flush returns error?

parath · May 2, 2017, 5:47am

Hi,

I am not sure if this is the right platform to ask this question. Please help me reach to right place in that case
I have created one go application which is based on socket programming. I am making call to bufio.flush() function after writing response to tcp socket. I am closing tcp connection whenever flush or write function returns any error.
Below is my code.

    err :=  TcpUtilsStructPtr.writer.WriteByte(tByte)
		if err != nil {
            logger.Println("Could not  write to socket");
			TcpUtilsStructPtr.CloseTcpClient()
			return
        }
    }
    if TcpUtilsStructPtr.writer.Flush() != nil {
        logger.Println("Error while flushing output stream");
		TcpUtilsStructPtr.CloseTcpClient()
    }

Below is CloseTcpClient function code:

func  (TcpUtilsStructPtr	*TcpUtilsStruct)    CloseTcpClient()  {
err:= TcpUtilsStructPtr.tcpClient.Close()
if err != nil {
	logger.Println("Error closing TCP client")
	logger.Println(err)
}

TcpUtilsStructPtr.reader = nil
TcpUtilsStructPtr.writer = nil
TcpUtilsStructPtr.responseLineBuff = nil
TcpUtilsStructPtr.commandLineBuff = nil
TcpUtilsStructPtr.itemA= nil
TcpUtilsStructPtr.itemB = nil
TcpUtilsStructPtr.ResponseReceivedArray = nil
TcpUtilsStructPtr.itemC = nil
TcpUtilsStructPtr.itemD = false

}

I have seen two kinds of errors with flush function(There can be more)

broken pipe
write tcp tcpServerIP->ClientIp: write: connection timed out

In case of broken error, my application behaves as expected, only close the tcp client connection for particular request.

But in case 2) my TCP server is also restarting, that is completely unknown to me.

Could someone pls help me on this.

Advanced thanks.

christophberger · May 2, 2017, 6:53am

Why does your connection time out? I could imagine that the server restart is actually causing the timeout (rather than being a result of the timeout).

parath · May 2, 2017, 7:14am

Thanks Christophberger for quick suggestion, but when I see server logs, which I have hard codded in my application, first flush error (wtire tcp … write:connection timed out) is occurring and then after few more common operation (2-3 lines of logs), the server restart is happening.

Is it possible because of connection got terminated from client’s end? I think in that case I will get “broken pipe” error.

christophberger · May 2, 2017, 7:45am

To clarify, is the code you posted part of a TCP client connecting to an external server, or is the code actually part of the TCP server itself? I might be confused because your function is called CloseTcpClient().

Given the code you posted, the next thing I would do is write out the contents of err in the first code snippet. This should reveal why the write fails.

parath · May 2, 2017, 8:19am

Hi Christophberger

The code is part of tcp server itself, where I am closing a tcp connection, to which server supposed to write some resposne.

Given the code you posted, the next thing I would do is write out the contents of err in the first code snippet. This should reveal why the write fails.

Yes I have made changes to the first code snippet to see the error and the error in above post are the ones which I am getting. (write tcp tcpServerIP->ClientIp: write: connection timed out)
Here is the update code:

error_flush := TcpUtilsStructPtr.writer.Flush() 
    if error_flush != nil {
            logger.println("flush error" , error_flush)
	TcpUtilsStructPtr.CloseTcpClient()
}

christophberger · May 2, 2017, 9:13am

Ok, so that’s the error returned by Flush(). What about the error returned by WriteByte()? What does this one contain?

parath · May 2, 2017, 9:33am

No error with WriteByte function. but flush function make a calls to Basic write method

func (b *Writer) Flush() error {
if b.err != nil {
	return b.err
}
if b.n == 0 {
	return nil
}
n, err := b.wr.Write(b.buf[0:b.n])
if n < b.n && err == nil {
	err = io.ErrShortWrite
}
if err != nil {
	if n > 0 && n < b.n {
		copy(b.buf[0:b.n-n], b.buf[n:b.n])
	}
	b.n -= n
	b.err = err
	return err
}
b.n = 0
return nil

}

nathankerr · May 2, 2017, 9:44am

What do you mean by “my TCP server is also restarting”?

This indicates that something is killing the server (maybe itself). How are you running the server? How does it restart? Are there logs from whatever is restarting the server?

christophberger · May 2, 2017, 11:13am

Makes sense, as you said the error happens when calling Flush(). My fault.

The Write call within Flush() is not unexpected. b.Flush() calls Write on the unbuffered Writer within b to write out any pending data contained in b.buf.

But let’s perhaps approach this from a higher level. The server is your own code, so I guess you somehere have coded the restart logic. What events trigger a server restart in your code? Could these events be related to a connection timeout?

parath · May 2, 2017, 11:14am

Hi Nathankerr,

How are you running the server?
I have a go routine which is basically a startTcpServer function, where I am creating listener.

And this go-routine is in the main function, which is called only when I run my application binary generated after build.

How does it restart?

Application is restarting automatically after the above mentioned error is occurring in flush function.

Are there logs from whatever is restarting the server?

There is no particular logs where the restart call is happening. Only thing I can see is after that flush error, control is coming to the main function.

This indicates that something is killing the server (maybe itself)

Are there any exceptional cases when it can get killed? Because as I mentioned, other than other this write time out error, no flush error is causing the same problem.

christophberger · May 2, 2017, 11:23am

So the restart is not initiated by your code? I suppose you run the application as a service then, and the OS reloads the service in case of an (unexpected) exit, right?

Then the question is, why does the application exit.

If the application indeed runs as a service, redirecting stdout and stderr to a file can help catching any output from a panic (or any other form of unexpected exit).

parath · May 2, 2017, 12:07pm

No, Application is not running as a service. But OS can be cause for the restart.

Because from application, I am only closing tcpclient, when flush returns error. And only in a specific case of flush error "write tcp tcpServerIP->ClientIp: write: connection timed out " server is restarting.

nathankerr · May 2, 2017, 1:21pm

What OS are you using?

Go version?

How are you running the application binary?

How do you know the application restarts?

If you can provide the code or a small example that has the same problem, we will have an easier time figuring out what is going on.

parath · May 4, 2017, 5:44am

Hi Nathankerr
Sorry for the delay.

What OS are you using?

Amazon Linux AMI

How are you running the application binary?

go version go1.5.2 linux/amd64

How do you know the application restarts?

In console logs, I can see that binary which was created at the time of build is getting executed, which contains the code to restart tcp server.
Though we have a gocron job running which checks status of the tcp listener on that particular port, if it;s not running then it will initiate from the build process, which is not happening in the case. Because only binary is getting executed.

below is my code

The very main function

 package main
import (
	"bufio"
	"log"
	"net"
	"net/http"
	"os"
	"os/exec"
	"time"

	"github.com/go-socket.io"
	"github.com/gocron"
)

var (
	logger *log.Logger
	server *socketio.Server
	so     socketio.Socket
	buff   []string
	isconn bool
)

const (
	remoteLog     = false
	maxbufflength = 5000
)

func runServer(port string) {
	out, err := exec.Command("fuser", port+"/tcp").Output()
	if err != nil {
		logger.Println(err)
	}
	s := string(out[:])
	if len(s) == 0 {
		logger.Println("Build Started!!!")
		_, eorr := exec.Command("go", "build", "myApplication").Output()
		if eorr != nil {
			logger.Println(eorr)
		}
		logger.Println("Starting Server!!!")
		//cmd:=exec.Command("./myApplication")
		cmd := exec.Command("Streetcomm_WebServer_App.exe")
		stdout, err := cmd.StdoutPipe()
		if err != nil {
			logger.Println(err)
		}
		//create log file
		f, err := os.OpenFile("log.txt", os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0600)
		if err != nil {
			logger.Println(err)
		}
		defer f.Close()
		// start the command after having set up the pipe
		if err := cmd.Start(); err != nil {
			logger.Println(err)
		}
		// read command's stdout line by line
		in := bufio.NewScanner(stdout)
		for in.Scan() {
			str := time.Now().Format(time.RFC850) + "=>"
			str += in.Text()
			if _, err = f.WriteString(str + "\n"); err != nil {
				logger.Println(err)
			}
			logger.Println("out=", str) // write each line to your log, or anything you need
			if len(buff) > maxbufflength {
				buff = buff[1:]
			}
			buff = append(buff, str)
			if isconn {
				so.Emit("chat message", str)
				so.BroadcastTo("chat", "chat message", str)
			}

		}
		if err := in.Err(); err != nil {
			logger.Println("error: %s", err)
		}
	} else {
		logger.Println("Server Running with Pid=", s)
	}

}
func startCron(port string) {
	gocron.Every(5).Seconds().Do(runServer, port)
	_, time := gocron.NextRun()
	logger.Println("CRON JOB SET AT=", time)
	<-gocron.Start()
}
func main() {
	wl, err := net.Dial("udp", "logs3.papertrailapp.com:32240")
	defer wl.Close()
	if remoteLog {
		logger = log.New(wl, "runServer: ", log.Lshortfile)
		if err != nil {
			log.Fatal("error")
		}

	} else {
		logger = log.New(os.Stdout, "runServer: ", log.Lshortfile)
	}
	port := os.Getenv("PORT")
	go startCron(port)
	server, err = socketio.NewServer(nil)
	if err != nil {
		log.Fatal(err)
	}

	server.On("connection", func(so1 socketio.Socket) {
		so = so1
		so.Join("chat")
		isconn = true
		log.Println("on connection")
		restr := ""
		for k, v := range buff {
			if k != 0 {
				restr += "##"
			}
			restr += v
			//so.BroadcastTo("chat","chat message", v)
		}
		if restr != "" {
			so.Emit("chat message", restr)
		}

	})
	server.On("error", func(so socketio.Socket, err error) {
		log.Println("error:", err)
	})
	http.Handle("/socket.io/", server)
	http.Handle("/", http.FileServer(http.Dir("./assFet")))
	log.Println("Serving at localhost:5000...")
	log.Fatal(http.ListenAndServe(":5000", nil))
}

The main function which is in the myApplication binary

func main() {
wl, err := net.Dial("udp", "logs3.papertrailapp.com:32240")
defer wl.Close()
if remoteLog {
	logger = log.New(wl, "HAVELLS_STREET_COMM: ", log.Lshortfile)
	if err != nil {
		log.Fatal("error")
	}

} else {
	logger = log.New(os.Stdout, "HAVELLS_STREET_COMM: ", log.Lshortfile)
}

port := os.Getenv("PORT")
logger.Println("new logger")

logger.Println("error")
//port="8000"
if port == "" {
	logger.Println("$PORT must be set")
}
logger.Println("Starting Application")

log.SetFlags(log.LstdFlags | log.Lshortfile)

dbController.DbUtilsInit(logger)

if dbController.DbConnected {
	defer dbController.Db.Close()
}
logger.Println("Strting TCP server")

go tcpServer.StartTcpServer(deviceConnectionChannel, logger)

Once it receives a connection, we are forming response and writting it to and the making a call to bufio.flush and that’s where the connection time out error occurred and we are closing particular tcpclient connection.

nathankerr · May 4, 2017, 8:13am

The code from “The very main function” does not allow for the possibility that the application could be restarted or that if it is not running (e.g., was killed) that it could be started without building. This means that something else is going on.

Please try running the application directly (i.e., from the command line) without “The very main function” and with remoteLog disabled. If the application still restarts, figure out what is restarting it and make it stop. I think something in your setup is interfering with your application.

parath · May 12, 2017, 5:21am

Hi nathankerr,
Sorry for the delay, because didn’t get a chance to reproduce the issue in the way you mentioned above, as it’s prod application.

But finally I was able to run the application in the same way, mentioned by you. And this time application didn’t get restarted.
So here is my conclusion:

The issue is occurring because server is trying to write on a closed tcp connection.
Restart might be cause of gocron job running in ‘very main’ function.

Below are the logs:

wrtite tcp error : serverIP -> Client IP : write: connection reset by peer

 panic: runtime error: index out of range
 goroutine 75 [running]:
 panic(0xa4b2a0, 0xc82000a0b0)
     /usr/local/go/src/runtime/panic.go:481 +0x3e6
 tcpUtils.(*TcpUtilsStruct).AddByteToOutputBuff(0xc82014eac0, 0x7e)
     /home/ec2-user/Havels/src/tcpUtils/tcpUtils.go:312 +0x94
 tcpUtils.(*TcpUtilsStruct).SendResponsePacket(0xc82014eac0, 0x3000, 0x20001900000000, 0x10a, 0x0, 0x0, 0x0, 0x0)
     /home/ec2-user/Havels/src/tcpUtils/tcpUtils.go:671 +0x30c
 sguUtils.(*SguUtilsStruct).SendWithRetry(0xc82014ea80, 0x3000, 0x20001900000000, 0x10a, 0x0, 0x0, 0x0, 0x0, 0x1)
     /home/ec2-user/Havels/src/sguUtils/sguUtils.go:1431 +0x1093
 created by itemUtils.HandleitemEvents.func1
     /home/ec2-user/Havels/src/itemUtils/itemUtils.go:1529 +0x1555

Here is the flow

itemUtils.HandleitemEvents.func1 -> calling SendWithRetry function as a goroutine

SendWithRetry making call to SendResponsePacket function

SendResponsePacket is calling AddByteToOutputBuff and the code is below:

func  (TcpUtilsStructPtr	*TcpUtilsStruct)   AddByteToOutputBuff( newByte byte){
       if (TcpUtilsStructPtr.outputBufferDipstick < MaxInOutBufferLength){
            TcpUtilsStructPtr.responseLineBuff[TcpUtilsStructPtr.outputBufferWritePtr] = newByte
			TcpUtilsStructPtr.outputBufferWritePtr++
            TcpUtilsStructPtr.outputBufferWritePtr &= (MaxInOutBufferLength-1)
            TcpUtilsStructPtr.outputBufferDipstick++
            
        } else {
            //should be spinning here till thread empties buffer.
            //TBD
            logger.Println("Warning! Output Buff is full");
        }
    }

And after that we are getting above panic, and application is getting crashed.

nathankerr · May 12, 2017, 7:46am

In general, panic crashes the program.

In this case it did so because the code tried to access an “index out of range” at /home/ec2-user/Havels/src/tcpUtils/tcpUtils.go:312. Assuming that the following is line 312:

TcpUtilsStructPtr.responseLineBuff[TcpUtilsStructPtr.outputBufferWritePtr] = newByte

This means that TcpUtilsStructPtr.outputBufferWritePtr is less that 0 or greater than len(TcpUtilsStructPtr.responseLineBuff)-1.

From looking at this code, I think this is supposed to be some sort of a circular buffer. In which case,

TcpUtilsStructPtr.outputBufferWritePtr &= (MaxInOutBufferLength-1)

should be:

TcpUtilsStructPtr.outputBufferWritePtr %= MaxInOutBufferLength

Also, instead of using MaxInOutBufferLength use len(TcpUtilsStructPtr.responseLineBuff) because the actual buffer could be changed somewhere else in the code, which is probably what happened here.

system · August 10, 2017, 7:47am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.