Golang crashing in production netgo

I have a go program which connects to a tcp server and publishes that message to web service. the application works fine in the first go and it panics after a week or so. I am not sure why. the stack trace isn’t helpful either. (I cross compile from mac os to linux; env GOOS=linux GOARCH=386 go build) could some one give pointers on what might be the issue?

the tcp server connection might go off once in a while; I have build the reconnection logic by hand. the error condition would be applicable to posting as well. the internet connection might break at any time; so “dns” resolver which go uses by default “cgo” was panicking. I had recompile with the “netgo” to use pure go network implementation.

could someone help me out on what is wrong in this program?

go program

Stack trace

net.runtime_pollWait(0xf75c12d8, 0x72, 0x18690000) /usr/local/go/src/runtime/netpoll.go:157 +0x55 net.(*pollDesc).Wait(0x18a6bb38, 0x72, 0x0, 0x0)/usr/local/go/src/net/fd_poll_runtime.go:73 +0x35 net.(*pollDesc).WaitRead(0x18a6bb38, 0x0, 0x0) _poll_runtime.go:78 +0x33 net.(*netFD).Read(0x18a6bb00, 0x18cd6000, 0x1000, 0x1000, 0x0, 0xf75bc018, 0x18690000)/usr/local/go/src/net/fd_unix.go:232 +0x19anet.(*conn).Read(0x18cd4d78, 0x18cd6000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:172 +0xb9 net/http.noteEOFReader.Read(0xf75fc018, 0x18cd4d78, 0x1889770c, 0x18cd6000, 0x1000, 0x1000, 0x2c3620, 0x0, 0x0) /usr/local/go/src/net/http/transport.go:1370 +0x55 net/http.(*noteEOFReader).Read(0x18cd29e0, 0x18cd6000, 0x1000, 0x1000, 0x8056f0c, 0x0, 0x0) <autogenerated>:126 +0xae bufio.(*Reader).fill(0x1881b7d0) /usr/local/go/src/bufio/bufio.go:97 +0x172 bufio.(*Reader).Peek(0x1881b7d0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:132 +0xad net/http.(*persistConn).readLoop(0x188976e0) /usr/local/go/src/net/http/transport.go:876 +0xe3 created by net/http.(*Transport).dialConn /usr/local/go/src/net/http/transport.go:685 +0xabc

You’re not showing the complete panic output, and the program isn’t runnable by itself so it’s hard for anyone to reproduce.

Generally speaking, run the program compiled with -race.

just guessing…

bufio.(*Reader).fill(0x1881b7d0)
/usr/local/go/src/bufio/bufio.go:97 +0x172

go func() {
		reader := bufio.NewReaderSize(conn, 1024*1024)
		for {
			message, err := reader.ReadBytes('\n')

my assumption is that you create a buffer every time in that go routine and is possible to get over some limits. try to reuse the buffer. also check the size of this buffer.

i guess i can’t cross compile with -race option right?
env GOOS=linux GOARCH=386 go build -race

Try bufio.Scanner, it’s exactly the right tool for this problem (assume your program has crashed because you’ve used more than 2gb of heap)

scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
    fmt.Println(scanner.Text()) // Println will add back the final '\n'
}
if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "reading standard input:", err)
}
1 Like

@dfc: how would this resolve that issue? ( I don’t know the internals of bufio.newReader /NewScanner)

I was able to get the stacktrace once it crashed (compiled with --race flag ) even that is not showing me any valuable information. pasting it here…

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f4671b1a625 m=41

goroutine 0 [idle]:

goroutine 71 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:245 fp=0xc820048d10 sp=0xc820048d08
runtime.gcMarkTermination()
/usr/local/go/src/runtime/mgc.go:1260 +0x393 fp=0xc820048f20 sp=0xc820048d10
runtime.gcMarkDone()
/usr/local/go/src/runtime/mgc.go:1146 +0x29e fp=0xc820048f40 sp=0xc820048f20
runtime.gcBgMarkWorker(0xc820024a00)
/usr/local/go/src/runtime/mgc.go:1480 +0x448 fp=0xc820048fb8 sp=0xc820048f40
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1998 +0x1 fp=0xc820048fc0 sp=0xc820048fb8
created by runtime.gcBgMarkStartWorkers
/usr/local/go/src/runtime/mgc.go:1330 +0x92

goroutine 1 [select (no cases), 3405 minutes]:
main.main()
/home/myuser/goprog/publishmessage/publisher.go:61 +0x5ac

goroutine 17 [syscall, 3405 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1998 +0x1

goroutine 19 [IO wait]:
net.runtime_pollWait(0x7f4672121758, 0x72, 0x415d89)
/usr/local/go/src/runtime/netpoll.go:160 +0x63
net.(*pollDesc).Wait(0xc820118060, 0x72, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:73 +0x56
net.(*pollDesc).WaitRead(0xc820118060, 0x0, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:78 +0x44
net.(*netFD).Read(0xc820118000, 0xc82011c000, 0x100000, 0x100000, 0x0, 0x7f4672118028, 0xc820092150)
/usr/local/go/src/net/fd_unix.go:250 +0x27b
net.(*conn).Read(0xc8200b6020, 0xc82011c000, 0x100000, 0x100000, 0x43833f, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0x121
net.(*TCPConn).Read(0xc8200b6020, 0xc82011c000, 0x100000, 0x100000, 0x0, 0x0, 0x0)
:68 +0x7d
bufio.(*Reader).fill(0xc82005fee0)
/usr/local/go/src/bufio/bufio.go:97 +0x365
bufio.(*Reader).ReadSlice(0xc82005fee0, 0x46390a, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:328 +0x5a9
bufio.(*Reader).ReadBytes(0xc82005fee0, 0xa, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:406 +0xc0
main.(*TcpConnector).connect.func1(0x7f4672120788, 0xc8200b6020, 0xc820094e70)
/home/myuser/goprog/publishmessage/publisher.go:120 +0x12e
created by main.(*TcpConnector).connect
/home/myuser/goprog/publishmessage/publisher.go:133 +0x35f

goroutine 20 [runnable]:
bytes.(*Buffer).WriteString(0xc82027c000, 0xc820569c7c, 0x7, 0x0, 0x0, 0x0)
/usr/local/go/src/bytes/buffer.go:141
encoding/json.(*encodeState).string(0xc82027c000, 0xc820568000, 0x2677, 0xc820568000)
/usr/local/go/src/encoding/json/encode.go:799 +0x2a2
encoding/json.stringEncoder(0xc82027c000, 0x7a7ec0, 0xc8200125c0, 0x198, 0x0)
/usr/local/go/src/encoding/json/encode.go:551 +0x4bc
encoding/json.(*structEncoder).encode(0xc820012240, 0xc82027c000, 0x8445e0, 0xc8200125a0, 0x199, 0x100)
/usr/local/go/src/encoding/json/encode.go:587 +0x366
encoding/json.(*structEncoder).(encoding/json.encode)-fm(0xc82027c000, 0x8445e0, 0xc8200125a0, 0x199, 0xc820012500)
/usr/local/go/src/encoding/json/encode.go:601 +0x5f
encoding/json.(*ptrEncoder).encode(0xc82004e030, 0xc82027c000, 0x78c800, 0xc8200125a0, 0x16, 0x0)
/usr/local/go/src/encoding/json/encode.go:709 +0x11a
encoding/json.(*ptrEncoder).(encoding/json.encode)-fm(0xc82027c000, 0x78c800, 0xc8200125a0, 0x16, 0x0)
/usr/local/go/src/encoding/json/encode.go:714 +0x5f
encoding/json.(*encodeState).reflectValue(0xc82027c000, 0x78c800, 0xc8200125a0, 0x16)
/usr/local/go/src/encoding/json/encode.go:301 +0x79
encoding/json.(*encodeState).marshal(0xc82027c000, 0x78c800, 0xc8200125a0, 0x0, 0x0)
/usr/local/go/src/encoding/json/encode.go:274 +0xb7
encoding/json.Marshal(0x78c800, 0xc8200125a0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/encoding/json/encode.go:139 +0xc0
main.doPut(0xc820274d00, 0x3a, 0xc8200125a0)
/home/myuser/goprog/publishmessage/publisher.go:66 +0xa1
main.main.func1(0xc820094e70)
/home/myuser/goprog/publishmessage/publisher.go:56 +0x418
created by main.main
/home/myuser/goprog/publishmessage/publisher.go:59 +0x5a7

rax 0x0
rbx 0x7f4671e76d98
rcx 0xffffffffffffffff
rdx 0x6
rdi 0xad65
rsi 0x1642
rbp 0x94e242
rsp 0x7f45c8d7c9b8
r8 0xa
r9 0x7f45c8d7d700
r10 0x8
r11 0x206
r12 0x7f45c8d7cbe8
r13 0xc820083000
r14 0xc82033d380
r15 0x94c7c0
rip 0x7f4671b1a625
rflags 0x206
cs 0x33
fs 0x0
gs 0x0

Thank you for your report. Is this a new issue, or do you believe it to be the same issue as you reported previously.

You mentioned that you cross compiled from your mac to linux with GOARCH=386 but the stack trace shows pointers that are larger then 2^31 which makes me think this report comes from a 64bit system.

The stack trace you showed is using ReadBytes, not bufio.Scanner as discussed.

Long story short, your program is using too much memory, and possibly doing too many concurrent DNS lookups.

Can you please post the source to the program that matches this stack trace.

Thanks

the source code is available https://play.golang.org/p/Xu7clWnXTb

  1. yes the error is from a 64 bit system, but right now the stack trace is from a binary which is _not _ cross compiled and natively compiled with –trace flag.
  2. It seems I have multiple issues at play; but not able to pin point
  3. I tried to cross-compile using GOARCH=arm64 but the binary didn’t work on 64 bit machine (don’t know why but that a separate issue which I didn’t spend much time on)
  4. I will change to scanner and try it out once more; ( the funny thing is that it runs for a week or two and crashes; a classical memory leaking program)

OK, lets start to trim the fat to discover the root cause.

Line 45, there is no point in spinning off a goroutine then blocking. Just do the work in the main goroutine.

Line 71, when an error is encountered your function should return if you do not then you risk operating on invalid data

Also in the same function you are not closing the response body, you must always do this or the http client will not be able to reuse connections.

Looking at your program you are making too many concurrent DNS requests, when your program is not cross compiled this will use the system DNS resolver which consumes threads and megabytes of memory per request. If DNS is slow to respond your application will logjam on DNS requests.

When you cross compile the go runtime uses a different DNS resolver and you will see a different usage pattern.

If at all possible I recommend compiling natively on the target system, or if you plan to deploy via cross compilation, disable cgo entirely so you have a more consistent platform to develop on.

  1. the current program is compiled natively
  2. it is compiled with “cgo” flag as well so that it uses pure go implementation of dns lookup

I do have defer response.Body.Close() in case of successful “http put”. It is a fire and forget thing; the messages are generated at 5 messages/second rate ; I am fine with missing a couple in between

response, err := client.Do(request)
if err != nil {
		fmt.Println(err)
	} else {
		defer response.Body.Close()
		fmt.Println("   ", response.StatusCode)
		hdr := response.Header
		for key, value := range hdr {
			fmt.Println("   ", key, ":", value)
		}

	}
```

I’m sorry, I missed that you had a defer in the other side of the else block.

I recommend taking a look at your reconnection logic inside tcpconnect.connect. I think you could be leaking file descriptors, and it’s not clear that your not accidentally loosing messages taken off the out queue during reconnection.

hmm I was also thinking of redoing the reconnect part … Ideally speaking if we are reconnecting phase; it means there are no messages as the messages are produced from the tcp server. So yes we are missing those messages!!! But from the source itself and not inside the go program

Anyways really appreciate your effort in providing valuable pointers!!!

How would we check there are leaking file descriptors via ’ go trace’ ?

I don’t think go trace has that feature, it is a profiling tool, not a
debugger. I would avoid using it until you’ve stabilised your program.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.