I encountered something very weird. I wrote a bunch of code centred around making many concurrent http requests using net/http
. This code appears to work fine on my local machine, but on online servers the code seems to cause a memory leak, which pprof
and runtime
never seem to pick up.
Whenever I deploy the code to the cloud (i have tried aws, google cloud and microsoft azure), the system’s memory steadily depletes until the lack of memory causes the code to crash and the cloud server hang.
I have been using free -m
and top
(res, not virt) to track the system’s memory depletion, and the same memory depletion does not occur on my local machines:
Microsoft Azure (ubuntu total used free shared buff/cache available
Mem: 16041 505 15055 14 480 15228
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 506 15054 14 481 15227
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 507 15052 14 481 15226
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 508 15051 14 481 15225
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 510 15049 14 481 15223
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 510 15049 14 481 15223
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 510 15049 14 481 15223
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 513 15047 14 481 15220
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 514 15045 14 481 15219
Swap: 0 0 0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16041 516 15043 14 481 15217
Swap: 0 0 0
desktop (ubuntu 16.04):
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16001 4386 7938 95 3677 11065
Swap: 16343 0 16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16001 4386 7938 95 3677 11065
Swap: 16343 0 16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16001 4386 7938 95 3677 11065
Swap: 16343 0 16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16001 4386 7937 95 3677 11065
Swap: 16343 0 16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16001 4387 7936 95 3677 11064
Swap: 16343 0 16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
total used free shared buff/cache available
Mem: 16001 4387 7936 95 3677 11064
Swap: 16343 0 16343
pprof
and runtime
don’t seem able to pick up the memory leak. go tool pprof -top http://localhost:6060/debug/pprof/heap
shows a steady ~2mb of usage even while the system is crashing around it.
$ go tool pprof -top http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /home/lewington/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.063.pb.gz
File: main
Build ID: 5cb3aeea203dbef664dde0fb5ed13f564d3476e3
Type: inuse_space
Time: Jan 11, 2019 at 7:01am (UTC)
Showing nodes accounting for 2611.12kB, 100% of 2611.12kB total
flat flat% sum% ■■■ ■■■%
1537.31kB 58.88% 58.88% 2049.62kB 78.50% bitbucket.org/lewington/autoroller/brain/braindata.newChain
561.50kB 21.50% 80.38% 561.50kB 21.50% html.init
512.31kB 19.62% 100% 512.31kB 19.62% bitbucket.org/lewington/autoroller/brain/scale.newPoints
0 0% 100% 2049.62kB 78.50% bitbucket.org/lewington/autoroller/brain/braindata.(*contestant).addEmptyFluc
0 0% 100% 2049.62kB 78.50% bitbucket.org/lewington/autoroller/brain/braindata.(*contestantBag).AddEmptyFluc
0 0% 100% 2049.62kB 78.50% bitbucket.org/lewington/autoroller/brain/braindb.(*Lib).DetailedRaceOdds
0 0% 100% 2049.62kB 78.50%
And logging runtime.MemStats
as the code executes also gives constant output:
time="2019-01-11T08:56:26Z" level=info msg="****** MEMORY USAGE *******"
time="2019-01-11T08:56:26Z" level=info msg="Alloc = 9 MiB"
time="2019-01-11T08:56:26Z" level=info msg="HeapIdle = 5 MiB"
time="2019-01-11T08:56:26Z" level=info msg="HeapInuse = 11 MiB"
time="2019-01-11T08:56:26Z" level=info msg="HeapReleased = 0 MiB"
time="2019-01-11T08:56:26Z" level=info msg="Malloc = 1\n"
time="2019-01-11T08:56:26Z" level=info msg="Frees = 1\n"
time="2019-01-11T08:56:26Z" level=info msg="TotalAlloc = 214 MiB"
time="2019-01-11T08:56:26Z" level=info msg="Sys = 23 MiB"
time="2019-01-11T08:56:26Z" level=info msg="StackInUse = 3 MiB"
time="2019-01-11T08:56:36Z" level=info msg="****** MEMORY USAGE *******"
time="2019-01-11T08:56:36Z" level=info msg="Alloc = 11 MiB"
time="2019-01-11T08:56:36Z" level=info msg="HeapIdle = 2 MiB"
time="2019-01-11T08:56:36Z" level=info msg="HeapInuse = 13 MiB"
time="2019-01-11T08:56:36Z" level=info msg="HeapReleased = 0 MiB"
time="2019-01-11T08:56:36Z" level=info msg="Malloc = 2\n"
time="2019-01-11T08:56:36Z" level=info msg="Frees = 2\n"
time="2019-01-11T08:56:36Z" level=info msg="TotalAlloc = 312 MiB"
time="2019-01-11T08:56:36Z" level=info msg="Sys = 23 MiB"
time="2019-01-11T08:56:36Z" level=info msg="StackInUse = 3 MiB"
time="2019-01-11T08:56:46Z" level=info msg="****** MEMORY USAGE *******"
time="2019-01-11T08:56:46Z" level=info msg="Alloc = 8 MiB"
time="2019-01-11T08:56:46Z" level=info msg="HeapIdle = 6 MiB"
time="2019-01-11T08:56:46Z" level=info msg="HeapInuse = 11 MiB"
time="2019-01-11T08:56:46Z" level=info msg="HeapReleased = 0 MiB"
time="2019-01-11T08:56:46Z" level=info msg="Malloc = 3\n"
time="2019-01-11T08:56:46Z" level=info msg="Frees = 3\n"
time="2019-01-11T08:56:46Z" level=info msg="TotalAlloc = 379 MiB"
time="2019-01-11T08:56:46Z" level=info msg="Sys = 24 MiB"
time="2019-01-11T08:56:46Z" level=info msg="StackInUse = 3 MiB"
One more helpful titbit comes via go tool pprof -top -alloc_space http://localhost:6060/debug/pprof/heap
(which tracks cumulative/total memory usage i.e. it never goes down). According to this profile, the biggest user of memory overall is compress/flate.NewReader
, on the virtual servers, but when I run the same command locally, that guy doesn’t show up at all, and instead most bytes are attributed to compress/flate.NewWriter
.
From the Azure VM
go tool pprof -top -alloc_space http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /home/lewington/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.065.pb.gz
File: main
Build ID: 5cb3aeea203dbef664dde0fb5ed13f564d3476e3
Type: alloc_space
Time: Jan 11, 2019 at 7:02am (UTC)
Showing nodes accounting for 2717.45MB, 92.24% of 2946.06MB total
Dropped 186 nodes (■■■ <= 14.73MB)
flat flat% sum% ■■■ ■■■%
1118.27MB 37.96% 37.96% 1118.27MB 37.96% compress/flate.NewReader
368.09MB 12.49% 50.45% 368.09MB 12.49% bytes.makeSlice
115.95MB 3.94% 54.39% 1234.22MB 41.89% compress/gzip.(*Reader).Reset
111.88MB 3.80% 58.19% 130.88MB 4.44% net/http.(*Transport).dialConn
104.52MB 3.55% 61.73% 104.52MB 3.55% bitbucket.org/lewington/autoroller/realtime/odds.(*Odds).Odd
104.02MB 3.53% 65.26% 276.05MB 9.37% fmt.Sprintf
86MB 2.92% 68.18% 86MB 2.92% reflect.unsafe_New
75.03MB 2.55% 70.73% 75.03MB 2.55% strconv.appendEscapedRune
60.51MB 2.05% 72.78% 60.51MB 2.05% compress/flate.(*huffmanDecoder).init
58.51MB 1.99% 74.77% 58.51MB 1.99% bitbucket.org/lewington/autoroller/brain/braindata.(*contestant).autorollDegrees
53.51MB 1.82% 76.59% 54.01MB 1.83% fmt.Sprint
49MB 1.66% 78.25% 49MB 1.66% bitbucket.org/lewington/autoroller/realtime.(*interpreter).saveFlucForProvider
48.59MB 1.65% 79.90% 50.09MB 1.70% net.(*dnsMsg).Pack
42.01MB 1.43% 81.33% 42.01MB 1.43% strings.genSplit
42MB 1.43% 82.75% 205.04MB 6.96% github.com/sirupsen/logrus.(*TextFormatter).Format
35.02MB 1.19% 83.94% 35.02MB 1.19% net/textproto.(*Reader).ReadMIMEHeader
23.50MB 0.8% 84.74% 228.54MB 7.76% github.com/sirupsen/logrus.Entry.log
22MB 0.75% 85.48% 22MB 0.75% bitbucket.org/lewington/autoroller/realtime.(*cleaner).intCleaned
Local
$ go tool pprof -top -alloc_space http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /home/lewington/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.058.pb.gz
File: main
Build ID: 6996cbd6c46216e87717e5f3b483a90c9021d6b2
Type: alloc_space
Time: Jan 11, 2019 at 8:34pm (AEDT)
Showing nodes accounting for 5868.78kB, 100% of 5868.78kB total
flat flat% sum% ■■■ ■■■%
3610.34kB 61.52% 61.52% 4843.97kB 82.54% compress/flate.NewWriter
1233.63kB 21.02% 82.54% 1233.63kB 21.02% compress/flate.(*compressor).init
512.75kB 8.74% 91.27% 512.75kB 8.74% bytes.makeSlice
512.05kB 8.73% 100% 512.05kB 8.73% net/http.(*conn).readRequest
0 0% 100% 512.75kB 8.74% bytes.(*Buffer).ReadFrom
0 0% 100% 512.75kB 8.74% bytes.(*Buffer).grow
0 0% 100% 4843.97kB 82.54% compress/gzip.(*Writer).Write
0 0% 100% 512.75kB 8.74% io/ioutil.ReadFile
0 0% 100% 512.75kB 8.74% io/ioutil.readAll
0 0% 100% 5356.72kB 91.27% net/http.(*ServeMux).ServeHTTP
0 0% 100% 5868.78kB 100% net/http.(*conn).serve
0 0% 100% 5356.72kB 91.27% net/http.HandlerFunc.ServeHTTP
0 0% 100% 5356.72kB 91.27% net/http.serverHandler.ServeHTTP
0 0% 100% 5356.72kB 91.27% net/http/pprof.Index
0 0% 100% 5356.72kB 91.27% net/http/pprof.handler.ServeHTTP
0 0% 100% 5356.72kB 91.27% runtime/pprof.(*Profile).WriteTo
0 0% 100% 4843.97kB 82.54% runtime/pprof.(*profileBuilder).build
0 0% 100% 512.75kB 8.74% runtime/pprof.(*profileBuilder).readMapping
0 0% 100% 512.75kB 8.74% runtime/pprof.newProfileBuilder
0 0% 100% 5356.72kB 91.27% runtime/pprof.writeHeap
0 0% 100% 5356.72kB 91.27% runtime/pprof.writeHeapProto
I did this a bunch of time and those ratios are Representative.
Finally, I am sure that the process in question is the source of the leak, rather than some other rogue process, because top
told me as much.
Side note: the threads for the process in question stay a very steady 19-22 our of ~210 total.
To my inexperienced-■■■ eye it looks a bit like the golang GC might not be releasing virtual memory back to the OS properly, maybe because of some strange way net/http
interacts with the way the networking in huge virtual servers are set up. Probably more likely that my code is doing something ghastly. I was able to find one properly similar issue online, but the dialog stopped infuriatingly: https://github.com/golang/go/issues/15590.
I can’t include all the code by my next comment contains the net/http
portion of it. What I would like to know is what kind of issues might cause this kind of memory leak, and/or how I might go about debugging it. I have used the wonderful pprof
in the past to find out where in my code excessive memory usage is coming from, but in this case pprof
isn’t registering any excessive memory usage, same with runtime
. What tools are out there to help me find out the source of this issue?
Thanks very much guys, sorry for the huge read but it all seemed pertinent. I’m kind of at my wits end right now.