Virtual memory fills up while allocated memory stays the same

I encountered something very weird. I wrote a bunch of code centred around making many concurrent http requests using net/http. This code appears to work fine on my local machine, but on online servers the code seems to cause a memory leak, which pprof and runtime never seem to pick up.

Whenever I deploy the code to the cloud (i have tried aws, google cloud and microsoft azure), the system’s memory steadily depletes until the lack of memory causes the code to crash and the cloud server hang.

I have been using free -m and top (res, not virt) to track the system’s memory depletion, and the same memory depletion does not occur on my local machines:

Microsoft Azure (ubuntu total        used        free      shared  buff/cache   available
Mem:          16041         505       15055          14         480       15228
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         506       15054          14         481       15227
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         507       15052          14         481       15226
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         508       15051          14         481       15225
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         510       15049          14         481       15223
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         510       15049          14         481       15223
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         510       15049          14         481       15223
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         513       15047          14         481       15220
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         514       15045          14         481       15219
Swap:             0           0           0
lewington@gambling:~/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16041         516       15043          14         481       15217
Swap:             0           0           0

desktop (ubuntu 16.04):

lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16001        4386        7938          95        3677       11065
Swap:         16343           0       16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16001        4386        7938          95        3677       11065
Swap:         16343           0       16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16001        4386        7938          95        3677       11065
Swap:         16343           0       16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16001        4386        7937          95        3677       11065
Swap:         16343           0       16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16001        4387        7936          95        3677       11064
Swap:         16343           0       16343
lewington@lewington-desktop:~/code/go/src/bitbucket.org/lewington/autoroller$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16001        4387        7936          95        3677       11064
Swap:         16343           0       16343

pprof and runtime don’t seem able to pick up the memory leak. go tool pprof -top http://localhost:6060/debug/pprof/heap shows a steady ~2mb of usage even while the system is crashing around it.

$ go tool pprof -top http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /home/lewington/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.063.pb.gz
File: main
Build ID: 5cb3aeea203dbef664dde0fb5ed13f564d3476e3
Type: inuse_space
Time: Jan 11, 2019 at 7:01am (UTC)
Showing nodes accounting for 2611.12kB, 100% of 2611.12kB total
      flat  flat%   sum%        ■■■   ■■■%
 1537.31kB 58.88% 58.88%  2049.62kB 78.50%  bitbucket.org/lewington/autoroller/brain/braindata.newChain
  561.50kB 21.50% 80.38%   561.50kB 21.50%  html.init
  512.31kB 19.62%   100%   512.31kB 19.62%  bitbucket.org/lewington/autoroller/brain/scale.newPoints
         0     0%   100%  2049.62kB 78.50%  bitbucket.org/lewington/autoroller/brain/braindata.(*contestant).addEmptyFluc
         0     0%   100%  2049.62kB 78.50%  bitbucket.org/lewington/autoroller/brain/braindata.(*contestantBag).AddEmptyFluc
         0     0%   100%  2049.62kB 78.50%  bitbucket.org/lewington/autoroller/brain/braindb.(*Lib).DetailedRaceOdds
         0     0%   100%  2049.62kB 78.50% 

And logging runtime.MemStats as the code executes also gives constant output:

time="2019-01-11T08:56:26Z" level=info msg="****** MEMORY USAGE *******"
time="2019-01-11T08:56:26Z" level=info msg="Alloc = 9 MiB"
time="2019-01-11T08:56:26Z" level=info msg="HeapIdle = 5 MiB"
time="2019-01-11T08:56:26Z" level=info msg="HeapInuse = 11 MiB"
time="2019-01-11T08:56:26Z" level=info msg="HeapReleased = 0 MiB"
time="2019-01-11T08:56:26Z" level=info msg="Malloc = 1\n"
time="2019-01-11T08:56:26Z" level=info msg="Frees = 1\n"
time="2019-01-11T08:56:26Z" level=info msg="TotalAlloc = 214 MiB"
time="2019-01-11T08:56:26Z" level=info msg="Sys = 23 MiB"
time="2019-01-11T08:56:26Z" level=info msg="StackInUse = 3 MiB"

time="2019-01-11T08:56:36Z" level=info msg="****** MEMORY USAGE *******"
time="2019-01-11T08:56:36Z" level=info msg="Alloc = 11 MiB"
time="2019-01-11T08:56:36Z" level=info msg="HeapIdle = 2 MiB"
time="2019-01-11T08:56:36Z" level=info msg="HeapInuse = 13 MiB"
time="2019-01-11T08:56:36Z" level=info msg="HeapReleased = 0 MiB"
time="2019-01-11T08:56:36Z" level=info msg="Malloc = 2\n"
time="2019-01-11T08:56:36Z" level=info msg="Frees = 2\n"
time="2019-01-11T08:56:36Z" level=info msg="TotalAlloc = 312 MiB"
time="2019-01-11T08:56:36Z" level=info msg="Sys = 23 MiB"
time="2019-01-11T08:56:36Z" level=info msg="StackInUse = 3 MiB"

time="2019-01-11T08:56:46Z" level=info msg="****** MEMORY USAGE *******"
time="2019-01-11T08:56:46Z" level=info msg="Alloc = 8 MiB"
time="2019-01-11T08:56:46Z" level=info msg="HeapIdle = 6 MiB"
time="2019-01-11T08:56:46Z" level=info msg="HeapInuse = 11 MiB"
time="2019-01-11T08:56:46Z" level=info msg="HeapReleased = 0 MiB"
time="2019-01-11T08:56:46Z" level=info msg="Malloc = 3\n"
time="2019-01-11T08:56:46Z" level=info msg="Frees = 3\n"
time="2019-01-11T08:56:46Z" level=info msg="TotalAlloc = 379 MiB"
time="2019-01-11T08:56:46Z" level=info msg="Sys = 24 MiB"
time="2019-01-11T08:56:46Z" level=info msg="StackInUse = 3 MiB"

One more helpful titbit comes via go tool pprof -top -alloc_space http://localhost:6060/debug/pprof/heap (which tracks cumulative/total memory usage i.e. it never goes down). According to this profile, the biggest user of memory overall is compress/flate.NewReader, on the virtual servers, but when I run the same command locally, that guy doesn’t show up at all, and instead most bytes are attributed to compress/flate.NewWriter.

From the Azure VM

go tool pprof -top -alloc_space http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /home/lewington/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.065.pb.gz
File: main
Build ID: 5cb3aeea203dbef664dde0fb5ed13f564d3476e3
Type: alloc_space
Time: Jan 11, 2019 at 7:02am (UTC)
Showing nodes accounting for 2717.45MB, 92.24% of 2946.06MB total
Dropped 186 nodes (■■■ <= 14.73MB)
      flat  flat%   sum%        ■■■   ■■■%
 1118.27MB 37.96% 37.96%  1118.27MB 37.96%  compress/flate.NewReader
  368.09MB 12.49% 50.45%   368.09MB 12.49%  bytes.makeSlice
  115.95MB  3.94% 54.39%  1234.22MB 41.89%  compress/gzip.(*Reader).Reset
  111.88MB  3.80% 58.19%   130.88MB  4.44%  net/http.(*Transport).dialConn
  104.52MB  3.55% 61.73%   104.52MB  3.55%  bitbucket.org/lewington/autoroller/realtime/odds.(*Odds).Odd
  104.02MB  3.53% 65.26%   276.05MB  9.37%  fmt.Sprintf
      86MB  2.92% 68.18%       86MB  2.92%  reflect.unsafe_New
   75.03MB  2.55% 70.73%    75.03MB  2.55%  strconv.appendEscapedRune
   60.51MB  2.05% 72.78%    60.51MB  2.05%  compress/flate.(*huffmanDecoder).init
   58.51MB  1.99% 74.77%    58.51MB  1.99%  bitbucket.org/lewington/autoroller/brain/braindata.(*contestant).autorollDegrees
   53.51MB  1.82% 76.59%    54.01MB  1.83%  fmt.Sprint
      49MB  1.66% 78.25%       49MB  1.66%  bitbucket.org/lewington/autoroller/realtime.(*interpreter).saveFlucForProvider
   48.59MB  1.65% 79.90%    50.09MB  1.70%  net.(*dnsMsg).Pack
   42.01MB  1.43% 81.33%    42.01MB  1.43%  strings.genSplit
      42MB  1.43% 82.75%   205.04MB  6.96%  github.com/sirupsen/logrus.(*TextFormatter).Format
   35.02MB  1.19% 83.94%    35.02MB  1.19%  net/textproto.(*Reader).ReadMIMEHeader
   23.50MB   0.8% 84.74%   228.54MB  7.76%  github.com/sirupsen/logrus.Entry.log
      22MB  0.75% 85.48%       22MB  0.75%  bitbucket.org/lewington/autoroller/realtime.(*cleaner).intCleaned

Local

$ go tool pprof -top -alloc_space http://localhost:6060/debug/pprof/heap
Fetching profile over HTTP from http://localhost:6060/debug/pprof/heap
Saved profile in /home/lewington/pprof/pprof.main.alloc_objects.alloc_space.inuse_objects.inuse_space.058.pb.gz
File: main
Build ID: 6996cbd6c46216e87717e5f3b483a90c9021d6b2
Type: alloc_space
Time: Jan 11, 2019 at 8:34pm (AEDT)
Showing nodes accounting for 5868.78kB, 100% of 5868.78kB total
      flat  flat%   sum%        ■■■   ■■■%
 3610.34kB 61.52% 61.52%  4843.97kB 82.54%  compress/flate.NewWriter
 1233.63kB 21.02% 82.54%  1233.63kB 21.02%  compress/flate.(*compressor).init
  512.75kB  8.74% 91.27%   512.75kB  8.74%  bytes.makeSlice
  512.05kB  8.73%   100%   512.05kB  8.73%  net/http.(*conn).readRequest
         0     0%   100%   512.75kB  8.74%  bytes.(*Buffer).ReadFrom
         0     0%   100%   512.75kB  8.74%  bytes.(*Buffer).grow
         0     0%   100%  4843.97kB 82.54%  compress/gzip.(*Writer).Write
         0     0%   100%   512.75kB  8.74%  io/ioutil.ReadFile
         0     0%   100%   512.75kB  8.74%  io/ioutil.readAll
         0     0%   100%  5356.72kB 91.27%  net/http.(*ServeMux).ServeHTTP
         0     0%   100%  5868.78kB   100%  net/http.(*conn).serve
         0     0%   100%  5356.72kB 91.27%  net/http.HandlerFunc.ServeHTTP
         0     0%   100%  5356.72kB 91.27%  net/http.serverHandler.ServeHTTP
         0     0%   100%  5356.72kB 91.27%  net/http/pprof.Index
         0     0%   100%  5356.72kB 91.27%  net/http/pprof.handler.ServeHTTP
         0     0%   100%  5356.72kB 91.27%  runtime/pprof.(*Profile).WriteTo
         0     0%   100%  4843.97kB 82.54%  runtime/pprof.(*profileBuilder).build
         0     0%   100%   512.75kB  8.74%  runtime/pprof.(*profileBuilder).readMapping
         0     0%   100%   512.75kB  8.74%  runtime/pprof.newProfileBuilder
         0     0%   100%  5356.72kB 91.27%  runtime/pprof.writeHeap
         0     0%   100%  5356.72kB 91.27%  runtime/pprof.writeHeapProto

I did this a bunch of time and those ratios are Representative.

Finally, I am sure that the process in question is the source of the leak, rather than some other rogue process, because top told me as much.

Side note: the threads for the process in question stay a very steady 19-22 our of ~210 total.

To my inexperienced-■■■ eye it looks a bit like the golang GC might not be releasing virtual memory back to the OS properly, maybe because of some strange way net/http interacts with the way the networking in huge virtual servers are set up. Probably more likely that my code is doing something ghastly. I was able to find one properly similar issue online, but the dialog stopped infuriatingly: https://github.com/golang/go/issues/15590.

I can’t include all the code by my next comment contains the net/http portion of it. What I would like to know is what kind of issues might cause this kind of memory leak, and/or how I might go about debugging it. I have used the wonderful pprof in the past to find out where in my code excessive memory usage is coming from, but in this case pprof isn’t registering any excessive memory usage, same with runtime. What tools are out there to help me find out the source of this issue?

Thanks very much guys, sorry for the huge read but it all seemed pertinent. I’m kind of at my wits end right now.

Here are the 3 classes that I use to actually make requests:

Throttle:

package session

import (
	"time"

	"bitbucket.org/lewington/autoroller/assist"
	"bitbucket.org/lewington/autoroller/lg"
	"github.com/jbowtie/gokogiri/xml"
)

type throttle struct {
	readyBots  chan *bot
	workedBots chan *workedBot
	bots       []*bot
	delays     []*delay
	perSecond  int
	botIndex   int
	timer
}

func (t *throttle) work(bot *bot, url string) (error, xml.Node) {
	err, elapsed, response := t.timedResponse(bot, url)
	lg.L.Debug("elapsed time: %v", elapsed)
	t.addRequestTime(elapsed)
	t.workedBots <- bot.worked(elapsed)
	return err, response
}

func (t *throttle) timedResponse(bot *bot, url string) (error, time.Duration, xml.Node) {
	startTime := time.Now()
	err, response := bot.request(url)
	endTime := assist.Timestamp()
	return err, endTime.Sub(startTime), response
}

func (t *throttle) request(url string) (error, xml.Node) {
	lg.L.Debug("Making request, %v workers remaining", len(t.readyBots))
	bot := <-t.readyBots
	return t.work(bot, url)
}

func (t *throttle) createBot(index int) {
	bot := newBot()
	t.readyBots <- bot
	t.bots[index] = bot
}

func (t *throttle) createDelay() {
	delay := newDelay(t.workedBots, t.readyBots, 1000)
	delay.startListening()
}

func (t *throttle) start() {
	for index := range t.bots {
		t.createBot(index)
		t.createDelay()
	}
}

func newThrottle(perSecond int) *throttle {
	t := &throttle{
		make(chan *bot, perSecond),
		make(chan *workedBot, perSecond),
		make([]*bot, perSecond),
		make([]*delay, perSecond),
		perSecond,
		0,
		newTimer(),
	}
	t.start()
	return t
}

Bot:

package session

import (
	"io/ioutil"
	"net/http"
	"time"

	"bitbucket.org/lewington/autoroller/lg"
	"github.com/jbowtie/gokogiri"
	"github.com/jbowtie/gokogiri/xml"
)

type bot struct {
	netClient *http.Client
}

var client *http.Client = &http.Client{
	Timeout: time.Second * 10,
}

func (b *bot) request(url string) (error, xml.Node) {
	lg.L.Debug("visiting: %v", url)

	response, err := b.netClient.Get(url)

	if err != nil {
		return err, nil
	}

	body, err := ioutil.ReadAll(response.Body)
	defer response.Body.Close()

	if err != nil {
		return err, nil
	}

	doc, err := gokogiri.ParseXml(body)

	if err != nil {
		return err, nil
	}

	return err, doc
}
func (b *bot) worked(elapsed time.Duration) *workedBot {
	return &workedBot{
		b,
		elapsed,
	}
}

func newBot() *bot {
	return &bot{
		client,
	}
}

delay:

package session

import (
	"time"
)

type delay struct {
	input  chan *workedBot
	output chan *bot
	wait   time.Duration
}

func (d *delay) startListening() {
	go d.listen()
}

func (d *delay) listen() {
	for bot := range d.input {
		d.delayedQueueAdd(bot)
	}
}

func (d *delay) delayedQueueAdd(b *workedBot) {
	toWait := d.wait - b.elapsed
	if toWait > 0 {
		time.Sleep(toWait)
	}
	d.output <- b.bot
}

func newDelay(input chan *workedBot, output chan *bot, millisecondsDelay int) *delay {
	return &delay{
		input,
		output,
		time.Millisecond * time.Duration(millisecondsDelay),
	}
}

Is there anything blindingly wrong with these lads?

Update: based on this https://youtu.be/ydWFpcoYraU?t=1084 I suspect that we’re maybe dealing with a cgo memory leak? Does anybody know anything about debugging those?

Hi

A quick peek and I think you make a recursive call which fills the stack. Go hasn’t got tail recursion optimisation (if I remember correctly) so this could be the problem. This is the call chain:

work > timedResponse > request > work

1 Like

Heya Johandalabaka,

Thanks for taking the time!

If this was the case, would I expect to see the stack growing over time? Something like MemStats.StackSys (https://golang.org/pkg/runtime/#MemStats)? I’m currently logging it and its holding steady.

Also I think your peek may have been a bit too quick haha, throttle.timedResponse calls bot.request not throttle.request. I appreciate the suggestion though.

1 Like

Hey guys, justa quick update: I found out that one of my dependencies gokogiri is known co cause memory leaks, and that appears to have been the issue.

1 Like

Good you found it

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.