Golang service running on Kubernetes (EKS) gets OOM killed (high RES memory value, low runtime.Memstats.Alloc value)

mindthebits · February 21, 2024, 7:43am

I have a Go service that I’m running in Kubernetes (AWS EKS). I’m observing that the RES memory grows to the max resources.limits.memory and gets OOMKilled by Kubernetes. As a point of reference, this code is running in production currently in an EC2 instance t3.small and the RES memory value is always under 50MB with a heavy load of 20k rps (async message processing). In my k8s dev environment, I only see 5 rps and the RES memory grows to whatever max value, I tried assigning up to 2GB to the pod, but it’s the same pattern, the process gets OOMKilled. It seems like Memory is not fully released to the OS, but the service GC shows a small memory footprint (12->12->9 MB, 19 MB goal)

K8s config and env variables

For easier testing, I configured pods to have 365Mi, since it was the same pattern with 2GB. This is the k8s manifest resources configuration:

    resources:
      limits:
        memory: 365Mi
      requests:
        cpu: 1000m
        memory: 365Mi

With the following Environment variables:

GOMEMLIMIT: '180MiB'
GOMAXPROCS: '2'
GODEBUG: 'gctrace=1'
GOGC: '50'

Initially, I didn’t have GOMEMLIMIT but I read that it could help, so I set it with half of the memory limit of the pod, but it doesn’t seem to help. I also had the default GOGC, so tried the value of 50, but doesn’t seem to help.

Garbage collection

GC logs before it gets killed show the following:

21:03:41.854 gc 1643 @80939.735s 0%: 0.060+2.4+0.002 ms clock, 0.12+1.1/1.4/0.57+0.005 ms cpu, 12->12->8 MB, 12 MB goal, 0 MB stacks, 0 MB globals, 2 P

Memstats

runtime.mstats show the following (added MB conversion manually for easier readability):

21:03:43.195 
{
  "Alloc":8922888 (8.9MB),
  "TotalAlloc":5646312096 (5.6GB),
  "Sys":28415240 (28.4MB),
  "HeapSys":18284544 (18.2MB),
  "HeapIdle":6414336 (6.4MB),
  "HeapReleased":3121152 (3.1MB),
  "HeapInuse":11870208 (11.8MB),
  "HeapObjects":24393,
  "MallocsObjects":43016155,
  "FreesObjects":42991762,
  "LiveObjects":24393,
  "PauseTotalNs":153890330,
  "NumGC":1643,
  "NumGoroutine":265
}

Alloc is 8.9MB, which matches the end 8MB in the gc log (12->12->8 MB).

Here’s another log sample before and after the OOMKilled:

15:39:21.969  my-service gc 1709 @168600.017s 0%: 0.033+3.5+0.002 ms clock, 0.033+0/0.059/3.4+0.002 ms cpu, 12->12->9 MB, 19 MB goal, 0 MB stacks, 0 MB globals, 1 P (forced)
15:39:23.947  my-service {"Alloc":10126368,"TotalAlloc":5661878296,"Sys":36803848,"HeapSys":26771456,"HeapIdle":13369344,"HeapReleased":13336576,"HeapObjects":42613,"MallocsObjects":35141353,"FreesObjects":35098740,"LiveObjects":42613,"PauseTotalNs":70123823,"NumGC":1709,"NumGoroutine":264}
15:40:23.948  my-service {"Alloc":14120360,"TotalAlloc":5665872288,"Sys":36803848,"HeapSys":26738688,"HeapIdle":10780672,"HeapReleased":10780672,"HeapObjects":73826,"MallocsObjects":35172566,"FreesObjects":35098740,"LiveObjects":73826,"PauseTotalNs":70123823,"NumGC":1709,"NumGoroutine":264}
15:41:16.861  my-service Killed
15:41:18.201  my-service gc 1 @0.015s 6%: 0.007+4.9+0.002 ms clock, 0.007+0.027/1.3/0+0.002 ms cpu, 3->4->2 MB, 4 MB goal, 0 MB stacks, 0 MB globals, 1 P

top

kubectl top pod my-service-pod-56f7fcffbb-d8tdh shows:

NAME                              CPU(cores)   MEMORY(bytes)
my-service-pod-56f7fcffbb-d8tdh   8m           344Mi

top shows:

top - 05:04:05 up 14 days,  8:05,  0 user,  load average: 1.78, 1.95, 1.89
Tasks:   4 total,   1 running,   3 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.2 us,  5.2 sy,  0.0 ni, 79.2 id,  0.8 wa,  0.0 hi,  1.6 si,  0.0 st
MiB Mem :  15801.5 total,   4087.7 free,   9661.0 used,   2460.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   6140.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     11 guest     20   0 2315940 348884  10532 S   0.7   2.2   7:02.35 my-service

pprof

pprof does’t show anything out of the ordinary either, total Memory allocated is 7096.39kB.

go tool pprof ~/Downloads/my-service-pod-56f7fcffbb-d8tdh.tar.gz
File: my-service
Type: inuse_space
Time: Feb 6, 2024 at 8:40pm (PST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 7096.39kB, 100% of 7096.39kB total
Showing top 10 nodes out of 35
      flat  flat%   sum%        ■■■   ■■■%
    4104kB 57.83% 57.83%     4104kB 57.83%  github.com/DataDog/datadog-go/v5/statsd.newStatsdBuffer (inline)
  902.59kB 12.72% 70.55%   902.59kB 12.72%  compress/flate.NewWriter (inline)
  553.04kB  7.79% 78.34%   553.04kB  7.79%  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newConcentrator
  512.34kB  7.22% 85.56%   512.34kB  7.22%  crypto/x509.map.init.0
  512.31kB  7.22% 92.78%   512.31kB  7.22%  vendor/golang.org/x/net/http/httpguts.map.init.0
  512.10kB  7.22%   100%   512.10kB  7.22%  github.com/aws/aws-sdk-go/aws/endpoints.init
         0     0%   100%   902.59kB 12.72%  bufio.(*Writer).Flush
         0     0%   100%   902.59kB 12.72%  compress/gzip.(*Writer).Write
         0     0%   100%   512.34kB  7.22%  crypto/x509.init
         0     0%   100%     4104kB 57.83%  github.com/DataDog/datadog-go/v5/statsd.(*bufferPool).addNewBuffer
(pprof)

Question

Am I correct concluding that there’s no memory leak because of the following:

runtime.MemStats.Alloc is low
runtime.Memstats.NumGC is constant (no unfinished go routines)
runtime.Memstats.TotalAlloc is big since it’s the accumulated number since the service started but most of that memory has been released.
gc has very little memory to release each run (12->12->8 MB, 12 MB goal). I haven’t seen that goal above 18 MB.

It looks like the gc is releasing memory but not all of it is going back to the OS. runtime.Memstats.TotalAlloc does show 5.6GB which tells me that some memory is released, but not as it does in the EC2 instance where the process has less than 50MB RES memory footprint, in kubernetes the service keeps the max memory allocated in resources.limits.memory.

I was expecting to not have any issues since the service runs fine in an EC2 instance. And I’ve looked at several posts for weeks trying to find an answer.

Xavi_Ros · March 2, 2024, 11:44pm

I swear God it’s happening to me as well.
When running or Go code in a bare metal host the memory keeps stable. I used tools to monitor the Go code like using pprof and many functions from runtime package (RearMemStatsm, goroutines number, etc) and also checking it in the OS side the used resources… And everything was completely stable.

When running it from our Kubernetes k3s cluster I could see how Go monitoring was exactly the same but in the K8s environment the used memory from the host was raising and raising (like a memory leak) but it was getting stable when it reached a very very high amount of memory used (so it remained very high but stopped growing in memory).

So I think that the GC is really freeing a lot of resources correctly, but the OS seems that is taking a lot of time till it starts freeing resources…
I guess the scenario is kinda similar…

mindthebits · March 5, 2024, 6:30am

yes! that sounds exactly like what I’m seeing. I’m still trying to find the point where the memory stabilizes and stops climbing up. I’ll post my update when I have those numbers.

John_09 · March 6, 2024, 2:14pm

I appreciate you sharing that information with me. It’s exactly what I needed.

Xavi_Ros · March 7, 2024, 3:48pm

If you clean cache from your host you’d notice it gets freed suddenly and then keeps growing again.

mindthebits · March 9, 2024, 1:22am

The culprit turned out to be that we were building the binary with -race in our Dockerfile:

RUN go build -race -v

As soon as we removed the flag, the Resident Memory Size (RES) dropped significantly! From 137MB to 22.4MB as seen on this screenshot:

We also ended up changing the Docker Image to Alpine to have a smaller image size (from 1.19GB to 535MB) but that didn’t affect the memory footprint. Now we have:

FROM golang:1.22.0-alpine

WORKDIR /app

COPY . .

RUN go build -v

EXPOSE 8080

USER guest

CMD /app/my-service

See the following image of what RES memory would look like throughout weeks ending up in OOM pod restarts:

Note: As explained in the post, we thought we didn’t have this issue in AWS t3.small EC2 instances with the same code and build process. In EC2, RES memory was between 45MB-60MB. But double checking the build step, we realized that we had already removed the race flag from the build command when deploying to EC2.

Chasing this issue, we also realized we could run fine with GOMAXPROCS=1 and less cpu, so now we have:

    resources:
      limits:
        cpu: 250m
        memory: 100Mi
      requests:
        cpu: 250m
        memory: 100Mi
    env:
      open:
        GOMEMLIMIT: '100MiB'
        GOMAXPROCS: '1'

thenorthnate · March 9, 2024, 7:55pm

This is unrelated to your original question, but you could further reduce your image size quite a bit by doing a staged build in the dockerfile. You don’t need the go compiler in the image you ship - just the binary you built!

Xavi_Ros · March 11, 2024, 5:38pm

Seems that’s something already known, there are some articles already talking about just that: When Kubernetes and Go don’t work well together

mindthebits · March 11, 2024, 6:34pm

Btw, the reason why this was hard to diagnose was that when using the -race flag to build, apparently there are some C libraries that the program will use and that memory is added to the Go process but it is not part of GC logs or runtime.Memstats. So the memory seemed unaccounted for from Go’s perspective and that memory is not freed by Go’s GC.

Thanks for the link to the article @Xavi_Ros . GOMEMLIMIT has to be used definitely as mentioned there:

When the Go program (as a single process) runs in a container in a Kubernetes pod, by default it doesn’t know about the resource limits set for the container it is running in

In my case, that was set, but that didn’t help since the memory from the C libraries doesn’t seemed to be freed by the GC and it would run OOM, no matter what limit I had specified.

alexcray · March 16, 2024, 5:50am

A high Resident Set Size (RES) in system monitoring tools indicates the amount of physical memory your process is holding (not necessarily using actively).

system · June 14, 2024, 5:50am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.