The culprit turned out to be that we were building the binary with -race in our Dockerfile:
RUN go build -race -v
As soon as we removed the flag, the Resident Memory Size (RES) dropped significantly! From 137MB to 22.4MB as seen on this screenshot:
We also ended up changing the Docker Image to Alpine to have a smaller image size (from 1.19GB to 535MB) but that didn’t affect the memory footprint. Now we have:
FROM golang:1.22.0-alpine
WORKDIR /app
COPY . .
RUN go build -v
EXPOSE 8080
USER guest
CMD /app/my-service
See the following image of what RES memory would look like throughout weeks ending up in OOM pod restarts:
Note: As explained in the post, we thought we didn’t have this issue in AWS t3.small EC2 instances with the same code and build process. In EC2, RES memory was between 45MB-60MB. But double checking the build step, we realized that we had already removed the race flag from the build command when deploying to EC2.
Chasing this issue, we also realized we could run fine with GOMAXPROCS=1 and less cpu, so now we have:
resources:
limits:
cpu: 250m
memory: 100Mi
requests:
cpu: 250m
memory: 100Mi
env:
open:
GOMEMLIMIT: '100MiB'
GOMAXPROCS: '1'

