Hey everyone, I’ve been maintaining a small restaurant-focused website (mainly about Texas Roadhouse locations, menu updates, and deals), and the backend is written entirely in Go. I’ve recently started facing strange performance issues with my API layer specifically, requests timing out randomly and goroutines hanging even under light traffic. What’s odd is that everything used to run perfectly fine for months, and the issue only started appearing after I made a few structural changes to the code.
The website’s backend is built using the net/http package with a few helper functions that serve JSON responses to the frontend. It’s deployed on an Ubuntu VPS using Caddy as the reverse proxy. The endpoints mostly fetch menu data from a local PostgreSQL database and occasionally call an external API for restaurant details. Lately, I’ve noticed that several requests get stuck indefinitely the browser keeps waiting for a response, and the logs show no indication that the handler completed execution.
When I checked the runtime metrics using pprof, I saw an unexpected increase in the number of goroutines over time, even though I’m not spawning any in tight loops. It looks like some of my database query contexts aren’t being properly canceled when the client disconnects. I’ve been using context.WithTimeout() for all DB queries, but maybe I’ve missed something subtle — because even after the timeout, the goroutines stay alive until I restart the service.
I also tried running benchmarks and stress tests locally with ab and wrk, and after around 2000–3000 requests, the response time spikes from 100ms to over 5 seconds. CPU usage stays relatively stable, but memory usage keeps climbing. I suspect this is a connection pooling issue with database/sql or perhaps a race condition introduced by my recent caching layer (which uses a simple map guarded by a mutex). I added some defer statements and double-checked for possible deadlocks, but I haven’t found anything conclusive yet.
The really strange part is that when I disable my caching middleware, everything runs more smoothly but slower so the bug seems tied to how I’m managing shared state between goroutines. I might not be handling concurrent reads and writes correctly. I’m using Go 1.22.3, and the code runs fine locally for small tests, but under real-world load, it degrades fast.
Has anyone here experienced goroutines hanging due to improper context cancellation or mutex contention in Go web servers? What’s the best approach to debug such issues should I use pprof’s block profiling, or is there a more direct way to detect stuck goroutines in real time? I’d love to make my backend more resilient since my site’s restaurant data API is growing quickly, and I don’t want to deal with manual restarts every few hours. Sorry for long post!