SQL temporarily failed to return connections to pool

I’m working on the Boulder project. We use go-sql-driver/mysql and gorp to connect to a MariaDB 10.1 database. We use db.SetMaxOpenConns(500) to limit our outbound connections. Under normal conditions, we have about 15 open connections at any given time, with that number increasing during traffic spikes. We set readTimeout and writeTimeout to 4.9s, and timeout to 5s.

We recently had a mysterious outage that I am trying to debug retroactively. According to our stats, at 23:36:15 one day, the number of open connections from our service that talks to the DB started increasing rapidly. It continued to rise until 23:38:16, when it hit the maximum of 500, at which point all new queries blocked awaiting a connection (as expected). My first thought was that we had a bad query that took too long, or the database was very slow. However, our RPC stats show that our Go service (sa/sa.go) was continuing to successfully serve RPCs promptly during those two minutes, and since each RPC includes one or more SQL queries, I know that the queries were completing promptly. There were no errors or timeouts in the logs until 23:38:16, when RPCs started timing out because our Go service couldn’t acquire a connection to the database.

We restarted the Go service, and the problem disappeared and hasn’t recurred. However, it’s unsatisfying knowing that the behavior could recur at any time. Does anyone have any ideas of what could cause this behavior? My best guess is that something was causing the SQL driver not to return connections to its pool. Is there any known condition that could cause that?

Thanks,
Jacob

If it reoccurs, before restarting the service hit it with SIGQUIT to get a report of all running goroutines and their stacks. It should be possible to figure out what happened with that report.

3 Likes

It turns out that one of our code paths failed to roll back a transaction on error. The dangling transaction kept a dedicated connection open indefinitely, which quickly exhausted our available connections.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.