Not traffic. Not memory. Just database connections.
Here’s what happens: your app talks to the database through connections - think of them as phone lines. Most cloud databases limit these (Oracle Cloud: 300, CloudSQL varies by tier). Each user request might grab one. Sounds fine until it isn’t.
Real story #1: Built WordPress hosting platform. Used CloudSQL. Load test showed approaching connection limits. Caught in metrics before prod chaos. Fixed: CloudSQL Proxy + connection pooling.
Real story #2: Client’s site on Oracle Cloud (300 connection limit). Ran fine for months. Then one Monday morning: dead. No errors. No warnings. Just… stopped responding. Database showed 333 active connections, pegged at max, no response. Restarting the DB instance hung. Support spent an hour manually resurrecting it. Back online: 4 connections. Four. Out of 300.
Wrong thoughts:
- “We’re not big enough for this problem” (size doesn’t matter - bad connection handling does)
- Treating SaaS databases like they’re infinite
- No monitoring on DB metrics because “it’s managed”
- No connection pooling in the app layer
What would be better:
- Connection pooling (PgBouncer, ProxySQL, cloud-native proxies)
- Proper connection timeout handling in your app
- Monitoring + alerts on connection count (even on SaaS)
- Cache layers to reduce DB pressure
You don’t need perfect architecture on day one. But monitoring connection usage? That’s baseline observability, not optional.
Can you share your “oh crap” database story? Drop it in comments - let’s learn from each other’s pain.
