No context. Which project, which app - not specified. This client has four and ten.
Found the right one.
Disk - 100%. Replica - read-only:on. Checking table sizes:
┌────────────────────┬────────┐
│ exception_logs │ 33 GB │
│ failed_jobs │ 15 GB │
│ jobs │ 22 GB │
└────────────────────┴────────┘
Why would you write logs to the same database as your actual data?
Fine. Let’s see what’s in those logs:
▸ cURL error 6: Could not resolve host: teams-webhook-url
Someone added this to helm values:
▸ TEAMS_WEBHOOK_URL: "teams-webhook-url"
Literally the string “𝘁𝗲𝗮𝗺𝘀-𝘄𝗲𝗯𝗵𝗼𝗼𝗸-𝘂𝗿𝗹”. The idea - pull the value from Bitbucket Pipelines variables. Except nobody added –𝘀𝗲𝘁 in the pipeline.
So it went:
exception → writes log to DB → tries to send to Teams → host doesn’t resolve → exception → writes log → tries to send → …
Until it filled the disk.
Tried to clean the tables. Managed database on OVH: superuser not available, OVH doesn’t allow it. Great feature. Without superuser, you can’t write to a read-only database.
Dropped the database. Restored from backup.
The error notification system took down the database.
There was a backup. Lucky.