Prometheus is free. Until you actually count.
You look at your cluster and start counting: wait, what are all these pods?
prometheus-0 3387 MB
prometheus-1 3028 MB
thanos-store-0 439 MB
thanos-store-1 899 MB
thanos-query 486 MB
thanos-query 139 MB
thanos-compact 300 MB
thanos-bucket 13 MB
grafana 334 MB
kube-state-metrics 40 MB
prometheus-operator 21 MB
prometheus-adapter 24 MB
blackbox-exporter 16 MB
node-exporter ×8 ~10 MB each
That’s just monitoring. ~9.5 GB RAM, 20+ pods.
kube-prometheus-stack deploys all of this with a single command. Defaults are almost reasonable - you just need to enable storage and set resource limits for your cluster. Neither is configured out of the box.
On a three-node cluster, the monitoring stack easily takes 20-30% of cluster resources. Before you’ve monitored a single application pod. On larger projects we usually dedicate a separate nodepool for monitoring - at least two nodes. So monitoring doesn’t affect workloads. And workloads don’t affect monitoring.
This is no longer just a few extra pods. These are separate machines running around the clock.
The biggest consumer is Prometheus. RAM and disk depend on three things: how many metrics, how often you scrape, how long you retain.
Rough math: 100,000 metrics × 30 days retention × 15s scrape interval ≈ 25-40 GB. That’s a small cluster. A busy one - 500K metrics → 200-300 GB. And that’s just Prometheus.
Maintenance costs get counted less often than resource costs.
Stack upgrades - quarterly is reasonable, breaking changes happen (minor versions too, even though they shouldn’t). Alert rules go stale as services change - an alert for a service that no longer exists is just noise. Grafana dashboards need someone to maintain them. Usually nobody does. Then “why isn’t this showing data?” - the service was renamed three months ago.
Realistically: 2-4 hours a month when things are healthy. Plus incident time.
Prometheus is free.
You’re not paying for Datadog - you never bought it.
They charge you in engineering time, cluster resources, and the ongoing attention it takes to keep this working.
Do you know what it’s actually costing you?