Prometheus is free. Until you actually count.

Prometheus is free. Until you actually count.

You look at your cluster and start counting: wait, what are all these pods?

prometheus-0           3387 MB
prometheus-1           3028 MB
thanos-store-0          439 MB
thanos-store-1          899 MB
thanos-query            486 MB
thanos-query            139 MB
thanos-compact          300 MB
thanos-bucket            13 MB
grafana                 334 MB
kube-state-metrics       40 MB
prometheus-operator      21 MB
prometheus-adapter       24 MB
blackbox-exporter        16 MB
node-exporter ×8         ~10 MB each

That’s just monitoring. ~9.5 GB RAM, 20+ pods.


kube-prometheus-stack deploys all of this with a single command. Defaults are almost reasonable - you just need to enable storage and set resource limits for your cluster. Neither is configured out of the box.

On a three-node cluster, the monitoring stack easily takes 20-30% of cluster resources. Before you’ve monitored a single application pod. On larger projects we usually dedicate a separate nodepool for monitoring - at least two nodes. So monitoring doesn’t affect workloads. And workloads don’t affect monitoring.

This is no longer just a few extra pods. These are separate machines running around the clock.


The biggest consumer is Prometheus. RAM and disk depend on three things: how many metrics, how often you scrape, how long you retain.

Rough math: 100,000 metrics × 30 days retention × 15s scrape interval ≈ 25-40 GB. That’s a small cluster. A busy one - 500K metrics → 200-300 GB. And that’s just Prometheus.


Maintenance costs get counted less often than resource costs.

Stack upgrades - quarterly is reasonable, breaking changes happen (minor versions too, even though they shouldn’t). Alert rules go stale as services change - an alert for a service that no longer exists is just noise. Grafana dashboards need someone to maintain them. Usually nobody does. Then “why isn’t this showing data?” - the service was renamed three months ago.

Realistically: 2-4 hours a month when things are healthy. Plus incident time.


Prometheus is free.

You’re not paying for Datadog - you never bought it.

They charge you in engineering time, cluster resources, and the ongoing attention it takes to keep this working.

Do you know what it’s actually costing you?