Every Day I Find Something New. Or a Variation of an Old Fuckup

Today I’m a DevOps/Cloud Architect, and when mentoring juniors, one of the first topics is “never use :latest in production.” But I learned this in practice.

Some Context

We worked with Azure Kubernetes, logging/monitoring team. Built custom OpenSearch images - pipeline built, tagged with version, and moved :latest tag.

After each build, we manually updated the version in Helm chart for deployment. One day my colleague and I watched another StatefulSet restart and thought: “Why bother? Let’s just use :latest? We deploy right after build anyway?”

Changed it. PullPolicy was Always, seemed fine.

Things we realised

Lasted two deploys in dev environment. Then issues emerged:

  • Can’t rollback easily
  • Can’t test images separately before deployment - they go straight to deploy
  • Most important: dev was “dev” for developers, but production for us. They still needed their logs)))

Quick fix, reverted to explicit versions. Just work.

Recent Example

A month ago helped a team with MongoDB crash in dev:

image: mongo  # no tag, which means :latest is used

Node went for maintenance, pod moved and image re-pulled. Mongo v8 was released 16 days earlier, the :latest tag moved to it. Pod started with new version, but storage had database from old one - incompatible. Dev was down for hours.

Developers were more surprised than upset - they work locally in docker compose anyway, so for them it was just “huh, interesting, didn’t know this could happen”.


Now I always use explicit versions in manifests. :latest only for local development, and only when I’m too lazy to type new one for every rebuild.

When I want to drive the point home, or explain the nuances, I mention: :latest tag isn’t “latest stable”, it’s “whatever sits under this tag right now”. Google “:latest problems” - tons of “10 ways to shoot yourself in the foot” stories.

What’s your :latest story?