“Prod is down” is not a task. It’s the beginning of a conversation. Does DevOps know which prod is down — or are they waiting for someone to explain?
I was scrolling LinkedIn. Saw a post — “many DevOps engineers don’t know” — and a question: “prod is down, what’s your first move?”
Answered in the comments: “I’d ask — which prod exactly?”
Almost nobody agreed. Most people explained how they’d check something in AWS.
Guys, what made you assume it’s AWS?
One of my clients runs: AWS, GCP, Alibaba, and AWS-CN. How exactly are AWS logs going to help me?
Ideally — a monitoring alert would have already answered that question. But since the call happened:
So. Prod is down. What do I do first? I ask: what exactly went down, what does the customer see, how long has it been?
Then — documentation, runbooks. Then logs and metrics.
Not after I check the logs. Before.
Whoever skips the questions is fighting a fire in the wrong building.