I spent two hours doing it right.
Terraform. CodeBuild. IAM roles. VPC endpoints.
I was wrong.
We have an AWS China account. One app for Chinese users.
Two ECS services. 0.25 CPU. Minimal memory. App + MongoDB.
Around it: ALB, Service Discovery, ECR, VPC, EFS, a couple of VPC endpoints.
~1800 CNY/month. Everything works.
Docker image is built on CircleCI, pushed to AWS CN ECR.
Then pushes started timing out at 10 minutes.
The image is 120 MB - not large. The problem isn’t size.
It’s that AWS China sits behind the Great Firewall.
External bandwidth is unpredictable. CircleCI in the US pushing to cn-northwest-1 is a lottery.
My idea: move to CodeBuild. The logic was solid.
CircleCI uploads only the source - 5 MB. Triggers CodeBuild inside CN.
CodeBuild builds the image and pushes to ECR. External channel carries code, not a 120 MB image.
Monday, 4pm. Two hours left in the workday.
I started. Got absorbed.
Additional Terraform module. Terragrunt parameters. IAM roles - that’s always its own story: AWS permissions are genuinely complex, there’s a reason people call it out separately.
When I finished and opened the architecture diagram, it felt good to look at. Clean. Logical. External channel carries only source code. The image never crosses the firewall.
Time to deploy. I always pause here - step back, review everything once more, sometimes talk it through out loud. It catches things you miss while building.
It caught something.
ECR VPC endpoints. Two of them.
I’d already thought about this when first setting up ECS. It felt odd: ECS doesn’t work without ECR - why pay for two additional endpoints? I accepted it then and moved on.
But now I’d added them for what, exactly?
Then I ran the numbers.
Each VPC endpoint: $0.01/hour per AZ. Two AZs - $14/month per endpoint. Two endpoints - $29/month total.
Image is built twice a year.
$348/year. For two builds.
Removed both - works without them anyway.
Fine. But then I kept thinking.
I spent 2 engineering hours on this. That’s not free.
The CircleCI runner with a longer timeout - how much does it cost per run, twice a year?
Pennies.
Result: increased the timeout to 20 minutes. Closed the ticket.
This isn’t a story about laziness.
It’s about how easy it is to start doing things right - and not notice that the right solution costs more than the problem.
New service. New permissions. New endpoints. New costs. Each step makes sense. The total doesn’t.
Infrastructure grows complex not because anyone means harm.
But because each individual decision looks reasonable.
That’s exactly what I review in audits - not individual services, but the sum of decisions.
→ itaudit.yushkov.org