Is it worth investing in a replicated AWS environment or a disaster recovery setup beyond US-EAST-1?
The Amazon outage yesterday was pretty spectacular – with systems down for hours at a time, and data lagging or missing for over 24 hours in certain cases.
The public outrage was all over the place.
Is it justified? 👇
AWS maintains over 300 different SLAs for their infrastructure and products. In certain cases, these outages were still within their promised uptime or reliability time frames – with delays and “partial interruption” not considered complete downtime for hours at a time.
🏢 AWS-hosted companies losing revenue can:
1. Accept the cost of running a business and move on
2. Migrate to a different provider/data center and hope their SLA ends up being better (a random guess in a non-replicated environment)
3. Budget for a replicated environment – a secondary region within AWS or one outside of AWS if the setup is too cumbersome
We measure the “Cost of Inaction” against the aggregated cost of engineering, provisioning, server costs, and maintenance for the last scenario.
Companies losing $100K or less yesterday may find the revenue gap insufficient to set up a replicated environment with all pipelines, database syncs, routing, messaging, and failover switches in the event of failure. Simple systems only relying on EC2 and/or RDS may be easier to provision and maintain, but this is rarely the case.
Companies losing $1M or more can consider the cost of inaction toward the long-term impact of the business: additional churn, risk of swapping vendors, or additional complications with revenue generation.
While outages are unpredictable at best, maintaining a secondary environment may be a $150K one-off provision cost with a $100K/year AWS bill on top and $50K to $100K annual maintenance/support/emergency fee.
Your mileage may vary, but the formula remains comparable.
…
And the easiest workaround for Amazon now is to introduce a one-click “run a second replica of your entire AWS environment” with a click of a button, provisioned, synced, and routed internally from AWS with no human supervision. 👀

