Post Mortem lessons from Amazon
The AWS outage last week had caused a surge in cloud reliability discussions. I believe it turns out that using cloud service providers is much like using any other IT service, you must do your homework about how to deal with failures and also the appropriate vendor management procedures to choose wisely.
Having said that, Amazon is still the leader in cloud computing services and in my opinion their behaviour in reacting to this incident clearly shows why. They have just published an extremely detailed Post Mortem analysis, presenting the root causes, what is being done to avoid similar events in the future and also offering reasonable compensation to affected clients. It's also worth pointing that they mentioned the root cause even if it was a change mistake, a very honest posture, in my point of view.
If all the service providers behave like that we'll definitely keep seeing an increase in business moving to the cloud. Congratulations to Amazon.