Quantcast
Channel: Zoli's Sandbox » krishnan
Viewing all articles
Browse latest Browse all 10

Some Lessons From AWS Outage

$
0
0

Yesterday’s AWS outage has been buzzing around the tech blogosphere even after 24+ hours. As usual naysayers of cloud are up in the arms trying not to miss the golden opportunity to create FUD and competitors to Amazon are tapping into their misery to push their services. Well, people are tuned to accept this as legitimate strategy in a free market system. Without ranting any further on this or spending time blaming how Amazon botched this up big time, I want to talk about some of the lessons we can learn from this outage.

Before we talk about the lessons learned from this AWS debacle, I want to emphasize one difference between the cloud world and the traditional IT world. In the FUD and noise surrounding the outage, many miss this important advantage in the cloud based world. In traditional IT, there are significant costs associated with any DR plan because you have to provision the additional servers (datacenters) needed for any recovery well in advance. This not only adds significantly to the capital expense, it also adds deeply into the operating expenses. Even if your IT is with a managed provider, you spend lot of money reserving capacity for any possible DR needs. The advantage with the cloud based environment is that if you manage to keep your data backup current in another location, the processing power can be switched on by just swiping your credit card and without any need to either provision ahead of time or wait for a long time after the disaster. This is a very important advantage in the cloud based world and, when disaster strikes, you can recover with minimum monetary pinch (provided the DR plan is solid).

Yesterday’s EC2 outage exposed how many of the startups are running without a proper DR strategy. It is a shame that some of the well funded startups didn’t bother to plan for such eventualities. I guess this outage will teach a good lesson for the startups (and, also, their investors) and prepare them before the next disaster. There are many lessons we can learn from yesterday’s outage but I want to highlight some key ones in this post. After all, CloudAve is one of the well respected blogs on cloud computing and we cannot shy away from talking about a topic which reached even the consumer media.

The following are the key lessons we should learn from the episode:

  • Even though I don’t like the idea of coding for failure, just do it. When we shop at Walmart, we clearly understand that there is a compromise in the quality while getting goods at low prices. If we want to take advantage of commodity servers based public clouds, there is no option but to code for failure
  • Now imagine myself to be jumping up and down the stage like Steve Ballmer shouting “DR, DR, DR, DR, ……….”. Well, a proper DR strategy is key to any cloud plans. As I pointed out in the paragraph above, cloud computing offers some cost advantages while planning for disaster recovery. In spite of that advantage, we have seen many businesses getting hit in the AWS outage. There are many reasons why this happened. The picture painted by cloud evangelists (including myself in the past) gave an impression that cloud is fail proof. The higher emphasis on devs over ops gave some kind of complacency to people. They started believing religiously that cloud removes ops from the picture entirely and everything works automagically. All these evangelism driven dogma led people to not worry about DR at all. I am glad that this failure wake people up from any complacency
  • SLAs are important but what matters is how you have negotiated the compensation. This is one of the reasons I promote federated clouds over consolidation. When you have a handful of infrastructure players, they will not care about compensating for any loss during the outage unless the customers are Fortune 500 companies. We need providers who differentiate their offerings on the basis of how they compensate. In order for this to happen we need large scale competition and not consolidation. Only federated clouds can help in ensuring a marketplace where customers are not screwed because of cloud downtimes
  • Keep geographical redundancy and proximity to another cloud provider as key mantra while planning your DR strategy

Whether we like it or not, the customers are equally responsible for outages along with the cloud providers. Cloud is not a magic pill that solves the erection, sorry, scaling problems without any other worries. As in the case of pills that help in the erection issues, there are some side effects associated with the cloud that helps with the rapid infrastructure scaling. It is important that customers understand the compromises they have to make while taking advantage of the benefits offered by cloud computing. Yesterday’s AWS outage is a good opportunity to take a step back and be realistic about the approach to cloud.


Viewing all articles
Browse latest Browse all 10

Trending Articles