Cloudfail: Lessons Learned from AWS Outage

The Amazon AWS outage has cast questions as to whether AWS (and the cloud in general) is ready for hosting revenue-critical production applications. The outage lasted for more than a day for many popular sites like Reddit and Zuora, and it raised many doubts about cloud computing.

But before we write off the cloud, let’s review a few lessons we can learn from this outage.

Some survived, many did not
The number one lesson to learn is that not EVERY application running in AWS died. Netflix, one of the biggest web apps running in AWS, survived the outage without any issues while sites like Reddit and Zuora crashed for more than a day. So why is it that some survived and many did not? It’s simply because many of these companies forgot that cloud is not a magical solution to everything, and you still have to remember to implement the architectural techniques that have been perfected for years in the physical data center world as you move in the cloud world.

“This Week in Cloud Computing” – Agility & the Cloud

I recently had a chance to visit the webcast “This Week in Cloud Computing” and share some of my thoughts about cloud trends and application performance management. One thread of the conversation that I found particularly interesting was the discussion of agility in cloud computing. Although this theme comes up from time to time, most discussions I hear on cloud computing focus on cost-cutting and security.

These are extremely important concerns, of course — security in particular can be seen as a prerequisite of any sound cloud computing strategy. But there’s a “forest for the trees” risk in focusing too much on cloud computing pitfalls in lieu of recognizing its benefits, of which agility is certainly a major component.

We’re seeing with our own customers the need to be even more agile than before, of scrums becoming common and engineering stand-ups becoming a way of life. Any process change that helps speed up the application deployment chain is more than just a “nice to have;” it’s a sea change in the ability of companies to deliver value to their end users.

Bernard Golden makes some interesting points about two types of cloud computing agility in this discussion on CIO – definitely worth a read if you’re interested in the topic.

In case you missed the live webcast last week, here’s the video:



The Three Faces of the Hybrid Cloud

Most people, thankfully, have acknowledged that the cloud is real.  For groups such as development and testing, it’s not only real, but highly successful; we’re seeing our customers and prospects use the public clouds, or build and deploy private clouds at a much more rapid pace than even a year ago. The real question is this: when will companies starting deploying their complex production environments in the cloud?

Some pundits believe that enterprises will ditch their data centers and move their complex production applications completely into the cloud.  But this is not pragmatic.  We have talked to hundreds of enterprises and the people responsible for their complex production applications.  From these conversations, we have gained a sense of the overarching strategy that many companies are adopting.What we are seeing is a recurring scenario where enterprises move production environments to the cloud–but this migration involves a physical-cloud “hybrid” approach instead of a “pure cloud” approach.

Here are the hybrid approaches that I believe we will begin to see in abundance:

1.  Hybrid Physical-Cloud SOA: The business decides that it needs quick time-to-market for some new functionality, but IT folks can’t procure new machines quickly enough to meet the deadline. The business groups then decide to use the cloud for capacity for the new functionality. These new services in the cloud communicate to and from the existing services and functionality in the data center.

2. Cloud Bursting: Companies using cloud for temporary capacity in case of a spike.  For example, let’s say an eCommerce greeting card company needs 100 machines on Mother’s Day, but doesn’t want to buy them–so they get that capacity temporarily in the cloud.

3.  Cloud Failover: Using the cloud for failover and disaster recovery

We are definitely seeing companies begin to move services to the cloud, but they don’t want to risk critical production apps responsible for $2 billion in revenue. For now, hybrid is the way to go.

But only for now…