Bad Premise: Cloud Outages are *not* driving IT back to premises

trapped

I wrote this responding to Lauren Carlson‘s (Software Advice) Blog Post.  Lauren – I’d be more likely to agree with the statement that “SLAs are dead”  Here’s why…

<soapbox>

Recent industry buzz about cloud service level agreements (SLAs) and reliability miss the core point about cloud.  Cloud is about agility, business models, consumerization of software and merciless pursuit of efficiency.

The fact that Amazon EC2 built its base without an “enterprise” SLA is exhibit #1 that the IT world changed and it’s not going back.

Here are my reasons why IT pandoras can’t get cloud back into the box.

#1. Cloud has vastly superior network connectivity

The concept of your users accessing your applications from inside your firewall is so 2005.  Today’s reality is that significant amounts of network access is externally routed means that applications need to live where they have excellent bandwidth to their users and to other applications.

#2. Cloud has elastic consumption of resources

Cloud is not less expensive infrastructure, it is mainly more flexible.  If you’re worried about an outage, then cloud is exactly the investment for you because you position a backup site at another location without having to pay for online resources.  It’s much harder to take down a site that invests the time to design a system that dynamically reallocates load between sites.

#3. Cloud drives more robust architecture

The fact that cloud delivery is more opaque and modular without a five 9s SLA has driven a cloud application architecture revolution (see CAP).  We have shifted the app paradigm from robust scale up hardware to robust scale out software.  Also significant, DevOps innovations have made deployments repeatable and adaptable.

The only “logical” argument for pulling applications back from the cloud is to assert control over more of the delivery chain for your application.  It the same reason that we think that driving is safer than flying – we’re the ones sitting behind the wheel when we drive.  News flash – driving is NOT safer than flying.

Cloud applications are not about hardware infrastructure, they are about SOFTWARE.  Perhaps one of the greatest disservices foisted on the market was saying cloud is synonymous with “Infrastructure as a Service” and “Virtualization.”  Cloud applications are powerful because we created ways that circumvent the limitations of IaaS and VMs!

</soapbox>

Notes from 2011 Cloud Connect Event Day 2 (#ccevent)

With the OpenStack launch behind me, I have some time to attend the Cloud Connect Event.  I missed all the DevOps sessions, but was getting to geek out on the NoSQL & Big Data sessions.   I jumped to the private cloud track (based on Twitter traffic) and was rewarded for the shift.

I’m surprised at how much focus this cloud conference is dedicated to private cloud.  At other cloud conferences I’ve attended, the focus has been on learning how to use the cloud (specifically the public cloud).  This is the first cloud show I’ve attended that has so much emphasis, dialog and vendor feeding around private.  This was a suits & slacks show with few jeans, t-shirts, and pony tails.  Perhaps private cloud is where the $$$ is being spent now?

It definitely feels like using cloud has become assumed, but the best practices and tools are just emerging.

The twitter #ccevent stream is interesting but temporal.  I’m posting my raw (spelling optional) notes (below the more tag) because there is a lot of great content from the show to support and extend the twitter stream.  I’ll try to italicize some of the better lines.

Continue reading

Cloud Gravity – launching apps into the clouds

Dave McCrory‘s Cloud Gravity series (Data Gravity & Escape Velocity) brings up some really interesting concepts and has lead to some spirited airplane discussions while Dell shuttled us to an end of year strategy meeting.  Note: whoever was on American 34 seats 22A/C – we apologize if we were too geek-rowdy for you.

Dave’s Cloud Gravity is the latest unfolding of how clouds are evolving as application architectures before more platform capable.  I’ve explored these concepts in previous posts (Storage Banana, PaaS vs IaaS, CAP Chasm) to show how cloud applications are using services differently than traditional applications.

Dave’s Escape Velocity post got me thinking about how cleanly Data Gravity fits with cloud architecture change and CAP theorem.

My first sketch shows how traditional applications are tightly coupled with the data they manipulate.  For example, most apps work directly on files or a database direct connection.  These apps rely on very consistent and available data access.  They are effectively in direct contact with their data much like a building resting on it’s foundation.  That works great until your building is too small (or too large).  In that case, you’re looking a substantial time delay before you can expand your capcity.

Cloud applications have broken into orbit around their data.  They still have close proximity to the data but they do their work via more generic network connections.  These connections add some latency, but allow much more flexible and dynamic applications.  Working within the orbit analogy, it’s much much easier realign assets in orbit (cloud servers) to help do work than to move buildings around on the surface.

In the cloud application orbital analogy, components of applications may be located in close proximity if they need fast access to the data.  Other components may be located farther away depending on resource availability, price or security.  The larger (or more valuable) the data, the more likely it will pull applications into tight orbits.

My second sketch extends to analogy to show that our cloud universe is not simply point apps and data sources.  There truly a universe of data on the internet with hugh sources (Facebook, Twitter, New York Stock Exchange, my blog, etc) creating gravitational pull that brings other data into orbit around them.  Once again, applications can work effectively on data at stellar distances but benefit from proximity (“location does not matter, but proximity does”).

Looking at data gravity in this light leads me to expect a data race where clouds (PaaS and SaaS) seek to capture as much data as possible.

Exploding the Cloud Storage Banana

Storage Banana shows how cloud persistence is functionally diverse and optimized

Internally, my group (specifically Dave McCrory & Greg Althaus) has been kicking around some new ways of expressing clouds in an effort to help reconcile Dell’s traditional and cloud focused businesses.  We’ve found it challenging to translate CAP theorem and

externalized application state into more enterprise-ready concepts.

Our latest effort led to a pleasantly succinct explanation of why cloud storage is different than enterprise storage.  Ultimately, it’s a matter of control and optimization.  Cloud persistence (Cache, Queue, Tables, Objects) is functionally diverse in order to optimize for price and performance while enterprise storage (SAN, NAS, SQL) is control and centralization driven.  Unfortunately for enterprises, the data genie is out of the Pandora’s box with respect to architectures that drive much lower cost and higher performance.

The background on this irresistible transformation begins with seeing storage as a spectrum of services as per the table below.

Enterprise:

Consistent

 

Block (SAN) iSCSI, Infiband:

Amazon EBS, EqualLogic, EMC Symmeterix

File (NAS) NFS, CIFS:

NetApp, PowerVault, EMC Clariion

Database (ACID) MS SQL, Oracle 11g, MySQL, Postgres
Cloud:

Distributed

Partitioned

Object DX/Caringo, OpenStack Swift, EMC Atmos
Map/Reduce Hadoop DFS
Key Value Cassandra, CouchDB, Riak, Reddis, Mongo
Queue (Bus) RabbitMQ, ActiveMQ, ZeroMQ, OpenMQ, Celery
Cloud:

Transitory

 

Messaging AMPQ, MSMQ (.NET)
Shared RAM MemCache, Tokyo Cabinet

From this table, I approximated the relative price and performance for each component in the storage spectrum.

The result was the “cloud storage banana” graph.  In this graph, enterprise storage is clustered in the “compromise” quadrant where there’s a high price for relatively low performance.  The cloud persistence refuses to be clustered at all.  To save cost and enable distributed data, applications will use cheap but slow object storage.  This drives the need for high speed RAM based cache and distributed buses. These approaches are required when developers build fault tolerance at the application level.

Enterprises have enjoyed the false luxury of perceived hardware reliability.  Where these assumptions are removed, applications are freed to scale more gracefully and consider resource cost in their consumption plans.

When we compare the enterprise Pandora’s box storage to the cloud persistence banana, a more general pattern emerges.  The cloud persistence pattern represents a fragmentation of monolithic, IT controlled services into a more functional driven architecture.  In this case, we see desire for speed, distribution and cost forcing change to application design patterns.

We also see similar dispersion patterns driving changes in compute and networking conventions.

So next time your corporate IT refuses to deploy Rabbit MQ or MemCacheD, just remember my mother’s sage advice for cloud architects: “time flies like an arrow, fruit flies like an banana.”

CAP Chasm: why clouds say “no SANank you” to SANs

My personal bias against SANs in cloud architectures is well documented; however, I am in the minority at my employer (Dell) and few enterprise IT shops share my view.  In his recent post about CAP theorem, Dave McCrory has persuaded me to look beyond their failure to bask in my flawless reasoning.  Apparently, this crazy CAP thing explains why some people loves SANs (enterprise) and others don’t (clouds).

The deal with CAP is that you can only have two of Consistency, Availability, or Partitioning Tolerance.  Since everyone wants Availablity, the choice is really between Consitency or Partitioning.  Seeking Availability you’ve got two approaches:

  1. Legacy applications tried to eliminate faults to achieve Consistency with physically redundant scale up designs. 
  2. Cloud applications assume faults to achieve Partitioning Tolerance with logically redundant scale out design.

According to CAP, Legacy and cloud approaches are so fundamentally different that they create a “CAP Chasm” in which the very infrastructure fabric needed to deploy these applications is different.

As a cloud geek, I consider the inherent cost and scale limitations of a CA approach much too limited.   My first hand experience is that our customers and partners share my view: they have embraced AP patterns.  These patterns make more efficient use of resources, dictate simpler infrastructure layout, scale like hormone-crazed rabbits at a carrot farm, and can be deployed on less expensive commodity hardware.

As a CAP theorem enlightened IT professional, I can finally accept that there are other intellectually valid infrastructure models. 

See Mom?  I can play nicely with others after all.