Rethinking Storage

Or “UNthinking SANs”

Back in 2001, I was co-founder of a start-up building the first Internet virtualized cloud.  Dual CPU 1U pizza box servers were brand new and we were ready to build out an 8 node, 64 VM cloud!  It was going to be a dream – all that RAM and CPU just begging to be oversubscribed.  It was enough to make Turing weep for joy.

Unfortunately, all those VMs needed lots and lots of storage.

Never fear, EMC was more than happy to quote us a lovely SAN with plenty of redundant FBAs and interconnected fabric switches.  It was all so shiny and cool yet totally unscalable and obscenely expensive.   Yes, unscalable because that nascent 8 node cloud was already at the port limit for the solution!  Yes, expensive because that $50,000 hardware solution would have needed a $1,000,000 storage solution!

The funny part is that even after learning all that, we still wanted to buy the SAN.  It was just that cool.

We never bought that SAN, but we did buy a very workable NAS device.  Then it was my job to change (“pragmatic-ize”) our architecture so that our cloud management did not require expensive shiny objects.

Our ultimate solution used the NAS for master images that were accessed by many nodes.  These requests were mainly reads and optimized.  Writes were made to differencing disks kept on local disk and highly scalable.  In systems, we were able to keep the masters local and save bandwidth.  This same strategy could easily be applied in current “stateless” VM deployments.

Some of the SANless benefits are:

  • Less cost
  • Simplicity of networking and management
  • Nearer to linear scale out
  • Improved I/O throughput
  • Better fault tolerance (storage faults are isolated to individual nodes)

Of course, there are costs:

  • More spindles means more energy use (depending on drive selection and other factors)
  • Lack of centralized data management
  • Potentially wasted space because each system carries excess capacity
  • The need to synchronize data stored in multiple locations

These are real costs; however, I believe the data management problems are unsolved issues for SAN deployments too.  Data proliferation is simply hidden inside of the VMs.

Today, I observe many different SAN focused architectures and cringe.  These same solutions could be much simpler, more scalable and dramatically affordable with minimal (or even no) changes.  If you’re serious about deploying a cloud based on commodity system then you need seriously need to re-evaluate your storage.

Dell goes to the Clouds (hardware & Joyent)

As a Dell employee, I’ve had the privilege of being on the front lines of Dell’s cloud strategy.  Until today, I have not been able to post about the exciting offerings that we’ve been brewing.

Two related components have been occupying my days.  The first is the new cloud optimized hardware and the second is the agreement to offer private clouds using Joyent’s infrastructure. Over the next few weeks, I’ll be exploring some of the implications of these technologies.  I’ve already been exploring them in previous posts.

Cloud optimized hardware grew out of lesson learned in Dell’s custom mega-volume hardware business (that’s another story!).  This hardware is built for applications and data centers that embrace scale out designs.  These customers build applications that are so fault tolerant that they can focus on power, density, and cost optimizations instead of IT hardening.  It’s a different way of looking at the data center because they see the applications and the hardware as a whole system.

To me, that system view is the soul of cloud computing.

The Dell-Joyent relationship is a departure from the expected.  As a founder of Surgient, I’m no stranger to hypervisor private clouds; however, the Joyent takes a fundamentally different approach.  Riding on top of OpenSolaris’ paravirtualization, this cloud solution virtually eliminates the overhead and complexity that seem to be the default for other virtualization solutions.  I especially like Joyent’s application architectures and their persistent vision on how to build scale-out applications from the ground up.

To me, scale should be baked into the heart of cloud applications.

So when I look at Dell’s offings, I think we’ve captured the heart and soul of true cloud computing.

Cloud Application Life Cycle

Or “you learn by doing, and doing, and doing”

One of the most consistent comments I hear about cloud applications is that it fundamentally changes the way applications are written.  I’m not talking about the technologies, but the processes and infrastructure.

Since our underlying assumption of a cloud application is that node failure is expected then our development efforts need to build in that assumption before any code is written.  Consequently, cloud apps should be written directly on cloud infrastructure.

In old school development, I would have all the components for my application on my desktop.  That’s necessary for daily work, but does not give me a warm fuzzy for success in production.

Today’s scale production environments involve replicated data with synchronization lags, shared multi-writer memcache, load balancers, and mixed code versions.  There is no way that I can simulate that on my desktop!   There is no way I can fully anticipate how that will behave all together!

The traditional alternative is to wait.  Wait for QA to try and find bugs through trial and error.  Or (more likely) wait for users to discover the problem post deployment.

My alternative is to constantly deploy the application to a system that matches production.    As a bonus, I then attack the deployment with integration tests and simulators.

If you’re thinking that is too much effort then you are no thinking deeply enough.  This model forces developers to invest in install and deployment automation.  That means that you will be able to test earlier in the cycle.  It means you will be able to fix issues more quickly.  And that you’ll be able to ship more often.  It means that you can involve operations and networking specialists well before production.  You may even see more collaboration between your development, quality, and operations teams.   

Forget about that last one – if those teams actually worked together you might accidently ship product on time.  Gasp!

Ready to Fail

Or How Monte Python taught me to program

Sometimes you learn the most from boring conference calls.  In this case, I was listening to a deployment that was so painfully reference-example super-redundant by-the-book that I could have completed the presenter’s sentences.  Except that he kept complaining about the cost.  It turns out that our typical failure-proofed belt-and-suspenders infrastructure is really, really expensive.

Shouldn’t our applications be Monte Python’s Black Knight yelling “It’s just a flesh wound!  Come back and fight!”   Instead, we’ve grown to tolerate princess applications that throw a tantrum of over skim milk instead of organic soy in their mochaito.

Making an application failure-ready requires a mindset change.  It means taking of our architecture space suit and donning our welding helmet.

Fragility is often born from complexity and complexity is the compounded interest from system design assumptions.

Let’s consider a transactional SQL database.  I love relational databases.  Really, I do.  Just typing SELECT * FROM or LEFT OUTER JOIN gives me XKCD-like goose bumps.  Unfortunately, they are as fragile as Cinderella’s glass slippers.  The whole concept of relational databases requires a complex web of sophisticated data integrity we’ve been able to take for granted.  The web requires intricate locking mechanisms that make data replication tricky.  We could take it for granted because our operations people have built up super-complex triple-redundant infrastructure so that we did not have to consider what happens when the database can’t perform its magic.

What is the real cost for that magic?

I’m learning about CouchDB.  It’s not a relational database, it a distributed JSON document warehouse with smart indexing.  And compared some of the fine grained features of SQL, it’s an arc welder.   The data in CouchDB is loosely structured (JSON!) and relationships are ad hoc.  The system doesn’t care (let alone enforce) that if you’ve maintained referential integrity within the document – it just wants to make sure that the documents are stored, replicated, and indexed.   The goodness here is that CouchDB allows you to distribute your data broadly so that it can be local and redundant.  Even better, weak structure allows you to evolve your schema agilely (look for a future post on this topic).

If you’re cringing about lack referential integrity then get over it – every SQL backed application I ever wrote required RI double-checking anyway!

If you’re cringing about possible dirty reads or race conditions then get over it – every SQL backed application I ever wrote required collision protection too!

I’m not pitching CouchDB (or similar) is a SQL replacement.   I’m holding it up as an example of a pragmatic approach to failure-ready design.   I’m asking you to think about the hidden complexity and consequential fragility that you may blindly inherit.

So cut off my arms and legs – I can still spit on your shoes.

Cloud Reference App, “What The Bus” intro

Today I started working on an application to demonstrate “Cloud Scale” concepts.  I had planned to do this using the PetShop application; unfortunately, the 1995 era PetShop Rails migration would take more repair work then a complete rewrite (HTML tables, no CSS, bad forms, no migrations, poor session architecture).

If I’m considering a fresh start, I’d rather do it with one of my non-PetShop pet projects called “WhatTheBus.”  The concept combines inbound live data feeds and geo mapping with a hyper-scale use target.  The use case is to allow parents to see when their kids’ bus is running late using the phone from the bus stop.

I’m putting the code in git://github.com/ravolt/WhatTheBus.git and tracking my updates on this bog.

My first sprint is to build the shell for this application.  That includes:

  • the shell RAILS application
  • Cucumber for testing
  • MemCacheD
  • Simple test that sets the location of a bus (using a GET, sorry) in the cache and checks that it can retrieve that update.

This sprint does not include a map or any database.  I’ll post more as we build out this app.

Note: http://WhatTheBus.com is a working name for this project because it appeals to m warped sense of humor.  It will likely appear under the sanitary ShowBus moniker: http://showb.us.

Making Cloud Applications RAIN, part 1

An application that runs “in the cloud” is designed fundamentally differently than a traditional enterprise application.  Cloud apps live on fundamentally unreliable, oversubscribed infrastructure; consequently, we must adopt the same mindset that drove the first RAID storage systems to create a Redundant Array of Inexpensive Nodes (RAIN).

The drivers for RAIN are the same as RAID.  It’s more cost effective and much more scalable to put together a set of inexpensive units redundantly than build a single large super-reliable unit.  Each node in the array handles a fraction of the overall workload so application design must partition the workloads into atomic units.

I’ve attempted to generally map RAIN into RAID style levels.  Not a perfect fit, but helpful.

  • RAIN 0 – no redundancy.  If one part fails then the whole application dies.  Think of a web server handing off to a backend system that fronts for the database.  You may succeed in subdividing the workload to improve throughput, but a failure in any component breaks the system.
  • RAIN 1 – active-passive clustering.   If one part fails then a second steps in to take over the workload.  Simple redundancy yet expensive because half your resources are idle.
  • RAIN 2 – active-active clustering.  Both parts of the application perform work so resource utilization is better, but now you’ve got a data synchronization problem.
  • RAIN 5 – multiple nodes can process the load. 
  • RAIN 6 – multiple nodes with specific dedicated stand-by capacity.  Sometimes called “N+1” deployment, this approach works will with failure-ready designs.
  • RAIN 5-1 or 5-2 – multiple front end nodes (“farm”) backed by a redundant database.
  • RAIN 5-5 – multiple front end nodes with a distributed database tier.
  • RAIN 50 – mixed use nodes where data is stored local to the front end nodes.
  • RAIN 551 or 552 – geographical distribution of an application so that nodes are running in multiple data centers with data synchronization
  • RAIN 555 – nirvana (no, I’m not going to suggest a 666).

Unlike RAID, there’s an extra hardware dimension to RAIN.  All our careful redundancy goes out the window if the nodes are packed onto the same server and/or network path.  We’ll save that for another post. 

I hope you’ll agree that Clouds create RAINy apps.