Rethinking Storage

Or “UNthinking SANs”

Back in 2001, I was co-founder of a start-up building the first Internet virtualized cloud.  Dual CPU 1U pizza box servers were brand new and we were ready to build out an 8 node, 64 VM cloud!  It was going to be a dream – all that RAM and CPU just begging to be oversubscribed.  It was enough to make Turing weep for joy.

Unfortunately, all those VMs needed lots and lots of storage.

Never fear, EMC was more than happy to quote us a lovely SAN with plenty of redundant FBAs and interconnected fabric switches.  It was all so shiny and cool yet totally unscalable and obscenely expensive.   Yes, unscalable because that nascent 8 node cloud was already at the port limit for the solution!  Yes, expensive because that $50,000 hardware solution would have needed a $1,000,000 storage solution!

The funny part is that even after learning all that, we still wanted to buy the SAN.  It was just that cool.

We never bought that SAN, but we did buy a very workable NAS device.  Then it was my job to change (“pragmatic-ize”) our architecture so that our cloud management did not require expensive shiny objects.

Our ultimate solution used the NAS for master images that were accessed by many nodes.  These requests were mainly reads and optimized.  Writes were made to differencing disks kept on local disk and highly scalable.  In systems, we were able to keep the masters local and save bandwidth.  This same strategy could easily be applied in current “stateless” VM deployments.

Some of the SANless benefits are:

  • Less cost
  • Simplicity of networking and management
  • Nearer to linear scale out
  • Improved I/O throughput
  • Better fault tolerance (storage faults are isolated to individual nodes)

Of course, there are costs:

  • More spindles means more energy use (depending on drive selection and other factors)
  • Lack of centralized data management
  • Potentially wasted space because each system carries excess capacity
  • The need to synchronize data stored in multiple locations

These are real costs; however, I believe the data management problems are unsolved issues for SAN deployments too.  Data proliferation is simply hidden inside of the VMs.

Today, I observe many different SAN focused architectures and cringe.  These same solutions could be much simpler, more scalable and dramatically affordable with minimal (or even no) changes.  If you’re serious about deploying a cloud based on commodity system then you need seriously need to re-evaluate your storage.

1 thought on “Rethinking Storage

  1. Pingback: CAP Chasm: why clouds say “no SANank you” to SANs « Rob Hirschfeld's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s