Crowbar lays it all out: RAID & BIOS configs officially open sourced

MediaToday, Dell (my employer) announced a plethora of updates to our open source derived solutions (OpenStack and Hadoop). These solutions include the latest bits (Grizzly and Cloudera) for each project. And there’s another important notice for people tracking the Crowbar project: we’ve opened the remainder of its provisioning capability.

Yes, you can now build the open version of Crowbar and it has the code to configure a bare metal server.

Let me be very specific about this… my team at Dell tests Crowbar on a limited set of hardware configurations. Specifically, Dell server versions R720 + R720XD (using WSMAN and iIDRAC) and C6220 + C8000 (using open tools). Even on those servers, we have a limited RAID and NIC matrix; consequently, we are not positioned to duplicate other field configurations in our lab. So, while we’re excited to work with the community, caveat emptor open source.

Another thing about RAID and BIOS is that it’s REALLY HARD to get right. I know this because our team spends a lot of time testing and tweaking these, now open, parts of Crowbar. I’ve learned that doing hard things creates value; however, it’s also means that contributors to these barclamps need to be prepared to get some silicon under their fingernails.

I’m proud that we’ve reached this critical milestone and I hope that it encourages you to play along.

PS: It’s worth noting is that community activity on Crowbar has really increased. I’m excited to see all the excitement.

Making Cloud Applications RAIN, part 1

An application that runs “in the cloud” is designed fundamentally differently than a traditional enterprise application.  Cloud apps live on fundamentally unreliable, oversubscribed infrastructure; consequently, we must adopt the same mindset that drove the first RAID storage systems to create a Redundant Array of Inexpensive Nodes (RAIN).

The drivers for RAIN are the same as RAID.  It’s more cost effective and much more scalable to put together a set of inexpensive units redundantly than build a single large super-reliable unit.  Each node in the array handles a fraction of the overall workload so application design must partition the workloads into atomic units.

I’ve attempted to generally map RAIN into RAID style levels.  Not a perfect fit, but helpful.

  • RAIN 0 – no redundancy.  If one part fails then the whole application dies.  Think of a web server handing off to a backend system that fronts for the database.  You may succeed in subdividing the workload to improve throughput, but a failure in any component breaks the system.
  • RAIN 1 – active-passive clustering.   If one part fails then a second steps in to take over the workload.  Simple redundancy yet expensive because half your resources are idle.
  • RAIN 2 – active-active clustering.  Both parts of the application perform work so resource utilization is better, but now you’ve got a data synchronization problem.
  • RAIN 5 – multiple nodes can process the load. 
  • RAIN 6 – multiple nodes with specific dedicated stand-by capacity.  Sometimes called “N+1” deployment, this approach works will with failure-ready designs.
  • RAIN 5-1 or 5-2 – multiple front end nodes (“farm”) backed by a redundant database.
  • RAIN 5-5 – multiple front end nodes with a distributed database tier.
  • RAIN 50 – mixed use nodes where data is stored local to the front end nodes.
  • RAIN 551 or 552 – geographical distribution of an application so that nodes are running in multiple data centers with data synchronization
  • RAIN 555 – nirvana (no, I’m not going to suggest a 666).

Unlike RAID, there’s an extra hardware dimension to RAIN.  All our careful redundancy goes out the window if the nodes are packed onto the same server and/or network path.  We’ll save that for another post. 

I hope you’ll agree that Clouds create RAINy apps.