How do platforms die? One step at a time [the Fidelity Gap]

The RackN team is working on the “Start to Scale” position for Digital Rebar that targets the IT industry-wide “fidelity gap” problem.  When we started on the Digital Rebar journey back in 2011 with Crowbar, we focused on “last mile” problems in metal and operations.  Only in the last few months did we recognize the importance of automating smaller “first mile” desktop and lab environments.

A fidelityFidelity Gap gap is created when work done on one platform, a developer laptop, does not translate faithfully to the next platform, a QA lab.   Since there are gaps at each stage of deployment, we end up with the ops staircase of despair.

These gaps hide defects until they are expensive to fix and make it hard to share improvements.  Even worse, they keep teams from collaborating.

With everyone trying out Container Orchestration platforms like Kubernetes, Docker Swarm, Mesosphere or Cloud Foundry (all of which we deploy, btw), it’s important that we can gracefully scale operational best practices.

For companies implementing containers, it’s not just about turning their apps into microservice-enabled immutable-rock stars: they also need to figure out how to implement the underlying platforms at scale.

My example of fidelity gap harm is OpenStack’s “all in one, single node” DevStack.  There is no useful single system OpenStack deployment; however, that is the primary system for developers and automated testing.  This design hides production defects and usability issues from developers.  These are issues that would be exposed quickly if the community required multi-instance development.  Even worse, it keeps developers from dealing with operational consequences of their decisions.

What are we doing about fidelity gaps?  We’ve made it possible to run and faithfully provision multi-node systems in Digital Rebar on a relatively light system (16 Gb RAM, 4 cores) using VMs or containers.  That system can then be fully automated with Ansible, Chef, Puppet and Salt.  Because of our abstractions, if deployment works in Digital Rebar then it can scale up to 100s of physical nodes.

My take away?  If you want to get to scale, start with the end in mind.

Rackspace unveils OpenStack reference architecture & private cloud offering

Yesterday, Rackspace Cloud Builders unveiled both their open reference architecture (RA) and a private cloud offering (on GigaOM) based upon the RA.  The RA (which is well aligned with our Dell OpenStack RA) does a good job laying out the different aspects of an OpenStack deployment.  It also calls for the use of Dell C6100 servers and the open source version of Crowbar.

The Rackspace RA and Crowbar deployment barclamps share the same objective: sharing of best practices for OpenStack operations.

Over the last 12+ months, my team at Dell has had the opportunity to work with many customers on OpenStack deployment designs.  While no two of these are identical, they do share many similarities.  We are pleased to collaborate with Rackspace and others on capturing these practices as operational code (or “opscode” if you want a reference to the Chef cookbooks that are an intrinsic part of Crowbar’s architecture).

In our customer interactions, we hear clearly that Crowbar must remain flexible and ready to adapt to both customer on-site requirements and evolution within the OpenStack code base.  You are also telling us that there is a broader application space for Crowbar and we are listening to that too.

I believe that it will take some time for the community and markets to process today’s Rackspace announcements.  Rackspace is showing strong leadership in both sharing information and commercialization around OpenStack.  Both of these actions will drive responses from the community members.