How do platforms die? One step at a time [the Fidelity Gap]

The RackN team is working on the “Start to Scale” position for Digital Rebar that targets the IT industry-wide “fidelity gap” problem. When we started on the Digital Rebar journey back in 2011 with Crowbar, we focused on “last mile” problems in metal and operations. Only in the last few months did we recognize the importance of automating smaller “first mile” desktop and lab environments.

A fidelity gap is created when work done on one platform, a developer laptop, does not translate faithfully to the next platform, a QA lab. Since there are gaps at each stage of deployment, we end up with the ops staircase of despair.

These gaps hide defects until they are expensive to fix and make it hard to share improvements. Even worse, they keep teams from collaborating.

With everyone trying out Container Orchestration platforms like Kubernetes, Docker Swarm, Mesosphere or Cloud Foundry (all of which we deploy, btw), it’s important that we can gracefully scale operational best practices.

For companies implementing containers, it’s not just about turning their apps into microservice-enabled immutable-rock stars: they also need to figure out how to implement the underlying platforms at scale.

My example of fidelity gap harm is OpenStack’s “all in one, single node” DevStack. There is no useful single system OpenStack deployment; however, that is the primary system for developers and automated testing. This design hides production defects and usability issues from developers. These are issues that would be exposed quickly if the community required multi-instance development. Even worse, it keeps developers from dealing with operational consequences of their decisions.

What are we doing about fidelity gaps? We’ve made it possible to run and faithfully provision multi-node systems in Digital Rebar on a relatively light system (16 Gb RAM, 4 cores) using VMs or containers. That system can then be fully automated with Ansible, Chef, Puppet and Salt. Because of our abstractions, if deployment works in Digital Rebar then it can scale up to 100s of physical nodes.

My take away? If you want to get to scale, start with the end in mind.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

How do platforms die? One step at a time [the Fidelity Gap]

7 thoughts on “How do platforms die? One step at a time [the Fidelity Gap]”

Leave a comment Cancel reply

Share this:

7 thoughts on “How do platforms die? One step at a time [the Fidelity Gap]”

Leave a comment Cancel reply