Why we can’t move past installers to talk about operations – the underlay gap

Posted on October 12, 2016 by Rob H

20 minutes. That’s the amount of time most developers are willing to spend installing a tool or platform that could become the foundation for their software. I’ve watched our industry obsess on the “out of box” experience which usually translates into a single CLI command to get started (and then fails to scale up).

Secure, scalable and robust production operations is complex. In fact, most of these platforms are specifically designed to hide that fact from developers.

That means that these platforms intentionally hide the very complexity that they themselves need to run effectively. Adding that complexity, at best, undermines the utility of the platform and, at worst, causes distractions that keep us forever looping on “day 1” installation issues.

I believe that systems designed to manage ops process and underlay are different than the platforms designed to manage developer life-cycle. This is different than the fidelity gap which is about portability. Accepting that allows us to focus on delivering secure, scalable and robust infrastructure for both users.

In a pair of DevOps.com posts, I lay out my arguments about the harm being caused by trying to blend these concepts in much more detail:

5 Key Aspects of High Fidelity DevOps [repost from DevOps.com]

Posted on May 5, 2016 by Rob H

For all our cloud enthusiasm, I feel like ops automation is suffering as we increase choice and complexity. Why is this happening? It’s about loss of fidelity.

Nearly a year ago, I was inspired by a mention of “Fidelity Gaps” during a Cloud Foundry After Dark session. With additional advice from DevOps leader Gene Kim, this narrative about the why and how of DevOps Fidelity emerged.

As much as we talk about how we should have shared goals spanning Dev and Ops, it’s not nearly as easy as it sounds. To fuel a DevOps culture, we have to build robust tooling, also.

That means investing up front in five key areas: abstraction, composability, automation, orchestration, and idempotency.

Together, these concepts allow sharing work at every level of the pipeline. Unfortunately, it’s tempting to optimize work at one level and miss the true system bottlenecks.

Creating production-like fidelity for developers is essential: We need it for scale, security and upgrades. It’s not just about sharing effort; it’s about empathy and collaboration.

But even with growing acceptance of DevOps as a cultural movement, I believe deployment disparities are a big unsolved problem. When developers have vastly different working environments from operators, it creates a “fidelity gap” that makes it difficult for the teams to collaborate.

Before we talk about the costs and solutions, let me first share a story from back when I was a bright-eyed OpenStack enthusiast…

Read the Full Article on DevOps.com including my section about Why OpenStack Devstack harms the project and five specific ways to improve DevOps fidelity.

Hybrid & Container Disruption [Notes from CTP Mike Kavis’ Interview]

Posted on February 23, 2016 by Rob H

Last week, Cloud Technology Partner VP Mike Kavis (aka MadGreek65) and I talked for 30 minutes about current trends in Hybrid Infrastructure and Containers.

Mike Kavis

Three of the top questions that we discussed were:

Why Composability is required for deployment? [5:45]
Is Configuration Management dead? [10:15]
How can containers be more secure than VMs? [23:30]

Here’s the audio matching the time stamps in my notes:

00:44: What is RackN? – scale data center operations automation
01:45: Digital Rebar is… 3^rd generation provisioning to manage data center ops & bring up
02:30: Customers were struggling on Ops more than code or hardware
04:00: Rethinking “open” to include user choice of infrastructure, not just if the code is open source.
05:00: Use platforms where it’s right for users.
05:45: Composability – it’s how do we deal with complexity. Hybrid DevOps
06:40: How do we may Ops more portable
07:00: Five components of Hybrid DevOps
07:27: Rob has “Rick Perry” Moment…
08:30: 80/20 Rule for DevOps where 20% is mixed.
10:15: “Is configuration management dead” > Docker does hurt Configuration Management
11:00: How Service Registry can replace Configuration.
11:40: Reference to John Willis on the importance of sequence.
12:30: Importance of Sequence, Services & Configuration working together
12:50: Digital Rebar intermixes all three
13:30: The race to have orchestration – “it’s always been there”
14:30: Rightscale Report > Enterprises average SIX platforms in use
15:30: Fidelity Gap – Why everyone will hybrid but need to avoid monoliths
16:50: Avoid hybrid trap and keep a level of abstraction
17:41: You have to pay some “abstraction tax” if you want to hybrid BUT you can get some additional benefits: hybrid + ops management.
18:00: Rob gives a shout out to Rightscale
19:20: Rushing to solutions does not create secure and sustained delivery
20:40: If you work in a silo, you loose the ability to collaborate and reuse other works
21:05: Rob is sad about “OpenStack explosion of installers”
21:45: Container benefit from services containers – how they can be MORE SECURE
23:00: Automation required for security
23:30: How containers will be more secure than containers
24:30: Rob bring up “cheese” again…
26:15: If you have more situational awareness, you can be more secure WITHOUT putting more work for developers.
27:00: Containers can help developers worry about as many aspects of Ops
27:45: Wrap up

What do you think? I’d love to hear your opinion on these topics!

Deployment Fidelity – reducing tooling transistions for fun and profit

Posted on January 11, 2016 by Rob H

At the OpenStack Tokyo summit, I gave a short interview on Deployment Fidelity. I’ve come to see the fidelity problem more broadly as the hybrid DevOps challenge that I described in my 2016 Predictions post as the end of mono-clouds. Thanks Ken Hui from OpenStack Superuser TV for resurfacing this link!

From Start to Scale: learn faster with heterogenous deployments

Posted on November 3, 2015 by Rob H

Why mix VMs and Physical? Having a consistent deploy approach can dramatically speed learning cycles that result in better scale ops. I would never deploy production OpenStack on VMs but I strongly recommend rehearsing that deployment on VMs hundreds of times before I touch metal.

Over the last two months, the RackN team redefined “heterogeneous” infrastructure in Digital Rebar from being “just” multi-vendor hardware to include any server resource from containers and Vagrant/Virtualbox to clouds like AWS or Packet. To support this truly diverse range, there were both technical and operational challenges to overcome.

The technical challenge rises from the fundamental control differences between cloud and physical infrastructure. In cloud, infrastructure is much more prescribed – you cannot change most aspects of your system and especially not your network interfaces or IPs. To provision hardware efficiently, we had to establish control over the very things that Cloud systems manage for you.

That management diversity exercised the full extent of the Digital Rebar “functional ops” architecture.

Over the last year, we’ve been unwinding baked-in control assumptions from earlier versions of Digital Rebar. That added flexibility allows Digital Rebar to mix control APIs for infrastructure ranging from using Cobbler to Docker, Vagrant and AWS. Since we could already cope with heterogeneous control APIs using Digital Rebar’s unique functional ops design, we retained the ability to mix and match container, virtual and physical infrastructure.

The operational challenge was more subtle. We were motivated to make this change by first hand observations of the fidelity gap. I am a strong believer that container platforms will directly target metal in the next two years. The challenge is how do we get there from our current virtualization-focused infrastructure.

It’s easy to look at the completed work as an obvious step forward. Looking over my shoulder, I know that it took years of learning and perseverance to create a platform that was flexible enough to handle both extremes of control. Even more important was understanding why it was so important for a physical scale deployment platform to provide ops fidelity for developers too.

With the infrastructure work behind us, we’re seeing Digital Rebar deliver real operational transformation. We want to help IT embrace containers and immutable infrastructure without having to discard the hard won battles installing cloud and traditional infrastructure. Most critically, we hope that you’ll join our open community and share your operational journey with us.

How do platforms die? One step at a time [the Fidelity Gap]

Posted on September 26, 2015 by Rob H

The RackN team is working on the “Start to Scale” position for Digital Rebar that targets the IT industry-wide “fidelity gap” problem. When we started on the Digital Rebar journey back in 2011 with Crowbar, we focused on “last mile” problems in metal and operations. Only in the last few months did we recognize the importance of automating smaller “first mile” desktop and lab environments.

A fidelity gap is created when work done on one platform, a developer laptop, does not translate faithfully to the next platform, a QA lab. Since there are gaps at each stage of deployment, we end up with the ops staircase of despair.

These gaps hide defects until they are expensive to fix and make it hard to share improvements. Even worse, they keep teams from collaborating.

With everyone trying out Container Orchestration platforms like Kubernetes, Docker Swarm, Mesosphere or Cloud Foundry (all of which we deploy, btw), it’s important that we can gracefully scale operational best practices.

For companies implementing containers, it’s not just about turning their apps into microservice-enabled immutable-rock stars: they also need to figure out how to implement the underlying platforms at scale.

My example of fidelity gap harm is OpenStack’s “all in one, single node” DevStack. There is no useful single system OpenStack deployment; however, that is the primary system for developers and automated testing. This design hides production defects and usability issues from developers. These are issues that would be exposed quickly if the community required multi-instance development. Even worse, it keeps developers from dealing with operational consequences of their decisions.

What are we doing about fidelity gaps? We’ve made it possible to run and faithfully provision multi-node systems in Digital Rebar on a relatively light system (16 Gb RAM, 4 cores) using VMs or containers. That system can then be fully automated with Ansible, Chef, Puppet and Salt. Because of our abstractions, if deployment works in Digital Rebar then it can scale up to 100s of physical nodes.

My take away? If you want to get to scale, start with the end in mind.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Tag Archives: Fidelity Gap