Why we can’t move past installers to talk about operations – the underlay gap

20 minutes.  That’s the amount of time most developers are willing to spend installing a tool or platform that could become the foundation for their software.  I’ve watched our industry obsess on the “out of box” experience which usually translates into a single CLI command to get started (and then fails to scale up).

Secure, scalable and robust production operations is complex.  In fact, most of these platforms are specifically designed to hide that fact from developers.  

That means that these platforms intentionally hide the very complexity that they themselves need to run effectively.  Adding that complexity, at best, undermines the utility of the platform and, at worst, causes distractions that keep us forever looping on “day 1” installation issues.

I believe that systems designed to manage ops process and underlay are different than the platforms designed to manage developer life-cycle.  This is different than the fidelity gap which is about portability. Accepting that allows us to focus on delivering secure, scalable and robust infrastructure for both users.

In a pair of DevOps.com posts, I lay out my arguments about the harm being caused by trying to blend these concepts in much more detail:

  1. It’s Time to Slay the Universal Installer Unicorn
  2. How the Lure of an ‘Easy Button’ Installer Traps Projects

5 Key Aspects of High Fidelity DevOps [repost from DevOps.com]

For all our cloud enthusiasm, I feel like ops automation is suffering as we increase choice and complexity.  Why is this happening?  It’s about loss of fidelity.

Nearly a year ago, I was inspired by a mention of “Fidelity Gaps” during a Cloud Foundry After Dark session.  With additional advice from DevOps leader Gene Kim, this narrative about the why and how of DevOps Fidelity emerged.

As much as we talk about how we should have shared goals spanning Dev and Ops, it’s not nearly as easy as it sounds. To fuel a DevOps culture, we have to build robust tooling, also.

That means investing up front in five key areas: abstraction, composability, automation, orchestration, and idempotency.

Together, these concepts allow sharing work at every level of the pipeline. Unfortunately, it’s tempting to optimize work at one level and miss the true system bottlenecks.

Creating production-like fidelity for developers is essential: We need it for scale, security and upgrades. It’s not just about sharing effort; it’s about empathy and collaboration.

But even with growing acceptance of DevOps as a cultural movement, I believe deployment disparities are a big unsolved problem. When developers have vastly different working environments from operators, it creates a “fidelity gap” that makes it difficult for the teams to collaborate.

Before we talk about the costs and solutions, let me first share a story from back when I was a bright-eyed OpenStack enthusiast…

Read the Full Article on DevOps.com including my section about Why OpenStack Devstack harms the project and five specific ways to improve DevOps fidelity.

Hybrid & Container Disruption [Notes from CTP Mike Kavis’ Interview]

Last week, Cloud Technology Partner VP Mike Kavis (aka MadGreek65) and I talked for 30 minutes about current trends in Hybrid Infrastructure and Containers.

leadership-photos-mike

Mike Kavis

Three of the top questions that we discussed were:

  1. Why Composability is required for deployment?  [5:45]
  2. Is Configuration Management dead? [10:15]
  3. How can containers be more secure than VMs? [23:30]

Here’s the audio matching the time stamps in my notes:

  • 00:44: What is RackN? – scale data center operations automation
  • 01:45: Digital Rebar is… 3rd generation provisioning to manage data center ops & bring up
  • 02:30: Customers were struggling on Ops more than code or hardware
  • 04:00: Rethinking “open” to include user choice of infrastructure, not just if the code is open source.
  • 05:00: Use platforms where it’s right for users.
  • 05:45: Composability – it’s how do we deal with complexity. Hybrid DevOps
  • 06:40: How do we may Ops more portable
  • 07:00: Five components of Hybrid DevOps
  • 07:27: Rob has “Rick Perry” Moment…
  • 08:30: 80/20 Rule for DevOps where 20% is mixed.
  • 10:15: “Is configuration management dead” > Docker does hurt Configuration Management
  • 11:00: How Service Registry can replace Configuration.
  • 11:40: Reference to John Willis on the importance of sequence.
  • 12:30: Importance of Sequence, Services & Configuration working together
  • 12:50: Digital Rebar intermixes all three
  • 13:30: The race to have orchestration – “it’s always been there”
  • 14:30: Rightscale Report > Enterprises average SIX platforms in use
  • 15:30: Fidelity Gap – Why everyone will hybrid but need to avoid monoliths
  • 16:50: Avoid hybrid trap and keep a level of abstraction
  • 17:41: You have to pay some “abstraction tax” if you want to hybrid BUT you can get some additional benefits: hybrid + ops management.
  • 18:00: Rob gives a shout out to Rightscale
  • 19:20: Rushing to solutions does not create secure and sustained delivery
  • 20:40: If you work in a silo, you loose the ability to collaborate and reuse other works
  • 21:05: Rob is sad about “OpenStack explosion of installers”
  • 21:45: Container benefit from services containers – how they can be MORE SECURE
  • 23:00: Automation required for security
  • 23:30: How containers will be more secure than containers
  • 24:30: Rob bring up “cheese” again…
  • 26:15: If you have more situationalleadership-photos-mike awareness, you can be more secure WITHOUT putting more work for developers.
  • 27:00: Containers can help developers worry about as many aspects of Ops
  • 27:45: Wrap up

What do you think?  I’d love to hear your opinion on these topics!

Deployment Fidelity – reducing tooling transistions for fun and profit

At the OpenStack Tokyo summit, I gave a short interview on Deployment Fidelity.  I’ve come to see the fidelity problem more broadly as the hybrid DevOps challenge that I described in my 2016 Predictions post as the end of mono-clouds.  Thanks Ken Hui from OpenStack Superuser TV for resurfacing this link!

How do platforms die? One step at a time [the Fidelity Gap]

The RackN team is working on the “Start to Scale” position for Digital Rebar that targets the IT industry-wide “fidelity gap” problem.  When we started on the Digital Rebar journey back in 2011 with Crowbar, we focused on “last mile” problems in metal and operations.  Only in the last few months did we recognize the importance of automating smaller “first mile” desktop and lab environments.

A fidelityFidelity Gap gap is created when work done on one platform, a developer laptop, does not translate faithfully to the next platform, a QA lab.   Since there are gaps at each stage of deployment, we end up with the ops staircase of despair.

These gaps hide defects until they are expensive to fix and make it hard to share improvements.  Even worse, they keep teams from collaborating.

With everyone trying out Container Orchestration platforms like Kubernetes, Docker Swarm, Mesosphere or Cloud Foundry (all of which we deploy, btw), it’s important that we can gracefully scale operational best practices.

For companies implementing containers, it’s not just about turning their apps into microservice-enabled immutable-rock stars: they also need to figure out how to implement the underlying platforms at scale.

My example of fidelity gap harm is OpenStack’s “all in one, single node” DevStack.  There is no useful single system OpenStack deployment; however, that is the primary system for developers and automated testing.  This design hides production defects and usability issues from developers.  These are issues that would be exposed quickly if the community required multi-instance development.  Even worse, it keeps developers from dealing with operational consequences of their decisions.

What are we doing about fidelity gaps?  We’ve made it possible to run and faithfully provision multi-node systems in Digital Rebar on a relatively light system (16 Gb RAM, 4 cores) using VMs or containers.  That system can then be fully automated with Ansible, Chef, Puppet and Salt.  Because of our abstractions, if deployment works in Digital Rebar then it can scale up to 100s of physical nodes.

My take away?  If you want to get to scale, start with the end in mind.