DevOps Concept: “Ready State” Infrastructure as hand-off milestone

Working for Dell, it’s no surprise that I have a lot of discussions around building up and maintaining the physical infrastructure to run a data centers at scale.  Generally the context is around OpenCrowbar, Hadoop or OpenStack Ironic/TripleO/Heat but the concerns are really universal in my cloud operations experience.

Three Teams

Typically, deployments have three distinct phases: 1) mechanically plug together the systems, 2) get the systems ready to the OS and network level andthen 3) install the application.  Often these phases are so distinct that they are handled by completely different teams!

That’s a problem because errors or unexpected changes from one phase are very expensive to address once you change teams.  The solution has been to become more and more prescriptive about what the system looks like between the second (“ready”) and third (“installed”) phase.  I’ve taken to calling this hand-off a achieving a ready state infrastructure.

I define a “ready state” infrastructure as having been configured so that the application lay down steps are simple and predictable.

In my experience, most application deployment guides start with a ready state assumption.  They read like “Step 0: rack, configure, provision and tweak the nodes and network to have this specific starting configuration.”   If you are really lucky then “specific configuration” is actually a documented and validated reference architecture.

The magic of cloud IaaS is that it always creates ready state infrastructure.  If I request 10 servers with 2 NICs running Ubuntu 14.04 then that’s exactly what I get.  The fact that cloud always provisions a ready state infrastructure has become an essential operating assumption for cloud orchestration and configuration management.

Unfortunately, hardware provisioning is messy.  It takes significant effort to configure a physical system into a ready state.  This is caused by a number of factors

  1. You can’t alter physical infrastructure with programming (an API) – for example, if the server enumerates the NICs differently than you expected, you have to adapt to that.
  2. You have to respect the physical topology of the system – for example, production deployments used teamed NICs that have to be use different switches for redundancy.  You can’t make assumptions, you have to setup the team based on the specific configuration.
  3. You have to build up the configuration in sequence – for example, you can’t setup the RAID configuration after the operating system is installed.  If you made a bad choice then you’ll likely have to repeat the whole sequence of the deployment and some bad choices (like using the wrong subnets) result in a total system rebuild.
  4. Hardware fails and is non-uniform – for example, in any order of sufficient size you will have NIC failures due to everything from simple mechanical card seating issues to BIOS interface mismatches.  Troubleshooting these issues can occupy significant time.
  5. Component configurations are interlocked – for example, a change to the switch settings could result in DHCP failures when systems are rebooted (real experience).  You cannot always work node-to-node, you must deal with the infrastructure as an integrated system.

Being consistent at turning discovered state into ready state is a complex and unique problem space.  As I explore this bare metal provisioning space in the community, I am more and more convinced that it has a distinct architecture from applications built for ready state operations.

My hope in this post is test if the concept of “ready state” infrastructure is helpful in describing the transition point between provisioning and installation.  Please let me know what you think!

25 thoughts on “DevOps Concept: “Ready State” Infrastructure as hand-off milestone

  1. Pingback: Dell Open Source Ecosystem Digest #45 - Dell TechCenter - TechCenter - Dell Community

  2. Pingback: Ops Validation using Development Tests [3/4 series on Operating Open Source Infrastructure] | Rob Hirschfeld

  3. Pingback: OpenCrowbar: ready to fly as OpenOps neutral platform – Dell stepping back | Rob Hirschfeld

  4. Pingback: OpenCrowbar Design Principles: Reintroduction [Series 1 of 6] | Rob Hirschfeld

  5. Pingback: OpenCrowbar Design Principles: The Ops Challenge [Series 2 of 6] | Rob Hirschfeld

  6. Pingback: OpenCrowbar Design Principles: Late Binding [Series 3 of 6] | Rob Hirschfeld

  7. Pingback: OpenCrowbar Design Principles: Simulated Annealing [Series 4 of 6] | Rob Hirschfeld

  8. Pingback: OpenCrowbar Design Principles: Emergent services [Series 5 of 6] | Rob Hirschfeld

  9. Pingback: OpenCrowbar Design Principles: Attribute Injection [Series 6 of 6] | Rob Hirschfeld

  10. Pingback: You need a Squid Proxy fabric! Getting Ready State Best Practices | Rob Hirschfeld

  11. Pingback: a Ready State analogy: “roughed in” brings it Home for non-ops-nerds | Rob Hirschfeld

  12. Pingback: Apply, Rinse, Repeat! How do I get that DevOps conditioner out of my hair? | Rob Hirschfeld

  13. Pingback: OpenCrowbar 2.B to deliver multiple hardware vendor support and advanced integrations | Rob Hirschfeld

  14. Pingback: OpenCrowbar bootstrap positions SSH Keys for hand-offs | Rob Hirschfeld

  15. Pingback: Unicorn captured! Unpacking multi-node OpenStack Juno from ready state. | Rob Hirschfeld

  16. Pingback: Starting RackN – Delivering open ops by pulling an OpenCrowbar Bunny out of our hat | Rob Hirschfeld

  17. Pingback: API Driven Metal = OpenCrowbar + Chef Provisioning | Rob Hirschfeld

  18. Pingback: Ops is Ops, except when it ain’t. Breaking down the impedance mismatches between physical and cloud ops. | Rob Hirschfeld

  19. Pingback: To thrive, OpenStack must better balance dev, ops and business needs. | Rob Hirschfeld

  20. Pingback: why is hardware hard? Ready State Physical Ops Meetup on Tuesday 12/2 9am PT | Rob Hirschfeld

  21. Pingback: Delicious 7 Layer DIP (DevOps Infrastructure Provisioning) model with graphic! | Rob Hirschfeld

  22. Pingback: Nextcast #14 Transcription on OpenStack & Crowbar > “we can’t hand out trophies to everyone” | Rob Hirschfeld

  23. Pingback: Online Meetup Today (1/13): Build a rock-solid foundation under your OpenStack cloud | Rob Hirschfeld

  24. Pingback: Talking Functional Ops & Bare Metal DevOps with vBrownBag [video] | Rob Hirschfeld

  25. Pingback: From Metal Foundation to FIVE new workloads in five weeks | Rob Hirschfeld

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s