Full Metal DevOps: 12 things we needed beyond Cobbler

The RackN team did not plan to replace Cobbler, we just needed something that responded to our need for full-cycle cross-platform DevOps automation.

Provisioning an O/S is never enough!  You need to coordinate a lot of operational activity to deploy a multi-node system, like OpenStack, Kubernetes, Docker Swarm or Ceph.  Since we believe an automated upgrade path is also required, there is a huge gap in provisioning.

So what was needed?  Here’s our (rather long!) list of gaps to fill for full Metal DevOps provisioning:

Gap Commentary
1 Needs to work with Cobbler! Improve? Yes.  Disrupt?  Hell No!  It has to be OK to leave Cobbler in place while we do something better.  I’d be OK to tweak my Cobber to point it to the new stuff.
2 REST API & JSON CLI Beyond the obvious API, we really want a way to write scripts that drive deployment proactively.
3 Modular Components If I’ve got my own DNS, DHCP, NTP, etc then let me use those instead (see #1 above)
4 Control over the discovery image RAM Discovery images are awesome BUT please let me mess with it too!  Inject my keys and let me control when it exits.
5 Configure heterogenous RAID, BIOS & IPMI Servers are a mix of in-band (in the O/S) or out-of-band (BMC) configs.  Don’t make me pick, I can’t.
6 Inject DevOps scripts dynamically based on system inventory or state Depending on the node’s role, I want to run a set of scripts AFTER the O/S is installed.  And, please let me mix Chef, Puppet, Ansible and Bash.  Bash?  Especially Bash.
7 Portable Scripts between Cloud or Metal I’m going to practice on VMs and AWS.  In fact, my devs only work there.  I need high fidelity between my cloud and metal deploys.
8 One-click to reset and start over I don’t care if you want to call this “Metal as a Service.”  Deployments are iterative and we need to go faster.
9 Don’t require PXE or IP control to add nodes to the system Beyond #2, I want to get control of servers that don’t PXE or are already provisioned.
10 System Inventory including Network topology.  Then Push it. No surprise that we need inventory to make provisioning decisions.  Can we make that API available?  Maybe push into CMDB?
11 Control SSH keys per system, group and deployment  Darn, Security is near the bottom again!  Can we please control keys and access from first boot.  It should be table stakes.
0 AND NEVER HAVE TO TOUCH KICKSTART or PRESEED TEMPLATES Well, there are times I have to do it (like soft raid for O/S drives), so at least create a template system because Cobbler’s was pretty good.

We built Digital Rebar to close these gaps and many others (like transparent in operation, working in containers, and failing fast).  We think it’s time to bring cloud operational practices into metal.  With this type of automation, we can make it happen!

What are your biggest challenges with Metal Ops?   Does it match this list?  I’ve love to hear your opinion.

This entry was posted in Uncategorized by Rob H. Bookmark the permalink.

About Rob H

A Baltimore transplant to Austin, Rob thinks about ways of building scale infrastructure for the clouds using Agile processes. He sat on the OpenStack Foundation board for four years. He co-founded RackN enable software that creates hyperscale converged infrastructure.

3 thoughts on “Full Metal DevOps: 12 things we needed beyond Cobbler

Leave a comment