Razor | Rob Hirschfeld

I was reading a ComputerWorld article about how Google and Amazon achieve scale. The theme: you must do better than linear cost scale and the only way to achieve that is to automate and commoditize hardware. I find interesting parallels in the Crowbar physical devops effort.

As the OpenCrowbar team continues to explore the concepts around “ready state,” I discover more and more small ops nuisances that need to be included in the build up before installing software. These small items quickly add up at scale breaking the rule above.

I’ve already posted about the performance benefit of building a Squid Proxy fabric as part of the underlying ops environment. As we work on Chef Metal, SaltStack and Packstack integrations (private beta), we’ve rediscovered the importance of management/population of SSH public keys.

In cloud infrastructure, key injection is taken for granted; however, it’s not an automatic behavior in the physical ops. Since OpenCrowbar handles keys by default but other tools (like Cobbler or Razor) expect that you will use kickstart to inject your SSH keys when you install the Operating System..

Including keys in kickstart (which I’m using generically instead of preseed, auto-yast, jumpstart, etc) hand generated scripts is a potentially dangerous security practice since it makes it difficult to propagate and manage your keys. It also means that every time a new operating system update is released that you may have to update and retest your kickstarts. OpenCrowbar has the same challenge but our approach allows everyone can share in the work because our bootstrapping files are scripted and generic.

OpenCrowbar takes care of these ready state configurations in our integrations with these DevOps platforms. Our experience has been that little items like SSH keys and proxy configurations can make a disproportionate advantage in running scale ops or during iterative development.

I’ve been watching to see if other open “bare metal” projects would morph to match the system-level capabilities that we proved in Crowbar v1 and honed in the re-architecture of OpenCrowbar. The answer appears to be that Crowbar simply takes a broader approach to solving the physical ops repeatably problem.

Crowbar Architect Victor Lowther says “What makes Crowbar a better tool than Cobbler, Razor, or Foreman is that Crowbar has an orchestration engine that can be used to safely and repeatably deploy complex workloads across large numbers of machines. This is different from (and better than, IMO) just being able to hand responsibility off to Chef/Puppet/Salt, because we can manage the entire lifecycle of a machine where Cobbler, Razor and Chef cannot, we can describe how we want workloads configured at a more abstract level than Foreman can, and we do it all using the same API and UI.”

Since we started with a vision of an integrated system to address the “apply-rinse-repeat” cycle; it’s no surprise that Crowbar remains the only open platform that’s managed to crack the complete physical deployment life-cycle.

The Crowbar team realized that it’s not just about automation setting values: physical ops requires orchestration to make sure the values are set in the correct sequence on the appropriate control surface including DNS, DHCP, PXE, Monitoring, et cetera. Unlike architectures for aaS platforms, the heterogeneous nature of the physical control planes requires a different approach.

We’ve seen that making more and more complex kickstart scripts or golden images is not a sustainable solution. There is simply too much hardware variation and dependency thrash for operators to collaborate with those tools. Instead, we’ve found that decomposing the provisioning operations into functional layers with orchestration is much more multi-site repeatable.

Accepting that physical ops (discovered infrastructure) is fundamentally different from cloud ops (created infrastructure) has been critical to architecting platforms that were resilient enough for the heterogeneous infrastructure of data centers.

If we want to start cleaning up physical ops, we need to stop looking at operating system provisioning in isolation and start looking at the full server bring up as just a part of a broader system operation that includes networking, management and operational integration.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Tag Archives: Razor

OpenCrowbar bootstrap positions SSH Keys for hand-offs

Need a physical ops baseline? Crowbar continues to uniquely fill gap