Podcast with Will Dennis talking Crowbar to Digital Rebar and BarClamps

Joining this week’s L8ist Sh9y Podcast is Will Dennis long-time member of the Crowbar community who continues to engage in helping drive Digital Rebar forward. Will is an excellent resource who takes us through the history from Crowbar to Digital Rebar Provision in a way that highlights how the project has changed and why the community scaled back from V2 to the new V3.1.

Topic                                                            Time (Minutes.Seconds)

Introduction                                                   0.0 – 1:12
What drew you to Crowbar?                       1:12 – 4:29
Secret Language                                          3:05 – 3:39
Ansible Add-On                                            4:29 – 5:08
Crowbar v2                                                     5:08 – 6:03
Heterogeneous Infra                                    6:03 – 8:25
v3 – What had to go?                                   8:25 – 11:12
Building Infra White Paper                         11:12 – 12:07
Cobbler Must Die                                         12:07 – 12:34
UNIX Concept                                              12:34 – 13:00
Cobbler Community                                   13:00 – 16:53
DR – Service in a Workflow                       16:53 – 18:42
HashiCorp & Linux Tool Model                 18:42 – 19:28
Upgrades                                                      19:28 – 20:09
Immutability                                                 20:09 – 26:35
Compromise for Immutable                     26:35 – 32:09
Perfect Fit for Digital Rebar                      32:09 – 33:20
3 Requests for DR Project                         33:20 – END

Podcast Guest: Will Dennis

Will Dennis is currently employed as a Senior Systems Administrator at NEC Laboratories America, and has over 25 years of experience in managing, installing, and troubleshooting enterprise computing systems, networks, and software. A lifelong learner, Will enjoys keeping current with both tech and culture in the field of Information Technology. Will can be found online on Twitter as @willarddennis, and thru LinkedIn at https://www.linkedin.com/in/willdennis/

Breaking Up is Hard To Do – Why I Believe Ops Decomposition (pt 1)

Over the summer, the RackN team took a radical step with our previous Ansible Kubernetes workload install: we broke it into pieces.  Why?  We wanted to eliminate all “magic happens here” steps in the deployment.

320px-dominos_fallingThe result, DR Kompos8, is a faster, leaner, transparent and parallelized installation that allows for pluggable extensions and upgrades (video tour). We also chose the operationally simplest configuration choice: Golang binaries managed by SystemDGolang binaries managed by SystemD.

Why decompose and simplify? Let’s talk about our hard earned ops automation battle scars that let to composability as a core value:

Back in the early OpenStack days, when the project was actually much simpler, we were part of a community writing Chef Cookbooks to install it. These scripts are just a sequence of programmable steps (roles in Ops-speak) that drive the configuration of services on each node in the cluster. There is an ability to find cross-cluster information and lookup local inventory so we were able to inject specific details before the process began. However, once the process started, it was pretty much like starting a dominoes chain. If anything went wrong anywhere in the installation, we had to reset all the dominoes and start over.

Like a dominoes train, it is really fun to watch when it works. Also, like dominoes, it is frustrating to set up and fix. Often we literally were holding our breath during installation hoping that we’d anticipated every variation in the software, hardware and environment. It is no surprise that the first and must critical feature we’d created was a redeploy command.

It turned out the the ability to successfully redeploy was the critical measure for success. We would not consider a deployment complete until we could wipe the systems and rebuild it automatically at least twice.

What made cluster construction so hard? There were a three key things: cross-node dependencies (linking), a lack of service configuration (services) and isolating attribute chains (configuration).

We’ll explore these three reasons in detail for part 2 of this post tomorrow.

Even without the details, it easy to understand that we want to avoid all magic in a deployment.

For scale operations, there should never be a “push and prey” step where we are counting on timing or unknown configuration for it to succeed. Likewise, we need to eliminate “it worked from my desktop” automation too.  Those systems are impossible to maintain, share and scale. Composed cluster operations addresses this problem by making work modular, predictable and transparent.

Talking Functional Ops & Bare Metal DevOps with vBrownBag [video]

Last Wednesday (3/11/15), I had the privilege of talking with the vBrownBag crowd about Functional Ops and bare metal deployment.  In this hour, I talk about how functional operations (FuncOps) works as an extension of ready state.  FuncOps is a critical concept for providing abstractions to scale heterogeneous physical operations.

Timing for this was fantastic since we’d just worked out ESXi install capability for OpenCrowbar (it will exposed for work starting on Drill, the next Crowbar release cycle).

Here’s the brown bag:

If you’d like to see a demo, I’ve got hours of them posted:

Video Progression

Crowbar v2.1 demo: Visual Table of Contents [click for playlist]

My OpenStack Vancouver Session Promotion Dilemma – please, vote outside your block

We need people to promote their OpenStack Sessions, but how much is too much?

Megaphone!Semi-annually, I choose to be part of the growing dog pile of OpenStack summit submissions.  Looking at the list, I see some truly amazing sessions by committed and smart community members.  There are also a fair share of vendor promotions.

The nature of the crowded OpenStack vendor community is that everyone needs to pick up their social media megaphones (and encourage some internal block voting) to promote their talks.   Consequently, please I need to ask you to consider voting for my list:

  1. DefCore 2015 
  2. The DefCore Show: “is it core or not” feud episode
  3. Mayflies: Improve Cloud Utilization by Forcing Rapid Server Death [Research Analysis] (xref)
  4. It’s all about the Base. If you want stability, start with the underlay [Crowbar] 
  5. State of OpenStack Product Management

Why am I so reluctant to promote these excellent talks?  Because I’m concerned about fanning the “PROMOTE MY TALKS” inferno.

For the community to function, we need for users and operators to be heard.  The challenge is that the twin Conference/Summit venue serves a lot of different audiences.

In my experience, that leads to a lot of contributor navel gazing and vendor-on-vendor celebrations.  That in turn drowns out voices from the critical, but non-block-enabled users and operators.

Yes, please vote those sessions of mine that interest you; however, please take time to vote more broadly too.  The system randomized which talks you see to help distribute voting too.

Thanks.

Want CI Consul Love? OK! Run Consul in Travis-CI [example scripts]

If you are designing an application that uses microservice registration AND continuous integration then this post is for you!  If not, get with the program, you are a fossil.

Inside The EngineSunday night, I posted about the Erlang Consul client I wrote for our Behavior Driven Development (BDD) testing infrastructure.  That exposed a need to run a Consul service in the OpenCrowbar Travis-CI build automation that validates all of our pull requests.  Basically, Travis spins up the full OpenCrowbar API and workers (we call it the annealer) which in turn registers services in Consul.

NOTE: This is pseudo instructions.  In the actual code (here too), I created a script to install consul but this is more illustrative of the changes you need to make in your .travis.yml file.

In the first snippet, we download and unzip consul.  It’s in GO so that’s about all we need for an install.  I added a version check for logging validation.

before_script:
  - wget 'https://dl.bintray.com/mitchellh/consul/0.4.1_linux_amd64.zip'
  - unzip "0.4.1_linux_amd64.zip"
  - ./consul --version

In the second step, we setup the consul service and register it to itself in the background.  That allows the other services to access it.

script: 
  - ../consul agent -server -bootstrap-expect 1 -data-dir /tmp/consul &

After that, the BDD infrastructure can register the fake services that we expect (I created an erlang consul:reg_serv(“name”) routine that makes this super easy).  Once the services are registered, OpenCrowbar will check for the services and continue without trying to instantiate them (which it cannot do in Travis).

Here’s the pull request with the changes.

OpenCrowbar 2.1 Released Last Week with new integrations and support

Crowbar 2.1 Release brings commercial support, hardware configs, chef and saltstack

OpenCrowbarLast week, the Crowbar community completed the OpenCrowbar “Broom” release and officially designed it as v2.1.  This release represents 8 months of hardening of the core orchestration engine (including automated testing), the addition of true hardware support (in the optional hardware workload) and preliminary advanced integration with Chef and Saltstack.

Core Features:

  • RAID – Automatically set RAID configuration parameters depending on how the system will be used.
    • Support for LSI controllers
    • Single and Dual RAID configuration
  • BIOS – Automatically set BIOS settings depending on how the system will be used.
    • Configuration setting for Dell PE series systems
  • Out of Band Support–  Configure and manage systems via their OOB interface
    • Support for IPMI and WSMan
  • RPM Installation (it riseth again!) – Install OpenCrowbar via a standard RPM instead of a Docker container

Integrations:

  • SaltStack integration – OpenCrowbar can install SaltStack as a configuration tool to take over after “Ready State”
  • Chef Provisioning (was Chef Metal) – OpenCrowbar driver allows Chef to build clusters on bare metal using the Crowbar API.

Infrastructure:

  • Automated smoke test and code coverage analysis for all pull requests.

And…v2.1 is the first release with commercial support!

RackN (rackn.com) offers consulting and support for the OpenCrowbar v2.1 release.  The company was started by Crowbar founders Greg Althaus, Scott Jensen, Dan Choquette, and myself specifically to productize and extend Crowbar.

Want to try it out?

Ironic + Crowbar: United in Vision, Complementary in Approach

This post is co-authored by Devanda van der Veen, OpenStack Ironic PTL, and Rob Hirschfeld, OpenCrowbar Founder.  We discuss how Ironic and Crowbar work together today and into the future.

Normalizing the APIs for hardware configuration is a noble and long-term goal.  While the end result, a configured server, is very easy to describe; the differences between vendors’ hardware configuration tools are substantial.  These differences make it impossible challenging to create repeatable operations automation (DevOps) on heterogeneous infrastructure.

Illustration to show potential changes in provisioning control flow over time.

Illustration to show potential changes in provisioning control flow over time.

The OpenStack Ironic project is a multi-vendor community solution to this problem at the server level.  By providing a common API for server provisioning, Ironic encourages vendors to write drivers for their individual tooling such as iDRAC for Dell or iLO for HP.

Ironic abstracts configuration and expects to be driven by an orchestration system that makes the decisions of how to configure each server. That type of orchestration is the heart of Crowbar physical ops magic [side node: 5 ways that physical ops is different from cloud]

The OpenCrowbar project created extensible orchestration to solve this problem at the system level.  By decomposing system configuration into isolated functional actions, Crowbar can coordinate disparate configuration actions for servers, switches and between systems.

Today, the Provisioner component of Crowbar performs similar functions as Ironic for operating system installation and image lay down.  Since configuration activity is tightly coupled with other Crowbar configuration, discovery and networking setup, it is difficult to isolate in the current code base.  As Ironic progresses, it should be possible to shift these activities from the Provisioner to Ironic and take advantage of the community-based configuration drivers.

The immediate synergy between Crowbar and Ironic comes from accepting two modes of operation for OpenStack: bootstrapping infrastructure and multi-tenant server allocation.

Crowbar was designed as an operational platform that seeds an OpenStack ready environment.  Once that environment is configured, OpenStack can take over ownership of the resources and allow Ironic to manage and deliver “hypervisor-free” servers for each tenant.  In that way, we can accelerate the adoption of OpenStack for self-service metal.

Physical operations is messy and challenging, but we’re committed to working together to make it suck less.  Operators of the world unite!

API Driven Metal = OpenCrowbar + Chef Provisioning

The OpenCrowbar community created a Chef-Provisioning driver that allows you to quickly build hardware clusters using Chef cookbooks.

2012-08-05_14-13-18_310When we started using Chef in 2011, there was a distinct gap around bootstrapping systems.  The platform did a great job of automation and even connecting services together (via the Search anti-pattern, see below) but lacked a way to build the initial clusters automatically.

The current answer to this problem from Chef is refreshingly simply: a cookbook API extension called Chef Provisioning.  This approach uses the regular Chef DSL in recipes to create request and bind a cluster into Chef.  Basically, the code simply builds an array of nodes using an API that creates the nodes if they are missing from the array in the code.  Specifically, when a node is missing from the array, Chef calls out to create the node in an external system.

For clouds, that means using the API to request a server and then inject credentials for Chef management.  It’s trickier for physical gear because you cannot just make a server in the configuration you need it in.  Physical systems must first be discovered and profiled to ready state: the system must know how many NICs and disk drives are available to correctly configure the hardware prior to laying down the Operating System.

Consequently, Chef Provisioning automation is more about reallocation of existing discovered physical assets to Chef.  That’s exactly the approach the OpenCrowbar team took for our Chef Provisioning driver.

OpenCrowbar interacts with Chef Provisioning by pulling nodes from the System deployment into a Chef Provisioning deployment.  That action then allows the API client to request specific configurations like Operating System or network that need to be setup for Chef to execute.  Once these requests are made, Crowbar will simply run its normal annealing processes to ready state and then injects the Chef credentials.  Chef waits until the work queue is empty and then takes over management of the asset.  When Chef is finished, Crowbar can be instructed to reconfigure the node back to a base state.

Does that sound simple?  It is simple because the Crowbar APIs match the Chef needs very cleanly.

It’s worth noting that this integration is a great test of the OpenCrowbar API design.  Over the last two years, we’ve evolved the API to make it more final result focused.  Late binding is a critical concept for the project and the APIs reflect that objective.  For Chef Provisioning, we allow the integration to focus on simple requests like “give me a node then put this O/S on the node and go.”  Crowbar has the logic needed to figure out how to accomplish those objectives without much additional instruction.

Bonus Side Note: Why Search can become an anti-pattern?  

Search is an incredibly powerful feature in Chef that allows cross-role and cross-node integration; unfortunately, it’s also very difficult to maintain as complexity and contributor counts grow.  The reason is that search creates “forward dependencies” in the scripts that require operators creating data to be aware of downstream, hidden consumers.  High Availability (HA) is a clear example.  If I add a new “cluster database” role to the system then it is very likely to return multiple results for database searches.  That’s excellent until I learn that my scripts have coded search to assume that we only return one result for database lookups.  It’s very hard to find these errors since the searches are decoupled and downstream of the database cookbook.  Ultimately, the community had to advise against embedded search for shared cookbooks

Unicorn captured! Unpacking multi-node OpenStack Juno from ready state.

OpenCrowbar Packstack install demonstrates that abstracting hardware to ready state smooths install process.  It’s a working balance: Crowbar gets the hardware, O/S & networking right while Packstack takes care of OpenStack.

LAYERSThe Crowbar team produced the first open OpenStack installer back in 2011 and it’s been frustrating to watch the community fragment around building a consistent operational model.  This is not an OpenStack specific problem, but I think it’s exaggerated in a crowded ecosystem.

When I step back from that experience, I see an industry wide pattern of struggle to create scale deployments patterns that can be reused.  Trying to make hardware uniform is unicorn hunting, so we need to create software abstractions.  That’s exactly why IaaS is powerful and the critical realization behind the OpenCrowbar approach to physical ready state.

So what has our team created?  It’s not another OpenStack installer – we just made the existing one easier to use.

We build up a ready state infrastructure that makes it fast and repeatable to use Packstack, one of the leading open OpenStack installers.  OpenCrowbar can do the same for the OpenStack Chef cookbooks or Salt Formula.   It can even use Saltstack, Chef and Puppet together (which we do for the Packstack work)!  Plus we can do it on multiple vendors hardware and with different operating systems.   Plus we build the correct networks!

For now, the integration is available as a private beta (inquiries welcome!) because our team is not in the OpenStack support business – we are in the “get scale systems to ready state and integrate” business.  We are very excited to work with people who want to take this type of functionality to the next level and build truly repeatable, robust and upgradable application deployments.

Need a physical ops baseline? Crowbar continues to uniquely fill gap

Robots Everywhere!I’ve been watching to see if other open “bare metal” projects would morph to match the system-level capabilities that we proved in Crowbar v1 and honed in the re-architecture of OpenCrowbar.  The answer appears to be that Crowbar simply takes a broader approach to solving the physical ops repeatably problem.

Crowbar Architect Victor Lowther says “What makes Crowbar a better tool than Cobbler, Razor, or Foreman is that Crowbar has an orchestration engine that can be used to safely and repeatably deploy complex workloads across large numbers of machines. This is different from (and better than, IMO) just being able to hand responsibility off to Chef/Puppet/Salt, because we can manage the entire lifecycle of a machine where Cobbler, Razor and Chef cannot, we can describe how we want workloads configured at a more abstract level than Foreman can, and we do it all using the same API and UI.”

Since we started with a vision of an integrated system to address the “apply-rinse-repeat” cycle; it’s no surprise that Crowbar remains the only open platform that’s managed to crack the complete physical deployment life-cycle.

The Crowbar team realized that it’s not just about automation setting values: physical ops requires orchestration to make sure the values are set in the correct sequence on the appropriate control surface including DNS, DHCP, PXE, Monitoring, et cetera.  Unlike architectures for aaS platforms, the heterogeneous nature of the physical control planes requires a different approach.

We’ve seen that making more and more complex kickstart scripts or golden images is not a sustainable solution.  There is simply too much hardware variation and dependency thrash for operators to collaborate with those tools.  Instead, we’ve found that decomposing the provisioning operations into functional layers with orchestration is much more multi-site repeatable.

Accepting that physical ops (discovered infrastructure) is fundamentally different from cloud ops (created infrastructure) has been critical to architecting platforms that were resilient enough for the heterogeneous infrastructure of data centers.

If we want to start cleaning up physical ops, we need to stop looking at operating system provisioning in isolation and start looking at the full server bring up as just a part of a broader system operation that includes networking, management and operational integration.