Rocking Docker – OpenCrowbar builds solid foundation & life-cycle [VIDEOS]

Docker has been gathering a substantial about of interest as an additional way to solve application portability and dependency hell.  We’ve been enthusiastic participants in this fledgling community (Docker in OpenStack) and my work in DefCore’s Tempest in a Container (TCUP).

flying?  not flying!In OpenCrowbar, we’ve embedded Docker much deeper to solve a few difficult & critical problems: speeding up developing multi-node deployments and building the environment for the containers.  Check out my OpenCrowbar does Docker video or the community demo!

Bootstrapping Docker into a DevOps management framework turns out to be non-trivial because integrating new nodes into a functioning operating environment is very different on Docker than using physical servers or a VMs.  Containers don’t PXE boot and have more limited configuration options.

How did we do this?  Unlike other bare metal provisioning frameworks, we made sure that Crowbar did not require DHCP+PXE as the only node discovery process.  While we default to and fully support PXE with our sledgehammer discovery image, we also allow operators to pre-populate the Crowbar database using our API and make configuration adjustments before the node is discovered/created.

We even went a step farther and enabled the Crowbar dependency graph to take alternate routes (we call it the “provides” role).  This enhancement is essential for dealing with “alike but different” infrastructure like Docker.

The result is that you can request Docker nodes in OpenCrowbar (using the API only for now) and it will automatically create the containers and attach them into Crowbar management.  It’s important to stress that we are not adding existing containers to Crowbar by adding an agent; instead, Crowbar manages the container’s life-cycle and then then work inside the container.

Getting around the PXE cycle using containers as part of Crowbar substantially improves Ops development cycle time because we don’t have to wait for boot > discovery > reboot > install to create a clean environment.  Bringing fresh Docker containers into a dev system takes seconds instead,

The next step is equally powerful: Crowbar should be able to configure the Docker host environment on host nodes (not just the Admin node as we are now demonstrating).  Setting up the host can be very complex: you need to have the correct RAID, BIOS, Operating System and multi-NIC networking configuration.  All of these factors must be done with a system perspective that match your Ops environment.  Luckily, this is exactly Crowbar’s sweet spot!

Until we’ve got that pulled together, OpenCrowbar’s ability to use upstream cookbooks and this latest Dev/Test focused step provides remarkable out of the gate advantages for everyone build multi-node DevOps tools.


PS: It’s worth noting that we’ve already been using Docker to run & develop the Crowbar Admin server.  This extra steps makes Crowbar even more Dockeriffic.

OpenCrowbar Multi-OS deploy from Docker Admin

Last week I talked about OpenCrowbar reaching a critical milestone and this week I’ve posted two videos demonstrating how the new capabilities work.

annealingThe first video highlights the substantial improvements we’ve made testing and developing OpenCrowbar.  By using Docker containers, OpenCrowbar is fast and reliable to setup and test.  We’ve dramatically streamlined the development environment and consolidated the whole code base into logical groups with logical names.

The second video shows off the OpenCrowbar doing it’s deployment work (including setting up Docker nodes!).  This demonstration goes through the new node discovery and install process.  The new annealing process is very transparent and gives clear and immediate feedback about the entire discovery and provisioning process.  I also show how to configure networks (IPv4 and IPv6) and choose which operating system gets installed.

Note: In the videos, I demonstrate using our Docker install process.  Part of moving from Crowbar v2 (in the original Crowbar repo) to OpenCrowbar was so that we could also organize the code for an RPM install.  In either install process, OpenCrowbar no longer uses bloated ISOs with all components pre-cached so you must be connected to the Internet to complete the installation.

5 differences between Cloud ops and Bare Metal ops

OpenStack SummitCloud APIs are about abstracting operations to simplify deployment.  We want users of our cloud infrastructure to operate with blissful unawareness of the underlying networking topology, storage configuration and physical infrastructure.  For their perspective, the cloud is perfectly elastic, totally configurable and wonderfully consistent. Cloud Admins on the other hand need visibility and controls that expose the complexity while keeping it rational. These are profoundly different concerns.

Maintaining the illusion of clean and simple Cloud ops infrastructure is very valuable; however, it’s just an illusion.  The black metal box behind those APIs is complex, messy, unpredictable and dynamic.

1. Metal Ops has to deal with network topology and details like if an operating system enumerates the NICs correctly, bonding the correct NIC pair and which 10g network to use for the storage traffic. In networking, the topology determines how much traffic you can subscribe to a link and how to provide resliency. Networking does not exist in isolation: you must consider the boundary firewalls and routers to either block or allow traffic because without connectivity the cloud is useless. Details like the access and registration in DNS, NTP and DHCP provide foundations our stable operations. These details are (and should be) hidden from the cloud user.

2. Metal Ops has to deal with firmware issues at every level.  It matters to the server if it boots into BIOS or UEFI mode.  We have to manage the fact that RAID partitions need to be optimized based on the workload and type of drive.  We have to consider if there are specialized drivers and caches to manage and security features (like Intel TXT) to activate.  These details are (and should be) hidden from the cloud user.

3. Metal Ops have to consider the security of their infrastructure.  We have to manage where the admin control network crosses security domains.  It matters which layer 2 networks have access to which parts of the infrastructure.  Separation of responsibility for network vs. storage vs. compute is a reality that it not going away. These details are (and should be) hidden from the cloud user.

4. Metal Ops have to manage operating system compatibility.  I know personally that vendors test and certify their operating systems on an enormous matrix of silicon.  I also have learned that the matrix of possible combinations is far larger and fundamentally impossible at the edges.  There’s a reason that operators seek homogeneous environments and LTS releases. These details are (and should be) hidden from the cloud user.

5. Metal Ops have to deal with hardware failures. By simple statistics, the larger the system the more things will break and metal ops have to cope with this reality. We have to expose failure zones and boundaries to make intelligent responses (like moving data from a failed drive to a non-adjacent one) that require intimate knowledge of system topography that are intentionally hidden in cloud ops. Further, we have to have monitoring and management tooling that knows how to identify which NIC in a bond failed or flash the lights on the failed drive of an array. These details are (and should be) hidden from the cloud user.

Cloud’s power is being able to abstract away this complexity.  Dealing with it gracefully behind the scenes requires transparency and details that make Metal Ops job fundamentally different.

While both can be highly automated and pass my “Cloud is Infrastructure with an API” test, their objectives are different.

My insights from OpenStack “what is core” Spider > we need pluggable architectures


IdeasSo what did we learn from the spider map exercise?  Above all else, the spider confirmed that to me that OpenStack is a world of paradox.  The perfect definition of core may be elusive, but I believe we can find one that is sufficient.

The goal was understanding not philosophical truth.  In a diverse and vibrant community with many objectives, understanding leads to consensus while being “right” can become very lonely.

Since our goal was not to answer the question, what did we want to accomplish?  Spider success was defined as creating a framework, really a list of agreed positions (post #4), that narrows the scope of the “what is core” dialog.

Too vague a framework leads to uncertainty about what’s included, stable and working while too rigid a baseline could drive away innovation and lead to forking.  Being too aggressively open could discourage commercial investment yet too proprietary an approach contradicts our collaboration and community values.

Having a workable framework that accommodates these diverse positions allows us to move forward.

So what did Alan and I learn from the spider to help the discussion?

  • “Plug-ins” are essential to the definition of core because they create safe places for innovation  [note: there has been much refinement of what "plug-in" means here]
  • It is possible to balance between stability and innovation if we have a way to allow implementations to evolve
  • OpenStack has a significant commercial ecosystem that needs to be accommodated in core
  • We need an approach that allows extension and improvement without having to incubate new projects
  • We need to ensure that we use brand and culture to combat forking
  • Interoperability is a worthy goal
  • Everyone thinks testing is good, but it’s still a sidebar
  • There are multiple distinct audiences with conflicting goals: some want stability and durability while others want innovation and flexibility.

Of these insights, the need to discuss how OpenStack promotes a plug-in architecture seemed address the most points of tension.  [update: in the course of discussion, we've defining plug-in more generally to be something like "a designated section of implementation code that can be altered without negatively changing the base function of the project."]

This is not the only item worth discussing, but it was the one that made the most sense to cover first based on the spider map.  Our idea was that having the community find agreement on how we approach plug-ins would lead us closer to a common ground for the “what is core” discussion.

Finding a common thread shrinks the problem space so the Board, TC and Community can advance in discussion.  So far, that assessment has proven accurate enough to move the dialog forward.


OpenStack steps toward Interopability with Temptest, RAs &

Pipes are interoperableI’m a cautious supporter of OpenStack leading with implementation (over API specification); however, it clearly has risks. OpenStack has the benefit of many live sites operating at significant scale. The short term cost is that those sites were not fully interoperable (progress is being made!). Even if they were, we are lack the means to validate that they are.

The interoperability challenge was a major theme of the Havana Summit in Portland last week (panel I moderated) .  Solving it creates significant benefits for the OpenStack community.  These benefits have significant financial opportunities for the OpenStack ecosystem.

This is a journey that we are on together – it’s not a deliverable from a single company or a release that we will complete and move on.

There were several themes that Monty and I presented during Heat for Reference Architectures (slides).  It’s pretty obvious that interop is valuable (I discuss why you should care in this earlier post) and running a cloud means dealing with hardware, software and ops in equal measures.  We also identified lots of important items like Open OperationsUpstreamingReference Architecture/Implementation and Testing.

During the session, I think we did a good job stating how we can use Heat for an RA to make incremental steps.   and I had a session about upgrade (slides).

Even with all this progress, Testing for interoperability was one of the largest gaps.

The challenge is not if we should test, but how to create a set of tests that everyone will accept as adequate.  Approach that goal with standardization or specification objective is likely an impossible challenge.

Joshua McKenty & Monty Taylor found a starting point for interoperability FITS testing: “let’s use the Tempest tests we’ve got.”

We should question the assumption that faithful implementation test specifications (FITS) for interoperability are only useful with a matching specification and significant API coverage.  Any level of coverage provides useful information and, more importantly, visibility accelerates contributions to the test base.

I can speak from experience that this approach has merit.  The Crowbar team at Dell has been including OpenStack Tempest as part of our reference deployment since Essex and it runs as part of our automated test infrastructure against every build.  This process does not catch every issue, but passing Tempest is a very good indication that you’ve got the a workable OpenStack deployment.

From orphans to open source, data matters

TMF ChildrenMy wife’s day job helps Indian orphans through the Miracle Foundation here in Austin. On the surface, our jobs are very different; however, there is lately more and more intersection in both form and substance. It was not always like that, initially the Miracle Foundation primary engagement had been an emotional appeal: “look, these orphans are sad, they need you. Did we mention that they are orphans?”

Joking aside, there are plenty of kind people who want to help children; however, there are a lots of worthy causes with equally strong appeal. The question is how do you pick which one? Donors/Contributors want one that is both emotionally appealing and effective.

While radically different in human impact, both of raising orphans and building open source rely heavily on personal engagement and passion for success. Just like non-profits, there are many open source projects that want you to invest your time in installing and contributing to their most worthy technology.

About 18 months ago, the Miracle Foundation pivoted their strategy from tending individual children towards cultivating whole orphanages (the “NEST program”, video below). They started tracking things like how much milk and fruit each child ate and if they had been vaccinated. They connected observable data like hemoglobin levels of children to their ability to pay attention in school. They were even aware of additional days girls spent in school just because they got monthly hygiene products.

NEST Spider Graph

Used with Permission, The Miracle Foundation

With this new program, the Miracle Foundation can tell you exactly how much benefit each child will receive from each dollar. These are real results derived from collecting real data, and the results are powerful.

The children the Miracle Foundation nurtures are going from subsistence to flourishing. This is not happening because people care more about these children than before. It is happening because someone is keeping the data and making sure that the support they give gets the results they want. This in turn helps donors (become one) feel confident that their emotional response is delivering tangible improvements. Both are essential to TMF’s mission.

Open source projects have a similar gestalt.

People and companies contributing time and resources to a project want to both believe in the technology and see tangible metrics to validate adoption. Open source transparency makes it easier to find active projects and people are engaged contributors, but it can be harder to determine if the project is having broader impact.

For OpenStack, these tangible metrics began to surface in the last few days. Before the summit, Stephano Muffulli, community manager, launched the OpenStack Activity Board to show commit and quality data for the project. Last Monday, Tim Bell & Ryan Lane presented the results of the first user survey which showed how and what users are adopting for OpenStack.

If you like seeing this type of data driven behavior then vote with your keyboard and become part of an active open source project. For non-profits like the Miracle Foundation, voting is even easier – you just need a credit card to join in their Mothers’ Day campaign. Your mom may not understand anything you add to open source, but she can understand when you help orphans.

Continue reading

As OpenStack enters rapids with Grizzly, watch for strong currents, hidden rocks & eddies.

White Water

Play Boating From Wikipedia

I enjoy kayaking white water rapids – they are exhilarating and demanding. The water accelerates around obstacles and shows its power. You cannot simply ride the current; you must navigate your way around obstacles, stay clear of eddies that pull you back and watch for hidden rocks. The secret to success is to read the current and make small adjustments as you are carried along – resistance is futile.

After the summit, I see the OpenStack with the Grizzly release like water entering the rapids. The quality and capability of the code base continues to improve while the number of players with offerings in the ecosystem is also increasing rapidly. Until now, there was plenty of room to play together; however, as scope, activity and velocity increase there will more inter-vendor interactions.

As a member of the OpenStack board, I have tremendous enthusiasm for what the OpenStack community has accomplished. There have been some really positive accounts of the summit including CSC “OpenStack gains maturity…“, Silicon Angle “OpenStack has reached a Flash Point”, Randy Bias’ “OpenStack is THE Stack”, Wayne Walls “Hallway Track” and much more on the Planet OpenStack aggregator.

In fact, we’ve created such a love fest for OpenStack that I fear we are drinking our own kool aide.

I have a responsibility to be transparent and honest about challenges facing the us because it’s the Foundation’s job to guide us forward. My positions result from many conversations that I had throughout the week of the Summit. They are also the result of my first hand experiences along with my 14 years of cloud experience.

Over the next posts, I’ll explore a number of these topics with the goal of helping navigate a path through the potential turbulence. The simple fact is the OpenStack is growing quickly and that creates challenges:

  1. A growing number of new developers are joining. Since our work surface area is expanding, it’s both easier than ever to participate and harder to navigate where to begin. We need to get ahead of the design cycles.
  2. A growing number of non-devs are participating and bringing important contributions and experience. We must include them in the OpenStack meritocracy because they speak for the quality and usability of the project.
  3. A growing number of companies (many “name brands”) who are still trying to figure out how to participate and collaborate in open source projects. Lack of experience increases the risk of divergence (forking) and market confusion.
  4. A growing number of products based on OpenStack also increases forking risk as OpenStack contributors feel compelled to differentiate.
  5. A growing number of core components (compute+block+network+…) that are required to have base functionality.
  6. A growing number of incubated projects that continue to stress innovation and pace of change that challenges the very question of “what is OpenStack?”
  7. A growing number of deployed sites offering OpenStack clouds but the community lacks a way to verify (or really discuss) compatibility between the sites.

This list is a cause for celebration not a cause for alarm – every item is a challenges based on our success. The community and Foundation are already working to address the risks.

While some of us enjoy the chaos and excitement of rapids, other can take comfort from the fact that they are always followed by calm waters. Don’t worry – we’ll navigate through this together.

“Stack Shop” cover of Macklemore’s Thrift Shop

Sometimes a meme glitters too strongly for me to resist getting pulled in… that happened to great effect that just before the OpenStack Havana summit. When my code-addled mind kept swapping “poppin’ tags” for “OpenStack” on the radio edit, I stopped fighting and rewrote the Thrift Shop lyrics for OpenStack (see below the split).

With a lot of help from summit attendees (many of them are OpenStack celebrities, CEOs, VPs and members OpenStack Foundation board), I was able to create a freaking awesome cover of Macklemore’s second hand confection (NSFW).

Frankly, I don’t know everyone in the video (what, what?)!

But here’s a list of those that I do know.  I’m happy to update so the victims actors get credit.  Singers (in order):

Rob Hirschfeld (me) & Monty Taylor, Peter Poulliot, Judd Maltin, Forrest Norrod, Josh Kleinpeter, Tristan Goode, Dan Bode, Jay Pipes, Prabhakar Gopalan, Peter Chadwick, Simon Andersen, Vish Ishaya, Wayne Walls, Alex Freedland, Niki Acosta, Ops Track Monday Session 1, Ben Cherian, Eric Windisch, Brandon Draeger, Joseph B George,  Mark Collier, Joseph Heck, Tim Bell,  Chris Kemp, Kyle McDonald & Joshua McKenty,

Continue reading

Strategy is…

Strategy is confidence and execution.

If you’ve got a process that supports your strategy, you can execute. 

You can say no to the right things and focus on the important element  Strategy is not just knowing where you are going, it’s knowing which things to say no to along the way. 

We suffer from an over-abundance of opportunity.  Success often means knowing what not to do.

double Block Head with OpenStack+Equallogic & Crowbar+Ceph

Block Head

Whew….Yesterday, Dell announced TWO OpenStack block storage capabilities (Equallogic & Ceph) for our OpenStack Essex Solution (I’m on the Dell OpenStack/Crowbar team) and community edition.  The addition of block storage effectively fills the “persistent storage” gap in the solution.  I’m quadrupally excited because we now have:

  1. both open source (Ceph) and enterprise (Equallogic) choices
  2. both Nova drivers’ code is in the open at part of our open source Crowbar work

Frankly, I’ve been having trouble sitting on the news until Dell World because both features have been available in Github before the announcement (EQLX and Ceph-Barclamp).  Such is the emerging intersection of corporate marketing and open source.

As you may expect, we are delivering them through Crowbar; however, we’ve already had customers pickup the EQLX code and apply it without Crowbar.

The Equallogic+Nova Connector


If you are using Crowbar 1.5 (Essex 2) then you already have the code!  Of course, you still need to have the admin information for your SAN – we did not automate the configuration of the storage system, but the Nova Volume integration.

We have it under a split test so you need to do the following to enable the configuration options:

  1. Install OpenStack as normal
  2. Create the Nova proposal
  3. Enter “Raw” Attribute Mode
  4. Change the “volume_type” to “eqlx”
  5. Save
  6. The Equallogic options should be available in the custom attribute editor!  (of course, you can edit in raw mode too)

Want Docs?  Got them!  Check out these > EQLX Driver Install Addendum

Usage note: the integration uses SSH sessions.  It has been performance tested but not been tested at scale.

The Ceph+Nova Connector


The Ceph capability includes a Ceph barclamp!  That means that all the work to setup and configure Ceph is done automatically done by Crowbar.  Even better, their Nova barclamp (Ceph provides it from their site) will automatically find the Ceph proposal and link the components together!

Ceph has provided excellent directions and videos to support this install.