Posted by Rob H in CloudFoundry, CloudOps, Crowbar, DevOps, Greg Althaus, Hadoop, Linux, Open source, OpenStack.
I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.
What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.
This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.
Posted by Rob H in Architecture, Crowbar, DevOps, Greg Althaus, Matt Ray, Opscode.
Tags: blade, Chef, Crowbar, external entity, hack, managed node, SAN, switch
Note: this post is very technical and relates to detailed Chef design patterns used by Crowbar. I apologize in advance for the post’s opacity. Just unleash your inner DevOps geek and read on. I promise you’ll find some gems.
At the Opscode Community Summit, Dell’s primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that there is an instance of the chef-client agent that can act as a delegate for the external entity. We’ve been reluctant to call it a “proxy” because that term is so overloaded.
My Crowbar vision is to manage an end-to-end cloud application life-cycle. This starts from power and network connections to hardware RAID and BIOS then up to the services that are installed on the node and ultimately reaches up to applications installed in VMs on those nodes.
Our design goal is that you can control a managed node with the same Chef semantics that we already use. For example, adding a Network proposal role to the Switch managed node will force the agent to update its configuration during the next chef-client run. During the run, the managed node will see that the network proposal has several VLANs configured in its attributes. The node will then update the actual switch entity to match the attributes.
Design Considerations
There are five key aspects of our managed node design. They are configuration, discovery, location, relationships, and sequence. Let’s explore each in detail.
A managed node’s configuration is different than a service or actuator pattern. The core concept of a node in chef is that the node owns the configuration. You make changes to the nodes configuration and it’s the nodes job to manage its state to maintain that configuration. In a service pattern, the consumer manages specific requests directly. At the summit (with apologies to Bill Clinton), I described Chef configuration as telling a node what it “is” while a service provide verbs that change a node. The critical difference is that a node is expected to maintain configuration as its composition changes (e.g.: node is now connected for VLAN 666) while a service responds to specific change requests (node adds tag for VLAN 666). Our goal is the maintain Chef’s configuration management concept for the external entities.
Managed nodes also have a resource discovery concept that must align with the current ohai discovery model. Like a regular node, the manage node’s data attributes reflect the state of the managed entity; consequently we’d expect a blade chassis managed node to enumerate the blades that are included. This creates an expectation that the manage node appears to be “root” for the entity that it represents. We are also assuming that the Chef server can be trusted with the sharable discovered data. There may be cases where these assumptions do not have to be true, but we are making them for now.
Another essential element of managed nodes is that their agent location matters because the external resource generally has restricted access. There are several examples of this requirement. Switch configuration may require a serial connection from a specific node. Blade SANs and PDUs management ports are restricted to specific networks. This means that the manage node agents must run from a specific location. This location is not important to the Chef server or the nodes’ actions against the managed node; however, it’s critical for the system when starting the managed node agent. While it’s possible for managed nodes to run on nodes that are outside the overall Chef infrastructure, our use cases make it more likely that they will run as independent processes from regular nodes. This means that we’ll have to add some relationship information for managed nodes and perhaps a barclamp to install and manage managed nodes.
All of our use cases for managed nodes have a direct physical linkage between the managed node and server nodes. For a switch, it’s the ports connected. For a chassis, it’s the blades installed. For a SAN, it’s the LUNs exposed. These links imply a hierarchical graph that is not currently modeled in Chef data – in fact, it’s completely missing and difficult to maintain. At this time, it’s not clear how we or Opscode will address this. My current expectation is that we’ll use yet more roles to capture the relationships and add some hierarchical UI elements into Crowbar to help visualize it. We’ll also need to comprehend node types because “managed nodes” are too generic in our UI context.
Finally, we have to consider the sequence of action for actions between managed nodes and nodes. In all of our uses cases, steps to bring up a node requires orchestration with the managed node. Specifically, there needs to be a hand-off between the managed node and the node. For example, installing an application that uses VLANs does not work until the switch has created the VLAN, There are the same challenges on LUNs and SAN and blades and chassis. Crowbar provides orchestration that we can leverage assuming we can declare the linkages.
For now, a hack to get started…
For now, we’ve started on a workable hack for managed nodes. This involves running multiple chef-clients on the admin server in their own paths & processes. We’ll also have to add yet more roles to comprehend the relationships between the managed nodes and the things that are connected to them. Watch the crowbar listserv for details!
Extra Credit
Notes on the Opscode wiki from the Crowbar & Managed Node sessions
Posted by Rob H in Crowbar, DevOps, Greg Althaus, Opscode.
Tags: Chef, couchdb, Crowbar, DevOps, erlang, open space, OpsCode, orchestration, Wiki

Opscode Summit Agenda created by open space
I have to say that last week’s Opscode Community Summit was one of the most productive summits that I have attended. Their use of the open-space meeting format proved to be highly effective for a team of motivated people to self-organize and talk about critical topics. I especially like the agenda negations (see picture for an agenda snapshot) because everyone worked to adjust session times and locations based on what else other sessions being offered. Of course, is also helped to have an unbelievable level of Chef expertise on tap.
Overall
Overall, I found the summit to be a very valuable two days; consequently, I feel some need to pay it forward with some a good summary. Part of the goal was for the community to document their sessions on the event wiki (which I have done).
The roadmap sessions were of particular interest to me. In short, Chef is converging the code bases of their three products (hosted, private and open). The primary change on this will moving from CouchBD to a SQL based DB and moving away the API calls away from Merb/Ruby to Erlang. They are also improving search so that we can make more fine-tuned requests that perform better and return less extraneous data.
I had a lot of great conversations. Some of the companies represented included: Monster, Oracle, HP, DTO, Opscode (of course), InfoChimps, Reactor8, and Rackspace. There were many others – overall >100 people attended!
Crowbar & Chef
Greg Althaus and I attended for Dell with a Crowbar specific agenda so my notes reflect the fact that I spent 80% of my time on sessions related to features we need and explaining what we have done with Chef.
Observations related to Crowbar’s use of Chef
- There is a class of “orchestration” products that have similar objectives as Crowbar. Ones that I remember are Cluster Chef, Run Deck, Domino
- Crowbar uses Chef in a way that is different than users who have a single application to deploy. We use roles and databags to store configuration that other users inject into their recipes. This is dues to the fact that we are trying to create generic recipes that can be applied to many installations.
- Our heavy use of roles enables something of a cookbook service pattern. We found that this was confusing to many chef users who rely on the UI and knife. It works for us because all of these interactions are automated by Crowbar.
- We picked up some smart security ideas that we’ll incorporate into future versions.
Managed Nodes / External Entities
Our primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that that there is an instance of the chef-client agent that can act as a delegate for the external entity. I had so much to say about that part of the session, I’m posting it as its own topic shortly.
Posted by Rob H in Architecture, Clouds, Dave McCrory, DevOps.
Tags: 451, applications, Architecture, cloud, DevOps, hybrid, networking, orchestration, ProTier, Storage
It’s impossible to resist posting about this month’s 451 Group Cloudscape report when it calls me out by name as a leading cloud innovator:
… ProTier founders Dave McCrory and Rob Hirschfeld. ProTier [note: now part of Quest] was, indeed, the first VMware ecosystem vendor to be tracked by The 451 Group. In the face of a skeptical world, these entrepreneurs argued that virtualization needed automation in order to realize its full potential, and that the test lab was the low-hanging fruit. Subsequent events have more than vindicated their view (pg. 33).
It’s even better when the report is worth reading and offers insights into forces shaping the industry. It’s nice to be “more than vindicated” on an amazing journey we started over 10 years ago!
Rather than recite 451′s points (hybrid cloud = automation + orchestration + devops + pixie dust), I’d rather look at the problem different way as a counterpoint.
The problem is “how do we deal with applications that are scattered over multiple data centers?”
I do not think orchestration is the complete answer. Current orchestration is too focused on moving around virtual machines (aka workloads).
Ultimately, the solution lies in application architecture; however, I feel that is also a misdirection because cloud is redefining what an “application architecture” means.
Applications are a dynamic mix of compute, storage, and connectivity.
We’re entering an age when all of these ingredients will be delivered as elastic services that will be managed by the applications themselves. The concept of self management is an extension of DevOps principles that fuse application function and deployment. There are missing pieces, but I’m seeing the innovation moving to fill those gaps.
If you want to see the future of cloud applications then look at the network and storage services that are emerging. They will tell you more about the future than orchestration.
Posted by Rob H in Amazon, CloudOps, Crowbar, DevOps, Michael Cote, Open source, OpenStack, OpenStack Design Summit, Opscode.
I’m working on a larger post about the OpenStack Summit around API Implementation vs. Specification. You can have a preview of that AND A LOT OF OTHER STUFF (OpenStack, Crowbar, lunch) in this 40 minute interview w/ Michael Cote.
Setting: Dell World
Interview w/ @Cote at the Hilton Hotel Lobby on 6th street in Austin.
I know that Cote’s post does not have a time marker for easy navigation; however, I added them to help guide your navigation in the interview (link for audio) if you want to jump around.
Posted by Rob H in CloudOps, Crowbar, DevOps, Hadoop, Migration, OpenStack, Opscode, RackSpace.
Tags: Barclamp, Cloud Foundry, Crowbar, Github, hadoop, ISO, migration, OpenStack, PowerEdge, rackspace, repo, RHEL, version
In the last week, my team at Dell completed a major refactoring of Crowbar that significantly improves our ability to bring in community contributions and field customizations. Today, we merged it into Crowbar’s public repo(s).

From the very first versions, our objective for Crowbar was to create the fastest and most reliable cloud deployments. Along the way, we realized Crowbar’s true potential lay in embracing DevOps as an operational model for maintaining clouds. That meant building up cloud deployments in layers from pieces that we call barclamps (extensions of Chef cookbooks). Our first version, centered on OpenStack Cactus, leveraged barclamps but was still created as a single system. This unified system was a huge step forward in cloud deployments, but did not live up to our CloudOps vision of continuous delivery.
In this version, each Crowbar barclamp is an independent delivery unit that can be integrated before, while or after installing Crowbar.
The core of the change is each barclamp, including the most core ones, are stored in independent code repositories. Putting the code into distinct repos means that each barclamp can have its own life cycle, its own maintainer site and its own dependency tree. This modularization allows customers to manage their Crowbar deployments with a very fine brush: they may choose to customize parts of the system, they could lock components to specific tag and they can bring in barclamps from other vendors.
While the core barclamps are automatically integrated into the Crowbar build using git submodules; other barclamps are installed into the system as needed. This allows you to pull in the suite of OpenStack barclamps at build time or to wait until your Crowbar system is running before installing. Once you install a barclamp, you are able to retrieve an updated barclamp and reapply it to the system.
This feature gives you the ability to 1) choose exactly what you want to include and 2) perform field updates to a live Crowbar system.
Let’s look at some examples:
- The Cloud Foundry barclamp can be sourced Cloud Foundry instead of bundled into the Crowbar repository. This allows the team working on the cloud application to take ownership for their own deployment. As a continuous delivery proponent, I believe strongly that the development team should be responsible for ensuring that their code is deployable (refer to my OpenStack “Deployer API” blue print attempting to codify this).
- DreamHost, maintainers of Ceph Storage, can maintain their own local barclamp repos for OpenStack that are cloned from our community Swift barclamp. This allows them to innovate and customize OpenStack deployments for their business and choose which updates to merge back to the community.
- Rackspace Cloud Builders can work on the most leading edge OpenStack features and maintaining workable deployments on branches. As the code stabilizes, they simply merge in their changes.
- Dell BIOS and RAID barclamps only support the PowerEdge C line today. When we offer PowerEdge R support, you will be able to install or update the barclamps to add that capability. If another hardware vendor creates a barclamp for their hardware then you can install that into your existing system.
I believe that these changes to Crowbar are a huge step forwards on our journey of creating a community supportable Open Operations framework. I hope that you are as excited as I am about these changes.
I encourage you to take the first step by trying out Crowbar and, ultimately, writing your own barclamps.
Post Scripts:
- In addition to the modularization, the updated code includes RHEL as a deployment platform. At present, you must choose to be either RHEL or Ubuntu at build time.
- We have enhanced the network barclamp to describe connections as more abstract connections, called conduits, between nodes. This is a powerful change, but requires some understanding before you start making changes.
- We have only begun testing the change as of 9/12, we expect the system to be fully stabilized by 10/3. If you are not willing to deal with bugs then I recommend building the Crowbar “v1.0″ tag (or using the ISOs from our July launch).
Posted by Rob H in Crowbar, Dave McCrory, DevOps, Hadoop, Joseph George, Open source, OpenStack.
Tags: Barclamps, Big Data, Cloudcast, community, Crowbar, Delp, Gracely, hadoop, Humility, OpenStack
TheCloudcast.net – Thank you for such a great series of questions. Wow, nearly 36+ minutes of cloudicious interview about the work my team at Dell is doing!
Thanks to our hosts for putting together a great series! They are:
Highlights from Episode 16: Dell, Dude you’re getting a cloud
- 3:40 JBG “we are listening to our customers tell us what they want to accomplish”
- 4:40 RAH “humility is part of [what we're] doing … cloud is about learning and collaboration”
- 6:40 RAH “OpenStack filled a niche. It was the first open source community cloud. … Not just open source, its open community.”
- 7:15 RAH “We’re beyond critical mass. We’re seeing acceleration… we are transitioning into a community development.”
- 7:30 RAH “It’s accelerating. It happening so fast.”
- 8:00 RAH “We felt it was really important for people to be able to use it. We felt that it was important to get away from just people developing into people using. “
- 8:57 – RAH “Cloud is not just one thing. You have to have all the pieces.”
- 10: 15 – RAH “Cloud is always ready, never finished”
- 10:50 – RAH “OpenStack is an alternative to public cloud including hosting providers seeking to offer their own cloud”
- 12:40 – AD “Dell has been in the Big data space for many years now”
- 20:15 – JBG “There’s a legacy of great partnerships that we leverage”
- 20:48 – JBG “Conflicts have not come up because we are focused on the customer”
- 21:30 – RAH “Shout out to Greg Althaus for solving these problems in such an elegant way. And we rewrote it 3 times”
- 22:02 – RAH “Crowbar started from our frustration of bringing up a cloud quickly … so we took a DevOps approach.”
- 22:41 – RAH “You had to have a system view AND a boot strapping view simultaneously”
- 23:50 – JBG “Crowbar was born out of necessity because we were setting up and blowing away our clouds over and over and over again. “
- 24:40 – JBG “We realized there were not many people thinking about all the pieces before OpenStack was installed”
- 25:20 – RAH “We don’t think customers have all the answers before we show up. This is not unique to OpenStack.”
- 28:20 – JBG “We’re seeing the community pick up Crowbar as a way to deploy”
Posted by Rob H in DevOps, OSCON.
Tags: DevOp, OSCON, presentation, sandwich, Soup
Today I presented about how Crowbar + DevOps + OpenStack = CloudOps. The highlight of the presentation (to me, anyway) is the Images vs Layers analogy of Soup vs Sandwiches. I hope it helps explain why we believe that a DevOps approach to Cloud is essential to success.
Here’s the preso: OSCON 07 2011
I’ll add a link to the videos when they are available.
Posted by Rob H in Agile, DevOps, Lean, Open Trends.
Tags: Agile, antidepressants, cloud, DevOps, Lean, operational model, operations, roadmap
Today my mother-in-law (a practicing psychiatrist) was bemoaning the current medical practice of substituting action for knowledge. In her world, many doctors will make rapid changes to their patients’ therapy. Their goal is to address the issues immediately presented (patient feels sad so Dr prescribes antidepressants) rather than taking time to understand the patients’ history or make changes incrementally and measure impacts. It feels like another example of our cultural compulsion to fix problems as quickly as possible.
Her comments made me question the core way that I evangelize!
Do Lean and Agile substitute action for knowledge? No. We use action to acquire knowledge.
The fundamental assumption that drives poor decision-making is that we have enough information to make a design, solve a problem or define a market. Lean and Agile’s more core tenet is that we must attack this assumption. We must assume that we can’t gather enough information to fully define our objective. The good news, is that even without much analysis we know a lot! We know:
- roughly what we want to do (road map)
- the first steps we should take (tactics)
- who will be working on the problem (team members)
- generally how much effort it will take (time & team size)
- who has the problem that we are trying to solve (market)
We also know that we’ll learn a lot more as we get closer to our target. Every delay in starting effectively pushed our “day of clarity” further into the future. For that reason, it is essential that we build a process that constantly reviews and adjusts its targets.
We need to build a process that acquires knowledge as progress is made and makes rapid progress.
In Agile, we translate this need into the decorations of our process: reviews for learning, retrospectives for adjustments, planning for taking action and short iterations to drive the feedback loop. Agile’s mantra is “ready, fire, aim, fire, aim, fire, aim, …” which is very different from simply jumping out of a plane without a parachute and hoping you’ll find a haystack to land in.
For cloud deployments, this means building operational knowledge in stages. Technology is simply evolving too quickly and best practices too slowly for anyone to wait for a packaged solution to solve all their cloud infrastructure problems. We tried this and it does not work: clouds are a mixture hardware, software and operations. More accurately, clouds are an operational model supported by hardware and software.
Currently, 80% of cloud deployment effort is operations (or “DevOps“).
When I listen to people’s plans about building product or deploying cloud, I get very skeptical when they take a lot of time to aim at objects far off on the horizon. Perhaps they are worried that they will substitute action for knowledge; however, I think they would be better served to test their knowledge with a little action.
My MIL agrees – she sees her patients frequently and makes small adjustments to their treatment as needed. Wow, that’s an Rx for Agile!