If you noticed the Citrix’ Olympus OpenStack announcement with Dell and Rackspace (see my #2s chasing #1s commentary) then you may be wondering if there is a Crowbar story there too.
Yes, there is.
If you noticed the Citrix’ Olympus OpenStack announcement with Dell and Rackspace (see my #2s chasing #1s commentary) then you may be wondering if there is a Crowbar story there too.
Yes, there is.
<service bulletin> Server virtualization is not cloud: it is a commonly used technology that creates convenient resource partitions for cloud operations and infrastructure as a service providers. </service bulletin>
OpenStack claims support for nearly every virtualization platform on the market. While the basics of “what is virtualization” are common across all platforms, there are important variances in how these platforms are deployed. It is important to understand these variances to make informed choices about virtualization platforms.
Your virtualization model choice will have deep implications on your server/networking choice, deployment methodology and operations infrastructure.
My focus is on architecture not specific hypervisors so I’m generalizing to just three to make the each architecture description more concrete:
Of course, there are many more hypervisors and many different ways to deploy the three I’m referencing.
This picture shows all three options as a single system. In practice, only operators wishing to avoid exposure to RESTful recreational activities would implement multiple virtualization architectures in a single system. Let’s explore the three options:
OS + Hypervisor (KVM) architecture deploys the hypervisor a free standing application on top of an operating system (OS). In this model, the service provider manages the OS and the hypervisor independently. This means that the OS needs to be maintained, but is also allows the OS to be enhanced to better manage the cloud or add other functions (share storage). Because they are least restricted, free standing hypervisors lead the virtualization innovation wave.
Bare Metal Hypervisor (XenServer) architecture integrates the hypervisor and the OS as a single unit. In this model, the service provider manages the hypervisor as a single unit. This makes it easier to support and maintain the hypervisor because the platform can be tightly controlled; however, it limits the operator’s ability to extend or multi-purpose the server. In this model, operators may add agents directly to the individual hypervisor but would not make changes to the underlying OS or resource allocation.
Clustered Hypervisor (ESX + vCenter) architecture integrates multiple servers into a single hypervisor pool. In this model, the service provider does not manage the individual hypervisor; instead, they operate the environment through the cluster supervisor. This makes it easier to perform resource balancing and fault tolerance within the domain of the cluster; however, the operator must rely on the supervisor because directly managing the system creates a multi-master problem. Lack of direct management improves supportability at the cost of flexibility. Scale is also a challenge for clustered hypervisors because their span of control is limited to practical resource boundaries: this means that large clouds add complexity as they deal with multiple clusters.
Clearly, choosing a virtualization architecture is difficult with significant trade-offs that must be considered. It would be easy to get lost in the technical weeds except that the ultimate choice seems to be more stylistic.
Ultimately, the choice of virtualization approach comes down to your capability to manage and support cloud operations. The Hypervisor+OS approach maximum flexibility and minimum cost but requires an investment to build a level competence. Generally, this choice pervades an overall approach to embrace open cloud operations. Selecting more controlled models for virtualization reduces risk for operations and allows operators to leverage (at a price, of course) their vendor’s core competencies and mature software delivery timelines.
While all of these choices are seeing strong adoption in the general market, I have been looking at the OpenStack community in particular. In that community, the primary architectural choice is an agent per host instead of clusters. KVM is favored for development and is the hypervisor of NASA’s Nova implementation. XenServer has strong support from both Citrix and Rackspace.
Choice is good: know thyself.
OpenStack has grown amazingly and picked up serious corporate support in its first year. Understanding how helps to explain why the initiative has legs and where adopters should invest. While OpenStack has picked up a lot of industry partners, early participation by Dell and Citrix have been important to a meteoric trajectory.
So why are we (Dell) working so hard to light a fire under OpenStack?
To explain OpenStack support and momentum, we have to start with a self-reflective fact: Dell (my employer), Citrix and Rackspace are seeking to gain the dominate positions in their respective areas of “cloud.” Individually, we have competitors in more entrenched positions for cloud silos in solutions, ecosystem, and hypervisor; however, our competitors are not acting in concertto maintain their position.
OpenStack offers an opportunity for companies trying to gain market share to leverage the strengths of partners against their competitors.
The surprising aspect of OpenStack is how much better this collaboration is working in practice than in theory. This, dare I say it, synergistic benefit comes to OpenStack from all the partners (Dell, Citrix, Rackspace and others) working together because:
Today, three major cloud players are standing together to demonstrate commitment to the community. This announcement is a foundation for the OpenStack ecosystem. In the next few months, I expect to see more and more collaborative announcements as the community proves the value of working together.
By aligning around an open platform, we collectively out flank previously dominant players who choose to go it alone. The technology is promising; however, the power of OpenStack flows mainly from the ecosystem that we are building together.
Our OpenStack team here at Dell has been busy getting Crowbar ready to open source and that does not leave much time for blog posts. We’re putting on a new UI, modularizing with barclamps and creating network options for Nova Cactus.
However, I wanted to take a minute to update the community about Swift and Nova recipes that we are intenionally leaking out to the community in advance of the larger Crowbar code drop.
As part of our collaboration with Opscode, Matt Ray, has been merging our recipes into his most excellent OpenStack cookbook tree. If you want to see our unmerged recipes, we’re also posting those to our github. So far, we have the Swift recipes available (thanks to Andi Abes!) with Nova to follow soon.
5/31 Update: These are now online.

Greg Althaus (@glathaus) and I will be leading a discussion about OpenStack at the May CTLUG on 5/19 at 7pm. The location is Mangia Pizza on Burnet and Duval (In the strip mall where Taco Deli is).
We’ll talk about how OpenStack works, where we see it going, and what Dell is doing to participate in the community.
OpenStack should be very interesting to the CTLUG because of the technologies being used AND way that the community is engaged in helping craft the software.
Tonight I submitted a formal OpenStack Common blue print for Crowbar as a cloud installer. My team at Dell considers this to be our first step towards delivering the code as open source (next few weeks) and want to show the community the design thinking behind the project. Crowbar currently only embodies a fraction of this scope but we have designed it looking forward.
I’ve copied the text of our inital blueprint here until it is approved. The living document will be maintained at the OpenStack launch pad and I will update links appropriately.
Note: Installer is used here because of convention. The scope of this blue print is intended to include expansion and maintenance of the OpenStack infrastructure.
This blueprint creates a common installation system for OpenStack infrastructure and components. The installer should be able to discover and configure physical equipment (servers,switches, etc) and then deploy the OpenStack software components in an optimum way for the discovered infrastructure. Minimum manual steps should be needed for setup and maintenance of the system.
Users should be able to leverage and contribute to components of the system without deploying 100% of the system. This encourages community collaboration. For example, installation scripts that deploy and configure OpenStack components should be usable without using bare metal configuration and vice-versa.
The expected result will be installations that are 100% automated after racking gear with no individual touch of any components.
This means that the installer will be able to
Not currently released. Reference code (“Crowbar”) to be delivered by Dell via GitHub .
While a complete deployment system is an essential component to ensure adoption, it also fosters sharing and encoding of operational methods by the community. This follows and “Open Ops” strategy that encourages OpenStack users to create and share best practices.
The installer addresses the following needs
It is important that the installer does NOT
This design includes an “Ops Infrastructure API” for use by other components and services. This REST API will allow trusted applications to discover and inspect the operational infrastructure to provide additional services. The API should expose
The installation process has multiple operations phases: 1) bare metal provisioning, 2) component deployment, and 3) upgrade/redeployment. While each phase is distinct, they must act in a coordinated way.
A provisioning state machine (PSM) is a core concept for this overall installation architecture. The PSM must be extensible so that new capabilities and sequences can be added.
It is important that installer support IPv6 as an end state. It is not required that the entire process be IPv4 or IPv6 since changing address schema may be desirable depending on the task to be performed.
Modular Design Objective
The core element for Phase 1 is a “PXE State Machine” (a subset of the PSM) that orchestrates node provisioning through multiple installation points. This allows different installation environments to be used while the system is prepared for it’s final state. These environments may include BIOS & RAID configuration, diagnostics, burn-in, and security validation.
It is anticipated that nodes will pass through phase 1 provisioning FOR EACH boot cycle. This allows the Installation Manager to perform any steps that may be dictated based on the PSM. This could include diagnostic and security checks of the physical infrastructure.
Considerations:
During Phase 2, the installer must act on the system as a whole. The focus shifts from single node provisioning, to system level deployment and configuration.
Phase 2 extends the PSM to comprehend the dependencies between system components. The use of a state machine is essential because system configuration may require that individual nodes return to Phase 1 in order to change their physical configuration. For example, a node identified for use by Swift may need to be setup as a JBOD while the same node could be configured as RAID 10 for Nova. The PSM would also be used to handle inter-dependencies between components that are difficult to script in stages such as rebalancing a Swift ring.
Considerations:
The ultimate objective for Phase 3 is to foster a continuous deployment capability in which updates from OpenStack can be frequently and easily implemented in a production environment with minimal risk. This requires a substantial amount of self-testing and automation.
Phase 3 maintains the system when new components arrive. Phase 3 includes the added requirements:
This needs additional requirements.
The objective of the Ops API is to provide a standard way for operations tools to map the internal cloud infrastructure without duplicating discovery effort. This will allow tools that can:
Event: The Dell equipment has just arrived.
Event: System checked out healthy from base configuration
Event: System checked out healthy from base configuration
This is a special case, for Denise.
Event: System has passed lab inspection, is about to be connected into the corporate network (or hosting data center)
We are offering Crowbar as a starting point. It is an extension of Opscode Chef Server that provides the state machine for phases 1 and 2. Both code bases are Apache 2
TBD
I’m not going to post OpenStack full conference summary because I spent more time talking 1 on 1 with partners and customers than participating in sessions. Other members of the Dell team (@galthaus) did spend more time (I’ll see if he’ll post his notes).
I did lead an IPv6 unconference and those notes are below.
Overall, my observations from the conference are:
While IPv6 deserves more coverage here, I thought it would be worthwhile to at least preserve my notes/tweets from the IPv6 unconference discussion (To IP or not to IPv6? That will be the question.) at the OpenStack Design Summit.
NOTE: My tweets for this topic are notes, not my own experience/opinions
We had a hallway conversation after the unconference about what would force the switch. In a character, it’s $.
Votes for IPv6 during the keynote (tweet: I’d like to hear from audience here if that’s important to them. RT to vote). Retweeters:
My team at Dell is working diligently to release Crowbar (Apache 2) to the community.
The single most critical aspect of Crowbar involves a recent architectural change by Greg Althaus to make Crowbar much more modular. He dubbed the modules “barclamps” because they are used to attach new capabilities into the system. For example, we include barclamps for DNS, discovery, Nova, Swift, Nagios, Gangalia, and BIOS config. Users select which combination to use based on their deployment objectives.
In the Crowbar architecture, nearly every capability of the system is expressed as a barclamp. This means that the code base can be expanded and updated modularly. We feel that this pattern is essential to community involvement.
For example, another hardware vendor can add a barclamp that does the BIOS configuration for their specific equipment (yes! that is our intent). While many barclamps will be included with the open source release to install open source components, we anticipate that other barclamps will be only available with licensed products or in limited distribution.
A barclamp is like a cloud menu planner: it evaluates the whole environment and proposes configurations, roles, and recipes that fit your infrastructure. If you like the menu, then it tells Chef to start cooking.
Barclamps complement the “PXE state machine” aspect of Crowbar by providing logic Crowbar evaluates as the servers reach deployment states. These states are completely configurable via the provisioner barclamp; consequently, Crowbar users can choose to change order of operations. They can also add barclamps and easily incorporate them into their workflow where needed.
Barclamps take the form of a Rails controller that inherits from the barclamp superclass. The superclass provides the basic REST verbs that each barclamp must service while the child class implements the logic to create a “proposal” leveraging the wealth of information in Chef. Proposals are JSON collections that include configuration data needed for the deployment recipes and a mapping of nodes into roles.
Users are able to review and edit proposals (which are versioned) before asking Crowbar to implement the proposal in Chef. The proposal is implemented by assigning the nodes into the proposed roles and allowing Chef to work it’s magic.
Users can operate barcamps in parallel. In fact, most of our barclamps are designed to operate in conjunction.
Reminder: It is vital to understand that Crowbar is not a stand-alone utility. It is coupled to Chef Server for deployment and data storage. Our objective was to leverage the outstanding capabilities and community support for Chef as much as possible.
We’re excited about this architecture addition to Crowbar and encourage you to think about barclamps that would be helpful to your cloud deployment.
During the OpenStack Design Conference, Forrester’s James Staten (@Staten7) raved about OpenStack’s transparency compared to AWS. Within the enclave of OpenStack fan boys supports (Dell alone sent >14 people to the summit), his post drew a considerable attention but did little to really further the value proposition.
“Open deployments” are a much more significant value to implementors than transparency from open source code.
For any technology solution, there are significant challenges that will only be understood when the system is under stress. In some cases, these challenges are code defects; however, many will be related to configuration and deployment choices that are site specific. It is correcting these issues that result in design patterns and practices that create a robust infrastructure; consequently, the process of hardening a solution is critical to its ultimate stability and success.
When a solution, like AWS, is deployed and managed by a single entity, it is extremely rare for operational lessons learned and best practices to make it to the larger community. Amazon’s recent post mortem is a welcome exception. This is not a bad thing (Roman Stanek’s contrasting point), it is just the reality of a proprietary cloud. AWS operates as a black box and I don’t believe that Amazon’s operational experience would be relevant to others unless they were also operationally transparent.
While it makes business sense to remain operationally opaque, service providers lose the benefit of external lessons learned when there is no community working in parallel with them.
OpenStack’s community has an opportunity to iterate on CloudOps patterns and practices at a dramatically faster rate than any single provider. This creates distinct value for OpenStack adopters because they can shorten or eliminate their own challenges because other adopters will have the same pains and benefit from the same fixes.
It is critical to understand that the benefit is conferred to both the party sharing the problem (they get advice and support) and the party lending assistance (they avoid the problem). This is distinctly different from proprietary clouds where sharing is likely to cause embarrassment unlikely to create helpful outcomes.
I am not advocating that all OpenStack deployments be the same or follow a prescriptive patterns.
I believe that each installation will be unique in some way; however, there will be enough commonalities and shared code to make sharing worthwhile. This is especially true for adopters who start with tools like Crowbar that leverage community based Chef Recipes and automating scripts. Tools that encourage automation and shared scripts help accelerate the establishment of robust deployment patterns and practices.
Ultimately, the ability to collaborate on cloud operation practice does more to strengthen OpenStack than developers, code reviews or corporate endorsements.
Today, I gave the a presentation that is the sequel to the “bootstrapping the hyperscale clouds.” The topic addresses concerns raised by the BlackOps concept but brings up history and perspectives that drive Dell’s Solution focus for OpenStack.
You can view in slide share or here’s a PDF of the preso: Operating the Hyperscale Cloud 4-27
I tend to favor pictures over text, so you may get more when OpenStack posts the video [link pending].
The talk is about our journey to build an OpenStack solution starting from our drive to build a cloud installer (Crowbar). The critical learning is that success for clouds is the adoption of an “operational model” (aka Cloud Ops) that creates cloud agility.
Ultimately, the presentation is the basis for another white paper.
Appending: