When Greg Althaus and I first proposed the project that would become Dell’s Crowbar, we had already learned first-hand that there was a significant gap in both the technologies and the processes for scale operations. Our team at Dell saw that the successful cloud data centers were treating their deployments as integrated systems (now called DevOps) in which configuration of many components where coordinated and orchestrated; however, these approaches feel short of the mark in our opinion. We wanted to create a truly integrated operational environment from the bare metal through the networking up to the applications and out to the operations tooling.
Our ultimate technical nirvana is to achieve closed-loop continuous deployments. We want to see applications that constantly optimize new code, deployment changes, quality, revenue and cost of operations. We could find parts but not a complete adequate foundation for this vision.
The business driver for Crowbar is system thinking around improved time to value and flexibility. While our technical vision is a long-term objective, we see very real short-term ROI. It does not matter if you are writing your own software or deploying applications; the faster you can move that code into production the sooner you get value from innovation. It is clear to us that the most successful technology companies have reorganized around speed to market and adapting to pace of change.
System flexibility & acceleration were key values when lean manufacturing revolution gave Dell a competitive advantage and it has proven even more critical in today’s dynamic technology innovation climate.
We hope that this post helps define a vision for Crowbar beyond the upcoming refactoring. We started the project with the idea that new tools meant we could take operations to a new level.
While that’s a great objective, we’re too pragmatic in delivery to rest on a broad objective. Let’s take a look at Crowbar’s concrete strengths and growth areas.
Key strength areas for Crowbar
Late binding – hardware and network configuration is held until software configuration is known. This is a huge system concept.
Dynamic and Integrated Networking – means that we treat networking as a 1st class citizen for ops (sort of like software defined networking but integrated into the application)
System Perspective – no Application is an island. You can’t optimize just the deployment, you need to consider hardware, software, networking and operations all together.
Bootstrapping (bare metal) – while not “rocket science” it takes a lot of careful effort to get this right in a way that is meaningful in a continuous operations environment.
Open Source / Open Development / Modular Design – this problem is simply too complex to solve alone. We need to get a much broader net of environments and thinking involved.
Continuing Areas of Leadership
Open / Lean / Incremental Architecture – these are core aspects of our approach. While we have a vision, we also are very open to ways that solve problems faster and more elegantly than we’d expected.
Continuous deployment – we think the release cycles are getting faster and the only way to survive is the build change into the foundation of operations.
Integrated networking – software defined networking is cool, but not enough. We need to have semantics that link applications, networks and infrastructure together.
Equilivent physical / virtual – we’re not saying that you won’t care if it’s physical or virtual (you should), we think that it should not impact your operations.
Scale / Hybrid – the key element to hybrid is scale and to hybrid is scale. The missing connection is being able to close the loop.
Closed loop deployment – seeking load management, code quality, profit, and cost of operations as factor in managed operations.
The response to Crowbar has been exciting and humbling. I most appreciate those who looked at Crowbar and saw more than a bare metal installer. They are the ones who recognized that we are trying to solve a bigger problem: it has been too difficult to cope with change in IT operations.
During this year, we have made many changes. Many have been driven by customer, user and partner feedback while others support Dell product delivery needs. Happily, these inputs are well aligned in intent if not always in timing.
Introduction of barclamps as modular components
Expansion into multiple applications (most notably OpenStack and Apache Hadoop)
Working in the open (with public commits)
Collaborative License Agreements
Dell‘s understanding of open source and open development has made a similar transformation. Crowbar was originally Apache 2 open sourced because we imagined it becoming part of the OpenStack project. While that ambition has faded, the practical benefits of open collaboration have proven to be substantial.
The results from this first year are compelling:
For OpenStack Diablo, coordination with the Rackspace Cloud Builder team enabled Crowbar to include the Keystone and Dashboard projects into Dell’s solution
We’ve amassed hundreds of mail subscribers and Github followers
Support for multiple releases of RHEL, Centos & Ubuntu including Ubuntu 12.04 while it was still in beta.
SuSE does their own port of Crowbar to SuSE with important advances in Crowbar’s install model (from ISO to package).
We stand on the edge of many exciting transformations for Crowbar’s second year. Based on the amount of change from this year, I’m hesitant to make long term predictions. Yet, just within next few months there are significant plans based on Crowbar 2.0 refactor. We have line of site to changes that expand our tool choices, improve networking, add operating systems and become more even production ops capable.
Last week, my team at Dell led a world-wide OpenStack Essex Deploy event. Kamesh Pemmaraju, our OpenStack-powered solution product manager, did a great summary of the event results (200+ attendees!). What started as a hack-a-thon for deploy scripts morphed into a stunning 14+ hour event with rotating intro content and an ecosystem showcase (videos). Special kudos to Kamesh, Andi Abes, Judd Maltin, Randy Perryman & Mike Pittaro for leadership at our regional sites.
Clearly, OpenStack is attracting a lot of interest. We’ve been investing time in content to help people who are curious about OpenStack to get started.
On that measure, we have room for improvement. We had some great discussions about how to handle upgrades and market drivers for OpenStack; however, we did not spend the time improving Essex deployments that I was hoping to achieve. I know it’s possible – I’ve talked with developers in the Crowbar community who want this.
If you wanted more expert interaction, here are some of my thoughts for future events.
Expert track did not get to deploy coding. I think that we need to simply focus more even tightly on to Crowbar deployments. That means having a Crowbar Hack with an OpenStack focus instead of vice versa.
Efforts to serve OpenStack n00bs did not protect time for experts. If we offer expert sessions then we won’t try to have parallel intro sessions. We’ll simply have to direct novices to the homework pages and videos.
Combining on-site and on-line is too confusing. As much as I enjoy meeting people face-to-face, I think we’d have a more skilled audience if we kept it online only.
Connectivity! Dropped connections, sigh.
Better planning for videos (not by the presenters) to make sure that we have good results on the expert track.
This event was too long. It’s just not practical to serve Europe, US and Asia in a single event. I think that 2-3 hours is a much more practical maximum. 10-12am Eastern or 6-8pm Pacific would be much more manageable.
Do you have other comments and suggestions? Please let me know!
Overall, we had a good meeting with strong attendance. Unlike last meeting, the attendees were less OpenStack experienced; however, many us worked for companies that are members of the OpenStack Foundation. I work for Dell (a gold sponsor).
Rather than posting before the summit, I’ve scored my summit experiences against our poll to see if our priorities were met. (note: Thanks to Greg Althaus for additional input in the commentary)
Results from Summit
Stability vs. Features Prioritization & Processes
This was a major thread throughout the summit in multiple sessions. My feeling of the dialog was the stability (including continuous integration) was a core requirement.
API vs Code. What does it mean to be “OpenStack”
This is a good news / bad news story. As OpenStack Compute gets split into more and more independent pieces; their interactions will require a well-defined externalized API. The continuing issue is that these APIs will be still driven by the python-based reference implementation. In some regards, APIs will emerge and be better codified. Newer PTLs bring additional perspective and beliefs around APIs vs Code.
Operations focus: making OpenStack easy to deploy and manage
This was a major topic with many sessions dedicated to operationalizing OpenStack. Special focus was given to shared Puppet and Chef deployment code.There were specific sessions around High Availability and what that means. From this session, consensus was built for infrastructure HA documentation using Pacemaker for Folsom. There was NOT consensus for instance-level HA.
Documentation Standards and improved user guides
Anne Gentle is championing this and had a presence throughout the summit.
Driving for Hypervisor feature parity (KVM, Xen and also VMware/HyperV)
While Libvirt/KVM continues to dominate. Citrix was present to support XenServer and Microsoft made commitments for (returning) HyperV support.
Improving collaboration (get beyond listserv & IRC) so information is more persistent
I was not involved in discussions around this topic.
Have more operations discussion / design at the Design Summit
We had many sessions about operations tooling but little about specific considerations for operations. Perhaps we need to take a step towards shared deployment scripts.
Action with Fragmentation
Nova-volume to split out and/or more API driven (less integrated)
This was a major topic in multiple sessions. There are a number of parties that are signing up to create block storage as a stand-alone project.Cinder will be the block storage service. Not just good sessions were held, but good plans were built for constructing and improving the project. The project will start as a clone of the current nova project with unique chunks living in Cinder and common pieces of both projects move to the openstack-common project.The Cinder working group is very cross company and had a strong desire to maintain a minimal specification (current API replacement) with only one additional feature required for Folsom (boot from volume). The boot from volume feature is really a Nova feature, but the Cinder team will most likely drive it to ensure Cinder/Nova separation.
OpenStack on Windows & HyperV
This is two topics. Microsoft is committing for OpenStack to support HyperV as a Nova Compute node. Running the rest of the suite on Windows does not appear to be a priority (or practical?)
Orchestration. More projects like Donabe?
There are a number of ecosystem projects emerging. Now that Essex has emerged as a solid release, I expect to see an acceleration projects. At this time, they are still incubating.There was also the acknowledgement that there are two levels of orchestration, instance orchestration (think nova scheduler) and workload orchestration (think Donabe or VAPP). Instance orchestration had many good discussions and improvements suggested and started (host aggregates, filter scheduler extensions, …)
Making Nova into smaller components
This was a thread in several sessions and it part of the ongoing stabilization work to improve collaboration. One important component of this is moving common code into a shared library.
In process, needs focus
How should invitations be handed out to Summit? Was the last process to Dev focused?
I was not aware of any discussion of this at the summit. Looks like we all need to go out and commit some code!
Overall, I think that the Austin Stacker priorities were well positioned at the Design Summit.
After the split, I’m posted the twitter feed from the meeting (in post order):
It’s OpenStack Summit time again for my team at Dell and there’s deployment in the air. It’s been an amazing journey from the first Austin summit to Folsom today. Since those first heady days, the party has gotten a lot more crowded, founding members have faded away, recruiters became enriched as employees changed email TLDs and buckets of code was delivered.
Throughout, Dell has stayed the course: our focus from day-one has been ensuring OpenStack can be deployed into production in a way that was true to the OpenStack mission of community collaboration and Apache-2-licensed open source.
We’ve delivered on the making OpenStack deployable vision by collaborating broadly on the OpenStack components of the open source Crowbar project. I believe that our vision for sustainable open operations based on DevOps principles is the most complete strategy for production cloud deployments.
We are at the Folsom Summit in force and we’re looking forward to discussions with the OpenStack community. Here are some of the ways to engage with us:
During the summit (M-W), we’ll have our Crowbar OpenStack Essex deployments running. We kicked off Essex development with a world-wide event in early March and we want more people to come and join in.
During the conference (W-F), we’ll be showing off application deployments using enStratus and Chef against our field proven Diablo release.
Thursday 1:00pm, OpenStack Gains Momentum: Customers are Speaking Up by Kamesh Pemmaraju (Dell)
Friday 9:50am, Deploy Apps on OpenStack using Dashboard, Chef and enStratus by Rob Hirschfeld (Dell), Matt Ray (Opscode) and Keith Hudgins (enStratus).
Friday 11:30am, Expanding the Community Panel including Joseph George (Dell)
This fun round trip road trip from Rackspace & Dell HQs in Austin to the summit and home again promises to be an odyssey of inclusion. Dell OpenStack/Crowbar engineer Andi Abes (@a_abes). Follow @RoadstackRV to follow along as they return home and share their thoughts about the summit!
Monday 6pm Mirantis Welcome Party, co-sponsored with Dell, at Sens Restaurant (RSVP)
Tuesday 5pm “Demos & Drinks” Happy Hour, co-hosted by Dell, Mirantis, Morphlabs, Canonical at the Hyatt Regency Hospitality Room off the Atrium
My team has been in the field talking to customers and doing OpenStack deployments. We are proud to talk about it and our approach.
Mostly importantly, we want to collaborate with you on our Essex deployments using Crowbar. Get on our list, download/build crowbar, run the “essex-hack” branch and start banging on the deploy. Let’s work together to make this one rock solid Essex deploy.
I’m really pleased about this update because it reflects real world experience my team has working with customers and partners on OpenStack (and Hadoop) deployments.
Bringing a cloud infrastructure online can be a daunting bootstrapping challenge. Before
hanging out a shingle as a private or public cloud service provider, you must select a platform,
acquire hardware, configure your network, set up operations services, and integrate it to work
well together. That is a lot of moving parts before you have even installed a sellable application.
This white paper walks you through the decision process to get started with an open source
cloud infrastructure based on OpenStack™ and Dell™ PowerEdge™ C servers. At the end, you’ll
be ready to design your own trial system that will serve as the foundation of your hyperscale
cloud. 2011 Revision Notes
In the year since the the original publication of this white paper, we worked with many
customers building OpenStack clouds. These clouds range in size from small six-node lab
systems to larger production deployments. Based on these experiences, we updated this white
paper to reflect lessons learned.