If registered, you have 8 votes to allocate as you wish. You will get a link via email – you must use that link.
Joseph B George and I are cross-blogging this post because we are jointly seeking your vote(s) for individual member seats on the OpenStack Foundation board. This is key point in the OpenStack journey and we strongly encourage eligible voters to participate no matter who you vote for! As we have said before, success of the Foundation governance process matters just as much as the code because it ensures equal access and limits forking.
We think that OpenStack succeeds because it is collaboratively developed. It is essential that we select board members who have a proven record of community development, a willingness to partner and have demonstrated investment in the project.
Our OpenStack vision favors production operations by being operator, user and ecosystem focused. If elected, we will represent these interests by helping advance deployability, API specifications, open operations and both large and small scale cloud deployments.
Of course, we’re asking for you to consider for both of us; however, if you want to focus on just one then here’s the balance between us. Rob (bio) is a technologist with deep roots in cloud technology, data center operations and open source. Joseph is a business professional with experience new product introduction and enterprise delivery.
Not sure if you can vote? If you registered as an individual member then your name should be on the voting list. In that case, you can vote between 8/20 and 8/24.
I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1st Crowbar design summit. We had great discussions and even better participation.
The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.
I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.
In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.
Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.
These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.
We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
Due to time limits, we have to stop discussions and continue them online.
We are hoping to align Crowbar 2 beta and OpenStack Folsom release.
Online / Connected Mode
Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.
Install From Source
This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.
CMDB agnostic (decoupling Chef)
This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
The abstraction layer needs to have both import and export functions
Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
There are a lot of notes captured about this on the etherpad – I recommend reviewing them
Heterogeneous OS (bare metal provisioning and beyond)
This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
Late binding is a programming term that I’ve commandeered for Crowbar’s DevOps design objectives.
We believe that late binding is a best practice for CloudOps.
Understanding this concept is turning out to be an important but confusing differentiation for Crowbar. We’ve effectively inverted the typical deploy pattern of building up a cloud from bare metal; instead, Crowbar allows you to build a cloud from the top down. The difference is critical – we delay hardware decisions until we have the information needed to do the correct configuration.
If Late Binding is still confusing, the concept is really very simple: “we hold off all work until you’ve decided how you want to setup your cloud.”
Late binding arose from our design objectives. We started the project with a few critical operational design objectives:
Treat the nodes and application layers as an interconnected system
Realize that application choices should drive down the entire application stack including BIOS, RAID and networking
Expect the entire system to be in a constantly changing so we must track state and avoid locked configurations.
We’d seen these objectives as core tenets in hyperscale operators who considered bare metal and network configuration to be an integral part of their application deployment. We know it is possible to build the system in layers that only (re)deploy once the application configuration is defined.
We have all this great interconnected automation! Why waste it by having to pre-stage the hardware or networking?
In cloud, late binding is known as “elastic computing” because you wait until you need resources to deploy. But running apps on cloud virtual machines is simple when compared to operating a physical infrastructure. In physical operations, RAID, BIOS and networking matter a lot because there are important and substantial variations between nodes. These differences are what drive late binding as a one of Crowbar’s core design principles.
Not only are we simultaneously releasing both of these solutions, they reflect a significant acceleration in pace of delivery. Both solutions had beta support for their core technologies (Cloudera 4 & OpenStack Essex) when the components were released and we have dramatically reduced the lag from component RC to solution release compared to past (3.7 & Diablo) milestones.
As before, the core deployment logic of these open source based solutions was developed in the open on Crowbar’s github. You are invited to download and try these solutions yourself. For Dell solutions, we include validated reference architectures, hardware configuration extensions for Crowbar, services and support.
The latest versions of Hadoop and OpenStack represent great strides for both solutions. It’s great to be able have made them more deployable and faster to evaluate and manage.
Getting the core Crowbar 2.0 changes working is not a major refactoring effort in calendar time; however, it will impact current Crowbar developers by changing improving the programming APIs. The Dell Crowbar team decided to treat this as a focused refactoring effort because several important changes are tightly coupled. We cannot solve them independently without causing a larger disruption.
All of the Crowbar 2.0 changes address issues and concerns raised in the community and are needed to support expanding of our OpenStack and Hadoop application deployments.
Our technical objective for Crowbar 2.0 is to simplify and streamline development efforts as the development and user community grows. We are seeking to:
simplify our use of Chef and eliminate Crowbar requirements in our Opscode Chef recipes.
reduce the initial effort required to leverage Crowbar
provide heterogeneous / multiple operating system deployments. This enables:
multiple versions of the same OS running for upgrades
different operating systems operating simultaneously (and deal with heterogeneous packaging issues)
accommodation of no-agent systems like locked systems (e.g.: virtualization hosts) and switches (aka external entities)
UEFI booting in Sledgehammer
strengthen networking abstractions
allow networking configurations to be created dynamically (so that users are not locked into choices made before Crowbar deployment)
better manage connected operations
enable pull-from-source deployments that are ahead of (or forked from) available packages.
improvements in Crowbar’s core database and state machine to enable
larger scale concerns
controlled production migrations and upgrades
other important items
make documentation more coupled to current features and easier to maintain
upgrade to Rails 3 to simplify code base, security and performance
deepen automated test coverage and capabilities
Beyond these great technical targets, we want Crowbar 2.0 is to address barriers to adoption that have been raised by our community, customers and partners. We have been tracking concerns about the learning curve for adding barclamps, complexity of networking configuration and packaging into a single ISO.
My team at Dell does not take on any refactoring changes lightly because they are disruptive to our community; however, a convergence of requirements has made it necessary to update several core components simultaneously. Specifically, we found that desired changes in networking, operating systems, packaging, configuration management, scale and hardware support all required interlocked changes. We have been bringing many of these changes into the code base in preparation and have reached a point where the next steps require changing Crowbar 1.0 semantics.
We are first and foremost an incremental architecture & lean development team – Crowbar 2.0 will have the smallest footprint needed to begin the transformations that are currently blocking us. There is significant room during and after the refactor for the community to shape Crowbar.
The response to Crowbar has been exciting and humbling. I most appreciate those who looked at Crowbar and saw more than a bare metal installer. They are the ones who recognized that we are trying to solve a bigger problem: it has been too difficult to cope with change in IT operations.
During this year, we have made many changes. Many have been driven by customer, user and partner feedback while others support Dell product delivery needs. Happily, these inputs are well aligned in intent if not always in timing.
Introduction of barclamps as modular components
Expansion into multiple applications (most notably OpenStack and Apache Hadoop)
Working in the open (with public commits)
Collaborative License Agreements
Dell‘s understanding of open source and open development has made a similar transformation. Crowbar was originally Apache 2 open sourced because we imagined it becoming part of the OpenStack project. While that ambition has faded, the practical benefits of open collaboration have proven to be substantial.
The results from this first year are compelling:
For OpenStack Diablo, coordination with the Rackspace Cloud Builder team enabled Crowbar to include the Keystone and Dashboard projects into Dell’s solution
We’ve amassed hundreds of mail subscribers and Github followers
Support for multiple releases of RHEL, Centos & Ubuntu including Ubuntu 12.04 while it was still in beta.
SuSE does their own port of Crowbar to SuSE with important advances in Crowbar’s install model (from ISO to package).
We stand on the edge of many exciting transformations for Crowbar’s second year. Based on the amount of change from this year, I’m hesitant to make long term predictions. Yet, just within next few months there are significant plans based on Crowbar 2.0 refactor. We have line of site to changes that expand our tool choices, improve networking, add operating systems and become more even production ops capable.
One of my team at Dell’s most critical lessons from hyperscale cloud deployments was the DevOps tooling and operations processes are key to success. Our crowbar project was born out of this realization.
I have been tracking the progress the Copper ARM-based server from design to implementation internally. Now, I’m excited to see it getting some deserved attention.
The Copper platform is really cool because the cost, power, and density ratios of the nodes are unparalleled. This makes it an ideal platform for distributed mixed compute/store workloads like Hadoop. The nodes in the platform have excellent RAM/CPU/Spindle ratios.
While Copper is driving huge density, it also drives forward the same hyperscale challenges that we’ve been trying to address with Crowbar; consequently, we’re already working to ensure that we can deploy and manage Copper with Crowbar at scale.
Copper and Crowbar make a natural team and we’re excited to be part of today’s announcement:
Dell is staging clusters of the Dell “Copper” ARM server within the Dell Solution Centers and with TACC so developers may book time on the platforms. Dell also will deliver an ARM-supported version of Crowbar, Dell’s open-source management infrastructure software, to the industry in the future.
We are RECORDING everything and will link posts from the event page.
There is HOMEWORK if you want to get ahead by installing OpenStack yourself.
For last minute updates about the event, I recommend that you join the Crowbar Listserver.
Content Logistics work like this.
Everything will be available ONLINE. We are also coordinating many physical sites as rally points.
Introductory: FOUR 3-hour sessions for people who do not have OpenStack or Crowbar experience. These sessions will show how to install OpenStack using Crowbar, discuss DevOps and showcase companies that are in the OpenStack ecosystem. They are planned to have 2 European slots (afternoon & evening), 3 US slots (morning, afternoon & evening), and 1 Asian slot (morning).
Expert: ON-GOING deep technical sessions for engineers who have OpenStack and/or Crowbar experience. There will be one main screen and voice channel in which we are planning to highlight and discuss these topics in blocks throughout the day. We have a long list of topics to discuss and will maintain an ongoing Google Hangout for each topic. Depending on interest, we will jump back and forth to different hangouts.
Intro/Overview Session Logistics work like this
We’re planning FOUR introductory sessions throughout the day (read ahead?). Each session should be approximately 3 hours. The first hour of the sessions will be about OpenStack Essex and installing it using Crowbar. After some Q&A, we’re going to highlight the OpenStack ecosystem. The schedule for the ecosystem is in flux and will likely shift even during the event.
The Session start times for Overview & Ecosystem content
6/1 10 am
* There are no planned live venues at this time/region. You are always welcome to join online!
Experts Track Logistics
Note: we expect experts to have already installed OpenStack (see homework page). Ideally, an expert has already setup a build environment.
We have a list of topics (Essex, Quantum, Networking, Pull from Source, Documentation, etc) that we plan to cover on a 30-60 minute rotation.
We will cover the OpenStack Essex deploy at the start of each planned session (9am, Noon, 3pm & 8pm EDT). Before we cover the OpenStack deploy, we’ll spend 10 minutes setting (and posting) the agenda for the next three hours based on attendee input.
Even if we are not talking about a topic on the main channel, we will keep a dialog going on topic specific Google hangouts. The links to the hangouts will be posted with the Expert track agenda.
If you want to read more about the event, check out my event logistics post (link pending).
I do not apologize for my promotion of the Dell-lead open source Crowbar as the deployment tool for the OpenStack Essex Deploy. For a community to focus on improving deployment tooling, there must be a stable reference infrastructure. Crowbar provides a fast and repeatable multi-node environment with scriptable networking and packaging.
I believe that OpenStack benefits from a repeatable multi-node reference deployment. I’ll go further and state that this requires DevOps tooling to ensure consistency both within and between deployments.
DevStack makes trunk development more canonical between different developers. I hope that Crowbar will help provide a similar experience for operators so that we can truly share deployment experience and troubleshooting. I think it’s already realistic for Crowbar deployments to a repeatable enough deployment that they provide a reference for defect documentation and reproduction.
Said more plainly, it’s a good thing if a lot of us use OpenStack in the same way so that we can help each out.