OpenStack Board needs Consensus Governance

Posted on September 5, 2012 by Rob H

I am humbled by the community support for my election (I finished first in the results) and have been surprised to realize that one of my unlisted credentials, 5 years as Secretary of our local public Community Development Corporation, could also be an important asset. Dealing with Texas Open Government laws around parliamentary minutia such as open discussions, voting, minutes and agendas turns out to translate directly to open source governance (which affects everyone!).

I believe that the OpenStack Board should operate by Consensus rules.

Boards can choose to operate either by Consensus or Majority. A Consensus board typically passes all resolutions unanimously while a Majority board does not need agreement on decisions (see table below).

On first blush, majority process seems more efficient; unfortunately, split votes are divisive and polarizing. The consequence of split votes is the minority positions will seek longer debate, resort to back room politics and procedural overhead. This type of behavior would be destructive for our community.

A Consensus board, which only happens by implied agreement of the members and leadership, works to ensure that decisions can be supported by all the members. This does not mean that all the members agree with the board positions, hold-hands during meetings, or participate in Polynesian drum circles! It does mean that the board as a whole considers minority positions and their motivation before calling a vote. If there is too much difference in opinion, then the majority may defer voting or minority members may abstain from voting. One common aspect of Consensus boards is that members may appear to argue against their own positions to ensure that minority views have been represented.

While the consensus model takes discipline for our Directors, it also takes patience and cooperation from the community that we serve. Board actions may take longer or be less direct than members of the community desire.

I believe that committing to a Consensus board is essential for OpenStack because our board is large (24 members!), our community is diverse and the financial impacts to members are high. So far, I’m proud that we’ve been following that model and will try to ensure we maintain that tradition.

Post Script Table: Consensus vs. Majority Governance Snapshot

	Consensus	Majority
Voting	Unanimous	Split
Process	Flexible	Strict
Position in Discussions	Ambiguous	Polarized
Controversy	Avoided / Postponed	Forced / Decisive Wins
Community	Encouraged	Divided
Minority Interests	Incorporated	Excluded
Board Unity	High	Low

I am seeking your vote(s) for the OpenStack Board

Posted on August 20, 2012 by Rob H

If registered, you have 8 votes to allocate as you wish. You will get a link via email – you must use that link.

Joseph B George and I are cross-blogging this post because we are jointly seeking your vote(s) for individual member seats on the OpenStack Foundation board. This is key point in the OpenStack journey and we strongly encourage eligible voters to participate no matter who you vote for! As we have said before, success of the Foundation governance process matters just as much as the code because it ensures equal access and limits forking.

We think that OpenStack succeeds because it is collaboratively developed. It is essential that we select board members who have a proven record of community development, a willingness to partner and have demonstrated investment in the project.

Our OpenStack vision favors production operations by being operator, user and ecosystem focused. If elected, we will represent these interests by helping advance deployability, API specifications, open operations and both large and small scale cloud deployments.

Of the nominees, we best represent OpenStack users and operators (as opposed to developers). We have the most diverse experience in real-world OpenStack deployments because our solution has been deployed broadly (both as Dell and through Crowbar. We have a proven record of collaborating broadly with contributors, demonstrated skills at building the OpenStack community and doing real open source work to ensure that OpenStack is the most deployable cloud platform anywhere.

Let’s get specific about our leadership in the OpenStack project and community:

We have been active and vocal leaders in the OpenStack community
- our team has established two very active user groups (Austin & Boston)
- we have lead multiple world-wide deploy day events (March 2012 & May 2012).
We were the first OpenStack powered private cloud provider
- we have substantial experience in the field and know the challenges of running OpenStack for a wide variety of real-world deployments
- our first solution came out on Cactus! We’ve been delivering on Essex since OSCON 2012 (http://www.oscon .com/ ).
We represent a broad range of deployment scenarios ranging from hosting, government, healthcare, retail, education, media, financial and more!
We have broad engagements and partnerships at the infrastructure (SUSE, Canonical, Redhat), consulting (Canonical, Mirantis) and ecosystem layers (enStratus) and beyond!
We have a proven track record of collaboration instead of forking/disrupting – a critical skill for this project reflected by our consistent actions to preserve the integrity of the project.
We have led the “make OpenStack deployable” campaign with substantial investments (open source Crowbar, white papers, documentation & cookbooks.
We have very long and consistent history with the project starting even before the first OpenStack summit in Austin.

Of course, we’re asking for you to consider for both of us; however, if you want to focus on just one then here’s the balance between us. Rob (bio) is a technologist with deep roots in cloud technology, data center operations and open source. Joseph is a business professional with experience new product introduction and enterprise delivery.

Not sure if you can vote? If you registered as an individual member then your name should be on the voting list. In that case, you can vote between 8/20 and 8/24.

I’m seeking Nominations for OpenStack Board

Posted on August 2, 2012 by Rob H

The OpenStack Foundation is currently seeking nominations for community representatives for the board and I am asking for you to nominate me for that position. Candidates are required to have ten (10) nominations to be considered for the election. To nominate, you can join and nominate from here. (I’m nominated, thanks!)

Rob Background:

As the OpenStack technology lead within Dell and long-time cloud deployer and developer, I made OpenStack deployability a top concern for Dell and the community. My leadership has changed the dialog around OpenStack to be balanced between Ops and Dev. I have also been pivotal in bringing open collaboration to OpenStack operations through our Crowbar project. Through my role at Dell, I am actively engaged with numerous field deployments and uniquely positioned to represent OpenStack’s developer, provider and enterprise user bases. I bring substantial process experience (Agile/Lean/CI) into my decision making. My focus will be on ensuring OpenStack is deployable and ready for use.

More? Ready my previous background post for previous OpenStack elections.

Crowbar 2.0 Design Summit Notes (+ open weekly meetings starting)

Posted on July 30, 2012 by Rob H

I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1^stCrowbar design summit. We had great discussions and even better participation.

The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.

I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.

In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.

Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.

These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.

Introduction

Contributors need to sign CLAs
We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
Due to time limits, we have to stop discussions and continue them online.
We are hoping to align Crowbar 2 beta and OpenStack Folsom release.

Online / Connected Mode

Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.

Install From Source

This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.

CMDB agnostic (decoupling Chef)

This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
The abstraction layer needs to have both import and export functions
Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
There are a lot of notes captured about this on the etherpad – I recommend reviewing them

Heterogeneous OS (bare metal provisioning and beyond)

This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
We discussed Crowbar’s late binding capability, but did not resolve how to reconcile that with these other projects.
Critical use cases to consider:
- an API for provisioning (not sure if it needs to be more than the current one)
- pick which Operating Systems go on which nodes (potentially with a rules engine?)
- inventory capabilities of available nodes (like ohai and factor) into a database
- inventory available operating systems

Ops “Late Binding” is critical best practice and key to Crowbar differentiation

Posted on July 25, 2012 by Rob H

From OSCON 2012: Portland's light rail understands good dev practice. Late binding is a programming term that I’ve commandeered for Crowbar’s DevOps design objectives.

We believe that late binding is a best practice for CloudOps.

Understanding this concept is turning out to be an important but confusing differentiation for Crowbar. We’ve effectively inverted the typical deploy pattern of building up a cloud from bare metal; instead, Crowbar allows you to build a cloud from the top down. The difference is critical – we delay hardware decisions until we have the information needed to do the correct configuration.

If Late Binding is still confusing, the concept is really very simple: “we hold off all work until you’ve decided how you want to setup your cloud.”

Late binding arose from our design objectives. We started the project with a few critical operational design objectives:

Treat the nodes and application layers as an interconnected system
Realize that application choices should drive down the entire application stack including BIOS, RAID and networking
Expect the entire system to be in a constantly changing so we must track state and avoid locked configurations.

We’d seen these objectives as core tenets in hyperscale operators who considered bare metal and network configuration to be an integral part of their application deployment. We know it is possible to build the system in layers that only (re)deploy once the application configuration is defined.

We have all this great interconnected automation! Why waste it by having to pre-stage the hardware or networking?

In cloud, late binding is known as “elastic computing” because you wait until you need resources to deploy. But running apps on cloud virtual machines is simple when compared to operating a physical infrastructure. In physical operations, RAID, BIOS and networking matter a lot because there are important and substantial variations between nodes. These differences are what drive late binding as a one of Crowbar’s core design principles.

Late Binding Visualized

Crowbar’s early twins: Cloudera Hadoop & OpenStack Essex

Posted on July 12, 2012 by Rob H

I’m proud to see my team announce the twin arrival of the Dell | Cloudera Apache Hadoop (Manager v4) and Dell OpenStack-Powered Cloud (Essex) solutions.

Not only are we simultaneously releasing both of these solutions, they reflect a significant acceleration in pace of delivery. Both solutions had beta support for their core technologies (Cloudera 4 & OpenStack Essex) when the components were released and we have dramatically reduced the lag from component RC to solution release compared to past (3.7 & Diablo) milestones.

As before, the core deployment logic of these open source based solutions was developed in the open on Crowbar’s github. You are invited to download and try these solutions yourself. For Dell solutions, we include validated reference architectures, hardware configuration extensions for Crowbar, services and support.

The latest versions of Hadoop and OpenStack represent great strides for both solutions. It’s great to be able have made them more deployable and faster to evaluate and manage.

Community Participation in Crowbar 2.0 Efforts

Posted on July 9, 2012 by Rob H

I’ve laid out the Dell Crowbar team‘s technical objectives for Crowbar 2.0. Now it’s time to lay out how we plan to involve the Crowbar community. We welcome community participation and contributions but, I’m not making any pretense here, that refactoring must also satisfy needs expressed by the “Dell OpenStack powered Cloud” and “Dell | Cloudera Apache Hadoop” solutions. There is a happy alignment between Dell’s needs and our community’s since both of these solutions are openly developed.

Our community played a substantial role in the definition of Crowbar 2.0. We have been watching and listening carefully and the majority of the changes will leverage recommendations and address concerns raised by Crowbar users.

The Dell Crowbar team has been working towards Crowbar 2 for quite a while. Networking and operating system code is already in place and our team began design requirements for the refactoring mid-June.

Community Communication

We have setup the following meeting for community engagement around the Crowbar refactoring. Our purpose is to enable development collaboration during the refactoring – these will not be “learning Crowbar” sessions.

Design Brief (2 hour meeting)
- Objective: Review by of design and refactoring plans in progress by Dell Crowbar team. Provide basis for community input and discussion during week.
- When: Monday 7/16 at 10am CDT
- Where: Online Session
- Who: Anyone (this is a listening session, not a discussion event)
Design Summit (4+2 hour meeting)
- Objective: Provide input and suggestions about design proposal (see preparation). Identify members of collaboration teams for refactoring effort.
- Preparation: Attend 7/16 Design Brief & Complete Expert Homework.
- When: Friday 7/20
- Where: OSCON 2012 in Portland (not on-site at OSCON). To facilitate face-to-face dialog, there will be no interactive online component.
- Who: 20 people, invitation only due to space limitations. Attendees representing DevOps tools, hosting companies, operating system vendors, Hadoop & OpenStack contributors
- Notes will be posted
Follow-up sessions
- Objective: Coordinate delivery of refactored code
- When: Likely weekly following Summit
- Where: Online (Skype or Google Hangout)
- Who: Crowbar Developers

Of course, we will continue to engage on the Crowbar list and Skype channels.

More on the 7/20 Design Summit

The purpose of the summit is strictly limited to discussing and organizing efforts around Crowbar 2.0 refactoring. To maintain a tight focus on this effort, we made the decision to limit the audience of this event by making it in-person and invitation only.

The Summit is not a broad discussion about Crowbar’s future with an open space format. We have a specific need and agenda – coordinating the development of 2.0. For that reason, we made to decision to restrict invitations to partners who are ready to commit substantial effort towards Crowbar development in the next 6 months. This is a technical session only: attendees must have first-hand experience coding Crowbar, be able to build the code and take on development or test commitments.

Because of the time limitations, we are holding a mandatory pre-summit brief about the design changes. This session will be open for the community and we welcome comments.

The agenda for the Design Summit is as follows:

8:30 (45 mins): Crowbar 2.0 Challenges & Objectives
9:15 (30 mins): Packaging and Delivery
9:45 (15 mins) Break
10:00 (30 mins): Networking
10:30 (30 mins): Data Representation and Framework
11:00 (30 mins): Next Steps, Future meetings & work assignments
11:30 official content ends & lunch
12:00 (30 mins) breakout sessions based on discussions above, TBD during summit
12:30 (30 mins) breakout sessions based on discussions above, TBD during summit
1:00 (30 mins) breakout sessions based on discussions above, TBD during summit
1:30 (30 mins) breakout sessions based on discussions above, TBD during summit
2:00 meeting ends

Post Script: A Community Design Summit

We have already begun discussing a true community summit with an open agenda. That summit will likely be a dedicated event in Austin, Texas with 2 (or more!) days of content. Please let me know if this is of interest to you so we can begin building the business case for hosting it!

Our Vision for Crowbar – taking steps towards closed loop operations

Posted on July 7, 2012 by Rob H

When Greg Althaus and I first proposed the project that would become Dell’s Crowbar, we had already learned first-hand that there was a significant gap in both the technologies and the processes for scale operations. Our team at Dell saw that the successful cloud data centers were treating their deployments as integrated systems (now called DevOps) in which configuration of many components where coordinated and orchestrated; however, these approaches feel short of the mark in our opinion. We wanted to create a truly integrated operational environment from the bare metal through the networking up to the applications and out to the operations tooling.

Our ultimate technical nirvana is to achieve closed-loop continuous deployments. We want to see applications that constantly optimize new code, deployment changes, quality, revenue and cost of operations. We could find parts but not a complete adequate foundation for this vision.

The business driver for Crowbar is system thinking around improved time to value and flexibility. While our technical vision is a long-term objective, we see very real short-term ROI. It does not matter if you are writing your own software or deploying applications; the faster you can move that code into production the sooner you get value from innovation. It is clear to us that the most successful technology companies have reorganized around speed to market and adapting to pace of change.

System flexibility & acceleration were key values when lean manufacturing revolution gave Dell a competitive advantage and it has proven even more critical in today’s dynamic technology innovation climate.

We hope that this post helps define a vision for Crowbar beyond the upcoming refactoring. We started the project with the idea that new tools meant we could take operations to a new level.

While that’s a great objective, we’re too pragmatic in delivery to rest on a broad objective. Let’s take a look at Crowbar’s concrete strengths and growth areas.

Key strength areas for Crowbar

Late binding – hardware and network configuration is held until software configuration is known. This is a huge system concept.
Dynamic and Integrated Networking – means that we treat networking as a 1st class citizen for ops (sort of like software defined networking but integrated into the application)
System Perspective – no Application is an island. You can’t optimize just the deployment, you need to consider hardware, software, networking and operations all together.
Bootstrapping (bare metal) – while not “rocket science” it takes a lot of careful effort to get this right in a way that is meaningful in a continuous operations environment.
Open Source / Open Development / Modular Design – this problem is simply too complex to solve alone. We need to get a much broader net of environments and thinking involved.

Continuing Areas of Leadership

Open / Lean / Incremental Architecture – these are core aspects of our approach. While we have a vision, we also are very open to ways that solve problems faster and more elegantly than we’d expected.
Continuous deployment – we think the release cycles are getting faster and the only way to survive is the build change into the foundation of operations.
Integrated networking – software defined networking is cool, but not enough. We need to have semantics that link applications, networks and infrastructure together.
Equilivent physical / virtual – we’re not saying that you won’t care if it’s physical or virtual (you should), we think that it should not impact your operations.
Scale / Hybrid – the key element to hybrid is scale and to hybrid is scale. The missing connection is being able to close the loop.
Closed loop deployment – seeking load management, code quality, profit, and cost of operations as factor in managed operations.

Crowbar 2.0 Objectives: Scalable, Heterogeneous, Flexible and Connected

Posted on July 6, 2012 by Rob H

The seeds for Crowbar 2.0 have been in the 1.x code base for a while and were recently accelerated by SuSE. With the Dell | Cloudera 4 Hadoop and Essex OpenStack-powered releases behind us, we will now be totally focused bringing these seeds to fruition in the next two months.

Getting the core Crowbar 2.0 changes working is not a major refactoring effort in calendar time; however, it will impact current Crowbar developers by changing improving the programming APIs. The Dell Crowbar team decided to treat this as a focused refactoring effort because several important changes are tightly coupled. We cannot solve them independently without causing a larger disruption.

All of the Crowbar 2.0 changes address issues and concerns raised in the community and are needed to support expanding of our OpenStack and Hadoop application deployments.

Our technical objective for Crowbar 2.0 is to simplify and streamline development efforts as the development and user community grows. We are seeking to:

simplify our use of Chef and eliminate Crowbar requirements in our Opscode Chef recipes.
1. reduce the initial effort required to leverage Crowbar
2. opens Crowbar to a broader audience (see Upstreaming)
provide heterogeneous / multiple operating system deployments. This enables:
1. multiple versions of the same OS running for upgrades
2. different operating systems operating simultaneously (and deal with heterogeneous packaging issues)
3. accommodation of no-agent systems like locked systems (e.g.: virtualization hosts) and switches (aka external entities)
4. UEFI booting in Sledgehammer
strengthen networking abstractions
1. allow networking configurations to be created dynamically (so that users are not locked into choices made before Crowbar deployment)
2. better manage connected operations
3. enable pull-from-source deployments that are ahead of (or forked from) available packages.
improvements in Crowbar’s core database and state machine to enable
1. larger scale concerns
2. controlled production migrations and upgrades
other important items
1. make documentation more coupled to current features and easier to maintain
2. upgrade to Rails 3 to simplify code base, security and performance
3. deepen automated test coverage and capabilities

Beyond these great technical targets, we want Crowbar 2.0 is to address barriers to adoption that have been raised by our community, customers and partners. We have been tracking concerns about the learning curve for adding barclamps, complexity of networking configuration and packaging into a single ISO.

We will kick off to community part of this effort with an online review on 7/16 (details).

PS: why a refactoring?

My team at Dell does not take on any refactoring changes lightly because they are disruptive to our community; however, a convergence of requirements has made it necessary to update several core components simultaneously. Specifically, we found that desired changes in networking, operating systems, packaging, configuration management, scale and hardware support all required interlocked changes. We have been bringing many of these changes into the code base in preparation and have reached a point where the next steps require changing Crowbar 1.0 semantics.

We are first and foremost an incremental architecture & lean development team – Crowbar 2.0 will have the smallest footprint needed to begin the transformations that are currently blocking us. There is significant room during and after the refactor for the community to shape Crowbar.

Stop the Presses! Austin OpenStack Meetup 7/12 features docs, bugs & cinder