To thrive, OpenStack must better balance dev, ops and business needs.

OpenStack has grown dramatically in many ways but we have failed to integrate development, operations and business communities in a balanced way.

My most urgent observation from Paris is that these three critical parts of the community are having vastly different dialogs about OpenStack.

Clouds DownAt the Conference, business people were talking were about core, stability and utility while the developers were talking about features, reorganizing and expanding projects. The operators, unfortunately segregated in a different location, were trying to figure out how to share best practices and tools.

Much of this structural divergence was intentional and should be (re)evaluated as we grow.

OpenStack events are split into distinct focus areas: the conference for business people, the summit for developers and specialized days for operators. While this design serves a purpose, the community needs to be taking extra steps to ensure communication. Without that communication, corporate sponsors and users may find it easier to solve problems inside their walls than outside in the community.

The risk is clear: vendors may find it easier to work on a fork where they have business and operational control than work within the community.

Inside the community, we are working to help resolve this challenge with several parallel efforts. As a community member, I challenge you to get involved in these efforts to ensure the project balances dev, biz and ops priorities.  As a board member, I feel it’s a leadership challenge to make sure these efforts converge and that’s one of the reasons I’ve been working on several of these efforts:

  • OpenStack Project Managers (was Hidden Influencers) across companies in the ecosystem are getting organized into their own team. Since these managers effectively direct the majority of OpenStack developers, this group will allow
  • DefCore Committee works to define a smaller subset of the overall OpenStack Project that will be required for vendors using the OpenStack trademark and logo. This helps the business community focus on interoperability and stability.
  • Technical leadership (TC) lead “Big Tent” concept aligns with DefCore work and attempts to create a stable base platform while making it easier for new projects to enter the ecosystem. I’ve got a lot to say about this, but frankly, without safeguards, this scares people in the ops and business communities.
  • An operations “ready state” baseline keeps the community from being able to share best practices – this has become a pressing need.  I’d like to suggest as OpenCrowbar an outside of OpenStack a way to help provide an ops neutral common starting point. Having the OpenStack developer community attempting to create an installer using OpenStack has proven a significant distraction and only further distances operators from the community.

We need to get past seeing the project primarily as a technology platform.  Infrastructure software has to deliver value as an operational tool for enterprises.  For OpenStack to thrive, we must make sure the needs of all constituents (Dev, Biz, Ops) are being addressed.

Self-Exposure: Hidden Influencers become OpenStack Product Working Group

Warning to OpenStack PMs: If you are not actively involved in this effort then you (and your teams) will be left behind!

ManagersThe Hidden Influencers (now called “OpenStack Product Working Group”) had a GREAT and PRODUCTIVE session at the OpenStack (full notes):

  1. Named the group!  OpenStack Product Working Group (now, that’s clarity in marketing) [note: I was incorrect saying “Product Managers” earlier].
  2. Agreed to use the mailing list for communication.
  3. Committed to a face-to-face mid-cycle meetup (likely in South Bay)
  4. Output from the meetup will be STRATEGIC DIRECTION doc to board (similar but broader than “Win the Enterprise”)
  5. Regular meeting schedule – like developers but likely voice interactive instead of IRC.  Stefano Maffulli is leading.

PMs starting this group already direct the work for a super majority (>66%) of active contributors.

The primary mission for the group is to collaborate and communicate around development priorities so that we can ensure that project commitments get met.

It was recognized that the project technical leads are already strapped coordinating release and technical objectives.  Further, the product managers are already but independently engaged in setting strategic direction, we cannot rely on existing OpenStack technical leadership to have the bandwidth.

This effort will succeed to the extent that we can help the broader community tied in and focus development effort back to dollars for the people paying for those developers.  In my book, that’s what product managers are supposed to do.  Hopefully, getting this group organized will help surface that discussion.

This is a big challenge considering that these product managers have to balance corporate, shared project and individual developers’ requirements.  Overall, I think Allison Randall summarized our objectives best: “we’re herding cats in the same direction.”

Leveling OpenStack’s Big Tent: is OpenStack a product, platform or suite?

Question of the day: What should OpenStack do with all those eager contributors?  Does that mean expanding features or focusing on a core?

IMG_20141108_101906In the last few months, the OpenStack technical leadership (Sean Dague, Monty Taylor) has been introducing two interconnected concepts: big tent and levels.

  • Big tent means expanding the number of projects to accommodate more diversity (both in breath and depth) in the official OpenStack universe.  This change accommodates the growth of the community.
  • Levels is a structured approach to limiting integration dependencies between projects.  Some OpenStack components are highly interdependent and foundational (Keystone, Nova, Glance, Cinder) while others are primarily consumers (Horizon, Saraha) of lower level projects.

These two concepts are connected because we must address integration challenges that make it increasingly difficult to make changes within the code base.  If we substantially expand the code base with big tent then we need to make matching changes to streamline integration efforts.  The levels proposal reflects a narrower scope at the base than we currently use.

By combining big tent and levels, we are simultaneously growing and shrinking: we grow the community and scope while we shrink the integration points.  This balance may be essential to accommodate OpenStack’s growth.

UNIVERSALLY, the business OpenStack community who wants OpenStack to be a product.  Yet, what’s included in that product is unclear.

Expanding OpenStack projects tends to turn us into a suite of loosely connected functions rather than a single integrated platform with an ecosystem.  Either approach is viable, but it’s not possible to be both simultaneously.

On a cautionary note, there’s an anti-Big Tent position I heard expressed at the Paris Conference.  It’s goes like this: until vendors start generating revenue from the foundation components to pay for developer salaries; expanding the scope of OpenStack is uninteresting.

Recent DefCore changes also reflect the Big Tent thinking by adding component and platform levels.  This change was an important and critical compromise to match real-world use patterns by companies like Swiftstack (Object), DreamHost (Compute+Ceph), Piston (Compute+Ceph) and others; however, it creates the need to explain “which OpenStack” these companies are using.

I believe we have addressed interoperability in this change.  It remains to be seen if OpenStack vendors will choose to offer the broader platform or limit to themselves to individual components.  If vendors chase the components over platform then OpenStack becomes a suite of loosely connect products.  It’s ultimately a customer and market decision.

It’s not too late to influence these discussions!  I’m very interested in hearing from people in the community which direction they think the project should go.

API Driven Metal = OpenCrowbar + Chef Provisioning

The OpenCrowbar community created a Chef-Provisioning driver that allows you to quickly build hardware clusters using Chef cookbooks.

2012-08-05_14-13-18_310When we started using Chef in 2011, there was a distinct gap around bootstrapping systems.  The platform did a great job of automation and even connecting services together (via the Search anti-pattern, see below) but lacked a way to build the initial clusters automatically.

The current answer to this problem from Chef is refreshingly simply: a cookbook API extension called Chef Provisioning.  This approach uses the regular Chef DSL in recipes to create request and bind a cluster into Chef.  Basically, the code simply builds an array of nodes using an API that creates the nodes if they are missing from the array in the code.  Specifically, when a node is missing from the array, Chef calls out to create the node in an external system.

For clouds, that means using the API to request a server and then inject credentials for Chef management.  It’s trickier for physical gear because you cannot just make a server in the configuration you need it in.  Physical systems must first be discovered and profiled to ready state: the system must know how many NICs and disk drives are available to correctly configure the hardware prior to laying down the Operating System.

Consequently, Chef Provisioning automation is more about reallocation of existing discovered physical assets to Chef.  That’s exactly the approach the OpenCrowbar team took for our Chef Provisioning driver.

OpenCrowbar interacts with Chef Provisioning by pulling nodes from the System deployment into a Chef Provisioning deployment.  That action then allows the API client to request specific configurations like Operating System or network that need to be setup for Chef to execute.  Once these requests are made, Crowbar will simply run its normal annealing processes to ready state and then injects the Chef credentials.  Chef waits until the work queue is empty and then takes over management of the asset.  When Chef is finished, Crowbar can be instructed to reconfigure the node back to a base state.

Does that sound simple?  It is simple because the Crowbar APIs match the Chef needs very cleanly.

It’s worth noting that this integration is a great test of the OpenCrowbar API design.  Over the last two years, we’ve evolved the API to make it more final result focused.  Late binding is a critical concept for the project and the APIs reflect that objective.  For Chef Provisioning, we allow the integration to focus on simple requests like “give me a node then put this O/S on the node and go.”  Crowbar has the logic needed to figure out how to accomplish those objectives without much additional instruction.

Bonus Side Note: Why Search can become an anti-pattern?  

Search is an incredibly powerful feature in Chef that allows cross-role and cross-node integration; unfortunately, it’s also very difficult to maintain as complexity and contributor counts grow.  The reason is that search creates “forward dependencies” in the scripts that require operators creating data to be aware of downstream, hidden consumers.  High Availability (HA) is a clear example.  If I add a new “cluster database” role to the system then it is very likely to return multiple results for database searches.  That’s excellent until I learn that my scripts have coded search to assume that we only return one result for database lookups.  It’s very hard to find these errors since the searches are decoupled and downstream of the database cookbook.  Ultimately, the community had to advise against embedded search for shared cookbooks

Unicorn captured! Unpacking multi-node OpenStack Juno from ready state.

OpenCrowbar Packstack install demonstrates that abstracting hardware to ready state smooths install process.  It’s a working balance: Crowbar gets the hardware, O/S & networking right while Packstack takes care of OpenStack.

LAYERSThe Crowbar team produced the first open OpenStack installer back in 2011 and it’s been frustrating to watch the community fragment around building a consistent operational model.  This is not an OpenStack specific problem, but I think it’s exaggerated in a crowded ecosystem.

When I step back from that experience, I see an industry wide pattern of struggle to create scale deployments patterns that can be reused.  Trying to make hardware uniform is unicorn hunting, so we need to create software abstractions.  That’s exactly why IaaS is powerful and the critical realization behind the OpenCrowbar approach to physical ready state.

So what has our team created?  It’s not another OpenStack installer – we just made the existing one easier to use.

We build up a ready state infrastructure that makes it fast and repeatable to use Packstack, one of the leading open OpenStack installers.  OpenCrowbar can do the same for the OpenStack Chef cookbooks or Salt Formula.   It can even use Saltstack, Chef and Puppet together (which we do for the Packstack work)!  Plus we can do it on multiple vendors hardware and with different operating systems.   Plus we build the correct networks!

For now, the integration is available as a private beta (inquiries welcome!) because our team is not in the OpenStack support business – we are in the “get scale systems to ready state and integrate” business.  We are very excited to work with people who want to take this type of functionality to the next level and build truly repeatable, robust and upgradable application deployments.

Tweaking DefCore to subdivide OpenStack platform (proposal for review)

The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20.  Comments and suggest welcome!

OpenStack in PartsFor nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability.  At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models).

Outside of my role as DefCore chair, I see the OpenStack community asking itself an existential question: “are we one platform or a suite of projects?”  I’m having trouble believing “we are both” is an acceptable answer.

During the post-meeting review, Mark Collier drafted a Foundation supported recommendation that basically creates an additional core tier without changing the fundamental capabilities & designated code concepts.  This proposal has been reviewed by the DefCore committee (but not formally approved in a meeting).

The original DefCore proposed capabilities set becomes the “platform” level while capability subsets are called “components.”  We are considering two initial components, Compute & Object, and both are included in the platform (see illustration below).  The approach leaves the door open for new core component to exist both under and outside of the platform umbrella.

In the proposal, OpenStack vendors who meet either component or platform requirements can qualify for the “OpenStack Powered” logo; however, vendors using the only a component (instead of the full platform) will have more restrictive marks and limitations about how they can use the term OpenStack.

This approach addresses the “is Swift required?” question.  For platform, Swift capabilities will be required; however, vendors will be able to implement the Compute component without Swift and implement the Object component without Nova/Glance/Cinder.

It’s important to note that there is only one yard stick for components or the platform: the capabilities groups and designed code defined by the DefCore process.  From that perspective, OpenStack is one consistent thing.  This change allows vendors to choose sub-components if that serves their business objectives.

It’s up to the community to prove the platform value of all those sub-components working together.

OpenStack Goldilocks’ Syndrome: three questions to help us find our bearings

Goldilocks Atlas

Action: Please join Stefano. Allison, Sean and me in Paris on Monday, November 3rd, in the afternoon (schedule link)

If wishes were fishes, OpenStack’s rapid developer and user rise would include graceful process and commercial transitions too.  As a Foundation board member, it’s my responsibility to help ensure that we’re building a sustainable ecosystem for the project.  That’s a Goldilock’s challenge because adding either too much or too little controls and process will harm the project.

In discussions with the community, that challenge seems to breaks down into three key questions:

After last summit, a few of us started a dialog around Hidden Influencers that helps to frame these questions in an actionable way.  Now, it’s time for us to come together and talk in Paris in the hallways and specifically on Monday, November 3rd, in the afternoon (schedule link).   From there, we’ll figure out about next steps using these three questions as a baseline.

If you’ve got opinions about these questions, don’t wait for Paris!  I’d love to start the discussion here in the comments, on twitter (@zehicle), by phone, with email or via carrier pidgins.