OpenStack DefCore Review [interview by Jason Baker]

I was interviewed about DefCore by Jason Baker of Red Hat as part of my participation in OSCON Open Cloud Day (speaking Monday 11:30am).  This is just one of fifteen in a series of speaker interviews covering everything from Docker to Girls in Tech.

This interview serves as a good review of DefCore so I’m reposting it here:

Without giving away too much, what are you discussing at OSCON? What drove the need for DefCore?

I’m going to walk through the impact of the OpenStack DefCore process in real terms for users and operators. I’ll talk about how the process works and how we hope it will make OpenStack users’ lives better. Our goal is to take steps towards interoperability between clouds.

DefCore grew out of a need to answer hard and high stakes questions around OpenStack. Questions like “is Swift required?” and “which parts of OpenStack do I have to ship?” have very serious implications for the OpenStack ecosystem.

It was impossible to reach consensus about these questions in regular board meetings so DefCore stepped back to base principles. We’ve been building up a process that helps us make decisions in a transparent way. That’s very important in an open source community because contributors and users want ground rules for engagement.

It seems like there has been a lot of discussion over the OpenStack listservs over what DefCore is and what it isn’t. What’s your definition?

First, DefCore applies only to commercial uses of the OpenStack name. There are different rules for the integrated code base and community activity. That’s the place of most confusion.

Basically, DefCore establishes the required minimum feature set for OpenStack products.

The longer version includes that it’s a board managed process that’s designed to be very transparent and objective. The long-term objective is to ensure that OpenStack clouds are interoperable in a measurable way and that we also encourage our vendor ecosystem to keep participating in upstream development and creation of tests.

A final important component of DefCore is that we are defending the OpenStack brand. While we want a vibrant ecosystem of vendors, we must first have a community that knows what OpenStack is and trusts that companies using our brand comply with a meaningful baseline.

Are there other open source projects out there using “designated sections” of code to define their product, or is this concept unique to OpenStack? What lessons do you think can be learned from other projects’ control (or lack thereof) of what must be included to retain the use of the project’s name?

I’m not aware of other projects using those exact words. We picked up ‘designated sections’ because the community felt that ‘plug-ins’ and ‘modules’ were too limited and generic. I think the term can be confusing, but it was the best we found.

If you consider designated sections to be plug-ins or modules, then there are other projects with similar concepts. Many successful open source projects (Eclipse, Linux, Samba) are functionally frameworks that have very robust extensibility. These projects encourage people to use their code base creatively and then give back some (not all) of their lessons learned in the form of code contributes. If the scope returning value to upstream is too broad then sharing back can become onerous and forking ensues.

All projects must work to find the right balance between collaborative areas (which have community overhead to join) and independent modules (which allow small teams to move quickly). From that perspective, I think the concept is very aligned with good engineering design principles.

The key goal is to help the technical and vendor communities know where it’s safe to offer alternatives and where they are expected to work in the upstream. In my opinion, designated sections foster innovation because they allow people to try new ideas and to target specialized use cases without having to fight about which parts get upstreamed.

What is it like to serve as a community elected OpenStack board member? Are there interests you hope to serve that are difference from the corporate board spots, or is that distinction even noticeable in practice?

It’s been like trying to row a dragon boat down class III rapids. There are a lot of people with oars in the water but we’re neither all rowing together nor able to fight the current. I do think the community members represent different interests than the sponsored seats but I also think the TC/board seats are different too. Each board member brings a distinct perspective based on their experience and interests. While those perspectives are shaped by their employment, I’m very happy to say that I do not see their corporate affiliation as a factor in their actions or decisions. I can think of specific cases where I’ve seen the opposite: board members have acted outside of their affiliation.

When you look back at how OpenStack has grown and developed over the past four years, what has been your biggest surprise?

Honestly, I’m surprised about how many wheels we’ve had to re-invent. I don’t know if it’s cultural or truly a need created by the size and scope of the project, but it seems like we’ve had to (re)create things that we could have leveraged.

What are you most excited about for the “K” release of OpenStack?

The addition of platform services like database as a Service, DNS as a Service, Firewall as a Service. I think these IaaS “adjacent” services are essential to completing the cloud infrastructure story.

Any final thoughts?

In DefCore, we’ve moved slowly and deliberately to ensure people have a chance to participate. We’ve also pushed some problems into the future so that we could resolve the central issues first. We need to community to speak up (either for or against) in order for us to accelerate: silence means we must pause for more input.

OpenStack DefCore Update & 7/16 Community Reviews

The OpenStack Board effort to define “what is core” for commercial use (aka DefCore).  I have blogged extensively about this topic and rely on you to review that material because this post focuses on updates from recent activity.

First, Please Join Our Community DefCore Reviews on 7/16!

We’re reviewing the current DefCore process & timeline then talking about the Advisory Havana Capabilities Matrix (decoder).

To support global access, there are TWO meetings (both will also be recorded):

  1. July 16, 8 am PDT / 1500 UTC
  2. July 16, 6 pm PDT / 0100 UTC July 17

Note: I’m presenting about DefCore at OSCON on 7/21 at 11:30!

We want community input!  The Board is going discuss and, hopefully, approve the matrix at our next meeting on 7/22.  After that, the Board will be focused on defining Designated Sections for Havana and Ice House (the TC is not owning that as previously expected).

The DefCore process is gaining momentum.  We’ve reached the point where there are tangible (yet still non-binding) results to review.  The Refstack efforts to collect community test results from running clouds is underway: the Core Matrix will be fed into Refstack to validate against the DefCore required capabilities.

Now is the time to make adjustments and corrections!  

In the next few months, we’re going to be locking in more and more of the process as we get ready to make it part of the OpenStack by-laws (see bottom of minutes).

If you cannot make these meetings, we still want to hear from you!  The most direct way to engage is via the DefCore mailing list but 1×1 email works too!  Your input is import to us!

Understanding OpenStack Designated Code Sections – Three critical questions

A collaboration with Michael Still (TC Member from Rackspace) & Joshua McKenty and Cross posted by Rackspace.

After nearly a year of discussion, the OpenStack board launched the DefCore process with 10 principles that set us on path towards a validated interoperability standard.   We created the concept of “designated sections” to address concerns that using API tests to determine core would undermine commercial and community investment in a working, shared upstream implementation.

Designated SectionsDesignated sections provides the “you must include this” part of the core definition.  Having common code as part of core is a central part of how DefCore is driving OpenStack operability.

So, why do we need this?

From our very formation, OpenStack has valued implementation over specification; consequently, there is a fairly strong community bias to ensure contributions are upstreamed. This bias is codified into the very structure of the GNU General Public License (GPL) but intentionally missing in the Apache Public License (APL v2) that OpenStack follows.  The choice of Apache2 was important for OpenStack to attract commercial interests, who often consider GPL a “poison pill” because of the upstream requirements.

Nothing in the Apache license requires consumers of the code to share their changes; however, the OpenStack foundation does have control of how the OpenStack™ brand is used.   Thus it’s possible for someone to fork and reuse OpenStack code without permission, but they cannot called it “OpenStack” code.  This restriction only has strength if the OpenStack brand has value (protecting that value is the primary duty of the Foundation).

This intersection between License and Brand is the essence of why the Board has created the DefCore process.

Ok, how are we going to pick the designated code?

Figuring out which code should be designated is highly project specific and ultimately subjective; however, it’s also important to the community that we have a consistent and predictable strategy.  While the work falls to the project technical leads (with ratification by the Technical Committee), the DefCore and Technical committees worked together to define a set of principles to guide the selection.

This Technical Committee resolution formally approves the general selection principles for “designated sections” of code, as part of the DefCore effort.  We’ve taken the liberty to create a graphical representation (above) that visualizes this table using white for designated and black for non-designated sections.  We’ve also included the DefCore principle of having an official “reference implementation.”

Here is the text from the resolution presented as a table:

Should be DESIGNATED: Should NOT be DESIGNATED:
  • code provides the project external REST API, or
  • code is shared and provides common functionality for all options, or
  • code implements logic that is critical for cross-platform operation
  • code interfaces to vendor-specific functions, or
  • project design explicitly intended this section to be replaceable, or
  • code extends the project external REST API in a new or different way, or
  • code is being deprecated

The resolution includes the expectation that “code that is not clearly designated is assumed to be designated unless determined otherwise. The default assumption will be to consider code designated.”

This definition is a starting point.  Our next step is to apply these rules to projects and make sure that they provide meaningful results.

Wow, isn’t that a lot of code?

Not really.  Its important to remember that designated sections alone do not define core: the must-pass tests are also a critical component.   Consequently, designated code in projects that do not have must-pass tests is not actually required for OpenStack licensed implementation.

How DefCore is going to change your world: three advisory cases

The first release of the DefCore Core Capabilities Matrix (DCCM) was revealed at the Atlanta summit.  At the Summit, Joshua and I had a session which examined what this means for the various members of the OpenStack community.   This rather lengthy post reviews the same advisory material.

DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled “OpenStack.”

As a refresher, there are three uses of the OpenStack mark:

  • Community: The non-commercial use of the word OpenStack by the OpenStack community to describe themselves and their activities. (like community tweets, meetups and blog posts)
  • Code: The non-commercial use of the word OpenStack to refer to components of the OpenStack framework integrated release (as in OpenStack Compute Project Nova)
  • Commerce: The commercial use of the word OpenStack to refer to products and services as governed by the OpenStack trademark policy. This is where DefCore is focused.

In the DefCore/Commerce use, properly licensed vendors have three basic obligations to meet:is_it_openstack_graphic

  1. Pass the required Refstack tests for the capabilities matrix in the version of OpenStack that they use. Vendors are expected (not required) to share their results.
  2. Run and include the “designated sections” of code for the OpenStack components that you include.
  3. Other basic obligations in their license agreement like being a currently paid up corporate sponsor or foundation member, etc.

If they meet these conditions, vendors can use the OpenStack mark in their product names and descriptions.

Enough preamble!  Let’s see the three Advisory Cases

MANDATORY DISCLAIMER: These conditions apply to fictional public, private and client use cases.  Any resemblence to actual companies is a function of the need to describe real use-cases.  These cases are advisory for illustration use only and are not to be considered definitive guidenance because DefCore is still evolving.

Public Cloud: Service Provider “BananaCloud”

A popular public cloud operator, BananaCloud has been offering OpenStack-based IaaS since the Diablo release. However, they don’t use the Keystone component. Since they also offer traditional colocation and managed services, they have an existing identity management system that they use. They made a similar choice for Horizon in favor of their own cloud portal.

banana

  1. They use Nova a custom scheduler and pass all the Nova tests. This is the simplest case since they use code and pass the tests.
  2. In the Havana DCCM, the Keystone capabilities are a must pass test; however, there are no designated sections of code for Keystone. So BananaCloud must implement a Keystone-compatible API on their IaaS environment (an effort they had underway already) that will pass Refstack, and they’re good to go.
  3. There are no must pass tests for Horizon so they have no requirements to include those features or code. They can still be OpenStack without Horizon.
  4. There are no must pass tests for Trove so they have no brand requirements to include those features or code so it’s not a brand issue; however, by using Trove and promoting its use, they increase the likelihood of its capabilities becoming must pass features.

BananaCloud also offers some advanced OpenStack capabilities, including Marconi and Trove. Since there are no must pass capabilities from these components in the Havana DCCM, it has no impact on their offering additional services. DefCore defines the minimum requirements and encourages vendors to share their full test results of additional capabilities because that is how OpenStack identifies new must pass candidates.

Note: The DefCore DCCM is advisory for the Havana release, so if BananaCloud is late getting their Keystone-compatibility work done there won’t be any commercial impact. But it will be a binding part of the trademark license agreement by the Juno release, which is only 6 months away.

Private Cloud: SpRocket Small-Business OpenStack Software

SpRocket is a new OpenStack software vendor, specializing in selling a Windows-powered version of OpenStack with tight integration to Sharepoint and AzurePack. In their feature set, they only need part of Nova and provide an alternative object storage to Swift that implements a version of the Swift API. They do use Heat as part of their implementation to set up applications back ended by Sharepoint and AzurePack.

  1. sprocketFor Nova, they already use the code and have already implemented all required capabilities except for the key-store. To comply with the DefCore requirement, they must enable the key-store capability.
  2. While their implementation of Swift passes the tests, We are still working to resolve the final disposition of Swift so there are several possible outcomes:
    1. If Swift is 0% designated then they are OK (that’s illustrated here)
    2. If Swift is 100% designated then they cannot claim to be OpenStack.
    3. If Swift is partially designated then they have to adapt their deploy to include the required code.
  3. Their use of Heat is encouraged since it is an integrated project; however, there are no required capabilities and does not influence their ability to use the mark.
  4. They use the trunk version of Windows HyperV drivers which are not designated and have no specific tests.

Ecosystem Client: “Mist” OpenStack-consuming Client Library

Mist is a client library for load+kt programmers working on applications using the OpenStack APIs. While it’s an open source project, there are many commercial applications that use the library for their applications. Unlike a “pure” OpenStack program, it also supports other Cloud APIs.

Since the Mist library does not ship or implement the OpenStack code base, the DefCore process does not apply to their effort; however, there are several important intersections with Mist and OpenStack and Core.

  • First, it is very important for the DefCore process that Mist map their use of the OpenStack APIs to the capabilities matrix. They are asked to help with this process because they are the best group to answer the “works with clients” criteria.
  • Second, if there are APIs used by Mist that are not currently tested then the OpenStack community should work with the Mist community to close those test gaps.
  • Third, if Mist relies on an API that is not must-pass they are encouraged to help identify those capabilities as core candidates in the community.

DefCore Capabilities Scorecard & Core Identification Matrix [REVIEW TIME!]

Attribution Note: This post was collaboratively edited by members of the DefCore committee and cross posted with DefCore co-chair Joshua McKenty of Piston Cloud.

DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating minimum standards minimum standards for products labeled “OpenStack.”

The OpenStack Core definition process (aka DefCore) is moving steadily along and we’re looking for feedback from community as we move into the next phase.  Until now, we’ve been mostly working out principles, criteria and processes that we will use to answer “what is core” in OpenStack.  Now we are applying those processes and actually picking which capabilities will be used to identify Core.

TL;DR! We are now RUNNING WITH SCISSORS because we’ve reached the point there you can review early thoughts about what’s going to be considered Core (and what’s not).  We now have a tangible draft list for community review.

capabilities_selectionWhile you will want to jump directly to the review draft matrix (red means needs input), it is important to understand how we got here because that’s how DefCore will resolve the inevitable conflicts.  The very nature of defining core means that we have to say “not in” to a lot of capabilities.  Since community consensus seems to favor a “small core” in principle, that means many capabilities that people consider important are not included.

The Core Capabilities Matrix attempts to find the right balance between quantitative detail and too much information.  Each row represents an “OpenStack Capability” that is reflected by one or more individual tests.  We scored each capability equally on a 100 point scale using 12 different criteria.  These criteria were selected to respect different viewpoints and needs of the community ranging from popularity, technical longevity and quality of documentation.

While we’ve made the process more analytical, there’s still room for judgement.  Eventually, we expect to weight some criteria more heavily than others.  We will also be adjusting the score cut-off.  Our goal is not to create a perfect evaluation tool – it should inform the board and facilitate discussion.  In practice, we’ve found this approach to bring needed objectivity to the selection process.

So, where does this take us?  The first matrix is, by design, old news.  We focused on getting a score for Havana to give us a stable and known quantity; however, much of that effort will translate forward.  Using Havana as the base, we are hoping to score Ice House ninety days after the Juno summit and score Juno at K Summit in Paris.

These are ambitious goals and there are challenges ahead of us.  Since every journey starts with small steps, we’ve put ourselves on feet the path while keeping our eyes on the horizon.

Specifically, we know there are gaps in OpenStack test coverage.  Important capabilities do not have tests and will not be included.  Further, starting with a small core means that OpenStack will be enforcing an interoperability target that is relatively permissive and minimal.  Universally, the community has expressed that including short-term or incomplete items is undesirable.  It’s vital to remember that we are looking for evolutionary progress that accelerates our developer, user, operator and ecosystem communities.

How can you get involved?  We are looking for community feedback on the DefCore list on this 1st pass – we do not think we have the scores 100% right.  Of course, we’re happy to hear from you however you want to engage: in intentionally named the committed “defcore” to make it easier to cross-reference and search.

We will eventually use Refstack to collect voting/feedback on capabilities directly from OpenStack community members.

DefCore Core Capabilities Selection Criteria SIMPLIFIED -> how we are picking Core

I’ve posted about the early DefCore core capabilities selection process before and we’ve put them into application and discussed them with the community.  The feedback was simple: tl;dr.  You’ve got the right direction but make it simpler!

So we pulled the 12 criteria into four primary categories:

  1. Usage: the capability is widely used (Refstack will collect data)
  2. Direction: the capability advances OpenStack technically
  3. Community: the capability builds the OpenStack community experience
  4. System: the capability integrates with other parts of OpenStack

These categories summarize critical values that we want in OpenStack and so make sense to be the primary factors used when we select core capabilities.  While we strive to make the DefCore process objective and quantitive, we must recognize that these choices drive community behavior.

With this perspective, let’s review the selection criteria.  To make it easier to cross reference, we’ve given each criteria a shortened name:

Shows Proven Usage

  • Widely Deployed” Candidates are widely deployed capabilities.  We favor capabilities that are supported by multiple public cloud providers and private cloud products.
  • “Used by Tools” Candidates are widely used capabilities:Should be included if supported by common tools (RightScale, Scalr, CloudForms, …)
  • Used by Clients” Candidates are widely used capabilities: Should be included if part of common libraries (Fog, Apache jclouds, etc)
Aligns with Technical Direction
  • Future Direction” Should reflect future technical direction (from the project technical teams and the TC) and help manage deprecated capabilities.
  • “Stable” Test is required stable for >2 releases because we don’t want core capabilities that do not have dependable APIs.
  • “Complete” Where the code being tested has a designated area of alternate implementation (extension framework) as per the Core Principles, there should be parity in capability tested across extension implementations.  This also implies that the capability test is not configuration specific or locked to non-open technology.

Plays Well with Others

  • “Discoverable” Capability being tested is Service Discoverable (can be found in Keystone and via service introspection)
  • “Doc’d” Should be well documented, particularly the expected behavior.  This can be a very subjective measure and we expect to refine this definition over time.
  • “Core in Last Release”  A test that is a must-pass test should stay a must-pass test.  This make makes core capabilities sticky release per release.  Leaving Core is disruptive to the ecosystem

Takes a System View

  • Foundation” Test capabilities that are required by other must-pass tests and/or depended on by many other capabilities
  • “Atomic” Capabilities is unique and cannot be build out of other must-pass capabilities
  • “Proximity” (sometimes called a Test Cluster) selects for Capabilities that are related to Core Capabilities.  This helps ensure that related capabilities are managed together.

Note: The 13th “non-admin” criteria has been removed because Admin APIs cannot be used for interoperability and cannot be considered Core.

Mayflies and Dinosaurs (extending Puppies and Cattle)

Dont Be FragileJosh McKenty and I were discussing the common misconception of the “Puppies and Cattle” analogy. His position is not anti-puppy! He believes puppies are sometimes unavoidable and should be isolated into portable containers (VMs) so they can be shuffled around seamlessly. His more provocative point is that we want our underlying infrastructure to be cattle so it remains highly elastic and flexible. More cattle means a more resilient system. To me, this is a fundamental CloudOps design objective.

We realized that the perfect cloud infrastructure would structurally discourage the creation of puppies.

Imagine a cloud in which servers were automatically decommissioned after a week of use. In a sort of anti-SLA, any VM running for more than 168 hours would be (gracefully) terminated. This would force a constant churn of resources within the infrastructure that enables true cattle-like management. This cloud would be able to very gracefully rebalance load and handle disruptive management operations because the workloads are designed for the churn.

We called these servers mayflies due to their limited life span.

While this approach requires a high degree of automation, the most successful cloud operators I have met are effectively building workloads with this requirement. If we require application workloads to be elastic and fault-resilient then we have a much higher degree of flexibility with the underlying infrastructure. I’ve seen this in practice with several OpenStack clouds: operators with helped applications deploy using automation were able to decommission “old” clouds much more gracefully. They effectively turned their entire cloud into a cow. Sadly, the ones without that investment puppified™ the ops infrastructure and created a much more brittle environment.

The opposite of a mayfly is the dinosaur: a server that is so brittle and locked that the slightest disturbance wipes out everything it touches.

Dinosaurs are puppies grown into a T-Rex with rows of massive razor sharp teeth and tiny manicured hands. These are systems that are so unique and historical that there’s no way to recreate them if there’s a failure. The original maintainers exit happy hour was celebrated by people who were laid-off two CEOs ago. The impact of dinosaurs goes beyond their operational risk; they are typically impossible to extend or maintain and, consequently, ossify other server around them. This type of server drains elasticity from your ops team.

Puppies do not grow up to become dogs, they become dinosaurs.

It’s a classic lean adage to do hard things more frequently. Perhaps it’s time to start creating mayflies in your ops infrastructure.