Anyone else find picking OpenStack summit sessions overwhelming?!

Choices!The scope and diversity of sessions for the upcoming OpenStack conference in Atlanta are simply overwhelming.   As a board member, that’s a positive sign of our success as a community; however, it’s also a challenge as we attempt to pick topics.  That’s why we turn to you, the OpenStack community, to help sift and select the content.

Even if you are not attending, we need your help in selection!  Content from the summits is archived and has a much larger outside of the two conference days.  You’re voice matters for the community.

While it’s a simple matter to ask you to vote for my DefCore presentation and some excellent ones from my peers at Dell, I’d also like share some of my thoughts about general trends I saw illustrated by the offerings:

  • Swift has a strong following as a solution outside of other products
  • Ceph seems to emerging as a critical component with Cinder
  • Neutron has breath but not depth in practice
  • HA and Upgrades remain challenges
  • We are starting to see specializations emerge (like NFV)
  • OpenStack case studies!  There are many – some of uncertain utility as references
  • Some community members and companies are super prolific in submitting sessions.  Perhaps these sessions are all great but on first pass it seems out of balance.
  • Vendor pitch or conference session?  You often get both in the same session.  We’re still not certain how to balance this.

The number and diversity of sessions is staggering – we need your help on voting.

We also need you to be part of the dialog about the conference and summits to make sure they are meeting the community needs.  My review of the sessions indicates that we are trying to serve many different audiences in a very limited time window.  I’m interested in hearing yours!  Review some sessions and let me know.

Spinning up OpenStack “DefCore” Committee by spotting elephants

ElephantsThis week, Joshua McKenty, me and a handful of interested individuals (board member Eileen Evans included) met to start organizing the DefCore* Committee.  This standing committee was established by an OpenStack Foundation resolution just before the Hong Kong Summit.  Joshua and I were nominated as co-chairs (and about half the board volunteered to be members).    This action was an immediate result of the unanimous passage of the 10 Principles that I was driving in the DefCore “Spider” cycle.

We heard overwhelmingly at the Hong Kong summit that defining core should be a major focus for the Board.

The good news is that we’re doing exactly that in CoreDef.  Our challenge is to go quickly but not get ahead of community consensus.  So far, that means eating the proverbial elephant in small bites and intentionally deferring topics where we cannot find consensus.

This meeting was primarily about Joshua and I figuring out how to drive DefCore quickly (go fast!) without exceeding the communities ability to review and discuss (build consensus!).  While we had future-post-worthy conceptual discussions, we had a substantial agenda of get-it-done in front of us too.

Here’s a summary of key outcomes from the meeting:

1)      We’ve established a tentative schedule for our first two meetings (12/3 and 12/17).

  1. We’ve started building agendas for these two meetings.
  2. We’ve also established rules for governance that include members to do homework!

2)      We’ve agreed it’s important to present a bylaws change to the committee for consideration by the board.

  1. This change is to address confusion around how core is defined and possibly move towards the bylaws defining a core process not a list of core projects.
  2. This is on an accelerated track because we’d like to include it with the Community Board Member elections.

3)      We’ve broken DefCore into clear “cycles” so we can be clearer about concrete objectives and what items are out of scope for a cycle.  We’re using names to designate cycles for clarity.

  1. The first cycle, “Spider,” was about finding the connections between core issues and defining a process to resolve the tension in those connections.
  2. This cycle, “Elephant,” is about breaking the Core definition into
  3. The next cycle(s) will be named when we get there.  For now, they are all “Future”
  4. We agreed there is a lot of benefit from being clear to community about items that we “kick down the road” for future cycles.  And, yes, we will proactively cut off discussion of these items out of respect for time.

4)      We reviewed the timeline proposed at the end of Spider and added it to the agenda.

  1. The timeline assumes a staged introduction starting with Havana and accelerating for each release.
  2. We are working the timeline backwards to ensure time for Board, TC and community input.

5)      We agreed that consensus is going to be a focus for keeping things moving

  1. This will likely drive to a smaller core definition
  2. We will actively defer issues that cannot reach consensus in the Elephant cycle.

6)      We identified some concepts that may help guide the process in this cycle

  1. We likely need to create categories beyond “core” to help bucket tests
  2. Committee discussion is needed but debate will be time limited

7)      We identified the need to start on test criteria immediately

  1. Board member John Zannos (in absentia) offered to help lead this effort
  2. In defining test criteria, we are likely to have lively discussions about “OpenStack’s values”

8)      We identified some out of scope topics that are important but too big to solve.

  1. We are calling these “elephants” (or the elephant in the room).
  2. The list of elephants needs to be agreed by DefCore and clearly communicated
  3. We expect that the Elephant cycle will make discussing these topics more fruitful

9)      We talked about RefStack code features

  1. Allowing users to upload/post test results from their clouds to enable white box test reporting
  2. Allowing users who have uploaded results to +/- vote on tests they think are important
  3. We established a requirement that posting results requires an OpenStack ID
  4. We established a requirement that only a single Corporate designate (provided by the Foundation) can make a result official for their company.
  5. Collecting opt-in data with test results using tags for things like alternate implementations use, host operating system(s), deployment method, size of cloud, and hypervisor.
  6. We discussed (but did not resolve) that it could be possible to have people run RefStack against public cloud end points and post their results
  7. We agreed that RefStack needs to be able to run locally or as a hosted site.

10)   We identified a lot of missing communication channels

  1. We created a DefCore wiki page to be a home for information.
  2. Joshua and I (and others?) will work with the Foundation staff to create “what is core” video to help the community understand the Principles and objectives for the Elephant cycle.
  3. We are in the process of setting up mail lists, IRC, blog tags, etc.

Yikes!  That’s a lot of progress priming the pump for our first DefCore meeting!

* We picked “DefCore” for the core definition committee name.  One overriding reason for the name is that it has very clean search results.  Since the word “core” is so widely used, we wanted to make sure that commentary on this topic is easy to track against the noisy term core.  We also liked 1) the reference to DefCon and 2) that the Urban Dictionary defines it as going deaf from standing too close to the speakers.

5 differences between Cloud ops and Bare Metal ops

OpenStack SummitCloud APIs are about abstracting operations to simplify deployment.  We want users of our cloud infrastructure to operate with blissful unawareness of the underlying networking topology, storage configuration and physical infrastructure.  For their perspective, the cloud is perfectly elastic, totally configurable and wonderfully consistent. Cloud Admins on the other hand need visibility and controls that expose the complexity while keeping it rational. These are profoundly different concerns.

Maintaining the illusion of clean and simple Cloud ops infrastructure is very valuable; however, it’s just an illusion.  The black metal box behind those APIs is complex, messy, unpredictable and dynamic.

1. Metal Ops has to deal with network topology and details like if an operating system enumerates the NICs correctly, bonding the correct NIC pair and which 10g network to use for the storage traffic. In networking, the topology determines how much traffic you can subscribe to a link and how to provide resliency. Networking does not exist in isolation: you must consider the boundary firewalls and routers to either block or allow traffic because without connectivity the cloud is useless. Details like the access and registration in DNS, NTP and DHCP provide foundations our stable operations. These details are (and should be) hidden from the cloud user.

2. Metal Ops has to deal with firmware issues at every level.  It matters to the server if it boots into BIOS or UEFI mode.  We have to manage the fact that RAID partitions need to be optimized based on the workload and type of drive.  We have to consider if there are specialized drivers and caches to manage and security features (like Intel TXT) to activate.  These details are (and should be) hidden from the cloud user.

3. Metal Ops have to consider the security of their infrastructure.  We have to manage where the admin control network crosses security domains.  It matters which layer 2 networks have access to which parts of the infrastructure.  Separation of responsibility for network vs. storage vs. compute is a reality that it not going away. These details are (and should be) hidden from the cloud user.

4. Metal Ops have to manage operating system compatibility.  I know personally that vendors test and certify their operating systems on an enormous matrix of silicon.  I also have learned that the matrix of possible combinations is far larger and fundamentally impossible at the edges.  There’s a reason that operators seek homogeneous environments and LTS releases. These details are (and should be) hidden from the cloud user.

5. Metal Ops have to deal with hardware failures. By simple statistics, the larger the system the more things will break and metal ops have to cope with this reality. We have to expose failure zones and boundaries to make intelligent responses (like moving data from a failed drive to a non-adjacent one) that require intimate knowledge of system topography that are intentionally hidden in cloud ops. Further, we have to have monitoring and management tooling that knows how to identify which NIC in a bond failed or flash the lights on the failed drive of an array. These details are (and should be) hidden from the cloud user.

Cloud’s power is being able to abstract away this complexity.  Dealing with it gracefully behind the scenes requires transparency and details that make Metal Ops job fundamentally different.

While both can be highly automated and pass my “Cloud is Infrastructure with an API” test, their objectives are different.

Looking to Leverage OpenStack Havana? Crowbar delivers 3xL!

openstack_havanaThe Crowbar community has a tradition of “day zero ops” community support for the latest OpenStack release at the summit using our pull-from-source capability.  This release we’ve really gone the extra mile by doing it one THREE Linux distros (Ubuntu, RHEL & SLES) in parallel with a significant number of projects and new capabilities included.

I’m especially excited about Crowbar implementation of Havana Docker support which required advanced configuration with Nova and Glance.  The community also added Heat and Celiometer in the last release cycle plus High Availability (“Titanium”) deployment work is in active development.  Did I mention that Crowbar is rocking OpenStack deployments?  No, because it’s redundant to mention that.  We’ll upload ISOs of this work for easy access later in the week.

While my team at Dell remains a significant contributor to this work, I’m proud to point out to SUSE Cloud leadership and contributions also (including the new Ceph barclamp & integration).  Crowbar has become a true multi-party framework!

 

Want to learn more?  If you’re in Hong Kong, we are hosting a Crowbar Developer Community Meetup on Monday, November 4, 2013, 9:00 AM to 12:00 PM (HKT) in the SkyCity Marriott SkyZone Meeting Room.  Dell, dotCloud/Docker, SUSE and others will lead a lively technical session to review and discuss the latest updates, advantages and future plans for the Crowbar Operations Platform. You can expect to see some live code demos, and participate in a review of the results of a recent Crowbar 2 hackathon.  Confirm your seat here – space is limited!  (I expect that we’ll also stream this event using Google Hangout, watch Twitter #Crowbar for the feed)

My team at Dell has a significant presence at the OpenStack Summit in Hong Kong (details about activities including sponsored parties).  Be sure to seek out my fellow OpenStack Board Member Joseph George, Dell OpenStack Product Manager Kamesh Pemmaraju and Enstratius/Dell Multi-Cloud Manager Founder George Reese.

Note: The work referenced in this post is about Crowbar v1.  We’ve also reached critical milestones with Crowbar v2 and will begin implementing Havana on that platform shortly.

Crowbar 2 Status Update > I can feel the rumble of the engines

two

Crowbar Two

While I’ve been more muted on our Crowbar 2 progress since our pivot back to CB1 for Grizzly, it has been going strong and steady.  We took advantage of the extra time to do some real analysis about late-binding, simulated annealing, emergent services and functional operations that are directly reflected in Crowbar’s operational model (yes, I’m working on posted explaining each concept).

We’re planning Crowbar 2 hack-a-thon in Hong Kong before the OpenStack Ice House Summit (11/1-3).  We don’t expect a big crowd on site, but the results will be fun to watch remote and it should be possible to play along (watch the crowbar list for details).

In the mean time, I wanted to pass along this comprehensive status update by Crowbar’s leading committer, Victor Lowther:

It has been a little over a month since my last status report on
Crowbar 2.0, so now that we have hit the next major milestone
(installing the OS on a node and being able to manage it afterwards),
it is time for another status report.

Major changes since the initial status report:

* The Crowbar framework understands node aliveness and availability.
* The Network barclamp is operational, and can manage IPv4 and IPv6 in
  the same network.
* delayed_jobs + a stupidly thin queuing layer handle all our
  long-running tasks.
* We have migrated to postgresql 9.3 for all our database needs.
* DHCP and DNS now utilize the on_node_* role hooks to manage their
  databases.
* We support a 2 layer deployment tree -- system on top, everything
  else in the second layer.
* The provisioner can install Ubuntu 12.04 on other nodes.
* The crowbar framework can manage other nodes that are not in
  Sledgehammer.
* We have a shiny installation wizard now.

In more detail:

Aliveness and availability:

Nodes in the Crowbar framework have two related flags that control
whether the annealer can operate on them.

Aliveness is under the control of the Crowbar framework and
encapsulates the framework's idea of whether any given node is
manageable or not.  If a node is pingable and can be SSH'ed into as
root without a password using the credentials of the root user on
the admin node, then the node is alive, otherwise it is dead.
Aliveness is tested everytime a jig tries to do something on a node
-- if a node cannot be pinged and SSH'ed into from at least one of
its addresses on the admin network, it will be marked as
dead.  When a node is marked as dead, all of the noderoles on that
node will be set to either blocked or todo (depending on the state of
their parent noderoles), and those changes will ripple down the
noderole dependency graph to any child noderoles.

Nodes will also mark themselves as alive and dead in the course of
their startup and shutdown routines.

Availability is under the control of the Crowbar cluster
administrators, and should be used by them to tell Crowbar that it
should stop managing noderoles on the node.  When a node is not
available, the annealer will not try to perform any jig runs on a
node, but it will leave the state of the noderoles alone.

A node must be both alive and available for the annealer to perform
operations on it.

The Network Barclamp:

The network barclamp is operational, with the following list of
features:

* Everything mentioned in Architecture for the Network Barclamp in
  Crowbar 2.0
* IPv6 support.  You can create ranges and routers for IPv6 addresses
  as well as IPv4 addresses, and you can tell a network that it should
  automatically assign IPv6 addresses to every node on that network by
  setting the v6prefix setting for that network to either:
  * a /64 network prefix, or
  * "auto", which will create a globally unique RFC4193 IPv6 network
    prefix from a randomly-chosen 40 bit number (unique per cluster
    installation) followed by a subnet ID based on the ID of the
    Crowbar network.
  Either way, nodes in a Crowbar network that has a v6prefix will get
  an interface ID that maps back to their FQDN via the last 64 bits of
  the md5sum of that FQDN. For now, the admin network will
  automatically create an RFC4193 IPv6 network if it is not passed a
  v6prefix so that we can easily test all the core Crowbar components
  with IPv6 as well as IPv4.  The DNS barclamp has been updated to
  create the appropriate AAAA records for any IPv6 addresses in the
  admin network.

Delayed Jobs and Queuing:

The Crowbar framework runs all jig actions in the background using
delayed_jobs + a thin queuing layer that ensures that only one task is
running on a node at any given time.  For now, we limit ourselves to
having up to 10 tasks running in the background at any given time,
which should be enough for the immediate future until we come up with
proper tuning guidelines or auto-tuning code for significantly larger
clusters.

Postgresql 9.3:

Migrating to delayed_jobs for all our background processing made it
immediatly obvious that sqlite is not at all suited to handling real
concurrency once we started doing multiple jig runs on different nodes
at a time. Postgresql is more than capable of handling our forseeable
concurrency and HA use cases, and gives us lots of scope for future
optimizations and scalability.

DHCP and DNS:

The roles for DHCP and DNS have been refactored to have seperate
database roles, which are resposible for keeping their respective
server roles up to date.  Theys use the on_node_* roles mentioned in
"Roles, nodes, noderoles, lifeycles, and events, oh my!" along with a
new on_node_change event hook create and destroy DNS and DHCP database
entries, and (in the case of DHCP) to control what enviroment a node
will PXE/UEFI boot into.  This gives us back the abiliy to boot into
something besides Sledgehammer.

Deployment tree:

Until now, the only deployment that Crowbar 2.0 knew about was the
system deployment.  The system deployment, however, cannot be placed
into proposed and therefore cannot be used for anything other than
initial bootstrap and discovery.  To do anything besides
bootstrap the admin node and discover other nodes, we need to create
another deployment to host the additional noderoles needed to allow
other workloads to exist on the cluster.  Right now, you can only
create deployments as shildren of the system deployment, limiting the
deployment tree to being 2 layers deep.

Provisioner Installing Ubuntu 12.04:

Now, we get to the first of tqo big things that were added in the last
week -- the provisioner being able to install Ubuntu 12.04 and bring
the resulting node under management by the rest of the CB 2.0
framework.  This bulds on top of the deployment tree and DHCP/DNS
database role work.  To install Ubuntu 12.04 on a node from the web UI:

1: Create a new deployment, and add the provisioner-os-install role to
that deployment.  In the future you will be able to edit the
deployment role information to change what the default OS for a
deployment should be.
2: Drag one of the non-admin nodes onto the provisioner-os-install
role.  This will create a proposed noderole binding the
provisioner-os-install role to that node, and in the future you would
be able to change what OS would be installed on that node by editing
that noderole before committing the deployment.
3: Commit the snapshot.  This will cause several things to happen:
  * The freshly-bound noderoles will transition to TODO, which will
    trigger an annealer pass on the noderoles.
  * The annealer will grab all the provisioner-os-install roles that
    are in TODO, set them in TRANSITION, and hand them off to
    delayed_jobs via the queuing system.
  * The delayed_jobs handlers will use the script jig to schedule a
    reboot of the nodes for 60 seconds in the future and then return,
    which will transition the noderole to ACTIVE.
  * In the crowbar framework, the provisioner-os-install role has an
    on_active hook which will change the boot environment of the node
    passed to it via the noderole to the appropriate os install state
    for the OS we want to install, and mark the node as not alive so
    that the annealer will ignore the node while it is being
    installed.
  * The provisioner-dhcp-database role has an on_node_change handler
    that watches for changes in the boot environment of a node.  It
    will see the bootenv change, update the provisioner-dhcp-database
    noderoles with the new bootenv for the node, and then enqueue a
    run of all of the provisioner-dhcp-database roles.
  * delayed_jobs will see the enqueued runs, and run them in the order
    they were submitted.  All the runs sholuld happen before the 60
    seconds has elapsed.
  * When the nodes finally reboot, the DHCP databases should have been
    updated and the nodes will boot into the Uubntu OS installer,
    install, and then set their bootenv to local, which will tell the
    provisioner (via the provisioner-dhcp-database on_node_change
    hook) to not PXE boot the node anymore.
  * When the nodes reboot off their freshly-installed hard drive, they
    will mark themselves as alive, and the annealer will rerun all of
    the usual discovery roles.
The semi-astute observer will have noticed some obvious bugs and race
conditions in the above sequence of steps.  These have been left in
place in the interest of expediency and as learning oppourtunities for
others who need to get familiar with the Crowbar codebase.

Installation Wizard:

We have a shiny installation that you can use to finish bootstrapping
your admin node.  To use it, pass the --wizard flag after your FQDN to
/opt/dell/bin/install-crowbar when setting up the admin node, and the
install script will not automatically create an admin network or an
entry for the admin node, and logging into the web UI will let you
customize things before creating the initial admin node entry and
committing the system deployment.  

Once we get closer to releasing CB 2.0, --wizard will become the default.

OpenStack Core Online Forum, Oct 16 13:30 UTC / Oct 22 0100 UTC

Go Online!OpenStack Community, you are invited on an online discussion about OpenStack Core on October 16th at UTC 13:30 (8:30 am US Central) and October 22nd at UTC 0100 (8:00 pm US Central)

At the next OpenStack Foundation Board meeting, we will be setting a timeline for implementing an OpenStack Core Definition process that promotes a clear and implementation driven metric for deciding which projects should be considered “required.”  This is your chance to review and influence the process!

We’ll review the OpenStack Core Definition process (20 minutes) and then open up the channel for discussion using the IRC (#openstack-meeting) & Google Hangout on Air (link posted in IRC).

The forum will be coordinated through the IRC channel for links and questions.

Can’t make it?  The session was recorded > here!

Thinking about how to Implement OpenStack Core Definition

THIS POST IS #10 IN A SERIES ABOUT “WHAT IS CORE.”

Tied UpWe’ve had a number of community discussions (OSCON, SFO & SA-TX) around the process for OpenStack Core definition.  These have been animated and engaged discussions (video from SA-TX): my notes for them are below.

While the current thinking of a testing-based definition of Core adds pressure on expanding our test suite, it seems to pass the community’s fairness checks.

Overall, the discussions lead me to believe that we’re on the right track because the discussions jump from process to impacts.  It’s not too late!  We’re continuing to get community feedback.  So what’s next?

First…. Get involved: Upcoming Community Core Discussions

These discussions are expected to have online access via Google Hangout.  Watch Twitter when the event starts for a link.

Want to to discuss this in your meetup? Reach out to me or someone on the Board and we’ll be happy to find a way to connect with your local community!

What’s Next?  Implementation!

So far, the Core discussion has been about defining the process that we’ll use to determine what is core.  Assuming we move forward, the next step is to implement that process by selecting which tests are “must pass.”  That means we have to both figure out how to pick the tests and do the actual work of picking them.  I suspect we’ll also find testing gaps that will have developers scrambling in Ice House.

Here’s the possible (aggressive) timeline for implementation:

  • November: Approval of approach & timeline at next Board Meeting
  • January: Publish Timeline for Roll out (ideally, have usable definition for Havana)
  • March: Identify Havana must pass Tests (process to be determined)
  • April: Integration w/ OpenStack Foundation infrastructure

Obviously, there are a lot of details to work out!  I expect that we’ll have an interim process to select must-pass tests before we can have a full community driven methodology.

Notes from Previous Discussions (earlier notes):

  • There is still confusion around the idea that OpenStack Core requires using some of the project code.  This requirement helps ensure that people claiming to be OpenStack core have a reason to contribute, not just replicate the APIs.
  • It’s easy to overlook that we’re trying to define a process for defining core, not core itself.  We have spent a lot of time testing how individual projects may be effected based on possible outcomes.  In the end, we’ll need actual data.
  • There are some clear anti-goals in the process that we are not ready to discuss but will clearly going to become issues quickly.  They are:
    • Using the OpenStack name for projects that pass the API tests but don’t implement any OpenStack code.  (e.g.: an OpenStack Compatible mark)
    • Having speciality testing sets for flavors of OpenStack that are different than core.  (e.g.: OpenStack for Hosters, OpenStack Private Cloud, etc)
  • We need to be prepared that the list of “must pass” tests identifies a smaller core than is currently defined.  It’s possible that some projects will no longer be “core”
  • The idea that we’re going to use real data to recommend tests as must-pass is positive; however, the time it takes to collect the data may be frustrating.
  • People love to lobby for their favorite projects.  Gaps in testing may create problems.
  • We are about to put a lot of pressure on the testing efforts and that will require more investment and leadership from the Foundation.
  • Some people are not comfortable with self-reporting test compliance.   Overall, market pressure was considered enough to punish cheaters.
  • There is a perceived risk of confusion as we migrate between versions.  OpenStack Core for Havana seems to specific but there is concern that vendors may pass in one release and then skip re-certification.  Once again, market pressure seems to be an adequate answer.
  • It’s not clear if a project with only 1 must-pass test is a core project.  Likely, it would be considered core.  Ultimately, people seem to expect that the tests will define core instead of the project boundary.

What do you think?  I’d like to hear your opinions on this!

Taking OpenStack Core discussions to community

core flowTHIS POST IS #9 IN A SERIES ABOUT “WHAT IS CORE.”

We’ve been building up to a broad discussion about the OpenStack Core and I’d like to invite everyone in the OpenStack community to participate (review latest).

Alan Clark (Board Chairman) officially kicked off this open discussion with his post on the OpenStack blog last week.  And we’re trying to have face-to-face events for dialog like the Core meetup tonight in San Francisco.  Look for more to come!

Of course, this will also be a topic at the summit (Alan and I submitted two sessions about this).  The Board needs to move this forward in the November meeting, so NOW is the time to review and give us input.

refined: 10 OpenStack Core Positions

core flowTHIS POST IS #8 IN A SERIES ABOUT “WHAT IS CORE.”

Last week, I posted a streamlined visual of the core discussion that distilled the 12 positions into 10.  Here are reordered and cleaned up matching positions.  This should make it much easier to understand the context.

Note 11/3: The Core Definition is now maintained on the OpenStack Wiki.  This list may not reflect the latest changes.
  1. Implementations that are Core can use OpenStack trademark (OpenStack™)

    1. This is the legal definition of “core” and the  why it matters to  the community.

    2. We want to make sure that the OpenStack™ mark means something.

    3. The OpenStack™ mark is not the same as the OpenStack brand; however, the Board uses it’s control of the mark as a proxy to help manage the brand.

  2. Core is a subset of the whole project

    1. The OpenStack project is supposed to be a broad and diverse community with new projects entering incubation and new implementations being constantly added.  This innovation is vital to OpenStack but separate from the definition of Core.

    2. There may be other marks that are managed separately by the foundation, and available for the platform ecosystem as per the Board’s discretion

    3. “OpenStack API Compatible ” mark not part of this discussion and should be not be assumed.

  3. Core definition can be applied equally to all usage models

    1. There should not be multiple definitions of OpenStack depending on the operator (public, private, community, etc)

    2. While expected that each deployment is identical, the differences must be quantifiable

  4. Claiming OpenStack requiring use of designated upstream code

    1. Implementation’s claiming the OpenStack™ mark must use the OpenStack upstream code (or be using code submitted to upstream)

    2. You are not OpenStack, if you pass all the tests but do not use the API framework

    3. This prevents people from using the API without joining the community

    4. This also surfaces bit-rot in alternate implementations to the larger community

    5. This behavior improves interoperability because there is more shared code between implementation

  5. Projects must have an open reference implementation

    1. OpenStack will require an open source reference base plug-in implementation for projects (if not part of OpenStack, license model for reference plug-in must be compatible).

    2. Definition of a plug-in: alternate backend implementations with a common API framework that uses common _code_ to implement the API

    3. This expects that projects (where technically feasible) are expected to implement a plug-in or extension architecture.

    4. This is already in place for several projects and addresses around ecosystem support, enabling innovation

    5. Reference plug-ins are, by definition, the complete capability set.  It is not acceptable to have “core” features that are not functional in the reference plug-in

    6. This will enable alternate implementations to offer innovative or differentiated features without forcing changes to the reference plug-in implementation

    7. This will enable the reference to expand without forcing other  alternate implementations to match all features and recertify

  6. Vendors may substitute alternate implementations

    1. If a vendor plug-in passes all relevant tests then it can be considered a full substitute for the reference plug-in

    2. If a vendor plug-in does NOT pass all relevant test then the vendor is required to include the open source reference in the implementation.

    3. Alternate implementations may pass any tests that make sense

    4. Alternate implementations should add tests to validate new functionality.

    5. They must have all the must-pass tests (see #10) to claim the OpenStack mark.

  7. OpenStack Implementations are verified by open community tests

    1. Vendor OpenStack implementations must achieve 100% of must-have coverage?

    2. Implemented tests can be flagged as may-have requires list  [Joshua McKenty]

    3. Certifiers will be required to disclose their testing gaps.

    4. This will put a lot of pressure on the Tempest project

    5. Maintenance of the testing suite to become a core Foundation responsibility.  This may require additional resources

    6. Implementations and products are allowed to have variation based on publication of compatibility

    7. Consumers must have a way to determine how the system is different from reference (posted, discovered, etc)

    8. Testing must respond in an appropriate way on BOTH pass and fail (the wrong return rejects the entire suite)

  8. Tests can be remotely or self-administered

    1. Plug-in certification is driven by Tempest self-certification model

    2. Self-certifiers are required to publish their results

    3. Self-certified are required to publish enough information that a 3rd party could build the reference implementation to pass the tests.

    4. Self-certified must include the operating systems that have been certified

    5. It is preferred for self-certified implementation to reference an OpenStack reference architecture “flavor” instead of defining their own reference.  (a way to publish and agree on flavors is needed)

    6. The Foundation needs to define a mechanism of dispute resolution. (A trust but verify model)

    7. As an ecosystem partner, you have a need to make a “works against OpenStack” statement that is supportable

    8. API consumer can claim working against the OpenStack API if it works against any implementation passing all the “must have” tests(YES)

    9. API consumers can state they are working against the OpenStack API with some “may have” items as requirements

    10. API consumers are expected to write tests that validate their required behaviors (submitted as “may have” tests)

  9. A subset of tests are chosen by the Foundation as “must-pass”

    1. An OpenStack body will recommend which tests are elevated from may-have to must-have

    2. The selection of “must-pass” tests should be based on quantifiable information when possible.

    3. Must-pass tests should be selected from the existing body of “may-pass” tests.  This encourages people to write tests for cases they want supported.

    4. We will have a process by which tests are elevated from may to must lists

    5. Potentially: the User Committee will nominate tests that elevated to the board

  10. OpenStack Core means passing all “must-pass” tests

    1. The OpenStack board owns the responsibility to define ‘core’ – to approve ‘musts’

    2. We are NOT defining which items are on the list in this effort, just making the position that it is how we will define core

    3. May-have tests include items in the integrated release, but which are not core.

    4. Must haves – must comply with the Core criteria defined from the IncUp committee results

    5. Projects in Incubation or pre-Incubation are not to be included in the ‘may’ list

Visualizing the OpenStack Core discussion points

THIS POST IS #6 IN A SERIES ABOUT “WHAT IS CORE.”

As we take the OpenStack Core discussion to a larger audience, I was asked to create the summary version the discussion points.  We needed a quick visual way to understand how these consensus statements interconnect and help provide context.  To address this need, I based it on a refined 10 core positions to create the following OpenStack Core flowchart.

core flow

The flow diagram below is grouped into three main areas: core definition (green), technical requirements (blue), and testing impacts (orange).

  1. Core Definition (green) walks through the fundamental scope and premise of the “what is core” discussion.  We are looking for the essential OpenStack: the parts that everyone needs and nothing more.  While OpenStack can be something much larger, core lives at the heart of the use-case venn diagram.  It’s the magical ice cream flavor that everyone loves like Triple Unicorn Rainbow Crunch.
  2. Technical Requirements (blue) covers some of the most contentious parts of the dialog.  This section states the expectation that OpenStack™ implementations must use parts the OpenStack code (you can’t just provide a compatible API).  It goes further to expect that we will maintain an open reference implementation and also identify places where parts of the code can be substituted with alternate implementations.  Examples of alternate implementations are plug-ins, API extensions, different hypervisors, and alternate libraries.
  3. Testing Impacts (orange) reviews some of the important new thinking around Core.  These points focus on the use of OpenStack community tests (e.g.: Tempest) to validate the total code base.  We expect users to be able to self-administer these tests or rely on an external validation.  Either way, we do not expect all tests to pass for all configurations; instead, the Foundation will identify a subset of the tests as required or must-pass.  The current thinking is that these must-pass tests will become the effective definition of OpenStack™ Core.

I hope this helps connect the dots on the core discussions so far.

I’d like to clean-up the positions to match the flow chart and cross reference.  Stay tuned!  This flowchart is a work in process – updates and suggestions are welcome!

READ POST IS #7: WHERE IS THIS GOING?