Short lived VM (Mayflies) research yields surprising scheduling benefit

Last semester, Alex Hirschfeld (my son) did a simulation to explore the possible efficiency benefits of the Mayflies concept proposed by Josh McKenty and me.

Mayflies swarming from Wikipedia

In the initial phase of the research, he simulated a data center using load curves designed to oversubscribe the resources (he’s still interesting in actual load data).  This was sufficient to test the theory and find something surprising: mayflies can really improve scheduling.

Alex found an unexpected benefit comes when you force mayflies to have a controlled “die off.”  It allows your scheduler to be much smarter.

Let’s assume that you have a high mayfly ratio (70%), that means every day 10% of your resources would turn over.  If you coordinate the time window and feed that information into your scheduler, then it can make much better load distribution decisions.  Alex’s simulation showed that this approach basically eliminated hot spots and server over-crowding.

Here’s a snippet of his report explaining the effect in his own words:

On a system that is more consistent and does not have a massive virtual machine through put, Mayflies may not help with balancing the systems load, but with the social engineering aspect, it can increase the stability of the system.

Most of the time, the requests for new virtual machines on a cloud are immutable. They came in at a time and need to be fulfilled in the order of their request. Mayflies has the potential to change that. If a request is made, it has the potential to be added to a queue of mayflies that need to be reinitialized. This creates a queue of virtual machine requests that any load balancing algorithm can work with.

Mayflies can make load balancing a system easier. Knowing the exact size of the virtual machine that is going to be added and knowing when it will die makes load balancing for dynamic systems trivial.

Research showing that Short Lived Servers (“mayflies”) create efficiency at scale [DATA REQUESTED]

Last summer, Josh McKenty and I extended the puppies and cattle metaphor to limited life cattle we called “mayflies.” It was an attempt to help drive the cattle mindset (I think of it as social engineering, or maybe PsychOps) by forcing churn. I’ve come to think of it a step in between cattle and chaos monkeys (see Adrian Cockcroft).

While our thoughts were on mainly ops patterns, I’ve heard that there could be a real operational benefit from encouraging this behavior. The increased turn over in the environment improves scheduler optimization, planned load drains and coping with platform/environment migration.

Now we have a chance to quantify this benefit: a college student (disclosure: he’s my son) has created a data center emulation to see if Mayflies help with utilization. His model appears to work.

Now, he needs some real world data, here’s his request for assistance [note: he needs data by 1/20 to be included in this term]:

Hello!

I am Alexander Hirschfeld, a freshman at Rose-Hulman Institute of Technology. I am working on an independent study about Mayflies, a new idea in virtual machine management in cloud computing. Part of this management is load balancing and resource allocation for virtual machines across a collection of servers. The emulation that I am working on needs a realistic set of data to be the most accurate when modeling the results of using the methods outlined by the theory of mayflies.

Mayflies are an extension of the puppies verses cattle approach to machines, they are the extreme version of cattle as they have a known limited lifespan, such as 7 days. This requires the users of the cloud to build inherently more automated and fault-resistant applications. If you could send me a collection of the requests for new virtual machines(per standard unit of time and their requested specs/size), as well as an average lifetime for the virtual machines (or a graph or list of designated/estimated life times), and a basic summary of the collection of servers running the virtual machines(number, ram, cores), I would be better able to understand how Mayflies can affect a cloud.

Thanks,
Alexander Hirschfeld, twitter: @d-qoi

Needless to say, I’m really excited about the progress on demonstrating some the impact of this practice and am looking forward to posting about his results in the near future.

If you post in the comments, I will make sure you are connected to Alex.

OpenStack DefCore Enters Execution Phase. Help!

OpenStack DefCore Committee has established the principles and first artifacts required for vendors using the OpenStack trademark.  Over the next release cycle, we will be applying these to the Ice House and Juno releases.

Like a rockLearn more?  Hear about it LIVE!  Rob will be doing two sessions about DefCore next week (will be recorded):

  1. Tues Dec 16 at 9:45 am PST- OpenStack Podcast #14 with Jeff Dickey
  2. Thurs Dec 18 at 9:00 am PST – Online Meetup about DefCore with Rafael Knuth (optional RSVP)

At the December 2014 OpenStack Board meeting, we completed laying the foundations for the DefCore process that we started April 2013 in Portland. These are a set of principles explaining how OpenStack will select capabilities and code required for vendors using the name OpenStack. We also published the application of these governance principles for the Havana release.

  1. The OpenStack Board approved DefCore principles to explain
    the landscape of core including test driven capabilities and designated code (approved Nov 2013)
  2. the twelve criteria used to select capabilities (approved April 2014)
  3. the creation of component and framework layers for core (approved Oct 2014)
  4. the ten principles used to select designated sections (approved Dec 2014)

To test these principles, we’ve applied them to Havana and expressed the results in JSON format: Havana Capabilities and Havana Designated Sections. We’ve attempted to keep the process transparent and community focused by keeping these files as text and using the standard OpenStack review process.

DefCore’s work is not done and we need your help!  What’s next?

  1. Vote about bylaws changes to fully enable DefCore (change from projects defining core to capabilities)
  2. Work out going forward process for updating capabilities and sections for each release (once authorized by the bylaws, must be approved by Board and TC)
  3. Bring Havana work forward to Ice House and Juno.
  4. Help drive Refstack process to collect data from the field

OpenStack DefCore Process Flow: Community Feedback Cycles for Core [6 points + chart]

If you’ve been following my DefCore posts, then you already know that DefCore is an OpenStack Foundation Board managed process “that sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack™ products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack™.”

In this post, I’m going to be very specific about what we think “community resources and involvement” entails.

The draft process flow chart was provided to the Board at our OSCON meeting without additional review.  It below boils down to a few key points:

  1. We are using the documents in the Gerrit review process to ensure that we work within the community processes.
  2. Going forward, we want to rely on the technical leadership to create, cluster and describe capabilities.  DefCore bootstrapped this process for Havana.  Further, Capabilities are defined by tests in Tempest so test coverage gaps (like Keystone v2) translate into Core gaps.
  3. We are investing in data driven and community involved feedback (via Refstack) to engage the largest possible base for core decisions.
  4. There is a “safety valve” for vendors to deal with test scenarios that are difficult to recreate in the field.
  5. The Board is responsible for approving the final artifacts based on the recommendations.  By having a transparent process, community input is expected in advance of that approval.
  6. The process is time sensitive.  There’s a need for the Board to produce Core definition in a timely way after each release and then feed that into the next one.  Ideally, the definitions will be approved at the Board meeting immediately following the release.

DefCore Process Draft

Process shows how the key components: designated sections and capabilities start from the previous release’s version and the DefCore committee manages the update process.  Community input is a vital part of the cycle.  This is especially true for identifying actual use of the capabilities through the Refstack data collection site.

  • Blue is for Board activities
  • Yellow is or user/vendor community activities
  • Green is for technical community activities
  • White is for process artifacts

This process is very much in draft form and any input or discussion is welcome!  I expect DefCore to take up formal review of the process in October.

Patchwork Onion delivers stability & innovation: the graphics that explains how we determine OpenStack Core

This post was coauthored by the DefCore chairs, Rob Hirschfeld & Joshua McKenty.

The OpenStack board, through the DefCore committee, has been working to define “core” for commercial users using a combination of minimum required capabilities (APIs) and code (Designated Sections).  These minimums are decided on a per project basis so it can be difficult to visualize the impact on the overall effect on the Integrated Release.

Patchwork OnionWe’ve created the patchwork onion graphic to help illustrate how core relates to the integrated release.  While this graphic is pretty complex, it was important to find a visual way to show how different DefCore identifies distinct subsets of APIs and code from each project.  This graphic tries to show how that some projects have no core APIs and/or code.

For OpenStack to grow, we need to have BOTH stability and innovation.  We need to give clear guidance to the community what is stable foundation and what is exciting sandbox.  Without that guidance, OpenStack is perceived as risky and unstable by users and vendors. The purpose of defining “Core” is to be specific in addressing that need so we can move towards interoperability.

Interoperability enables an ecosystem with multiple commercial vendors which is one of the primary goals of the OpenStack Foundation.

Ecosystem OnionOriginally, we thought OpenStack would have “core” and “non-core” projects and we baked that expectation into the bylaws.  As we’ve progressed, it’s clear that we need a less binary definition.  Projects themselves have a maturity cycle (ecosystem -> incubated -> integrated) and within the project some APIs are robust and stable while others are innovative and fluctuating.

Encouraging this mix of stabilization and innovation has been an important factor in our discussions about DefCore.  Growing the user base requires encouraging stability and growing the developer base requires enabling innovation within the same projects.

The consequence is that we are required to clearly define subsets of capabilities (APIs) and implementation (code) that are required within each project.  Designating 100% of the API or code as Core stifles innovation because stability dictates limiting changes while designating 0% of the code (being API only) lessens the need to upstream.  Core reflects the stability and foundational nature of the code; unfortunately, many people incorrectly equate “being core” with the importance of the code, and politics ensues.

To combat the politics, DefCore has taken a transparent, principles-based approach to selecting core.   You can read about in Rob’s upcoming “Ugly Babies” post (check back on 8/14) .

Share the love & vote for OpenStack Paris Summit Sessions (closes Wed 8/6)

 

This is a friendly PSA that OpenStack Paris Summit session community voting ends on Wednesday 8/6.  There are HUNDREDS (I heard >1k) submissions so please set aside some time to review a handful.

Robot VoterMY PLEA TO YOU > There is a tendency for companies to “vote-up” sessions from their own employees.  I understand the need for the practice BUT encourage you to make time to review other sessions too.  Affiliation voting is fine, robot voting is not.

If you are interested topics that I discuss on this blog, here’s a list of sessions I’m involved in:

 

 

DefCore Advances at the Core > My take on the OSCON’14 OpenStack Board Meeting

Last week’s day-long Board Meeting (Jonathan’s summary) focused on three major topics: DefCore, Contribute Licenses (CLA/DCO) and the “Win the Enterprise” initiative. In some ways, these three topics are three views into OpenStack’s top issue: commercial vs. individual interests.

But first, let’s talk about DefCore!

DefCore took a major step with the passing of the advisory Havana Capabilities (the green items are required). That means that vendors in the community now have a Board approved minimum requirements.  These are not enforced for Havana so that the community has time to review and evaluate.

Designated Sections (1)For all that progress, we only have half of the Havana core definition complete. Designated Sections, the other component of Core, will be defined by the DefCore committee for Board approval in September. Originally, we expected the TC to own this part of the process; however, they felt it was related to commercial interested (not technical) and asked for the Board to manage it.

The coming meetings will resolve the “is Swift code required” question and that topic will require a dedicated post.  In many ways, this question has been the challenge for core definition from the start.  If you want to join the discussion, please subscribe to the DefCore list.

The majority of the board meeting was spent discussion other weighty topics that are work a brief review.

Contribution Licenses revolve around developer vs broader community challenge. This issue is surprisingly high stakes for many in the community. I see two primary issues

  1. Tension between corporate (CLA) vs. individual (DCO) control and approval
  2. Concern over barriers to contribution (sadly, there are many but this one is in the board’s controls)

Win the Enterprise was born from product management frustration and a fragmented user base. My read on this topic is that we’re pushing on the donkey. I’m hearing serious rumbling about OpenStack operability, upgrade and scale.  This group is doing a surprisingly good job of documenting these requirements so that we will have an official “we need this” statement. It’s not clear how we are going to turn that statement into either carrots or sticks for the donkey.

Overall, there was a very strong existential theme for OpenStack at this meeting: are we a companies collaborating or individuals contributing?  Clearly, OpenStack is both but the proportions remain unclear.

Answering this question is ultimately at the heart of all three primary topics. I expect DefCore will be on the front line of this discussion over the next few weeks (meeting 1, 2, and 3). Now is the time to get involved if you want to play along.

OpenStack DefCore Review [interview by Jason Baker]

I was interviewed about DefCore by Jason Baker of Red Hat as part of my participation in OSCON Open Cloud Day (speaking Monday 11:30am).  This is just one of fifteen in a series of speaker interviews covering everything from Docker to Girls in Tech.

This interview serves as a good review of DefCore so I’m reposting it here:

Without giving away too much, what are you discussing at OSCON? What drove the need for DefCore?

I’m going to walk through the impact of the OpenStack DefCore process in real terms for users and operators. I’ll talk about how the process works and how we hope it will make OpenStack users’ lives better. Our goal is to take steps towards interoperability between clouds.

DefCore grew out of a need to answer hard and high stakes questions around OpenStack. Questions like “is Swift required?” and “which parts of OpenStack do I have to ship?” have very serious implications for the OpenStack ecosystem.

It was impossible to reach consensus about these questions in regular board meetings so DefCore stepped back to base principles. We’ve been building up a process that helps us make decisions in a transparent way. That’s very important in an open source community because contributors and users want ground rules for engagement.

It seems like there has been a lot of discussion over the OpenStack listservs over what DefCore is and what it isn’t. What’s your definition?

First, DefCore applies only to commercial uses of the OpenStack name. There are different rules for the integrated code base and community activity. That’s the place of most confusion.

Basically, DefCore establishes the required minimum feature set for OpenStack products.

The longer version includes that it’s a board managed process that’s designed to be very transparent and objective. The long-term objective is to ensure that OpenStack clouds are interoperable in a measurable way and that we also encourage our vendor ecosystem to keep participating in upstream development and creation of tests.

A final important component of DefCore is that we are defending the OpenStack brand. While we want a vibrant ecosystem of vendors, we must first have a community that knows what OpenStack is and trusts that companies using our brand comply with a meaningful baseline.

Are there other open source projects out there using “designated sections” of code to define their product, or is this concept unique to OpenStack? What lessons do you think can be learned from other projects’ control (or lack thereof) of what must be included to retain the use of the project’s name?

I’m not aware of other projects using those exact words. We picked up ‘designated sections’ because the community felt that ‘plug-ins’ and ‘modules’ were too limited and generic. I think the term can be confusing, but it was the best we found.

If you consider designated sections to be plug-ins or modules, then there are other projects with similar concepts. Many successful open source projects (Eclipse, Linux, Samba) are functionally frameworks that have very robust extensibility. These projects encourage people to use their code base creatively and then give back some (not all) of their lessons learned in the form of code contributes. If the scope returning value to upstream is too broad then sharing back can become onerous and forking ensues.

All projects must work to find the right balance between collaborative areas (which have community overhead to join) and independent modules (which allow small teams to move quickly). From that perspective, I think the concept is very aligned with good engineering design principles.

The key goal is to help the technical and vendor communities know where it’s safe to offer alternatives and where they are expected to work in the upstream. In my opinion, designated sections foster innovation because they allow people to try new ideas and to target specialized use cases without having to fight about which parts get upstreamed.

What is it like to serve as a community elected OpenStack board member? Are there interests you hope to serve that are difference from the corporate board spots, or is that distinction even noticeable in practice?

It’s been like trying to row a dragon boat down class III rapids. There are a lot of people with oars in the water but we’re neither all rowing together nor able to fight the current. I do think the community members represent different interests than the sponsored seats but I also think the TC/board seats are different too. Each board member brings a distinct perspective based on their experience and interests. While those perspectives are shaped by their employment, I’m very happy to say that I do not see their corporate affiliation as a factor in their actions or decisions. I can think of specific cases where I’ve seen the opposite: board members have acted outside of their affiliation.

When you look back at how OpenStack has grown and developed over the past four years, what has been your biggest surprise?

Honestly, I’m surprised about how many wheels we’ve had to re-invent. I don’t know if it’s cultural or truly a need created by the size and scope of the project, but it seems like we’ve had to (re)create things that we could have leveraged.

What are you most excited about for the “K” release of OpenStack?

The addition of platform services like database as a Service, DNS as a Service, Firewall as a Service. I think these IaaS “adjacent” services are essential to completing the cloud infrastructure story.

Any final thoughts?

In DefCore, we’ve moved slowly and deliberately to ensure people have a chance to participate. We’ve also pushed some problems into the future so that we could resolve the central issues first. We need to community to speak up (either for or against) in order for us to accelerate: silence means we must pause for more input.

OpenStack DefCore Update & 7/16 Community Reviews

The OpenStack Board effort to define “what is core” for commercial use (aka DefCore).  I have blogged extensively about this topic and rely on you to review that material because this post focuses on updates from recent activity.

First, Please Join Our Community DefCore Reviews on 7/16!

We’re reviewing the current DefCore process & timeline then talking about the Advisory Havana Capabilities Matrix (decoder).

To support global access, there are TWO meetings (both will also be recorded):

  1. July 16, 8 am PDT / 1500 UTC
  2. July 16, 6 pm PDT / 0100 UTC July 17

Note: I’m presenting about DefCore at OSCON on 7/21 at 11:30!

We want community input!  The Board is going discuss and, hopefully, approve the matrix at our next meeting on 7/22.  After that, the Board will be focused on defining Designated Sections for Havana and Ice House (the TC is not owning that as previously expected).

The DefCore process is gaining momentum.  We’ve reached the point where there are tangible (yet still non-binding) results to review.  The Refstack efforts to collect community test results from running clouds is underway: the Core Matrix will be fed into Refstack to validate against the DefCore required capabilities.

Now is the time to make adjustments and corrections!  

In the next few months, we’re going to be locking in more and more of the process as we get ready to make it part of the OpenStack by-laws (see bottom of minutes).

If you cannot make these meetings, we still want to hear from you!  The most direct way to engage is via the DefCore mailing list but 1×1 email works too!  Your input is import to us!

Understanding OpenStack Designated Code Sections – Three critical questions

A collaboration with Michael Still (TC Member from Rackspace) & Joshua McKenty and Cross posted by Rackspace.

After nearly a year of discussion, the OpenStack board launched the DefCore process with 10 principles that set us on path towards a validated interoperability standard.   We created the concept of “designated sections” to address concerns that using API tests to determine core would undermine commercial and community investment in a working, shared upstream implementation.

Designated SectionsDesignated sections provides the “you must include this” part of the core definition.  Having common code as part of core is a central part of how DefCore is driving OpenStack operability.

So, why do we need this?

From our very formation, OpenStack has valued implementation over specification; consequently, there is a fairly strong community bias to ensure contributions are upstreamed. This bias is codified into the very structure of the GNU General Public License (GPL) but intentionally missing in the Apache Public License (APL v2) that OpenStack follows.  The choice of Apache2 was important for OpenStack to attract commercial interests, who often consider GPL a “poison pill” because of the upstream requirements.

Nothing in the Apache license requires consumers of the code to share their changes; however, the OpenStack foundation does have control of how the OpenStack™ brand is used.   Thus it’s possible for someone to fork and reuse OpenStack code without permission, but they cannot called it “OpenStack” code.  This restriction only has strength if the OpenStack brand has value (protecting that value is the primary duty of the Foundation).

This intersection between License and Brand is the essence of why the Board has created the DefCore process.

Ok, how are we going to pick the designated code?

Figuring out which code should be designated is highly project specific and ultimately subjective; however, it’s also important to the community that we have a consistent and predictable strategy.  While the work falls to the project technical leads (with ratification by the Technical Committee), the DefCore and Technical committees worked together to define a set of principles to guide the selection.

This Technical Committee resolution formally approves the general selection principles for “designated sections” of code, as part of the DefCore effort.  We’ve taken the liberty to create a graphical representation (above) that visualizes this table using white for designated and black for non-designated sections.  We’ve also included the DefCore principle of having an official “reference implementation.”

Here is the text from the resolution presented as a table:

Should be DESIGNATED: Should NOT be DESIGNATED:
  • code provides the project external REST API, or
  • code is shared and provides common functionality for all options, or
  • code implements logic that is critical for cross-platform operation
  • code interfaces to vendor-specific functions, or
  • project design explicitly intended this section to be replaceable, or
  • code extends the project external REST API in a new or different way, or
  • code is being deprecated

The resolution includes the expectation that “code that is not clearly designated is assumed to be designated unless determined otherwise. The default assumption will be to consider code designated.”

This definition is a starting point.  Our next step is to apply these rules to projects and make sure that they provide meaningful results.

Wow, isn’t that a lot of code?

Not really.  Its important to remember that designated sections alone do not define core: the must-pass tests are also a critical component.   Consequently, designated code in projects that do not have must-pass tests is not actually required for OpenStack licensed implementation.