Short lived VM (Mayflies) research yields surprising scheduling benefit

Last semester, Alex Hirschfeld (my son) did a simulation to explore the possible efficiency benefits of the Mayflies concept proposed by Josh McKenty and me.

Mayflies swarming from Wikipedia

In the initial phase of the research, he simulated a data center using load curves designed to oversubscribe the resources (he’s still interesting in actual load data).  This was sufficient to test the theory and find something surprising: mayflies can really improve scheduling.

Alex found an unexpected benefit comes when you force mayflies to have a controlled “die off.”  It allows your scheduler to be much smarter.

Let’s assume that you have a high mayfly ratio (70%), that means every day 10% of your resources would turn over.  If you coordinate the time window and feed that information into your scheduler, then it can make much better load distribution decisions.  Alex’s simulation showed that this approach basically eliminated hot spots and server over-crowding.

Here’s a snippet of his report explaining the effect in his own words:

On a system that is more consistent and does not have a massive virtual machine through put, Mayflies may not help with balancing the systems load, but with the social engineering aspect, it can increase the stability of the system.

Most of the time, the requests for new virtual machines on a cloud are immutable. They came in at a time and need to be fulfilled in the order of their request. Mayflies has the potential to change that. If a request is made, it has the potential to be added to a queue of mayflies that need to be reinitialized. This creates a queue of virtual machine requests that any load balancing algorithm can work with.

Mayflies can make load balancing a system easier. Knowing the exact size of the virtual machine that is going to be added and knowing when it will die makes load balancing for dynamic systems trivial.

OpenStack DefCore Community Review – TWO Sessions April 21 (agenda)

During the DefCore process, we’ve had regular community check points to review and discuss the latest materials from the committee.  With the latest work on the official process and flurry of Guidelines, we’ve got a lot of concrete material to show.

To accommodate global participants, we’ll have TWO sessions (and record both):

  1. April 21 8 am Central (1 pm UTC) https://join.me/874-029-687 
  2. April 21 8 pm Central (9 am Hong Kong) https://join.me/903-179-768 

Eye on OpenStackConsult the call etherpad for call in details and other material.

Planned Agenda:

  • Background on DefCore – very short 10 minutes
    • short description
    • why board process- where community
  •  Interop AND Trademark – why it’s both – 5 minutes
  •  Vendors AND Community – balancing the needs – 5 minutes
  •  Mechanics
    • testing & capabilities – 5 minutes
    • self testing & certification – 5 minutes
    • platform & components & trademark – 5 minutes
  • Quick overview of the the Process (to help w/ reviewers) – 15 minutes
  • How to get involved (Gerrit) – 5 minute

Jazz vs. Symphony: Why micromanaging digital work FAILS.

Third IN AN 8 POST SERIES, BRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

Now that we’ve introduced music as a functional analogy for a stable 21st century leadership model and defined digital work, we’re ready to expose how work actually gets done in the information age.

First, has work really changed?  Yes.  Traditionally there was a distinct difference between organized production and service-based/creative work such as advertising, accounting or medicine.  Solve a problem by looking for clues and coming up with creative solutions to solve it.

Jazz Hands By RevolvingRevolver on DeviantArt http://revolvingrevolver.deviantart.com/

Digital work on the other hand, and more importantly – digital workers, live in a strange limbo of doing creative work but needing business structures and management models that were developed during the industrial age.

In today’s multi-generational workforce, what appears to be a generational divide has transformed into a non-age-specific cultural rift. As Brad and Rob compared notes, we came to believe that what is really happening is a learned difference in the approach to work and work culture.

There is learned difference in the approach to work and work culture that’s more obvious in, but not limited to, digital natives.

In most companies, the executives are traditionalists (Baby Boomers or hand-selected by Boomers).  While previous generations have been trained to follow hierarchy, the new culture values performance, flexibility and teamwork with a less top-down control oriented outlook.

It’s like a symphonic conductor who is used to picking the chair order and directing the tempo is handing out sheet music to a Jazz ensemble.  So how is the traditional manager going to deliver a stellar performance when his performers are Jazz trained?

In traditional concert orchestra, each musician has to go to college, train hard, earn a shot to get into the orchestra, and overtime, work very hard to earn the First Chair position (think earning the corner office).  Once in that position, they stay there until death or retirement.  Anyone who deviates, is fired. Improv is only allowed during certain songs, by a select few.  It’s the workplace equivalent to climbing the corporate ladder.

Most digital workers think they belong to a Jazz ensemble.  

It’s a mistake to believe less organized means less skilled.  Workers in the Jazz model are also talented and trained professionals.  If you look at the careers of Thelonius Monk, Duke Ellington and Dizzy Gillespie, they all had formal training, many started as children.  The same is true for digital workers: many started build job skills as children and then honed their teamwork playing video games.

But can a loosely organized group consistently deliver results? Yes. In fact, they deliver better results!

When a Jazz Improv group plays, they have a rough composition to start with. Each member is given time for a solo.  To the uninitiated there appears to be no leaders in this milieu of talent, but the leader is there.  They just refuse to control the performance; instead, they trust that each member will bring their A Game and perform at 100% of their capacity.

In business, this is scary. Don’t we need someone to check each person’s work? People are just messing around right? I mean, is this actual work? Who is in charge?

In businesss environments that operate more like Jazz, studies have proven that there is a 32% increase in productivity from traditional command and control environments driven by hierarchy.

Age, experience and position are NOT the criteria for the Digital Worker. Output is.  And output is different for each product. Management’s role in this model is to get out of the way and let the musicians create. Instead of conforming to a single style and method, the people producing in the model each bring something unique and also experience a high degree of ownership.

This is a powerful type of workplace diversity: by allowing different ways of problem solving to co-exist, we also make the workplace more inclusive and collaborative.

Sound too good to be true?  In our next post we’ll discuss trust as the critical ingredient for Jazz performance.  (Teaser)

How Nebula shows why the OpenStack community (and OSS in general) should care about profits

Fancy DogsIn dog piling onto the news about early OpenStack provider Nebula’s demise, I’ll use their news to reinforce my belief that open communities need to ensure that funds are flowing to leaders and contributors.  This is one of the reasons that I invest time in DefCore: having a clear base product definition helps create healthy vendors.

Open source software is not simply free VC funded projects.  The industry needs a stable transparent supply chain of software to build on.

I am not asserting anything about Nebula’s business or model.  To me, it is a single downward data point in my ongoing review of vendor health in the OpenStack community.  OpenStack features and functions will not matter if vendors in the community cannot afford to sustain them.

For enterprises to rely on open source software supply chains, they need to make sure they are paying into the community.  The simplest route to “paying in” is through vendors.

Are VMs becoming El Caminos? Containers & Metal provide new choices for DevOps

I released “VMS ARE DEAD” this post two weeks ago on DevOps.com.  My point here is that Ops Automation (aka DevOps) is FINALLY growing beyond Cloud APIs and VMs.  This creates a much richer ecosystem of deployment targets instead of having to shoehorn every workload into the same platform.

In 2010, it looked as if visualization had won. We expected all servers to virtualize workloads and the primary question was which cloud infrastructure manager would dominate. Now in 2015, the picture is not as clear. I’m seeing a trend that threatens the “virtualize all things” battle cry.

IMG_20150301_170558985Really, it’s two intersecting trends: metal is getting cheaper and easier while container orchestration is advancing on rockets. If metal can truck around the heavy stable workloads while containers zip around like sports cars, that leaves VMs as a strange hybrid in the middle.

What’s the middle? It’s the El Camino, that notorious discontinued half car, half pick-up truck.

The explosion of interest in containerized workloads (I know, they’ve been around for a long time but Docker made them sexy somehow) has been creating secondary wave of container orchestration. Five years ago, I called that Platform as a Service (PaaS) but this new generation looks more like a CI/CD pipeline plus DevOps platform than our original PaaS concepts. These emerging pipelines obfuscate the operational environment differently than virtualized infrastructure (let’s call it IaaS). The platforms do not care about servers or application tiers, their semantic is about connecting services together. It’s a different deployment paradigm that’s more about SOA than resource reservation.

On the other side, we’ve been working hard to make physical ops more automated using the same DevOps tool chains. To complicate matters, the physics of silicon has meant that we’ve gone from scale up to scale out. Modern applications are so massive that they are going to exceed any single system so economics drives us to lots and lots of small, inexpensive servers. If you factor in the operational complexity and cost of hypervisors/clouds, an small actual dedicated server is a cost-effective substitute for a comparable virtual machine.

I’ll repeat that: a small dedicated server is a cost-effective substitute for a comparable virtual machine.

I am not speaking against virtualize servers or clouds. They have a critical role in data center operations; however, I hear from operators who are rethinking the idea that all servers will be virtualized and moving towards a more heterogeneous view of their data center. Once where they have a fleet of trucks, sports cars and El Caminos.

Of course, I’d be disingenuous if I neglected to point out that trucks are used to transport cars too. At some point, everything is metal.

Want more metal friendly reading?  See Packet CEO Zac Smith’s thinking on this topic.

My OpenStack Super User Interview [cross-post]

This post of my interview for the OpenStack Super User site originally appeared there on 3/23 under the title “OpenStack at 10: different code, same collaboration?”

With over 15 years of cloud experience, Rob Hirschfeld also goes way back with OpenStack. His involvement dates to before it was officially founded and he was also one of the initial Board Members. In addition to his role as Individual Director, Hirschfeld currently chairs the DefCore committee. He’ll be speaking about DefCore at the upcoming Vancouver Summit with Alan Clark, Egle Sigler and Sean Roberts.

He talks to Superuser about the importance of patches, priorities for 2015 and why you should care about OpenStack vendors making money.

Superuser: You’ve been with the project since before it started, where do you hope it will be in five years?

In five years, I expect that nearly every line of code will have been replaced. The thing that will endure is the community governance and interaction models that we’re working out to ensure commercial collaboration.

[3/24 Added Clarification Note: I find humbled watching traditionally open-unfriendly corporations using OpenStack to learn how to become open source collaborations.  Our governance choices will have long lasting ramifications in the industry.] 

What is something that a lot of people don’t know about OpenStack?

It was essentially a “rewrite fork” of Eucalyptus created because they would not accept patches.  That’s a cautionary tale about why accepting patches is essential that should not get lost from the history books.

Any thoughts on your first steps to the priorities you laid out in your candidacy profile?

I’ve already started to get DefCore into an execution phase with additional Board and Foundation leadership joining into the effort.  We’ve set a very active schedule of meetings with two sub-committees running in parallel…It’s going to be a busy spring.

You say that the company you founded, RackN, is not creating an OpenStack product. How are you connected to the community?

RackN supports OpenCrowbar which provides a physical ready state infrastructure for scale platforms like OpenStack. We are very engaged in the community from below by helping make other distributions, vendors and operators successful.

What are the next steps to creating the “commercially successful ecosystem” you mentioned in your candidacy profile? What are the biggest obstacles to this?

We have to make stability and scale a critical feature. This will mean slowing features and new projects; however, I hear a lot of frustration that OpenStack is not focused on delivering a solid base.

Without a base, the vendors cannot build profitable products.  Without profits, they cannot keep funding the project. This may be radical for an open project, but I think everyone needs to care more if vendors are making money.

What are some more persistent myths about the cloud?

That the word cloud really means anything.  Everyone has their own definition.  Mine is “infrastructure with an API” but I’d happily tell you it’s also about process and ops.

Who are your real-life heroes?

FIRST (For Inspiration and Recognition of Science and Technology) founders Dean Kamen and Woodie Flowers. They executed a real vision about how to train for both competition and collaboration in the next generation of engineers.  Their efforts in building the next generation of leaders really impact how we will should open source collaboration. That’s real innovation.

What do you hope to get out of the next summit?

First, I want to see vendors passing DefCore requirements.  After that, I’d like to see the operators get more equal treatment and I’m hoping to spend more time working with them so they can create places to share knowledge.

What’s your favorite/most important OpenStack debate?

There are two.  First, I think the API vs. implementation is a critical growth curve for OpenStack.  We need to mature past being so implementation driven so we can have stand alone APIs.

Second, I think the “benevolent dictator” discussion is useful. Since we are never going to have one, we need a real discussion about how to define and defend project wide priorities in a meaningful way.  Resolving both items is essential to our long-term viability.

OpenStack DefCore Process Draft Posted for Review [major milestone]

OpenStack DefCore Committee is looking for community feedback about the proposed DefCore Process.

Golden PathMarch has been a month for OpenStack DefCore milestones.  At the March Board meeting, we approved the first official DefCore Guideline (called DefCore 2015.03) and we are poised to commit the first DefCore Process draft.

Once this initial commit is approved by the DefCore Committee (expected at DefCore Scale.8 Meeting 3/25 @ 9 PT), we’ll be ready for broader input by the community using the standard OpenStack Gerrit review process.  If you are not comfortable with Gerrit, we’ll take your input anyway that you want to give it except via telepathy (we’ve already got a lot on our minds).

Note: We’re also looking for input on the 2015.next Guideline targeted for 2015.04,

The DefCore Process documents the rules (who, what, when and where) that will govern how we create the DefCore Guidelines.  By design, it has to be detailed and specific without adding complexity and confusion.  The why of DefCore is all that work we did on principles that shape the process.

This process reflects nearly a year of gestation starting from the June 2014 DefCore face-to-face.  Once of the notable recent refinements was to organize material into time phases and to be more specific about who is responsible for specific actions.

To make review easier, I’ve reposted the draft.  Comments are welcome here and on the patch (and here after it lands).

DRAFT: OpenStack DefCore Process 2015A (reposted from OpenStack/DefCore)

This document describes the DefCore process required by the OpenStack bylaws and approved by the OpenStack Technical Committee and Board.

Expected Time line:

Time Frame Milestone Activities Lead By
-3 months S-3 “preliminary” draft (from current) DefCore
-2 months S-2 ID new Capabilities Community
-1 month S-1 Score capabilities DefCore
Summit S “solid” draft Community
Advisory items selected DefCore
+1 month S+1 self-testing Vendors
+2 months S+2 Test Flagging DefCore
+3 months S+3 Approve Guidance Board

Note: DefCore may accelerate the process to correct errors and omissions.

Process Definition

Continue reading