Short lived VM (Mayflies) research yields surprising scheduling benefit

Last semester, Alex Hirschfeld (my son) did a simulation to explore the possible efficiency benefits of the Mayflies concept proposed by Josh McKenty and me.

Mayflies swarming from Wikipedia

In the initial phase of the research, he simulated a data center using load curves designed to oversubscribe the resources (he’s still interesting in actual load data).  This was sufficient to test the theory and find something surprising: mayflies can really improve scheduling.

Alex found an unexpected benefit comes when you force mayflies to have a controlled “die off.”  It allows your scheduler to be much smarter.

Let’s assume that you have a high mayfly ratio (70%), that means every day 10% of your resources would turn over.  If you coordinate the time window and feed that information into your scheduler, then it can make much better load distribution decisions.  Alex’s simulation showed that this approach basically eliminated hot spots and server over-crowding.

Here’s a snippet of his report explaining the effect in his own words:

On a system that is more consistent and does not have a massive virtual machine through put, Mayflies may not help with balancing the systems load, but with the social engineering aspect, it can increase the stability of the system.

Most of the time, the requests for new virtual machines on a cloud are immutable. They came in at a time and need to be fulfilled in the order of their request. Mayflies has the potential to change that. If a request is made, it has the potential to be added to a queue of mayflies that need to be reinitialized. This creates a queue of virtual machine requests that any load balancing algorithm can work with.

Mayflies can make load balancing a system easier. Knowing the exact size of the virtual machine that is going to be added and knowing when it will die makes load balancing for dynamic systems trivial.

Golang Example JSON REST HTTP Get with Digest Auth

Since I could not find a complete example of a GO REST Call that returned JSON and used Digest Auth (for Crowbar API), I wanted to feed the SEO monster for the next person.

My purpose is to illustrate the pattern, not deliver reference code.  Once I got all the pieces in the right place, the code was wonderfully logical.  The basic workflow is:

  1. define a structure with JSON mapping markup
  2. define an alternate HTTP transport that includes digest auth
  3. enable the client
  4. perform the get request
  5. extract the request body into a stream
  6. decode the stream into the mapped data structure (from step 1)
  7. use the information

Here’s the sample:

package main

import (
“fmt”
digest “code.google.com/p/mlab-ns2/gae/ns/digest”
“encoding/json”
)

// the struct maps to the JSON automatically with the added meta data
type Deployment struct {
ID int `json:”id”`
State int `json:”state”`
Name string `json:”name”`
Description string `json:”description”`
System bool `json:”system”`
ParentID int64 `json:”parent_id”`
CreatedAt string `json:”created_at”`
UpdatedAt string `json:”updated_at”`
}

func main() {

// setup a transport to handle disgest
transport := digest.NewTransport(“crowbar”, “password”)

// initialize the client
client, err := transport.Client()
if err != nil {
return err
}

// make the call (auth will happen)
resp, err := client.Get(“http://127.0.0.1:3000/api/v2/deployments”)
if err != nil {
return err
}
defer resp.Body.Close()

// magic of the structure definition will map automatically
var d []Deployment // it’s an array returned, so we need an array here.
err = json.NewDecoder(resp.Body).Decode(&d)

// print results
fmt.Printf(“Header:%s\n”, resp.Header[“Content-Type”])
fmt.Printf(“Code:%s\n”, resp.Status)
fmt.Printf(“Name:%s\n”, d[0].Name)

}

PS: I’m doing this for a Crowbar Docker Machine driver.

Manage Hardware like a BOSS – latest OpenCrowbar brings API to Physical Gear

A few weeks ago, I posted about VMs being squeezed between containers and metal.   That observation comes from our experience fielding the latest metal provisioning feature sets for OpenCrowbar; consequently, so it’s exciting to see the team has cut the next quarterly release:  OpenCrowbar v2.2 (aka Camshaft).  Even better, you can top it off with official software support.

Camshaft coordinates activity

Dual overhead camshaft housing by Neodarkshadow from Wikimedia Commons

The Camshaft release had two primary objectives: Integrations and Services.  Both build on the unique functional operations and ready state approach in Crowbar v2.

1) For Integrations, we’ve been busy leveraging our ready state API to make physical servers work like a cloud.  It gets especially interesting with the RackN burn-in/tear-down workflows added in.  Our prototype Chef Provisioning driver showed how you can use the Crowbar API to spin servers up and down.  We’re now expanding this cloud-like capability for Saltstack, Docker Machine and Pivotal BOSH.

2) For Services, we’ve taken ops decomposition to a new level.  The “secret sauce” for Crowbar is our ability to interweave ops activity between components in the system.  For example, building a cluster requires setting up pieces on different systems in a very specific sequence.  In Camshaft, we’ve added externally registered services (using Consul) into the orchestration.  That means that Crowbar will either use existing DNS, Database, or NTP services or set it’s own.  Basically, Crowbar can now work FIT YOUR EXISTING OPS ENVIRONMENT without forcing a dedicated Crowbar only services like DHCP or DNS.

In addition to all these features, you can now purchase support for OpenCrowbar from RackN (my company).  The Enterprise version includes additional server life-cycle workflow elements and features like HA and Upgrade as they are available.

There are AMAZING features coming in the next release (“Drill”) including a message bus to broadcast events from the system, more operating systems (ESXi, Xenserver, Debian and Mirantis’ Fuel) and increased integration/flexibility with existing operational environments.  Several of these have already been added to the develop branch.

It’s easy to setup and test OpenCrowbar using containers, VMs or metal.  Want to learn more?  Join our community in Gitteremail list or weekly interactive community meetings (Wednesdays @ 9am PT).

Are VMs becoming El Caminos? Containers & Metal provide new choices for DevOps

I released “VMS ARE DEAD” this post two weeks ago on DevOps.com.  My point here is that Ops Automation (aka DevOps) is FINALLY growing beyond Cloud APIs and VMs.  This creates a much richer ecosystem of deployment targets instead of having to shoehorn every workload into the same platform.

In 2010, it looked as if visualization had won. We expected all servers to virtualize workloads and the primary question was which cloud infrastructure manager would dominate. Now in 2015, the picture is not as clear. I’m seeing a trend that threatens the “virtualize all things” battle cry.

IMG_20150301_170558985Really, it’s two intersecting trends: metal is getting cheaper and easier while container orchestration is advancing on rockets. If metal can truck around the heavy stable workloads while containers zip around like sports cars, that leaves VMs as a strange hybrid in the middle.

What’s the middle? It’s the El Camino, that notorious discontinued half car, half pick-up truck.

The explosion of interest in containerized workloads (I know, they’ve been around for a long time but Docker made them sexy somehow) has been creating secondary wave of container orchestration. Five years ago, I called that Platform as a Service (PaaS) but this new generation looks more like a CI/CD pipeline plus DevOps platform than our original PaaS concepts. These emerging pipelines obfuscate the operational environment differently than virtualized infrastructure (let’s call it IaaS). The platforms do not care about servers or application tiers, their semantic is about connecting services together. It’s a different deployment paradigm that’s more about SOA than resource reservation.

On the other side, we’ve been working hard to make physical ops more automated using the same DevOps tool chains. To complicate matters, the physics of silicon has meant that we’ve gone from scale up to scale out. Modern applications are so massive that they are going to exceed any single system so economics drives us to lots and lots of small, inexpensive servers. If you factor in the operational complexity and cost of hypervisors/clouds, an small actual dedicated server is a cost-effective substitute for a comparable virtual machine.

I’ll repeat that: a small dedicated server is a cost-effective substitute for a comparable virtual machine.

I am not speaking against virtualize servers or clouds. They have a critical role in data center operations; however, I hear from operators who are rethinking the idea that all servers will be virtualized and moving towards a more heterogeneous view of their data center. Once where they have a fleet of trucks, sports cars and El Caminos.

Of course, I’d be disingenuous if I neglected to point out that trucks are used to transport cars too. At some point, everything is metal.

Want more metal friendly reading?  See Packet CEO Zac Smith’s thinking on this topic.

From the archives circa 2001: “logical service cloud” patent

 

 

Sometimes, it’s fun to go back and read old things

Abstract: A virtualized logical server cloud that enables logical servers to exist independent of physical servers that instantiate the logical servers. Servers are treated as logical resources in order to create a logical server cloud. The logical attributes of a logical server are non-deterministically allocated to physical resources creating a cloud of logical servers over the physical servers. Logical separation is facilitated by the addition of a server cloud manager, which is an automated multi-server management layer. Each logical server has persistent attributes that establish its identity. Each physical server includes or is coupled to physical resources including a network resource, a data storage resource and a processor resource. At least one physical server executes virtualization software that virtualizes physical resources for logical servers. The server cloud manager maintains status and instance information for the logical servers including persistent and non-persistent attributes that link each logical server with a physical server

Inventors: Rob Hirschfeld (me) and Dave McCrory.

What Is digital “work?” Can we sell a cloud of smoke? Yes and the impacts are very tangible.

IN THIS second in an 8 POST SERIES, BRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

So, what is a Digital worker?  Before we talk about managing them, we need to agree to the very concept of digital work.

A Digital Worker is someone who creates value primarily by creating virtual goods and services.  This creates a challenge for traditional work because, in the physical world, no material goods were created.

ManagersBack in The Day, this type of work was equivalent to selling day dreams – it had no material value. It was intangible.

To today’s tech savvy workforce, even though their output exists simply as numbers in the “cloud,” digital work is tangible to digital natives.

Tangible work is directly consumable. If I create something I can see it, hold it in my hands. Eat it and enjoy it in the three dimensional meatverse we call “reality.” So, If I baked a pie and Brad ate it then I produced consumable work. That same rule applies for digital work like this blog post that Brad and I produced and you are reading. It’s nothing more than photons on a screen, but the value is immense and you can see the tangible results of our work.

The entire industrial age up until now was driven by a basic premise of effort equals results so eloquently stated by management consultant, Peter Drucker“If you can’t measure it, you can’t manage it.”

But much of what we do in the nascent stage of Digital Age, the beginning of the 21st Century, can NOT be measured using traditional value placements.

Case in point, what happens if we only worked when our spouses told us it was time to stop playing Candy Crush and get back to writing? We’re still producing digital work but now our spouses have taken on the role of managers. While they played an essential part in the content being created, their input is intangible and something that cannot be measured. Our spouse becomes the influencer in this model.

We need to revisit “If You Can’t Measure It You Can’t Manage.”  It is BS! no longer applies in digital work.

This distinction is important because we want to distinguish between digital workers and managers. They do very similar actions (type on keyboards, send email, go to meetings), but one creates digital goods while the other coordinates the creation of digital goods.

In the world of physical goods, the people coordinating HOW the work gets done have a significant amount of power. They provide the raw materials, tools, capital, supply chains and other requirements to get the goods to market. i.e…logistics. The actions of any single worker cannot scale in a meaningful way without management being involved; consequently, management has a tremendous amount of power (and corresponding respect) in the worker-manager relationship paradigm. This is not just for industrial work, the same applies to farming, singing, writing or other industries and defines most work in the pre-digital world of the Boomers, Traditionalists and earlier generations.

But let’s extend our simple example to a team of animators creating special effects for a movie. Pixar for example. The work requires each member of the team to already be up to speed on their specific role in the animation process. Whether a sculptor, character development, a digital set designer or character animator, each member knows what they need to do to be their very best, and how to reach their own deadlines. They are self managed and the very best at their jobs. And each is in charge of creating from the digital universe the same logistics mentioned above. Instead of management providing that support, the digital worker is their own support within the team.

The digital world inverts the traditional worker-manager dynamic.

With digital goods, the raw materials, tools, capital, supply chains and other requirements to get the goods to market. i.e…again, logistics are readily available so the worker’s creativity and effort become the critical resource.

A so called “manager” in this framework has one job: to provide the support and right environment to get the work done. Like a beekeeper, he must trust that each bee knows how to create honey. His or her job is to make their jobs friction free by making the environment the very best to get that work done. Trust is the key word.

bunny slippersThere is still need for management and coordination, but the power dynamic has been radically altered. While anyone could follow Rob’s pie recipe, you cannot simply replace his role as co-author on this blog post. Even more radical, there’s often no perceived need for managers at all!  Digital workers simply order pizza and produce digital goods in their bathrobe and bunny slippers.

While this vision is held as a core belief by many digital natives, we don’t believe it entirely.

But wait Rob and Brad, what about those YouTube millionaires who upload cat videos and cash in?  In those cases, there is a lot of invisible coordination in the distribution channel. The massive infrastructure needed to deliver Grumpy Cat is also digital work and Google invests vast sums of money to reduce the friction connecting those content creators and consumers. [Google and YouTube are the beekeeper.]

We believe the need for coordination of digital work is a critical and necessary component for real digital work to get done on time. Unfortunately, the inversion of power means that managers have neither the authority not resource controls that were in place when “modern management techniques” were created.

Our focus here is not on the lone wolf digital workers, but instead, we are focused on the collaborative digital worker. Those people who must collaborate with each other to deliver their goods. For those workers, there is a need for capital, supply chains and coordination. Their work is just a bit of the larger digital whole.

If “modern management” does not work for digital workers then what does?

Let’s keep in mind as we explore this discussion that these are High Trust environments and the subject for our next 6 posts.

Talking Functional Ops & Bare Metal DevOps with vBrownBag [video]

Last Wednesday (3/11/15), I had the privilege of talking with the vBrownBag crowd about Functional Ops and bare metal deployment.  In this hour, I talk about how functional operations (FuncOps) works as an extension of ready state.  FuncOps is a critical concept for providing abstractions to scale heterogeneous physical operations.

Timing for this was fantastic since we’d just worked out ESXi install capability for OpenCrowbar (it will exposed for work starting on Drill, the next Crowbar release cycle).

Here’s the brown bag:

If you’d like to see a demo, I’ve got hours of them posted:

Video Progression

Crowbar v2.1 demo: Visual Table of Contents [click for playlist]