Problems with the “Give me a Wookiee” hybrid API

Greg Althaus, RackN CTO, creates amazing hybrid DevOps orchestration that spans metal and cloud implementations.  When it comes to knowing the nooks and crannies of data centers, his ops scar tissue has scar tissue.  So, I knew you’d all enjoy this funny story he wrote after previewing my OpenStack API report.  

“APIs are only valuable if the parameters mean the same thing and you get back what you expect.” Greg Althaus

The following is a guest post by Greg:

While building the Digital Rebar OpenStack node provider, Rob Hirschfeld tried to integrate with 7+ OpenStack clouds.  While the APIs matched across instances, there are all sorts of challenges with what comes out of the API calls.  

The discovery made me realize that APIs are not the end of interoperability.  They are the beginning.  

I found I could best describe it with a story.

I found an API on a service and that API creates a Wookiee!

I can tell the API that I want a tall or short Wookiee or young or old Wookiee.  I test against the Kashyyyk service.  I consistently get a 8ft Brown 300 year old Wookiee when I ask for a Tall Old Wookiee.  

I get a 6ft Brown 50 Year old Wookiee when I ask for a Short Young Wookiee.  Exactly what I want, all the time.  

My pointy-haired emperor boss says I need to now use the Forest Moon of Endor (FME) Service.  He was told it is the exact same thing but cheaper.  Okay, let’s do this.  It consistently gives me 5 year old 4 ft tall Brown Ewok (called a Wookiee) when I ask for the Tall Young Wookiee.  

This is a fail.  I mean, yes, they are both furry and brown, but the Ewok can’t reach the top of my bookshelf.  

The next service has to work, right?  About the same price as FME, the Tatooine Service claims to be really good too.  It passes tests.  It hands out things called Wookiees.  The only problem is that, while size is an API field, the service requires the use of petite and big instead of short and tall.  This is just annoying.  This time my tall (well big) young Wookiee is 8 ft tall and 50 years old, but it is green and bald (scales are like that).  

I don’t really know what it is.  I’m sure it isn’t a Wookiee.  

And while she is awesome (better than the male Wookiees), she almost froze to death in the arctic tundra that is Boston.  

My point: APIs are only valuable if the parameters mean the same thing and you get back what you expect.

 

Repair Kenmore Dishwasher Error F4E1 needs THIS Reset Code

This post is paying it forward on the SEO for people repairing their own Kenmore Dishwasher.  The repair is VERY managable but requires an undocumented clear code at the end of the replacement.

20160312_104922.jpgIf you get an F4-E1 (washer pump bad) code, then you MUST clear the code after you replace the drive motor.  The reset code is pressing any three bottons in a 1-2-3 sequence three times (so 1-2-3, 1-2-3, 1-2-3).  I took a picture of my unit with all lights on after I entered the diagnostic code.

Kudos to Kenmore for making their unit incredibly servicable.  

Replacing the drive components was EASY because of their thoughtful design. The repair basically replaced all the mechanical parts of the device as a single unit for $250 (a new unit is about $900).  Once I had the drive, it only required removing a few obvious connections.  The parts effectively snap together.

My only element of panic when I put the unit back together with the new parts and got the original code again.  Entering the clear code made everything work.  While it’s easy to find the parts, the reset code is NOT easy to find.  Hopefully, this post helps you resolve this final step in the repair!

A year of RackN – 9 lessons from the front lines of evangalizing open physical ops

Let’s avoid this > “We’re heading right at the ground, sir!  Excellent, all engines full power!

another scale? oars & motors. WWF managing small scale fisheries

RackN is refining our from “start to scale” message and it’s also our 1 year anniversary so it’s natural time for reflection. While it’s been a year since our founders made RackN a full time obsession, the team has been working together for over 5 years now with the same vision: improve scale datacenter operations.

As a backdrop, IT-Ops is under tremendous pressure to increase agility and reduce spending.  Even worse, there’s a building pipeline of container driven change that we are still learning how to operate.

Over the year, we learned that:

  1. no one has time to improve ops
  2. everyone thinks their uniqueness is unique
  3. most sites have much more in common than is different
  4. the differences between sites are small
  5. small differences really do break automation
  6. once it breaks, it’s much harder to fix
  7. everyone plans to simplify once they stop changing everything
  8. the pace of change is accelerating
  9. apply, rinse, repeat with lesson #1

Where does that leave us besides stressed out?  Ops is not keeping up.  The solution is not to going faster: we have to improve first and then accelerate.

What makes general purpose datacenter automation so difficult?  The obvious answer, variation, does not sufficiently explain the problem. What we have been learning is that the real challenge is ordering of interdependencies.  This is especially true on physical systems where you have to really grok* networking.

The problem would be smaller if we were trying to build something for a bespoke site; however, I see ops snowflaking as one of the most significant barriers for new technologies. At RackN, we are determined to make physical ops repeatable and portable across sites.

What does that heterogeneous-first automation look like? First, we’ve learned that to adapt to customer datacenters. That means using the DNS, DHCP and other services that you already have in place. And dealing with heterogeneous hardware types and a mix of devops tools. It also means coping with arbitrary layer 2 and layer 3 networking topologies.

This was hard and tested both our patience and architecture pattern. It would be much easier to enforce a strict hardware guideline, but we knew that was not practical at scale. Instead, we “declared defeat” about forcing uniformity and built software that accepts variation.

So what did we do with a year?  We had to spend a lot of time listening and learning what “real operations” need.   Then we had to create software that accommodated variation without breaking downstream automation.  Now we’ve made it small enough to run on a desktop or cloud for sandboxing and a new learning cycle begins.

We’d love to have you try it out: rebar.digital.

* Grok is the correct work here.  Thinking that you “understand networking” is often more dangerous when it comes to automation.

Transitioning from a Bossy Boss into a Digital Age Leader [Series Conclusion]

Now that we are to the end of our 8 POST SERIES, BRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

We hope you’ve enjoyed our discussion about digital management over the last seven posts. This series was born of our frustration with patterns of leadership in digital organizations: overly directing leaders stifle their team while hands-off leaders fail to provide critical direction. Neither culture is leading effectively!

Digital managers have to be two things at once

We felt that our “cultural intuition” is failing us.  That drove us to describe what’s broken and how to fix it.

Digital work and workers operate in a new model where top-down management is neither appropriate nor effective. To point, many digital workers actively resist being given too much direction, rules or structure. No, we are not throwing out management; on the contrary, we believe management is more important than ever, but changes to both work and workers has made it much harder than before.

That’s especially true when Boomers and Millennials try to work together because of differences in leadership experience and expectation. As Brad is always pointing out in his book Liquid Leadership, “what motivates a Millennial will not motivate a Boomer,” or even a Gen Xer.

Millennials may be so uncomfortable having to set limits and enforce decisions that they avoid exerting the very leadership that digital workers need! While GenX and Boomers may be creating and expecting unrealistic deadlines simply because they truly do not understand the depth of the work involved.

So who’s right and who’s wrong? As we’ve pointed out in previous posts, it’s neither! Why? Because unlike Industrial Age Models, there is no one way to get something done in The Information Age.

We desperately need a management model that works for everyone. How does a digital manager know when it’s time to be directing? If you’ve communicated a shared purpose well then you are always at liberty to 1) ask your team if this is aligned and 2) quickly stop any activity that is not aligned.

The trap we see for digital managers who have not communicated the shared goals is that they lack the team authority to take the lead.

We believe that digital leadership requires finding a middle ground using these three guidelines:

  1. Clearly express your intent and trust, don’t force, your team will follow it
  2. Respect your teams’ ability to make good decisions around the intent.
  3. Don’t be shy to exercise your authority when your team needs direction

Digital management is hard: you don’t get the luxury of authority or the comfort of certainty.

If you are used to directing then you have to trust yourself to communicate clearly at an abstract level and then let go of the details. If you are used to being hands-off then you have to get over being specific and assertive when the situation demands it.

Our frustration was that neither Boomer nor Millennial culture is providing effective management. Instead, we realized that elements of both are required. It’s up to the digital manager to learn when each mode is required.

Thank you for following along. It has been an honor.

The Matrix & Surrogates as an analogies for VMs, Containers and Metal

010312_1546_2012CloudOu1.jpgTrench coats aside, I used The Matrix as a useful analogy to explain visualization and containers to a non-technical friend today.  I’m interested in hearing from others if this is a helpful analogy.

Why does anyone care about virtual servers?

Virtual servers (aka virtual machines or VMs) are important because data centers are just like the Matrix.  The real world of data centers is a ugly, messy place fraught with hidden dangers and unpleasant chores.  Most of us would rather take the blue pill and live in a safe computer generated artificial environment where we can ignore those details and live in the convenient abstraction of Mega City.

Do VMs really work to let you ignore the physical stuff?

Pretty much.  For most people, they can live their whole lives within the virtual world.  They can think they are eating the steak and never try bending the spoons.

So why are containers disruptive?  

Well, it’s like the Surrogates movie.  Right now, a lot of people living in the Cloud Matrix are setting up even smaller bubbles.  They are finding that they don’t need a whole city, they can just live inside a single room.  For them, it’s more like Surrogates where people never leave their single room.

But if they never leave the container, do they need the Matrix?

No.  And that’s the disruption.  If you’ve wrapped yourself in a smaller bubble then you really don’t need to larger wrapper.

What about that messy “real world”?

It’s still out there in both cases.  Just once you are inside the inner bubble, you can’t really tell the difference.

Short lived VM (Mayflies) research yields surprising scheduling benefit

Last semester, Alex Hirschfeld (my son) did a simulation to explore the possible efficiency benefits of the Mayflies concept proposed by Josh McKenty and me.

Mayflies swarming from Wikipedia

In the initial phase of the research, he simulated a data center using load curves designed to oversubscribe the resources (he’s still interesting in actual load data).  This was sufficient to test the theory and find something surprising: mayflies can really improve scheduling.

Alex found an unexpected benefit comes when you force mayflies to have a controlled “die off.”  It allows your scheduler to be much smarter.

Let’s assume that you have a high mayfly ratio (70%), that means every day 10% of your resources would turn over.  If you coordinate the time window and feed that information into your scheduler, then it can make much better load distribution decisions.  Alex’s simulation showed that this approach basically eliminated hot spots and server over-crowding.

Here’s a snippet of his report explaining the effect in his own words:

On a system that is more consistent and does not have a massive virtual machine through put, Mayflies may not help with balancing the systems load, but with the social engineering aspect, it can increase the stability of the system.

Most of the time, the requests for new virtual machines on a cloud are immutable. They came in at a time and need to be fulfilled in the order of their request. Mayflies has the potential to change that. If a request is made, it has the potential to be added to a queue of mayflies that need to be reinitialized. This creates a queue of virtual machine requests that any load balancing algorithm can work with.

Mayflies can make load balancing a system easier. Knowing the exact size of the virtual machine that is going to be added and knowing when it will die makes load balancing for dynamic systems trivial.

Golang Example JSON REST HTTP Get with Digest Auth

Since I could not find a complete example of a GO REST Call that returned JSON and used Digest Auth (for Digital Rebar API), I wanted to feed the SEO monster for the next person.

My purpose is to illustrate the pattern, not deliver reference code.  Once I got all the pieces in the right place, the code was wonderfully logical.  The basic workflow is:

  1. define a structure with JSON mapping markup
  2. define an alternate HTTP transport that includes digest auth
  3. enable the client
  4. perform the get request
  5. extract the request body into a stream
  6. decode the stream into the mapped data structure (from step 1)
  7. use the information

Here’s the sample:

package main

import (
“fmt”
digest “code.google.com/p/mlab-ns2/gae/ns/digest”
“encoding/json”
)

// the struct maps to the JSON automatically with the added meta data
type Deployment struct {
ID int `json:”id”`
State int `json:”state”`
Name string `json:”name”`
Description string `json:”description”`
System bool `json:”system”`
ParentID int64 `json:”parent_id”`
CreatedAt string `json:”created_at”`
UpdatedAt string `json:”updated_at”`
}

func main() {

// setup a transport to handle disgest
transport := digest.NewTransport(“crowbar”, “password”)

// initialize the client
client, err := transport.Client()
if err != nil {
return err
}

// make the call (auth will happen)
resp, err := client.Get(“http://127.0.0.1:3000/api/v2/deployments”)
if err != nil {
return err
}
defer resp.Body.Close()

// magic of the structure definition will map automatically
var d []Deployment // it’s an array returned, so we need an array here.
err = json.NewDecoder(resp.Body).Decode(&d)

// print results
fmt.Printf(“Header:%s\n”, resp.Header[“Content-Type”])
fmt.Printf(“Code:%s\n”, resp.Status)
fmt.Printf(“Name:%s\n”, d[0].Name)

}

PS: I’m doing this for the  Digital Rebar API driver because it uses REST and Digest.  We’re actively maintaining it there if you want the latest.

Are VMs becoming El Caminos? Containers & Metal provide new choices for DevOps

I released “VMS ARE DEAD” this post two weeks ago on DevOps.com.  My point here is that Ops Automation (aka DevOps) is FINALLY growing beyond Cloud APIs and VMs.  This creates a much richer ecosystem of deployment targets instead of having to shoehorn every workload into the same platform.

In 2010, it looked as if visualization had won. We expected all servers to virtualize workloads and the primary question was which cloud infrastructure manager would dominate. Now in 2015, the picture is not as clear. I’m seeing a trend that threatens the “virtualize all things” battle cry.

IMG_20150301_170558985Really, it’s two intersecting trends: metal is getting cheaper and easier while container orchestration is advancing on rockets. If metal can truck around the heavy stable workloads while containers zip around like sports cars, that leaves VMs as a strange hybrid in the middle.

What’s the middle? It’s the El Camino, that notorious discontinued half car, half pick-up truck.

The explosion of interest in containerized workloads (I know, they’ve been around for a long time but Docker made them sexy somehow) has been creating secondary wave of container orchestration. Five years ago, I called that Platform as a Service (PaaS) but this new generation looks more like a CI/CD pipeline plus DevOps platform than our original PaaS concepts. These emerging pipelines obfuscate the operational environment differently than virtualized infrastructure (let’s call it IaaS). The platforms do not care about servers or application tiers, their semantic is about connecting services together. It’s a different deployment paradigm that’s more about SOA than resource reservation.

On the other side, we’ve been working hard to make physical ops more automated using the same DevOps tool chains. To complicate matters, the physics of silicon has meant that we’ve gone from scale up to scale out. Modern applications are so massive that they are going to exceed any single system so economics drives us to lots and lots of small, inexpensive servers. If you factor in the operational complexity and cost of hypervisors/clouds, an small actual dedicated server is a cost-effective substitute for a comparable virtual machine.

I’ll repeat that: a small dedicated server is a cost-effective substitute for a comparable virtual machine.

I am not speaking against virtualize servers or clouds. They have a critical role in data center operations; however, I hear from operators who are rethinking the idea that all servers will be virtualized and moving towards a more heterogeneous view of their data center. Once where they have a fleet of trucks, sports cars and El Caminos.

Of course, I’d be disingenuous if I neglected to point out that trucks are used to transport cars too. At some point, everything is metal.

Want more metal friendly reading?  See Packet CEO Zac Smith’s thinking on this topic.

From the archives circa 2001: “logical service cloud” patent

 

 

Sometimes, it’s fun to go back and read old things

Abstract: A virtualized logical server cloud that enables logical servers to exist independent of physical servers that instantiate the logical servers. Servers are treated as logical resources in order to create a logical server cloud. The logical attributes of a logical server are non-deterministically allocated to physical resources creating a cloud of logical servers over the physical servers. Logical separation is facilitated by the addition of a server cloud manager, which is an automated multi-server management layer. Each logical server has persistent attributes that establish its identity. Each physical server includes or is coupled to physical resources including a network resource, a data storage resource and a processor resource. At least one physical server executes virtualization software that virtualizes physical resources for logical servers. The server cloud manager maintains status and instance information for the logical servers including persistent and non-persistent attributes that link each logical server with a physical server

Inventors: Rob Hirschfeld (me) and Dave McCrory.

What Is digital “work?” Can we sell a cloud of smoke? Yes and the impacts are very tangible. [2 of 8]

IN THIS second in an 8 POST SERIES, BRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

So, what is a Digital worker?  Before we talk about managing them, we need to agree to the very concept of digital work.

A Digital Worker is someone who creates value primarily by creating virtual goods and services.  This creates a challenge for traditional work because, in the physical world, no material goods were created.

ManagersBack in The Day, this type of work was equivalent to selling day dreams – it had no material value. It was intangible.

To today’s tech savvy workforce, even though their output exists simply as numbers in the “cloud,” digital work is tangible to digital natives.

Tangible work is directly consumable. If I create something I can see it, hold it in my hands. Eat it and enjoy it in the three dimensional meatverse we call “reality.” So, If I baked a pie and Brad ate it then I produced consumable work. That same rule applies for digital work like this blog post that Brad and I produced and you are reading. It’s nothing more than photons on a screen, but the value is immense and you can see the tangible results of our work.

The entire industrial age up until now was driven by a basic premise of effort equals results so eloquently stated by management consultant, Peter Drucker“If you can’t measure it, you can’t manage it.”

But much of what we do in the nascent stage of Digital Age, the beginning of the 21st Century, can NOT be measured using traditional value placements.

Case in point, what happens if we only worked when our spouses told us it was time to stop playing Candy Crush and get back to writing? We’re still producing digital work but now our spouses have taken on the role of managers. While they played an essential part in the content being created, their input is intangible and something that cannot be measured. Our spouse becomes the influencer in this model.

We need to revisit “If You Can’t Measure It You Can’t Manage.”  It is BS! no longer applies in digital work.

This distinction is important because we want to distinguish between digital workers and managers. They do very similar actions (type on keyboards, send email, go to meetings), but one creates digital goods while the other coordinates the creation of digital goods.

In the world of physical goods, the people coordinating HOW the work gets done have a significant amount of power. They provide the raw materials, tools, capital, supply chains and other requirements to get the goods to market. i.e…logistics. The actions of any single worker cannot scale in a meaningful way without management being involved; consequently, management has a tremendous amount of power (and corresponding respect) in the worker-manager relationship paradigm. This is not just for industrial work, the same applies to farming, singing, writing or other industries and defines most work in the pre-digital world of the Boomers, Traditionalists and earlier generations.

But let’s extend our simple example to a team of animators creating special effects for a movie. Pixar for example. The work requires each member of the team to already be up to speed on their specific role in the animation process. Whether a sculptor, character development, a digital set designer or character animator, each member knows what they need to do to be their very best, and how to reach their own deadlines. They are self managed and the very best at their jobs. And each is in charge of creating from the digital universe the same logistics mentioned above. Instead of management providing that support, the digital worker is their own support within the team.

The digital world inverts the traditional worker-manager dynamic.

With digital goods, the raw materials, tools, capital, supply chains and other requirements to get the goods to market. i.e…again, logistics are readily available so the worker’s creativity and effort become the critical resource.

A so called “manager” in this framework has one job: to provide the support and right environment to get the work done. Like a beekeeper, he must trust that each bee knows how to create honey. His or her job is to make their jobs friction free by making the environment the very best to get that work done. Trust is the key word.

bunny slippersThere is still need for management and coordination, but the power dynamic has been radically altered. While anyone could follow Rob’s pie recipe, you cannot simply replace his role as co-author on this blog post. Even more radical, there’s often no perceived need for managers at all!  Digital workers simply order pizza and produce digital goods in their bathrobe and bunny slippers.

While this vision is held as a core belief by many digital natives, we don’t believe it entirely.

But wait Rob and Brad, what about those YouTube millionaires who upload cat videos and cash in?  In those cases, there is a lot of invisible coordination in the distribution channel. The massive infrastructure needed to deliver Grumpy Cat is also digital work and Google invests vast sums of money to reduce the friction connecting those content creators and consumers. [Google and YouTube are the beekeeper.]

We believe the need for coordination of digital work is a critical and necessary component for real digital work to get done on time. Unfortunately, the inversion of power means that managers have neither the authority not resource controls that were in place when “modern management techniques” were created.

Our focus here is not on the lone wolf digital workers, but instead, we are focused on the collaborative digital worker. Those people who must collaborate with each other to deliver their goods. For those workers, there is a need for capital, supply chains and coordination. Their work is just a bit of the larger digital whole.

If “modern management” does not work for digital workers then what does?

Let’s keep in mind as we explore this discussion that these are High Trust environments and the subject for our next 6 posts.  Read post 3.