Docker-Machine Crowbar Driver Delivers Metal Containers

I’ve just completed a basic Docker Machine driver for OpenCrowbar.  This enables you to quickly spin-up (and down) remote Docker hosts on bare metal servers from their command line tool.  There are significant cost, simplicity and performance advantages for this approach if you were already planning to dedicate servers to container workloads.

Docker Machine

The basics are pretty simple: using Docker Machine CLI you can “create” and “rm” new Docker hosts on bare metal using the crowbar driver.  Since we’re talking about metal, “create” is really “assign a machine from an available pool.”

Behind the scenes Crowbar is doing a full provision cycle of the system including installing the operating system and injecting the user keys.  Crowbar’s design would allow operators to automatically inject additional steps, add monitoring agents and security, to the provisioning process without changing the driver operation.

Beyond Create, the driver supports the other Machine verbs like remove, stop, start, ssh and inspect.  In the case of remove, the Machine is cleaned up and put back in the pool for the next user [note: work remains on the full remove>recreate process].

Overall, this driver allows Docker Machine to work transparently against metal infrastructure along side whatever cloud services you also choose.

Want to try it out?

  1. You need to setup OpenCrowbar – if you follow the defaults (192.168.124.10 ip, user, password) then the Docker Machine driver defaults will also work. Also, make sure you have the Ubuntu 14.04 ISO available for the Crowbar provisioner
  2. Discover some nodes in Crowbar – you do NOT need metal servers to try this, the tests work fine with virtual machines (tools/kvm-slave &)
  3. Clone my Machine repo (Wde’re looking for feedback before a pull to Docker/Machine)
  4. Compile the code using script/build.
  5. Allocate a Docker Node using  ./docker-machine create –driver crowbar testme
  6. Go to the Crowbar UI to watch the node be provisioned and configured into the Docker-Machines pool
  7. Release the node using ./docker-machine rm testme
  8. Go to the Crowbar UI to watch the node be redeployed back to the System pool
  9. Try to contain your enthusiasm 🙂

Want More?  Linux binary & readme.

OpenSource.com Interview on DefCore, project management, and the future of OpenStack

Reposted from My Interview with RedHat’s OpenSource.com Jason Baker

Rob Hirschfeld has been involved with OpenStack since before the project was even officially formed, and so he brings a rich perspective as to the project’s history, its organization, and where it may be headed next. Recently, he has focused primarily on the physical infrastructure automation space, working with an an enterprise version of OpenCrowbar, an “API-driven metal” project which started as an OpenStack installer and moved to a generic workload underlay.

Rob is speaking on two panels at the upcoming OpenStack Summit in Vancouver, including DefCore 2015 and the State of OpenStack Project Management. We caught up with Rob to get updates about these two topics and what else lies ahead for OpenStack.

We asked you to help walk us through DefCore as it was being developed last year; just as a reminder, what is DefCore and why should people care about it?

DefCore creates a minimal definition for OpenStack vendors to help ensure interoperability and stability for the user community. While DefCore definitions apply only to vendors asking to use the OpenStack trademark, there are technical impacts on the tests and APIs that we select as required. We’ve worked hard to make sure the that selection process for picking “core” is transparent and fair.

What did the changes approved by the OpenStack Foundation membership earlier this year mean for DefCore?

The by-laws changes approved by the community were important to allow us to use DefCore more granular definition of Core. The previous by-laws were much more project focused. The changes allow us to select specific APIs and code components from a project as required instead of picking everything blindly. That allows projects to have both stable and new innovative components.

What can we expect from OpenStack’s structure and organization as we move forward towards the next release?

There are a lot of changes still to come. The technical leadership is making it easier to become part of the OpenStack code base. I’ve written about this change having potentially both positive and negative impacts on OpenStack to make it appear more like a suite of projects than a tightly integrate product. In many ways, DefCore helps vendors define OpenStack as a product as the community is expanding to include more capabilities. In my discussions, this is a good balance.

Switching gears a bit, you’ve also been heavily involved in the OpenStack project management working group. How has that group been progressing since they convened at the Paris Summit?

This group has made a lot of progress. We’ve seen non-board leadership step in and lead the group. That leadership is more organic and based in the companies that are directly contributing. I think that’s resulted in a lot of good ideas and documentation from the group. We’ll see some excellent results in Vancouver from them. It’s going to come back to the community and technical leadership to leverage that work. I think that’s the real test: we have to share ownership of direction between multiple perspectives. The first step in doing that is writing it down (which is what they have been doing).

Aside from the organization, let’s talk about the software itself. What are you hoping to see from the Liberty release?

I’m hoping to see adoption of Neutron accelerate. Having two network approaches makes it impossible to really have an interoperability story. That means Neutron has to be working technically, but also for operators and users. To be brutally honest, it also has to overcome its own reputation. If Neutron does not become the dominate choice, we are going to effectively have two major flavors of OpenStack. From the DefCore, vendor, or user perspective, that’s a very challenging position.

Anything else you’d like to add?

We’ve accomplished a lot together. In some ways, chasing too many targets is our biggest threat. I think that container workloads and orchestration are already being very disruptive for OpenStack. I’m hoping that we focus on delivering a stable core infrastructure. That’s why I’ve been working so hard on DefCore. Looking forward, there’s an increasing risk of trying to chase too many targets and losing the core of what users want.

This article is part of the Speaker Interview Series for OpenStack Summit Vancouver, a five-day conference for developers, users, and administrators of OpenStack Cloud Software.

As CloudFoundry Builds Ecosystem and Utility, What Challenges Arise? (observations from CFSummit)

I’ve been on the outskirts of the CloudFoundry (CF) universe from the dawn of the project (it’s a little remembered fact that there was a 2011 Crowbar install of CloudFoundry.

openProgress and investment have been substantial and, happily, organic. Like many platforms, it’s success relies on a reasonable balance between strong opinions about “right” patterns and enough flexibility to accommodate exceptions.

From a well patterned foundation, development teams find acceleration.  This seems to be helping CloudFoundry win some high-profile enterprise adopters.

The interesting challenge ahead of the project comes from building more complex autonomous deployments. With the challenge of horizontal scale of arguably behind them, CF users are starting to build more complex architectures.  This includes dynamic provisioning of the providers (like data bases, object stores and other persistent adjacent services) and connecting to containerized “micro-services.”  (see Matt Stine’s preso)

While this is a natural evolution, it adds an order of magnitude more complexity because the contracts between previously isolated layers are suddenly not reliable.

For example, what happens to a CF deployment when the database provider is field upgraded to a new version.  That could introduce breaking changes in dependent applications that are completely opaque to the data provider.  These are hard problems to solve.

Happily, that’s exactly the discussions that we’re starting to have with container orchestration systems.  It’s also part of the dialog that I’ve been trying to drive with Functional Operations (FuncOps Preso) on the physical automation side.  I’m optimistic that CloudFoundry patterns will help make this problem more tractable.

Hidden costs of Cloud? No surprises, it’s still about complexity = people cost

Last week, Forbes and ZDnet posted articles discussing the cost of various cloud (451 source material behind wall) full of dollar per hour costs analysis.  Their analysis talks about private infrastructure being an order of magnitude cheaper (yes, cheaper) to own than public cloud; however, the open source price advantages offered by OpenStack are swallowed by added cost of finding skilled operators and its lack of maturity.

At the end of the day, operational concerns are the differential factor.

The Magic 8 Cube

The Magic 8 Cube

These articles get tied down into trying to normalize clouds to $/vm/hour analysis and buried the lead that the operational decisions about what contributes to cloud operational costs.   I explored this a while back in my “magic 8 cube” series about six added management variations between public and private clouds.

In most cases, operations decisions is not just about cost – they factor in flexibility, stability and organizational readiness.  From that perspective, the additional costs of public clouds and well-known stacks (VMware) are easily justified for smaller operations.  Using alternatives means paying higher salaries and finding talent that requires larger scale to justify.

Operational complexity is a material cost that strongly detracts from new platforms (yes, OpenStack – we need to address this!)

Unfortunately, it’s hard for people building platforms to perceive the complexity experienced by people outside their community.  We need to make sure that stability and operability are top line features because complexity adds a very real cost because it comes directly back to cost of operation.

In my thinking, the winners will be solutions that reduce BOTH cost and complexity.  I’ve talked about that in the past and see the trend accelerating as more and more companies invest in ops automation.

Short lived VM (Mayflies) research yields surprising scheduling benefit

Last semester, Alex Hirschfeld (my son) did a simulation to explore the possible efficiency benefits of the Mayflies concept proposed by Josh McKenty and me.

Mayflies swarming from Wikipedia

In the initial phase of the research, he simulated a data center using load curves designed to oversubscribe the resources (he’s still interesting in actual load data).  This was sufficient to test the theory and find something surprising: mayflies can really improve scheduling.

Alex found an unexpected benefit comes when you force mayflies to have a controlled “die off.”  It allows your scheduler to be much smarter.

Let’s assume that you have a high mayfly ratio (70%), that means every day 10% of your resources would turn over.  If you coordinate the time window and feed that information into your scheduler, then it can make much better load distribution decisions.  Alex’s simulation showed that this approach basically eliminated hot spots and server over-crowding.

Here’s a snippet of his report explaining the effect in his own words:

On a system that is more consistent and does not have a massive virtual machine through put, Mayflies may not help with balancing the systems load, but with the social engineering aspect, it can increase the stability of the system.

Most of the time, the requests for new virtual machines on a cloud are immutable. They came in at a time and need to be fulfilled in the order of their request. Mayflies has the potential to change that. If a request is made, it has the potential to be added to a queue of mayflies that need to be reinitialized. This creates a queue of virtual machine requests that any load balancing algorithm can work with.

Mayflies can make load balancing a system easier. Knowing the exact size of the virtual machine that is going to be added and knowing when it will die makes load balancing for dynamic systems trivial.

OpenStack DefCore Community Review – TWO Sessions April 21 (agenda)

During the DefCore process, we’ve had regular community check points to review and discuss the latest materials from the committee.  With the latest work on the official process and flurry of Guidelines, we’ve got a lot of concrete material to show.

To accommodate global participants, we’ll have TWO sessions (and record both):

  1. April 21 8 am Central (1 pm UTC) https://join.me/874-029-687 
  2. April 21 8 pm Central (9 am Hong Kong) https://join.me/903-179-768 

Eye on OpenStackConsult the call etherpad for call in details and other material.

Planned Agenda:

  • Background on DefCore – very short 10 minutes
    • short description
    • why board process- where community
  •  Interop AND Trademark – why it’s both – 5 minutes
  •  Vendors AND Community – balancing the needs – 5 minutes
  •  Mechanics
    • testing & capabilities – 5 minutes
    • self testing & certification – 5 minutes
    • platform & components & trademark – 5 minutes
  • Quick overview of the the Process (to help w/ reviewers) – 15 minutes
  • How to get involved (Gerrit) – 5 minute

Golang Example JSON REST HTTP Get with Digest Auth

Since I could not find a complete example of a GO REST Call that returned JSON and used Digest Auth (for Digital Rebar API), I wanted to feed the SEO monster for the next person.

My purpose is to illustrate the pattern, not deliver reference code.  Once I got all the pieces in the right place, the code was wonderfully logical.  The basic workflow is:

  1. define a structure with JSON mapping markup
  2. define an alternate HTTP transport that includes digest auth
  3. enable the client
  4. perform the get request
  5. extract the request body into a stream
  6. decode the stream into the mapped data structure (from step 1)
  7. use the information

Here’s the sample:

package main

import (
“fmt”
digest “code.google.com/p/mlab-ns2/gae/ns/digest”
“encoding/json”
)

// the struct maps to the JSON automatically with the added meta data
type Deployment struct {
ID int `json:”id”`
State int `json:”state”`
Name string `json:”name”`
Description string `json:”description”`
System bool `json:”system”`
ParentID int64 `json:”parent_id”`
CreatedAt string `json:”created_at”`
UpdatedAt string `json:”updated_at”`
}

func main() {

// setup a transport to handle disgest
transport := digest.NewTransport(“crowbar”, “password”)

// initialize the client
client, err := transport.Client()
if err != nil {
return err
}

// make the call (auth will happen)
resp, err := client.Get(“http://127.0.0.1:3000/api/v2/deployments”)
if err != nil {
return err
}
defer resp.Body.Close()

// magic of the structure definition will map automatically
var d []Deployment // it’s an array returned, so we need an array here.
err = json.NewDecoder(resp.Body).Decode(&d)

// print results
fmt.Printf(“Header:%s\n”, resp.Header[“Content-Type”])
fmt.Printf(“Code:%s\n”, resp.Status)
fmt.Printf(“Name:%s\n”, d[0].Name)

}

PS: I’m doing this for the  Digital Rebar API driver because it uses REST and Digest.  We’re actively maintaining it there if you want the latest.

Manage Hardware like a BOSS – latest OpenCrowbar brings API to Physical Gear

A few weeks ago, I posted about VMs being squeezed between containers and metal.   That observation comes from our experience fielding the latest metal provisioning feature sets for OpenCrowbar; consequently, so it’s exciting to see the team has cut the next quarterly release:  OpenCrowbar v2.2 (aka Camshaft).  Even better, you can top it off with official software support.

Camshaft coordinates activity

Dual overhead camshaft housing by Neodarkshadow from Wikimedia Commons

The Camshaft release had two primary objectives: Integrations and Services.  Both build on the unique functional operations and ready state approach in Crowbar v2.

1) For Integrations, we’ve been busy leveraging our ready state API to make physical servers work like a cloud.  It gets especially interesting with the RackN burn-in/tear-down workflows added in.  Our prototype Chef Provisioning driver showed how you can use the Crowbar API to spin servers up and down.  We’re now expanding this cloud-like capability for Saltstack, Docker Machine and Pivotal BOSH.

2) For Services, we’ve taken ops decomposition to a new level.  The “secret sauce” for Crowbar is our ability to interweave ops activity between components in the system.  For example, building a cluster requires setting up pieces on different systems in a very specific sequence.  In Camshaft, we’ve added externally registered services (using Consul) into the orchestration.  That means that Crowbar will either use existing DNS, Database, or NTP services or set it’s own.  Basically, Crowbar can now work FIT YOUR EXISTING OPS ENVIRONMENT without forcing a dedicated Crowbar only services like DHCP or DNS.

In addition to all these features, you can now purchase support for OpenCrowbar from RackN (my company).  The Enterprise version includes additional server life-cycle workflow elements and features like HA and Upgrade as they are available.

There are AMAZING features coming in the next release (“Drill”) including a message bus to broadcast events from the system, more operating systems (ESXi, Xenserver, Debian and Mirantis’ Fuel) and increased integration/flexibility with existing operational environments.  Several of these have already been added to the develop branch.

It’s easy to setup and test OpenCrowbar using containers, VMs or metal.  Want to learn more?  Join our community in Gitteremail list or weekly interactive community meetings (Wednesdays @ 9am PT).

Jazz vs. Symphony: Why micromanaging digital work FAILS. [post 3 of 8]

Third IN AN 8 POST SERIES, BRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

Now that we’ve introduced music as a functional analogy for a stable 21st century leadership model and defined digital work, we’re ready to expose how work actually gets done in the information age.

First, has work really changed?  Yes.  Traditionally there was a distinct difference between organized production and service-based/creative work such as advertising, accounting or medicine.  Solve a problem by looking for clues and coming up with creative solutions to solve it.

Jazz Hands By RevolvingRevolver on DeviantArt http://revolvingrevolver.deviantart.com/

Digital work on the other hand, and more importantly – digital workers, live in a strange limbo of doing creative work but needing business structures and management models that were developed during the industrial age.

In today’s multi-generational workforce, what appears to be a generational divide has transformed into a non-age-specific cultural rift. As Brad and Rob compared notes, we came to believe that what is really happening is a learned difference in the approach to work and work culture.

There is learned difference in the approach to work and work culture that’s more obvious in, but not limited to, digital natives.

In most companies, the executives are traditionalists (Baby Boomers or hand-selected by Boomers).  While previous generations have been trained to follow hierarchy, the new culture values performance, flexibility and teamwork with a less top-down control oriented outlook.

It’s like a symphonic conductor who is used to picking the chair order and directing the tempo is handing out sheet music to a Jazz ensemble.  So how is the traditional manager going to deliver a stellar performance when his performers are Jazz trained?

In traditional concert orchestra, each musician has to go to college, train hard, earn a shot to get into the orchestra, and overtime, work very hard to earn the First Chair position (think earning the corner office).  Once in that position, they stay there until death or retirement.  Anyone who deviates, is fired. Improv is only allowed during certain songs, by a select few.  It’s the workplace equivalent to climbing the corporate ladder.

Most digital workers think they belong to a Jazz ensemble.  

It’s a mistake to believe less organized means less skilled.  Workers in the Jazz model are also talented and trained professionals.  If you look at the careers of Thelonius Monk, Duke Ellington and Dizzy Gillespie, they all had formal training, many started as children.  The same is true for digital workers: many started build job skills as children and then honed their teamwork playing video games.

But can a loosely organized group consistently deliver results? Yes. In fact, they deliver better results!

When a Jazz Improv group plays, they have a rough composition to start with. Each member is given time for a solo.  To the uninitiated there appears to be no leaders in this milieu of talent, but the leader is there.  They just refuse to control the performance; instead, they trust that each member will bring their A Game and perform at 100% of their capacity.

In business, this is scary. Don’t we need someone to check each person’s work? People are just messing around right? I mean, is this actual work? Who is in charge?

In businesss environments that operate more like Jazz, studies have proven that there is a 32% increase in productivity from traditional command and control environments driven by hierarchy.

Age, experience and position are NOT the criteria for the Digital Worker. Output is.  And output is different for each product. Management’s role in this model is to get out of the way and let the musicians create. Instead of conforming to a single style and method, the people producing in the model each bring something unique and also experience a high degree of ownership.

This is a powerful type of workplace diversity: by allowing different ways of problem solving to co-exist, we also make the workplace more inclusive and collaborative.

Sound too good to be true?  In our next post we’ll discuss trust as the critical ingredient for Jazz performance.  (Teaser)

Showing to how others explain Ready State & OpenCrowbar

I’m working on a series for DevOps.com to explain Functional Ops (expect it to start early next week!) and it’s very hard to convey it’s east-west API nature.  So I’m always excited to see how other people explain how OpenCrowbar does ops and ready state.

Ready State PictureThis week I was blown away by the drawing that I’ve recreated for this blog post.  It’s very clear graphic showing the operational complexity of heterogeneous infrastructure AND how OpenCrowbar normalizes it into a ready state.

It’s critical to realize that the height of each component tower varies by vendor and also by location with in the data center topology.  Ready state is not just about normalizing different vendors gear; it’s really about dealing with the complexity that’s inherent in building a functional data center.  It’s “little” things liking knowing how to to enumerate the networking interfaces and uplinks to build the correct teams.

If you think this graphic helps, please let me know.