Is there something between a Container and VM? Apparently, yes.

The RackN team has started designing reference architectures for containers on metal (discussed on with the hope of finding hardware design that is cost and performance optimized for containers instead of simply repurposing premium virtualized cloud infrastructure.  That discussion turned up something unexpected…

That post generated a twitter thread that surfaced and ClearLinux as hardware enabled (Intel VT-x) alternatives to containers.

This container alternative likely escapes notice of many because it requires hardware capabilities that are not/partially exposed inside cloud virtual machines; however, it could be a very compelling story for operators looking for containers on metal.

Here’s my basic understanding: these technologies offer container-like light-weight & elastic behavior with the isolation provided by virtual machines.  This is possible because they use CPU capabilities to isolate environments.

7/3 Update: Feedback about this post has largely been “making it easier for VMs to run docker automatically is not interesting.”  What’s your take on it?

Details behind RackN Kubernetes Workload for OpenCrowbar

Since I’ve already bragged about how this workload validates OpenCrowbar’s deep ops impact, I can get right down to the nuts and bolts of what RackN CTO Greg Althaus managed to pack into this workload.

Like any scale install, once you’ve got a solid foundation, the actual installation goes pretty quickly.  In Kubernetes’ case, that means creating strong networking and etcd configuration.

Here’s a 30 minute video showing the complete process from O/S install to working Kubernetes:

Here are the details:

Clustered etcd – distributed key store

etcd is the central data service that maintains the state for the Kubernetes deployment.  The strength of the installation rests on the correctness of etcd.  The workload builds an etcd cluster and synchronizes all the instances as nodes are added.

Networking with Flannel and Proxy

Flannel is the default overlay network for Kubernetes that handles IP assignment and intercontainer communication with UDP encapsulation.  The workload configures Flannel as for networking with etcd as the backing store.

An important part of the overall networking setup is the configuration of a proxy so that the nodes can get external access for Docker image repos.

Docker Setup

We install the latest Docker on the system.  That may not sound very exciting; however, Docker iterates faster than most Linux images so it’s important that we keep you current.

Master & Minion Kubernetes Nodes

Using etcd as a backend, the workload sets up one (or more) master nodes with the API server and other master services.  When the minions are configured, they are pointed to the master API server(s).  You get to choose how many masters and which systems become masters.  If you did not choose correctly, it’s easy to rinse and repeat.

Highly Available using DNS Round Robin

As the workload configures API servers, it also adds them to a DNS round robin pool (made possible by [new DNS integrations]).  Minions are configured to use the shared DNS name so that they automatically round-robin all the available API servers.  This ensures both load balancing and high availability.  The pool is automatically updated when you add or remove servers.

Installed on Real Metal

It’s worth including that we’ve done cluster deployments of 20 physical nodes (with 80 in process!).  Since OpenCrowbar architecture abstracts the vendor hardware, the configuration is multi-vendor and heterogenous.  That means that this workload (and our others) delivers tangible scale implementations quickly and reliably.

Future Work for Advanced Networking

Flannel is really very basic SDN.  We’d like to see additional networking integrations including OpenContrail as per Pedro Marques work.

At this time, we are not securing communication with etcd.  This requires advanced key management is a more advanced topic.

Why is RackN building this?  We are a physical ops automation company.

We are seeking to advance the state of data center operations by helping get complex scale platforms operationalized.  We want to work with the relevant communities to deliver repeatable best practices around next-generation platforms like Kubernetes.  Our speciality is in creating a general environment for ops success: we work with partners who are experts on using the platforms.

We want to engage with potential users before we turn this into a open community project; however, we’ve chosen to make the code public.  Please get us involved (community forum)!  You’ll need a working OpenCrowbar or RackN Enterprise install as a pre-req and we want to help you be successful.

From Metal Foundation to FIVE new workloads in five weeks

SpinningOpenCrowbar Drill release (will likely become v2.3) is wrapping up in the next few weeks and it’s been amazing to watch the RackN team validate our designs by pumping out workloads and API integrations (list below).

I’ve posted about the acceleration from having a ready state operations base and we’re proving that out.  Having an automated platform is critical for metal deployment because there is substantial tuning and iteration needed to complete installations in the field.

Getting software setup once is not a victory: that’s called a snowflake   

Real success is tearing it down and having work the second, third and nth times.  That’s because scale ops is not about being able to install platforms.  It’s about operationalizing them.

Integration: the difference between install and operationalization.

When we build a workload, we are able to build up the environment one layer at a time.  For OpenCrowbar, that starts with a hardware inventory and works up though RAID/BIOS and O/S configuration.  After the OS is ready, we are able to connect into the operational environment (SSH keys, NTP, DNS, Proxy, etc) and build real multi-switch/layer2 topographies.  Next we coordinate multi-node actions like creating Ceph, Consul and etcd clusters so that the install is demonstrably correct across nodes and repeatable at every stage.  If something has to change, you can repeat the whole install or just the impacted layers.  That is what I consider integrated operation.

It’s not just automating a complex install.  We design to be repeatable site-to-site.

Here’s the list of workloads we’ve built on OpenCrowbar and for RackN in the last few weeks:

  1. Ceph (OpenCrowbar) with advanced hardware optimization and networking that synchronizes changes in monitors.
  2. Docker Swarm (RackN) (or DIY with Docker Machine on Metal)
  3. StackEngine (RackN) builds a multi-master cluster and connects all systems together.
  4. Kubernetes (RackN) that includes automatic high available DNS configuration, flannel networking and etcd cluster building.
  5. CloudFoundry on Metal via BOSH (RackN) uses pools of hardware that are lifecycle managed OpenCrowbar including powering off systems that are idle.
  6. I don’t count the existing RackN OpenStack via Packstack (RackN) workload because it does not directly leverage OpenCrowbar clustering or networking.  It could if someone wanted to help build it.

And… we also added a major DNS automation feature and updated the network mapping logic to work in environments where Crowbar does not manage the administrative networks (like inside clouds).  We’ve also been integrating deeply with Hashicorp Consul to allow true “ops service discovery.”

2015, the year cloud died. Meet the seven riders of the cloudocalypse

i can hazAfter writing pages of notes about the impact of Docker, microservice architectures, mainstreaming of Ops Automation, software defined networking, exponential data growth and the explosion of alternative hardware architecture, I realized that it all boils down to the death of cloud as we know it.

OK, we’re not killing cloud per se this year.  It’s more that we’ve put 10 pounds of cloud into a 5 pound bag so it’s just not working in 2015 to call it cloud.

Cloud was happily misunderstood back in 2012 as virtualized infrastructure wrapped in an API beside some platform services (like object storage).

That illusion will be shattered in 2015 as we fully digest the extent of the beautiful and complex mess that we’ve created in the search for better scale economics and faster delivery pipelines.  2015 is going to cause a lot of indigestion for CIOs, analysts and wandering technology executives.  No one can pick the winners with Decisive Leadership™ alone because there are simply too many possible right ways to solve problems.

Here’s my list of the seven cloud disrupting technologies and frameworks that will gain even greater momentum in 2015:

  1. Docker – I think that Docker is the face of a larger disruption around containers and packaging.  I’m sure Docker is not the thing alone.  There are a fleet of related technologies and Docker replacements; however, there’s no doubt that it’s leading a timely rethinking of application life-cycle delivery.
  2. New languages and frameworks – it’s not just the rapid maturity of Node.js and Go, but the frameworks and services that we’re building (like Cloud Foundry or Apache Spark) that change the way we use traditional languages.
  3. Microservice architectures – this is more than containers, it’s really Functional Programming for Ops (aka FuncOps) that’s a new generation of service oriented architecture that is being empowered by container orchestration systems (like Brooklyn or Fleet).  Using microservices well seems to redefine how we use traditional cloud.
  4. Mainstreaming of Ops Automation – We’re past “if DevOps” and into the how. Ops automation, not cloud, is the real puppies vs cattle battle ground.  As IT creates automation to better use clouds, we create application portability that makes cloud disappear.  This freedom translates into new choices (like PaaS, containers or hardware) for operators.
  5. Software defined networking – SDN means different things but the impacts are all the same: we are automating networking and integrating it into our deployments.  The days of networking and compute silos are ending and that’s going to change how we think about cloud and the supporting infrastructure.
  6. Exponential data growth – you cannot build applications or infrastructure without considering how your storage needs will grow as we absorb more data streams and internet of things sources.
  7. Explosion of alternative hardware architecture – In 2010, infrastructure was basically pizza box or blade from a handful of vendors.  Today, I’m seeing a rising tide of alternatives architectures including ARM, Converged and Storage focused from an increasing cadre of sources including vendors sharing open designs (OCP).  With improved automation, these new “non-cloud” options become part of the dynamic infrastructure spectrum.

Today these seven items create complexity and confusion as we work to balance the new concepts and technologies.  I can see a path forward that redefines IT to be both more flexible and dynamic while also being stable and performing.

Want more 2015 predictions?  Here’s my OpenStack EOY post about limiting/expanding the project scope.

9 scenarios to have prepared for a College Interview [from someone who does interviews]

This quick advice for preparing for a college interview is also useful for any interview: identify three key strengths and activities then prepare short insightful stories that show your strengths in each activity.  Stories are the strongest way to convey information.

I’ve been doing engineering college interviews since 2013 for my Alma mater, Duke University.  I love meeting the upcoming generation of engineers and seeing how their educational experiences will shape their future careers.  Sadly, I also find that few students are really prepared to showcase themselves well in these interviews.  Since it makes my job simpler if you are prepared, I’m going to post my recommendation for future interviews!

It does not take much to prepare for a college interview: you mainly need to be able to tell some short, detailed stories from your experiences that highlight your strengths.

In my experience, the best interviewees are good at telling short and specific stories that highlight their experiences and strengths.  It’s not that they have better experiences, they are just better prepared at showcasing them.  Being prepared makes you more confident and comfortable which then helps you control of how the interview goes and ensures that you leave the right impression.

1/9/15 Note: Control the interview?  Yes!  You should be planning to lead the interviewer to your strengths.  Don’t passively expect them to dig that information out of you.  It’s a two-way conversation, not an interrogation.

Here’s how it works:

  1. Identify three activities that you are passionate about.  They do not have to represent the majority of you effort.  Select ones that define who you are, or caused you to grow in some way.  They could be general items like “reading” or very specific like “summer camp 2016.”  You need to get excited when you talk about these items.  Put these on the rows/y-axis of a 3×3 grid (see below).
  2. Identify three attributes that describe you (you may want help from friends or parents here).  These words should be enough to give a fast snapshot of who you are.  In the example below, the person would be something like “an adventure seeking leader who values standing out as an individual.”  Put these attributes on the columns/x-axis of your grid as I’ve shown below.
  3. Come up with the nine short stories (3-6 sentences!) for the intersections on the grid where you demonstrated the key attribute during the activity.  They cannot just be statements – you must have stories because they provide easy to remember examples for your interview.  If you don’t have a story for an intersection, then talk about how you plan to work this in the future.

Note: This might feel repetitive when you construct your grid but this technique works exceptionally well during an hour-long interview.  You should repeat yourself because you need reinforce your strengths and leave the interviewer with a sure sense of who you are.

Interview Grid

Sample Grid – Click to Enlarge


Remember: An admissions, alumni or faculty interview is all about making a strong impression about who you are and, more importantly, what you will bring to the university.

Having a concrete set of experiences and attributes makes sure that you reinforce your strengths.  By showing them in stories, you will create a much richer picture about who you are than if you simply assert statements about yourself.  Remember the old adage of “show, don’t tell.”

Don’t use this grid as the only basis for your interview!  It should be a foundation that you can come back to during your conversations with college representatives.  These are your key discussion points – let them help you round out the dialog.

Good luck!

PS: Google your interviewer!  I expect the candidates to know me before they meet me.  It is perfectly normal and you’d be crazy to not take advantage of that.

unBIOSed? Is Redfish an IPMI retread or can vendors find unification?

Server management interfaces stink.  They are inconsistent both between vendors and within their own product suites.  Ideally, Vendors would agree on a single API; however, it’s not clear if the diversity is a product of competition or actual platform variation.  Likely, it’s both.

From RedFish SiteWhat is Redfish?  It’s a REST API for server configuration that aims to replace both IPMI and vendor specific server interfaces (like WSMAN).  Here’s the official text from

Redfish is a modern intelligent [server] manageability interface and lightweight data model specification that is scalable, discoverable and extensible.  Redfish is suitable for a multitude of end-users, from the datacenter operator to an enterprise management console.

I think that it’s great to see vendors trying to get on the same page and I’m optimistic that we could get something better than IPMI (that’s a very low bar).  However, I don’t expect that vendors can converge to a single API; it’s just not practical due to release times and pressures to expose special features.  I think the divergence in APIs is due both to competitive pressures and to real variance between platforms.

Even if we manage to a grand server management unification; the problem of interface heterogeneity has a long legacy tail.

In the best case reality, we’re going from N versions to N+1 (and likely N*2) versions because the legacy gear is still around for a long time.  Adding Redfish means API sprawl is going to get worse until it gets back to being about the same as it is now.

Putting pessimism aside, the sprawl problem is severe enough that it’s worth supporting Redfish on the hope that it makes things better.

That’s easy to say, but expensive to do.  If I was making hardware (I left Dell in Oct 2014), I’d consider it an expensive investment for an uncertain return.  Even so, several major hardware players are stepping forward to help standardize.  I think Redfish would have good ROI for smaller vendors looking to displace a major player can ride on the standard.

Redfish is GREAT NEWS for me since RackN/Crowbar provides hardware abstraction and heterogeneous interface support.  More API variation makes my work more valuable.

One final note: if Redfish improves hardware security in a real way then it could be a game changer; however, embedded firmware web servers can be tricky to secure and patch compared to larger application focused software stacks.  This is one area what I’m hoping to see a lot of vendor collaboration!  [note: this should be it’s own subject – the security issue is more than API, it’s about system wide configuration.  stay tuned!]

To thrive, OpenStack must better balance dev, ops and business needs.

OpenStack has grown dramatically in many ways but we have failed to integrate development, operations and business communities in a balanced way.

My most urgent observation from Paris is that these three critical parts of the community are having vastly different dialogs about OpenStack.

Clouds DownAt the Conference, business people were talking were about core, stability and utility while the developers were talking about features, reorganizing and expanding projects. The operators, unfortunately segregated in a different location, were trying to figure out how to share best practices and tools.

Much of this structural divergence was intentional and should be (re)evaluated as we grow.

OpenStack events are split into distinct focus areas: the conference for business people, the summit for developers and specialized days for operators. While this design serves a purpose, the community needs to be taking extra steps to ensure communication. Without that communication, corporate sponsors and users may find it easier to solve problems inside their walls than outside in the community.

The risk is clear: vendors may find it easier to work on a fork where they have business and operational control than work within the community.

Inside the community, we are working to help resolve this challenge with several parallel efforts. As a community member, I challenge you to get involved in these efforts to ensure the project balances dev, biz and ops priorities.  As a board member, I feel it’s a leadership challenge to make sure these efforts converge and that’s one of the reasons I’ve been working on several of these efforts:

  • OpenStack Project Managers (was Hidden Influencers) across companies in the ecosystem are getting organized into their own team. Since these managers effectively direct the majority of OpenStack developers, this group will allow
  • DefCore Committee works to define a smaller subset of the overall OpenStack Project that will be required for vendors using the OpenStack trademark and logo. This helps the business community focus on interoperability and stability.
  • Technical leadership (TC) lead “Big Tent” concept aligns with DefCore work and attempts to create a stable base platform while making it easier for new projects to enter the ecosystem. I’ve got a lot to say about this, but frankly, without safeguards, this scares people in the ops and business communities.
  • An operations “ready state” baseline keeps the community from being able to share best practices – this has become a pressing need.  I’d like to suggest as OpenCrowbar an outside of OpenStack a way to help provide an ops neutral common starting point. Having the OpenStack developer community attempting to create an installer using OpenStack has proven a significant distraction and only further distances operators from the community.

We need to get past seeing the project primarily as a technology platform.  Infrastructure software has to deliver value as an operational tool for enterprises.  For OpenStack to thrive, we must make sure the needs of all constituents (Dev, Biz, Ops) are being addressed.