Composable Infrastructure – is there a balance between configurable and complexity?

Recently, I was part of a Composable Infrastructure briefing hosted by the enterprise side of HP and moderated by Tim Crawford. The event focused on a dynamically (re)configurable hardware. This type of innovation points to new generation of hardware with exciting capabilities; however, my takeaway is that we need more focus on the system level challenges of operations when designing units of infrastructure.  I’d like to see more focus on “composable ops” than “composable hardware.”

What is “composable infrastructure?”

wpid-20151030_170354.jpgBasically, these new servers use a flexible interconnect bus between CPU/RAM/Disk/Network components that allows the operators to connect chassis’ internal commodities together on-the-fly. Conceptually, this allows operators to build fit-to-purpose servers via a chassis API. For example, an operator could build a 4-CPU high RAM system for VMs on even days and then reconfigure it as a single socket four drive big data node on the odd ones. That means we have to block out three extra CPUs and drives for this system even though it only needs them 50% of the time.

I’ve seen designs like this before. They are very cool to review but not practical in scale operations.

My first concern is the inventory packing problem. You cannot make physical resources truly dynamic because they have no option for oversubscription. Useful composable infrastructure will likely have both a lot of idle capacity to service requests and isolated idle resource fragments from building ad hoc servers. It’s a classic inventory design trade-off between capacity and density.

The pro-composable argument is that data centers already have a lot of idle capacity. While composable designs could allow tighter packing of resources; this misses the high cost of complexity in operations.

My second concern is complexity.  What do scale operators want? Looking at Facebook’s open compute models, they operators want efficient, cheap and interchangeable parts. Scale operators are very clear: heterogeneity is expensive inside of a single infrastructure. It’s impossible to create a complete homogeneous data center so operators must be able to cope with variation. As a rule, finding ways to limit variation increases predictability and scale efficiency.

What about small and mid-tier operators? If they need purpose specific hardware then they will buy it. For general purpose tasks, virtual machines already offer composable flexibility and are much more portable. The primary lesson is that operational costs, both hard and soft, are much more significant than fractional improvements from optimized hardware utilization.

I’m a believer that tremendous opportunities are created by hardware innovation. Composable hardware raises interesting questions; however, I get much more excited by open compute (OCP) designs because OCP puts operational concerns first.

How do platforms die? One step at a time [the Fidelity Gap]

The RackN team is working on the “Start to Scale” position for Digital Rebar that targets the IT industry-wide “fidelity gap” problem.  When we started on the Digital Rebar journey back in 2011 with Crowbar, we focused on “last mile” problems in metal and operations.  Only in the last few months did we recognize the importance of automating smaller “first mile” desktop and lab environments.

A fidelityFidelity Gap gap is created when work done on one platform, a developer laptop, does not translate faithfully to the next platform, a QA lab.   Since there are gaps at each stage of deployment, we end up with the ops staircase of despair.

These gaps hide defects until they are expensive to fix and make it hard to share improvements.  Even worse, they keep teams from collaborating.

With everyone trying out Container Orchestration platforms like Kubernetes, Docker Swarm, Mesosphere or Cloud Foundry (all of which we deploy, btw), it’s important that we can gracefully scale operational best practices.

For companies implementing containers, it’s not just about turning their apps into microservice-enabled immutable-rock stars: they also need to figure out how to implement the underlying platforms at scale.

My example of fidelity gap harm is OpenStack’s “all in one, single node” DevStack.  There is no useful single system OpenStack deployment; however, that is the primary system for developers and automated testing.  This design hides production defects and usability issues from developers.  These are issues that would be exposed quickly if the community required multi-instance development.  Even worse, it keeps developers from dealing with operational consequences of their decisions.

What are we doing about fidelity gaps?  We’ve made it possible to run and faithfully provision multi-node systems in Digital Rebar on a relatively light system (16 Gb RAM, 4 cores) using VMs or containers.  That system can then be fully automated with Ansible, Chef, Puppet and Salt.  Because of our abstractions, if deployment works in Digital Rebar then it can scale up to 100s of physical nodes.

My take away?  If you want to get to scale, start with the end in mind.

RackN fills holes with Drill Release

Originally posted on :

Drill Man! by [creative commons] Drill Man! by [creative commons] We’re so excited about our in-process release that we’ve been relatively quiet about the last OpenCrowbar Drill release (video tour here).  That’s not a fair reflection of the level of capability and maturity reflected in the code base; yes, Drill’s purpose was to set the stage for truly ground breaking ops automation work in the next release (“Epoxy”).

So, what’s in Drill?  Scale and Containers on Metal Workloads!  [official release notes]

The primary focus for this release was proving our functional operations architectural pattern against a wide range of workloads and that is exactly what the RackN team has been doing with Ceph, Docker Swarm, Kubernetes, CloudFoundry and StackEngine workloads.

In addition to workloads, we put the platform through its paces in real ops environments at scale.  That resulted in even richer network configurations and options plus performance…

View original 100 more words

Is there something between a Container and VM? Apparently, yes.

The RackN team has started designing reference architectures for containers on metal (discussed on with the hope of finding hardware design that is cost and performance optimized for containers instead of simply repurposing premium virtualized cloud infrastructure.  That discussion turned up something unexpected…

That post generated a twitter thread that surfaced and ClearLinux as hardware enabled (Intel VT-x) alternatives to containers.

This container alternative likely escapes notice of many because it requires hardware capabilities that are not/partially exposed inside cloud virtual machines; however, it could be a very compelling story for operators looking for containers on metal.

Here’s my basic understanding: these technologies offer container-like light-weight & elastic behavior with the isolation provided by virtual machines.  This is possible because they use CPU capabilities to isolate environments.

7/3 Update: Feedback about this post has largely been “making it easier for VMs to run docker automatically is not interesting.”  What’s your take on it?

Details behind RackN Kubernetes Workload for OpenCrowbar

Since I’ve already bragged about how this workload validates OpenCrowbar’s deep ops impact, I can get right down to the nuts and bolts of what RackN CTO Greg Althaus managed to pack into this workload.

Like any scale install, once you’ve got a solid foundation, the actual installation goes pretty quickly.  In Kubernetes’ case, that means creating strong networking and etcd configuration.

Here’s a 30 minute video showing the complete process from O/S install to working Kubernetes:

Here are the details:

Clustered etcd – distributed key store

etcd is the central data service that maintains the state for the Kubernetes deployment.  The strength of the installation rests on the correctness of etcd.  The workload builds an etcd cluster and synchronizes all the instances as nodes are added.

Networking with Flannel and Proxy

Flannel is the default overlay network for Kubernetes that handles IP assignment and intercontainer communication with UDP encapsulation.  The workload configures Flannel as for networking with etcd as the backing store.

An important part of the overall networking setup is the configuration of a proxy so that the nodes can get external access for Docker image repos.

Docker Setup

We install the latest Docker on the system.  That may not sound very exciting; however, Docker iterates faster than most Linux images so it’s important that we keep you current.

Master & Minion Kubernetes Nodes

Using etcd as a backend, the workload sets up one (or more) master nodes with the API server and other master services.  When the minions are configured, they are pointed to the master API server(s).  You get to choose how many masters and which systems become masters.  If you did not choose correctly, it’s easy to rinse and repeat.

Highly Available using DNS Round Robin

As the workload configures API servers, it also adds them to a DNS round robin pool (made possible by [new DNS integrations]).  Minions are configured to use the shared DNS name so that they automatically round-robin all the available API servers.  This ensures both load balancing and high availability.  The pool is automatically updated when you add or remove servers.

Installed on Real Metal

It’s worth including that we’ve done cluster deployments of 20 physical nodes (with 80 in process!).  Since OpenCrowbar architecture abstracts the vendor hardware, the configuration is multi-vendor and heterogenous.  That means that this workload (and our others) delivers tangible scale implementations quickly and reliably.

Future Work for Advanced Networking

Flannel is really very basic SDN.  We’d like to see additional networking integrations including OpenContrail as per Pedro Marques work.

At this time, we are not securing communication with etcd.  This requires advanced key management is a more advanced topic.

Why is RackN building this?  We are a physical ops automation company.

We are seeking to advance the state of data center operations by helping get complex scale platforms operationalized.  We want to work with the relevant communities to deliver repeatable best practices around next-generation platforms like Kubernetes.  Our speciality is in creating a general environment for ops success: we work with partners who are experts on using the platforms.

We want to engage with potential users before we turn this into a open community project; however, we’ve chosen to make the code public.  Please get us involved (community forum)!  You’ll need a working OpenCrowbar or RackN Enterprise install as a pre-req and we want to help you be successful.

From Metal Foundation to FIVE new workloads in five weeks

SpinningOpenCrowbar Drill release (will likely become v2.3) is wrapping up in the next few weeks and it’s been amazing to watch the RackN team validate our designs by pumping out workloads and API integrations (list below).

I’ve posted about the acceleration from having a ready state operations base and we’re proving that out.  Having an automated platform is critical for metal deployment because there is substantial tuning and iteration needed to complete installations in the field.

Getting software setup once is not a victory: that’s called a snowflake   

Real success is tearing it down and having work the second, third and nth times.  That’s because scale ops is not about being able to install platforms.  It’s about operationalizing them.

Integration: the difference between install and operationalization.

When we build a workload, we are able to build up the environment one layer at a time.  For OpenCrowbar, that starts with a hardware inventory and works up though RAID/BIOS and O/S configuration.  After the OS is ready, we are able to connect into the operational environment (SSH keys, NTP, DNS, Proxy, etc) and build real multi-switch/layer2 topographies.  Next we coordinate multi-node actions like creating Ceph, Consul and etcd clusters so that the install is demonstrably correct across nodes and repeatable at every stage.  If something has to change, you can repeat the whole install or just the impacted layers.  That is what I consider integrated operation.

It’s not just automating a complex install.  We design to be repeatable site-to-site.

Here’s the list of workloads we’ve built on OpenCrowbar and for RackN in the last few weeks:

  1. Ceph (OpenCrowbar) with advanced hardware optimization and networking that synchronizes changes in monitors.
  2. Docker Swarm (RackN) (or DIY with Docker Machine on Metal)
  3. StackEngine (RackN) builds a multi-master cluster and connects all systems together.
  4. Kubernetes (RackN) that includes automatic high available DNS configuration, flannel networking and etcd cluster building.
  5. CloudFoundry on Metal via BOSH (RackN) uses pools of hardware that are lifecycle managed OpenCrowbar including powering off systems that are idle.
  6. I don’t count the existing RackN OpenStack via Packstack (RackN) workload because it does not directly leverage OpenCrowbar clustering or networking.  It could if someone wanted to help build it.

And… we also added a major DNS automation feature and updated the network mapping logic to work in environments where Crowbar does not manage the administrative networks (like inside clouds).  We’ve also been integrating deeply with Hashicorp Consul to allow true “ops service discovery.”