Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.

OpenCrowbar.Anvil released – hammering out a gold standard in open bare metal provisioning

OpenCrowbarI’m excited to be announcing OpenCrowbar’s first release, Anvil, for the community.  Looking back on our original design from June 2012, we’ve accomplished all of our original objectives and more.
Now that we’ve got the foundation ready, our next release (OpenCrowbar Broom) focuses on workload development on top of the stable Anvil base.  This means that we’re ready to start working on OpenStack, Ceph and Hadoop.  So far, we’ve limited engagement on workloads to ensure that those developers would not also be trying to keep up with core changes.  We follow emergent design so I’m certain we’ll continue to evolve the core; however, we believe the Anvil release represents a solid foundation for workload development.
There is no more comprehensive open bare metal provisioning framework than OpenCrowbar.  The project’s focus on a complete operations model that comprehends hardware and network configuration with just enough orchestration delivers on a system vision that sets it apart from any other tool.  Yet, Crowbar also plays nicely with others by embracing, not replacing, DevOps tools like Chef and Puppet.
Now that the core is proven, we’re porting the Crowbar v1 RAID and BIOS configuration into OpenCrowbar.  By design, we’ve kept hardware support separate from the core because we’ve learned that hardware generation cycles need to be independent from the operations control infrastructure.  Decoupling them eliminates release disruptions that we experienced in Crowbar v1 and­ makes it much easier to use to incorporate hardware from a broad range of vendors.
Here are some key components of Anvil
  • UI, CLI and API stable and functional
  • Boot and discovery process working PLUS ability to handle pre-populating and configuration
  • Chef and Puppet capabilities including Birk Shelf v3 support to pull in community upstream DevOps scripts
  • Docker, VMs and Physical Servers
  • Crowbar’s famous “late-bound” approach to configuration and, critically, networking setup
  • IPv6 native, Ruby 2, Rails 4, preliminary scale tuning
  • Remarkably flexible and transparent orchestration (the Annealer)
  • Multi-OS Deployment capability, Ubuntu, CentOS, or Different versions of the same OS
Getting the workloads ported is still a tremendous amount of work but the rewards are tremendous.  With OpenCrowbar, the community has a new way to collaborate and integration this work.  It’s important to understand that while our goal is to start a quarterly release cycle for OpenCrowbar, the workload release cycles (including hardware) are NOT tied to OpenCrowbar.  The workloads choose which OpenCrowbar release they target.  From Crowbar v1, we’ve learned that Crowbar needed to be independent of the workload releases and so we want OpenCrowbar to focus on maintaining a strong ops platform.
This release marks four years of hard-earned Crowbar v1 deployment experience and two years of v2 design, redesign and implementation.  I’ve talked with DevOps teams from all over the world and listened to their pains and needs.  We have a long way to go before we’re deploying 1000 node OpenStack and Hadoop clusters, OpenCrowbar Anvil significantly moves the needle in that direction.
Thanks to the Crowbar community (Dell and SUSE especially) for nurturing the project, and congratulations to the OpenCrowbar team getting us this to this amazing place.

 

Success Factors of Operating Open Source Infrastructure [Series Intro]

2012-10-28_14-13-24_502Building a best practices platform is essential to helping companies share operations knowledge.   In the fast-moving world of open source software, sharing documentation about what to do is not sufficient.  We must share the how to do it also because the operations process is tightly coupled to achieving ongoing success.

Further, since change is constant, we need to change our definition of “stability” to reflect a much more iterative and fluid environment.

Baseline testing is an essential part of this platform. It enables customers to ensure not only fast time to value, but the tests are consistently conforming with industry best practices, even as the system is upgraded and migrates towards a continuous deployment infrastructure.

The details are too long for a single post so I’m going to explore this as three distinct topics over the next two weeks.

  1. Reference Deployments talks about needed an automated way to repeat configuration between sites.
  2. Ops Validation using Development Tests talks about having a way to verify that everyone uses a common reference platform
  3. Shared Open Operatons / DevOps (pending) talks about putting reference deployment and common validation together to create a true open operations practice.

OpenStack, Hadoop, Ceph, Docker and other open source projects are changing the landscape for information technology. Customers seeking to become successful with these evolving platforms must look beyond the software bits, and consider both the culture and operations.  The culture is critical because interacting with the open source projects community (directly or through a proxy) can help ensure success using the software. Operations are critical because open source projects expect the community to help find and resolve issues. This results in more robust and capable products. Consequently, users of open source software must operate in a more fluid environment.

My team at Dell saw this need as we navigated the early days of OpenStack.  The Crowbar project started because we saw that the community needed a platform that could adapt and evolve with the open source projects that our advanced customers were implementing. Our ability to deliver an open operations platform enables the community to collaborate, and to skip over routine details to refocus on shared best practices.

My recent focus on the OpenStack DefCore work reinforces these original goals.  Using tests to help provide a common baseline is a concrete, open and referenceable way to promote interoperability.  I hope that this in turn drives a dialog around best practices and shared operations because those help mature the community.

Crowbar HK Hack Report

Purple Fuzzy H for Hackathon (and Havana)Overall, I’m happy with our three days of hacking on Crowbar 2.  We’ve reached the critical “deploys workload” milestone and I’m excited about well the design is working and how clearly we’ve been able to articulate our approach in code & UI.

Of course, it’s worth noting again that Crowbar 1 has also had significant progress on OpenStack Havana workloads running on Ubuntu, Centos/RHEL, and SUSE/SLES

Here are the focus items from the hack:

  • Documentation – cleaned up documentation specifically by updating the README in all the projects to point to the real documentation in an effort to help people find useful information faster.  Reminder: if unsure, put documentation in barclamp-crowbar/doc!
  • Docker Integration for Crowbar 2 progress.  You can now install Docker from internal packages on an admin node.  We have a strategy for allowing containers be workload nodes.
  • Ceph installed as workload is working.  This workload revealed the need for UI improvements and additional flags for roles (hello “cluster”)
  • Progress on OpenSUSE and Fedora as Crowbar 2 install targets.  This gets us closer to true multi-O/S support.
  • OpenSUSE 13.1 setup as a dev environment including tests.  This is a target working environment.
  • Being 12 hours offset from the US really impacted remote participation.

One thing that became obvious during the hack is that we’ve reached a point in Crowbar 2 development where it makes sense to move the work into distinct repositories.  There are build, organization and packaging changes that would simplify Crowbar 2 and make it easier to start using; however, we’ve been trying to maintain backwards compatibility with Crowbar 1.  This is becoming impossible; consequently, it appears time to split them.  Here are some items for consideration:

  1. Crowbar 2 could collect barclamps into larger “workload” repos so there would be far fewer repos (although possibly still barclamps within a workload).  For example, there would be a “core” set that includes all the current CB2 barclamps.  OpenStack, Ceph and Hadoop would be their own sets.
  2. Crowbar 2 would have a clearly named “build” or “tools” repo instead of having it called “crowbar”
  3. Crowbar 2 framework would be either part of “core” or called “framework”
  4. We would put these in a new organization (“Crowbar2” or “Crowbar-2”) so that the clutter of Crowbar’s current organization is avoided.

While we clearly need to break apart the repo, this suggestion needs community more discussion!