Sirens of Open Infrastructure beacons to OpenStack Community

OpenStack is a real platform doing real work for real users.  So why does OpenStack have a reputation for not working?  It falls into the lack of core-focus paradox: being too much to too many undermines your ability to do something well.  In this case, we keep conflating the community and the code.

I have a long history with the project but have been pretty much outside of it (yay, Kubernetes!) for the last 18 months.  That perspective helps me feel like I’m getting closer to the answer after spending a few days with the community at the latest OpenStack Summit in Sydney Australia.  While I love to think about the why, the what the leaders are doing about it is very interesting too.

Fundamentally, OpenStack’s problem is that infrastructure automation is too hard and big to be solved within a single effort.  

It’s so big that any workable solution will fail for a sizable number of hopeful operators.  That does not keep people from the false aspiration that OpenStack code will perfectly fit their needs (especially if they are unwilling to trim their requirements).

But the problem is not inflated expectations for OpenStack VM IaaS code, it’s that we keep feeding them.  I have been a long time champion for a small core with a clear ecosystem boundary.  When OpenStack code claims support for other use cases, it invites disappointment and frustration.

So why is OpenStack foundation moving to expand its scope as an Open Infrastructure community with additional focus areas?  It’s simple: the community is asking them to do it.

Within the vast space of infrastructure automation, there are clusters of aligned interest.  These clusters are sufficiently narrow that they can collaborate on shared technologies and practices.  They also have an partial overlap (Venn) with adjacencies where OpenStack is already present.  There is a strong economic and social drive for members in these overlapped communities to bridge together instead of creating new disparate groups.  Having the OpenStack foundation organize these efforts is a natural and expected function.

The danger of this expansion comes from also carrying the expectation that the technology (code) will also be carried into the adjacencies.  That’s my my exact rationale the original VM IaaS needs to be smaller.  The wealth of non-core projects crosses clusters of interests.  Instead of allowing these clusters to optimize their needs around shared interests, the users get the impression that they must broadly adopt unneeded or poorly fit components.  The idea of “competitive” projects should be reframed because they may overlap in function but not ui use-case fit.

It’s long past time to give up expectations that OpenStack is a “one-stop-shop” of infrastructure automation.  In my opinion, it undermines the community mission by excluding adjacencies.

I believe that OpenStack must work to embrace its role as an open infrastructure community; however, it must also do the hard work to create welcoming space for adjacencies.  These adjacencies will compete with existing projects currently under the OpenStack code tent.  The community needs to embrace that the hard work done so far may simply be sunk cost for new use cases. 

It’s the OpenStack community and the experience, not the code, that creates long term value.

November 10 – Weekly Recap of all things Digital Rebar and RackN

Welcome to the weekly post of the RackN blog recap of all things Digital Rebar, RackN, SRE, and DevOps. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

Items of the Week

Digital Rebar

Digital Rebar Releases V3.2 – Stage Workflow

In v3.2, Digital Rebar continues to refine the groundbreaking provisioning workflow introduced in v3.1. Updates to the workflow make it easier to consume by external systems like Terraform. We’ve also improved the consistency and performance of both the content and service.

The release of workflow and the addition of inventory means that Digital Rebar v3 effectively replaces all key functions of v2 with a significantly smaller footprint, minimal learning curve and improved performance. One v2 major feature, multi-node coordination, is not on any roadmap for v3 because we believe those use case are well serviced by upstack integrations like Terraform and Ansible. Full Post

RackN

 

 

 

 

Joining this week’s L8ist Sh9y Podcast is Zach Smith, CEO of Packet and long-time champion of bare metal hardware. Rob Hirschfeld and Zach discuss the trends in bare metal, the impact of AWS changing the way developers view infrastructure, and issues between networking and server groups in IT organizations. (Blog with Topics and Times)

OpenStack Summit Sydney

Rob Hirschfeld and Ihor Dvoretskyi presented “Building Kubernetes based highly Customizable Environments on OpenStack with Kubespray.” Full Post

https://www.slideshare.net/RackN/slideshelf

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com

If you are attending any of these events please reach out to Rob Hirschfeld to setup time to learn more about our solutions or discuss the latest industry trends.

OTHER NEWSLETTERS

 

Building Kubernetes based highly customizable environments on OpenStack with Kubespray

This talk was given on November 8 at the OpenStack Summit Sydney event.

Abstract

Kubespray (formerly Kargo) – is a project under Kubernetes community umbrella. From the technical side, it is a set of tools, that bring the possibility to deploy production-ready Kubernetes cluster easily.

Kubespray supports multiple Linux distributions to host the Kubernetes clusters (including Ubuntu, Debian, CentOS/RHEL and Container Linux by CoreOS), multiple cloud providers to be used as an underlay for the cluster deployment (AWS, DigitalOcean, GCE, Azure and OpenStack), together with the ability to use Bare Metal installations. It may consume Docker and rkt as the container runtimes for the containerized workloads, as well as a wide variety of networking plugins (Flannel, Weave, Calico and Canal); or built-in cloud provider networking instead.

In this talk we will describe the options of using Kubespray for building Kubernetes environments on OpenStack and how can you benefit from it.

What can I expect to learn?

Active Kubernetes community members, Ihor Dvoretskyi and Rob Hirschfeld, will highlight the benefits of running Kubernetes on top of OpenStack, and will describe how Kubespray may simplify the cluster building and management options for these use-cases.

Complete presentation

Slides
https://www.slideshare.net/RackN/slideshelf

Speakers

Ihor Dvoretskyi

Ihor is a Developer Advocate at Cloud Native Computing Foundation (CNCF), focused on the upstream Kubernetes-related efforts. He acts as a Product Manager at Kubernetes community, leading Product Management Special Interest Group with the goals of growing Kubernetes as a #1 open source container orchestration platform.

Rob Hirschfeld

Rob Hirschfeld has been involved in OpenStack since the earliest days with a focus on ops and building the infrastructure that powers cloud and storage.  He’s also co-Chair of the Kubernetes Cluster Ops SIG and a four term OpenStack board member.

 

Go CI/CD and Immutable Infrastructure for Edge Computing Management

In our last post, we pretty much tore apart the idea of running mini-clouds on the edge because they are not designed to be managed at scale in resource constrained environments without deep hardware automation.  While I’m a huge advocate of API-driven infrastructure, I don’t believe in a one-size-fits-all API because a good API provides purpose-driven abstractions.

The logical extension is that having deep hardware automation means there’s no need for cloud (aka virtual infrastructure) APIs.  This is exactly what container-focused customers have been telling us at RackN in regular data centers so we’d expect the same to apply for edge infrastructure.

If “cloudification” is not the solution then where should we look for management patterns?  

We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases.  We discussed this at a session at the OpenStack OpenDev Edge summit.

Continuous Integration / Continuous Delivery (CI/CD) software pipelines help to manage environments where the risk of making changes is significant by breaking the changes into small, verifiable units.  This is essential for edge because lack of physical access makes it very hard to mitigate problems.  Using CI/CD, especially with A/B testing, allows for controlled rolling distribution of new software.  

For example, in a 10,000 site deployment, the CI/CD infrastructure would continuously roll out updates and patches over the entire system.  Small incremental changes reduce the risk of a major flaw being introduced.  The effect is enhanced when changes are rolled slowly over the entire fleet instead of simultaneously rolled out to all sites (known as A/B or blue/green testing).  In the rolling deployment scenario, breaking changes can be detected and stopped before they have significant impacts.

These processes and the support software systems are already in place for large scale cloud software deployments.  There are likely gaps around physical proximity and heterogeneity; however, the process is there and initial use-case fit seems to be very good.

Immutable Infrastructure is a catch-all term for deployments based on images instead of configuration.  This concept is popular in cloud deployments were teams produce “golden” VM or container images that contain the exact version of software needed and then are provisioned with minimal secondary configuration.  In most cases, the images only need a small file injected (known as a cloud init) to complete the process.

In this immutable pattern, images are never updated post deployment; instead, instances are destroyed and recreated.  It’s a deploy, destroy, repeat process.  At RackN, we’ve been able to adapt Digital Rebar Provisioning to support this even at the hardware layer where images are delivered directly to disk and re-provisioning happens on a constant basis just like a cloud managing VMs.

The advantage of the immutable pattern is that we create a very repeatable and controlled environment.  Instead of trying to maintain elaborate configurations and bi-directional systems of record, we can simply reset whole environments.  In a CI/CD system, we constantly generate fresh images that are incrementally distributed through the environment.

Immutable Edge Infrastructure would mean building and deploying complete system images for our distributed environment.  Clearly, this requires moving around larger images than just pushing patches; however, these uploads can easily be staged and they provide critical repeatability in management.  The alternative is trying to keep track of which patches have been applied successfully to distributed systems.  Based on personal experience, having an atomic deliverable sounds very attractive.

CI/CD and Immutable patterns are deep and complex subjects that go beyond the scope of a single post; however, they also offer a concrete basis for building manageable data centers.

The takeaway is that we need to be looking first to scale distributed software management patterns to help build robust edge infrastructure platforms. Picking a cloud platform before we’ve figured out these concerns is a waste of time.

Previous 2 Posts on OpenStack Conference:

Post 1 – OpenStack on Edge? 4 Ways Edge is Distinct from Cloud
Post 2 – Edge Infrastructure is Not Just Thousands of Mini Clouds

Podcast: OpenStack OpenDev Highlights Edge vs Cloud Computing Confusion

Rob Hirschfeld provides his thoughts from last week’s OpenStack OpenDev conference focused on Edge Computing. This podcast is part of a three blog series from Rob on the issues surrounding Edge and Cloud computing:

Post 1 – OpenStack on Edge? 4 Ways Edge is Distinct from Cloud
Post 2 – Edge Infrastructure is Not Just Thousands of Mini Clouds

Edge Infrastructure is Not Just Thousands of Mini Clouds

I left the OpenStack OpenDev Edge Infrastructure conference with a lot of concerns relating to how to manage geographically distributed infrastructure at scale.  We’ve been asking similar questions at RackN as we work to build composable automation that can be shared and reused.  The critical need is to dramatically reduce site-specific customization in a way that still accommodates required variation – this is something we’ve made surprising advances on in Digital Rebar v3.1.

These are very serious issues for companies like AT&T with 1000s of local exchanges, Walmart with 10,000s of in-store server farms or Verizon with 10,000s of coffee shop Wifi zones.  These workloads are not moving into centralized data centers.  In fact, with machine learning and IoT, we are expecting to see more and more distributed computing needs.

Running each site as a mini-cloud is clearly not the right answer.

While we do need the infrastructure to be easily API addressable, adding cloud without fixing the underlying infrastructure management moves us in the wrong direction.  For example, AT&T‘s initial 100+ OpenStack deployments were not field up-gradable and lead to their efforts to deploy OpenStack on Kubernetes; however, that may have simply moved the upgrade problem to a different platform because Kubernetes does not address the physical layer either!

There are multiple challenges here.  First, any scale infrastructure problem must be solved at the physical layer first.  Second, we must have tooling that brings repeatable, automation processes to that layer.  It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth.  These requirements are outside the scope of cloud focused tools.

Containers and platforms like Kubernetes have a significant part to play in this story.  I was surprised that they were present only in a minor way at the summit.  The portability and light footprint of these platforms make them a natural fit for edge infrastructure.  I believe that lack of focus comes from the audience believing (incorrectly) that edge applications are not ready for container management.

With hardware layer control (which is required for edge), there is no need for a virtualization layer to provide infrastructure management.  In fact, “cloud” only adds complexity and cost for edge infrastructure when the workloads are containerized.  Our current cloud platforms are not designed to run in small environments and not designed to be managed in a repeatable way at thousands of data centers.  This is a deep architectural gap and not easily patched.

OpenStack sponsoring the edge infrastructure event got the right people in the room but also got in the way of discussing how we should be solving these operational.  How should we be solving them?  In the next post, we’ll talk about management models that we should be borrowing for the edge…

Read 1st Post of 3 from OpenStack OpenDev: OpenStack on Edge? 4 Ways Edge is Distinct from Cloud

OpenStack on Edge? 4 Ways Edge Is Distinct From Cloud

Last week, I attended a unique OpenDev Edge Infrastructure focused event hosted by the OpenStack Foundation to help RackN understand the challenges customers are facing at the infrastructure edges.  We are exploring how the new lightweight, remote API-driven Digital Rebar Provision can play a unique role in these resource and management constrained environments.

I had also hoped the event part of the Foundation’s pivot towards being an “open infrastructure” community that we’ve seen emerging as the semiannual conferences attract a broader set of open source operations technologies like Kubernetes, Ceph, Docker and SDN platforms.  As a past board member, I believe this is a healthy recognition of how the community uses a growing mix of open technologies in the data center and cloud.

It’s logical for the OpenStack community, especially the telcos, to be leaders in edge infrastructure; unfortunately, that too often seemed to mean trying to “square peg” OpenStack into the every round hole at the Edge.  For companies with a diverse solution portfolio, like RackN, being too myopic on using OpenStack to solve all problems keeps us from listening to the real use-cases.  OpenStack has real utility but there is no one-size-fits all solution (and that goes for Kubernetes too).

By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant.  It seemed as if every session had a 50% mandatory overhead in definitioning.  I heard some very interesting attempts to define edge in terms of 1) resource constraints of the equipment or 2) proximity to data sources or 3) bandwidth limitations to the infrastructure.  All of these are helpful ways to describe edge infrastructure.

Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms.  Edge infrastructure has very distinct challenges compared to hyperscale data centers.  

Here is my definition:

1) Edge is inaccessible by operators so remote lights out operation is required

2) Edge requires distributed scale management because there are many thousands of instances to be managed

3) Edge is heterogeneous because breath of environments and scale imposes variations

4) Edge has a physical location awareness component because proximity matters by design

These four items are hard operational management related challenges.  They are also very distinctive challenges when compared to traditional hyperscale data center operations issues where we typically enjoy easy access, consolidated management, homogeneous infrastructure and equal network access.

In our next post, ….