Edge Infrastructure is Not Just Thousands of Mini Clouds

I left the OpenStack OpenDev Edge Infrastructure conference with a lot of concerns relating to how to manage geographically distributed infrastructure at scale.  We’ve been asking similar questions at RackN as we work to build composable automation that can be shared and reused.  The critical need is to dramatically reduce site-specific customization in a way that still accommodates required variation – this is something we’ve made surprising advances on in Digital Rebar v3.1.

These are very serious issues for companies like AT&T with 1000s of local exchanges, Walmart with 10,000s of in-store server farms or Verizon with 10,000s of coffee shop Wifi zones.  These workloads are not moving into centralized data centers.  In fact, with machine learning and IoT, we are expecting to see more and more distributed computing needs.

Running each site as a mini-cloud is clearly not the right answer.

While we do need the infrastructure to be easily API addressable, adding cloud without fixing the underlying infrastructure management moves us in the wrong direction.  For example, AT&T‘s initial 100+ OpenStack deployments were not field up-gradable and lead to their efforts to deploy OpenStack on Kubernetes; however, that may have simply moved the upgrade problem to a different platform because Kubernetes does not address the physical layer either!

There are multiple challenges here.  First, any scale infrastructure problem must be solved at the physical layer first.  Second, we must have tooling that brings repeatable, automation processes to that layer.  It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth.  These requirements are outside the scope of cloud focused tools.

Containers and platforms like Kubernetes have a significant part to play in this story.  I was surprised that they were present only in a minor way at the summit.  The portability and light footprint of these platforms make them a natural fit for edge infrastructure.  I believe that lack of focus comes from the audience believing (incorrectly) that edge applications are not ready for container management.

With hardware layer control (which is required for edge), there is no need for a virtualization layer to provide infrastructure management.  In fact, “cloud” only adds complexity and cost for edge infrastructure when the workloads are containerized.  Our current cloud platforms are not designed to run in small environments and not designed to be managed in a repeatable way at thousands of data centers.  This is a deep architectural gap and not easily patched.

OpenStack sponsoring the edge infrastructure event got the right people in the room but also got in the way of discussing how we should be solving these operational.  How should we be solving them?  In the next post, we’ll talk about management models that we should be borrowing for the edge…

Read 1st Post of 3 from OpenStack OpenDev: OpenStack on Edge? 4 Ways Edge is Distinct from Cloud

OpenStack on Edge? 4 Ways Edge Is Distinct From Cloud

Last week, I attended a unique OpenDev Edge Infrastructure focused event hosted by the OpenStack Foundation to help RackN understand the challenges customers are facing at the infrastructure edges.  We are exploring how the new lightweight, remote API-driven Digital Rebar Provision can play a unique role in these resource and management constrained environments.

I had also hoped the event part of the Foundation’s pivot towards being an “open infrastructure” community that we’ve seen emerging as the semiannual conferences attract a broader set of open source operations technologies like Kubernetes, Ceph, Docker and SDN platforms.  As a past board member, I believe this is a healthy recognition of how the community uses a growing mix of open technologies in the data center and cloud.

It’s logical for the OpenStack community, especially the telcos, to be leaders in edge infrastructure; unfortunately, that too often seemed to mean trying to “square peg” OpenStack into the every round hole at the Edge.  For companies with a diverse solution portfolio, like RackN, being too myopic on using OpenStack to solve all problems keeps us from listening to the real use-cases.  OpenStack has real utility but there is no one-size-fits all solution (and that goes for Kubernetes too).

By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant.  It seemed as if every session had a 50% mandatory overhead in definitioning.  I heard some very interesting attempts to define edge in terms of 1) resource constraints of the equipment or 2) proximity to data sources or 3) bandwidth limitations to the infrastructure.  All of these are helpful ways to describe edge infrastructure.

Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms.  Edge infrastructure has very distinct challenges compared to hyperscale data centers.  

Here is my definition:

1) Edge is inaccessible by operators so remote lights out operation is required

2) Edge requires distributed scale management because there are many thousands of instances to be managed

3) Edge is heterogeneous because breath of environments and scale imposes variations

4) Edge has a physical location awareness component because proximity matters by design

These four items are hard operational management related challenges.  They are also very distinctive challenges when compared to traditional hyperscale data center operations issues where we typically enjoy easy access, consolidated management, homogeneous infrastructure and equal network access.

In our next post, ….