About Rob H

A Baltimore transplant to Austin, Rob thinks about ways of building scale infrastructure for the clouds using Agile processes. He sat on the OpenStack Foundation board for four years. He co-founded RackN enable software that creates hyperscale converged infrastructure.

First Digital Rebar Online Meetup Next Week

Welcome to the first Digital Rebar online meetup!  In our inaugural meetup we’ll provide an introduction to  Digital Rebar Provision, name our mascot, discuss current and future features, and do a short demo of the product. The meetup is Sept 26, 2017 at 11:00am PST. Please join the community at https://www.meetup.com/digitalrebar/ and register for the event.

Online Link – https://zoom.us/j/3403934274  

We will cover the following topics:

  • Welcome !!
  • Introduction to Digital Rebar Provision (DRP) and RackN
  • Naming the Digital Rebar mascot [1]
  • Discussion on DRP version 3.1 features
  • Feature and roadmap planning for DRP version 3.2
  • Use Github Projects or Trello Board
  • Demo of DRP workload deployment
  • Getting in touch with the Digital Rebar community and RackN
  • Questions and answers period

NOTES:

Please note we’ll be using Zoom.us for our meeting; so please check in a few minutes early and make sure you have the Zoom client installed and working for you.

[1]
Name the mascot: https://twitter.com/digitalrebar/status/907724637487935488
Digital Rebar Provision:  http://rebar.digital/
RackN: https://www.rackn.com/

Digital Rebar v3.1 Release Annoucement

We’ve made open network provisioning radically simpler.  So simple, you can install in 5 minutes and be provisioning in under 30.  That’s a bold claim, but it’s also an essential deliverable for us to bridge the Ops execution gap in a way that does not disrupt your existing tool chains.

We’ve got a remarkable list of feature additions between Digital Rebar Provision (DRP) v3.0 and v3.1 that take it from basic provision into a powerful distributed infrastructure automation tool.

But first, we need to put v3.1 into a broader perspective: the new features are built from hard learned DevOps lessons.  The v2 combination of integrated provisioning and orchestration meant we needed a lot of overhead like Docker, Compose, PostgreSQL, Consul and RAILS.  That was needed for complex “one-click” cluster builds; however it’s overkill for users of Ansible, Terraform and immutable infrastructure flows.  

The v3 mantra is about starting simple and allowing users to grow automation incrementally.  RackN has been building advanced automation packages and powerful UX management to support that mission.

So what’s in the release?  The v3.0 release focused on getting core Provision infrastructure APIs, process and patterns working as a stand alone service. The v3.1 release targeted major architectural needs to streamline content management, event notification and add out-of-band actions.  

Key v3.1 Features

  • New Mascot and Logo!  We have a cloud native bare metal bear.  DRP fans should ask about stickers and t-shirts. Name coming soon! 
  • Layered Storage System. DRP storage model allows for layered storage tiers to support the content model and a read only base layer. These features allow operators to distribute content in a number of different ways and make field upgrades and multi-site synchronization possible.
  • Content packaging system.  DRP contents API allows operators to manage packages of other models via a single API call.  Content bundles are read-only and versioned so that field upgrades and patches can be distributed.
  • Plug-in system.  DRP allows API extensions and event listeners that are in the same process space as the DRP server.  This enables IPMI extensions and slack notifiers.
  • Stages, Tasks & Jobs.  DRP has a simple work queue system in which tasks are stored and tracked on machines during stages in their boot sequences.  This feature combines server and DRP client actions to create fast, simple and flexible workflows that don’t require agents or SSH access.
  • Websocket API for event subscription.  DRP clients can subscribe to system events using a long term websocket interface.  Subscriptions include filters so that operators can select very narrow notification scopes.
  • Removal of the minimal embedded UI (moving to community hosted UX).   DRP decoupled the user interface from the service API.  This allows features to be added to the UX without having to replace the Service.  This also allows community members to create their own UX.  RackN has agreed to support community users at no cost on a limited version of our commercial UX.

All of these features enable DRP to perform 100% of the hardware provision workflows that our customers need to run a fully autonomous, CI/CD enabled data center.  RackN has been showing examples of Ansible, Kubernetes, and Terraform to Metal integration as a reference implementations.

Getting the physical layer right is critical to closing your infrastructure execution gaps.  DRP v3.1 goes beyond getting it right – it makes it fast, simple and open.  Take a test drive of the open source code or give RackN a call to see our advanced automation demos.

Exploring the Edge Series: “Edge is NOT just Mini-Cloud”

While the RackN team and I have been heads down radically simplifying physical data center automation, I’ve still been tracking some key cloud infrastructure areas.  One of the more interesting ones to me is Edge Infrastructure.

This once obscure topic has come front and center based on coming computing stress from home video, retail machine and distributed IoT.  It’s clear that these are not solved from centralized data centers.

While I’m posting primarily on the RackN.com blog, I like to take time to bring critical items back to my personal blog as a collection.  WARNIING: Some of these statements run counter to other industry.  Please let me know what you think!

Don’t want to read?  Here’s a summary podcast.

Post 1: OpenStack On Edge? 4 Ways Edge Is Distinct From Cloud

By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant.  It seemed as if every session had a 50% mandatory overhead in definitioning.  Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms.  Edge infrastructure has very distinct challenges compared to hyperscale data centers.  Read article for the list...

Post 2: Edge Infrastructure Is Not Just Thousands Of Mini Clouds

Running each site as a mini-cloud is clearly not the right answer.  There are multiple challenges here. First, any scale infrastructure problem must be solved at the physical layer first. Second, we must have tooling that brings repeatable, automation processes to that layer. It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth. These requirements are outside the scope of cloud focused tools.

Post 3: Go CI/CD And Immutable Infrastructure For Edge Computing Management

If “cloudification” is not the solution then where should we look for management patterns?  We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases.  We discussed this at a session at the OpenStack OpenDev Edge summit.

What do YOU think?  This is an evolving topic and it’s time to engage in a healthy discussion.

Data Center Bacon: Terraform to Metal with Digital Rebar

TL;DR: We’ve built a buttery smooth Terraform provider for Bare Metal that runs equally on, of course, servers, Packet.net servers or VirtualBox VMs.  If you like Hashicorp Terraform and want it to own your data center too, then read on.

Deep into the Digital Rebar Provision (DRP) release plan, a customer asked the RackN team to build a Terraform provider for DRP.  They had some very specific requirements that would stress all the new workflows and out-of-band management features in the release: in many ways, this integration is the ultimate proof point for DRP v3.1 because it drives DRP autonomously.

The primary goal was simple: run a data center as a resource pool for Terraform.

Here our CTO, Greg Althaus, giving a short demo of the integration.

Of course, it is not that simple.  Operators need to be able to provide plans to pick correct nodes from resources pools.  Also, the customer request was to deploy both Linux and Windows images based on Packet.  That meant that the system needed both direct-to-disk image writing and cloud-init style post-configuration.  The result is deployments that are blazingly fast (sub 5 minutes) and highly portable.

An additional challenge in building the Terraform Provider is that no one wants to practice building plans against actual servers.  They are way too slow.  We need to be able to build and test the Terraform provider and plans quickly on a laptop or cloud infrastructure like Packet.net.  Our solution was to build parallel out-of-band IPMI type plugins for all three platforms so that the Terraform provider could interact with Digital Rebar Provision consistently regardless of the backing infrastructure.

We were able to build a full fidelity CI/CD pipeline for plans without committing dedicated infrastructure at the dev or test phases.  That is a significant breakthrough.

Terraform is kicking aaS for cluster deployments on cloud and we’re getting some very enthusiastic responses when we describe both the depth and simplicity of integration with Digital Rebar Provision.  We’re actively collecting feedback and testing both new DRP features and Terraform integration so it’s not available for open consumption; however, we very much want to find operators interested in field trials.

Please contact us if Terraform on Metal is interesting.  We’d be happy to show you how it works and discuss our next steps.

Further Listening?  Our Latest Shiny (L8stSh9y) podcast with Greg Althaus and Stephen Spector covers the work.

Go CI/CD and Immutable Infrastructure for Edge Computing Management

In our last post, we pretty much tore apart the idea of running mini-clouds on the edge because they are not designed to be managed at scale in resource constrained environments without deep hardware automation.  While I’m a huge advocate of API-driven infrastructure, I don’t believe in a one-size-fits-all API because a good API provides purpose-driven abstractions.

The logical extension is that having deep hardware automation means there’s no need for cloud (aka virtual infrastructure) APIs.  This is exactly what container-focused customers have been telling us at RackN in regular data centers so we’d expect the same to apply for edge infrastructure.

If “cloudification” is not the solution then where should we look for management patterns?  

We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases.  We discussed this at a session at the OpenStack OpenDev Edge summit.

Continuous Integration / Continuous Delivery (CI/CD) software pipelines help to manage environments where the risk of making changes is significant by breaking the changes into small, verifiable units.  This is essential for edge because lack of physical access makes it very hard to mitigate problems.  Using CI/CD, especially with A/B testing, allows for controlled rolling distribution of new software.  

For example, in a 10,000 site deployment, the CI/CD infrastructure would continuously roll out updates and patches over the entire system.  Small incremental changes reduce the risk of a major flaw being introduced.  The effect is enhanced when changes are rolled slowly over the entire fleet instead of simultaneously rolled out to all sites (known as A/B or blue/green testing).  In the rolling deployment scenario, breaking changes can be detected and stopped before they have significant impacts.

These processes and the support software systems are already in place for large scale cloud software deployments.  There are likely gaps around physical proximity and heterogeneity; however, the process is there and initial use-case fit seems to be very good.

Immutable Infrastructure is a catch-all term for deployments based on images instead of configuration.  This concept is popular in cloud deployments were teams produce “golden” VM or container images that contain the exact version of software needed and then are provisioned with minimal secondary configuration.  In most cases, the images only need a small file injected (known as a cloud init) to complete the process.

In this immutable pattern, images are never updated post deployment; instead, instances are destroyed and recreated.  It’s a deploy, destroy, repeat process.  At RackN, we’ve been able to adapt Digital Rebar Provisioning to support this even at the hardware layer where images are delivered directly to disk and re-provisioning happens on a constant basis just like a cloud managing VMs.

The advantage of the immutable pattern is that we create a very repeatable and controlled environment.  Instead of trying to maintain elaborate configurations and bi-directional systems of record, we can simply reset whole environments.  In a CI/CD system, we constantly generate fresh images that are incrementally distributed through the environment.

Immutable Edge Infrastructure would mean building and deploying complete system images for our distributed environment.  Clearly, this requires moving around larger images than just pushing patches; however, these uploads can easily be staged and they provide critical repeatability in management.  The alternative is trying to keep track of which patches have been applied successfully to distributed systems.  Based on personal experience, having an atomic deliverable sounds very attractive.

CI/CD and Immutable patterns are deep and complex subjects that go beyond the scope of a single post; however, they also offer a concrete basis for building manageable data centers.

The takeaway is that we need to be looking first to scale distributed software management patterns to help build robust edge infrastructure platforms. Picking a cloud platform before we’ve figured out these concerns is a waste of time.

Previous 2 Posts on OpenStack Conference:

Post 1 – OpenStack on Edge? 4 Ways Edge is Distinct from Cloud
Post 2 – Edge Infrastructure is Not Just Thousands of Mini Clouds

Podcast: OpenStack OpenDev Highlights Edge vs Cloud Computing Confusion

Rob Hirschfeld provides his thoughts from last week’s OpenStack OpenDev conference focused on Edge Computing. This podcast is part of a three blog series from Rob on the issues surrounding Edge and Cloud computing:

Post 1 – OpenStack on Edge? 4 Ways Edge is Distinct from Cloud
Post 2 – Edge Infrastructure is Not Just Thousands of Mini Clouds

Edge Infrastructure is Not Just Thousands of Mini Clouds

I left the OpenStack OpenDev Edge Infrastructure conference with a lot of concerns relating to how to manage geographically distributed infrastructure at scale.  We’ve been asking similar questions at RackN as we work to build composable automation that can be shared and reused.  The critical need is to dramatically reduce site-specific customization in a way that still accommodates required variation – this is something we’ve made surprising advances on in Digital Rebar v3.1.

These are very serious issues for companies like AT&T with 1000s of local exchanges, Walmart with 10,000s of in-store server farms or Verizon with 10,000s of coffee shop Wifi zones.  These workloads are not moving into centralized data centers.  In fact, with machine learning and IoT, we are expecting to see more and more distributed computing needs.

Running each site as a mini-cloud is clearly not the right answer.

While we do need the infrastructure to be easily API addressable, adding cloud without fixing the underlying infrastructure management moves us in the wrong direction.  For example, AT&T‘s initial 100+ OpenStack deployments were not field up-gradable and lead to their efforts to deploy OpenStack on Kubernetes; however, that may have simply moved the upgrade problem to a different platform because Kubernetes does not address the physical layer either!

There are multiple challenges here.  First, any scale infrastructure problem must be solved at the physical layer first.  Second, we must have tooling that brings repeatable, automation processes to that layer.  It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth.  These requirements are outside the scope of cloud focused tools.

Containers and platforms like Kubernetes have a significant part to play in this story.  I was surprised that they were present only in a minor way at the summit.  The portability and light footprint of these platforms make them a natural fit for edge infrastructure.  I believe that lack of focus comes from the audience believing (incorrectly) that edge applications are not ready for container management.

With hardware layer control (which is required for edge), there is no need for a virtualization layer to provide infrastructure management.  In fact, “cloud” only adds complexity and cost for edge infrastructure when the workloads are containerized.  Our current cloud platforms are not designed to run in small environments and not designed to be managed in a repeatable way at thousands of data centers.  This is a deep architectural gap and not easily patched.

OpenStack sponsoring the edge infrastructure event got the right people in the room but also got in the way of discussing how we should be solving these operational.  How should we be solving them?  In the next post, we’ll talk about management models that we should be borrowing for the edge…

Read 1st Post of 3 from OpenStack OpenDev: OpenStack on Edge? 4 Ways Edge is Distinct from Cloud

OpenStack on Edge? 4 Ways Edge Is Distinct From Cloud

Last week, I attended a unique OpenDev Edge Infrastructure focused event hosted by the OpenStack Foundation to help RackN understand the challenges customers are facing at the infrastructure edges.  We are exploring how the new lightweight, remote API-driven Digital Rebar Provision can play a unique role in these resource and management constrained environments.

I had also hoped the event part of the Foundation’s pivot towards being an “open infrastructure” community that we’ve seen emerging as the semiannual conferences attract a broader set of open source operations technologies like Kubernetes, Ceph, Docker and SDN platforms.  As a past board member, I believe this is a healthy recognition of how the community uses a growing mix of open technologies in the data center and cloud.

It’s logical for the OpenStack community, especially the telcos, to be leaders in edge infrastructure; unfortunately, that too often seemed to mean trying to “square peg” OpenStack into the every round hole at the Edge.  For companies with a diverse solution portfolio, like RackN, being too myopic on using OpenStack to solve all problems keeps us from listening to the real use-cases.  OpenStack has real utility but there is no one-size-fits all solution (and that goes for Kubernetes too).

By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant.  It seemed as if every session had a 50% mandatory overhead in definitioning.  I heard some very interesting attempts to define edge in terms of 1) resource constraints of the equipment or 2) proximity to data sources or 3) bandwidth limitations to the infrastructure.  All of these are helpful ways to describe edge infrastructure.

Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms.  Edge infrastructure has very distinct challenges compared to hyperscale data centers.  

Here is my definition:

1) Edge is inaccessible by operators so remote lights out operation is required

2) Edge requires distributed scale management because there are many thousands of instances to be managed

3) Edge is heterogeneous because breath of environments and scale imposes variations

4) Edge has a physical location awareness component because proximity matters by design

These four items are hard operational management related challenges.  They are also very distinctive challenges when compared to traditional hyperscale data center operations issues where we typically enjoy easy access, consolidated management, homogeneous infrastructure and equal network access.

In our next post, ….

September 8 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Nora Jones on Establishing, Growing, and Maturing a Chaos Engineering Practice
https://www.infoq.com/podcasts/nora-jones-chaos-engineering

Nora Jones, a senior software engineer on Netflix’ Chaos Team, talks with Wesley Reisz about what Chaos Engineering means today. She covers what it takes to build a practice, how to establish a strategy, defines cost of impact, and covers key technical considerations when leveraging chaos engineering. Read more and listen to podcast

SRE Jobs

I ran a job search on LinkedIn to find the # of available SRE positions currently open; there are 854 positions available as of this morning. Dice.com listed 30,665 positions based on a search. In comparison, DevOps only had 2,975 positions on Dice.com.

Podcast on Ansible, Kubernetes, Kubespray and Digital Rebar

Stephen Spector, HPE Cloud Evangelist talks with Rob Hirschfeld, Co-Founder and CEO RackN about the installation process for Kubernetes using Kubespray, Ansible, and Digital Rebar Provisioning. Additional commentary on the overviews of Kubernetes, Containers, and Installation in this podcast.

_____________

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OTHER NEWSLETTERS

Podcast – Install Kubernetes with Ansible, Kubespray and Digital Rebar Provision

Stephen Spector, HPE Cloud Evangelist talks with Rob Hirschfeld, Co-Founder and CEO RackN about the installation process for Kubernetes using Kubespray, Ansible, and Digital Rebar Provisioning. Additional commentary on the overviews of Kubernetes, Containers, and Installation in this podcast.

More info on Digital Rebar Provisioning

Follow the RackN L8ist Sh9y Podcast