Week in Review: Building Bridges between DevOps and Architects

Welcome to our new format for the RackN and Digital Rebar Weekly Review. It contains the same great information you are accustomed to; however, I have reorganized it to place a new section at the start with my thoughts on various topics. You can still find the latest news items related to Edge, DevOps and other relevant topics below.

Building Bridges between Operators, Developers and Architects

DevOps is not enough. Infrastructure architects play a key role and are often not considered. It is our experience that bringing the architect together with DevOps team leads to the optimal solution. Hear from Rob Hirschfeld on why this is critical for success:

Redefining PXE Provisioning for the Modern Data Center

Over the past 20 years, Linux admins have defined provisioning with a limited scope; PXE boot with Cobbler. This approach continues to be popular today even though it only installs an operating system limiting the operators’ ability to move beyond this outdated paradigm.

Digital Rebar is the answer operators have been looking for as provisioning has taken on a new role within the data center to include workflow management, infrastructure automation, bare metal, virtual machines inside and outside the firewall as well as the coming need for edge IoT management.

Read More


News

RackN

Digital Rebar Community

L8ist Sh9y Podcast

Social Media

RackN and Digital Rebar Philosophy of Provisioning

Re-defining physical automation to make it highly repeatable and widely consumable while also meeting the necessarily complex and evolving heterogeneous data center environment is the challenge the RackN team is solving. To meet this challenge, we have developed a unique philosophy in how we build our technology; both open source Digital Rebar and the additional RackN packages.

  • Stand-alone Provisioning
  • Building Software from the API
  • Single Golang Executable
  • Modular Components – Composable Content
  • Operator Defined Workflows
  • Immutable Infrastructure
  • Distributed or Consolidated Architectures

Stand-alone Provisioning

It is critical that Digital Rebar Provision (DRP) provides operators the maximum flexibility in terms of where to run the service (Server, Top-of-Rack Switch, ARM, Intel, etc) as well as removal of any dependencies that might restrict its deployment.  Each environment has it’s own unique Infrastructure DNA; the hardware, operating systems, and application stacks that drive the Infrastructure underlay.

Building Software from the API

The Digital Rebar Provision solution is built with an API first mentality.  Features and enhancements are implemented as an API (making it a first-class citizen), and the CLI is dynamically generated from the API which insures 100% coverage of API implementations within the CLI.  

This methodology also allows for the CLI to directly follow the structure and syntax of the API, making it easy for an Operator or Developer to understand and flexibly interchange the API and CLI syntax.  

At RackN we believe in strongly in the 12-Factor App methodology for designing modern software.  DRP is a direct reflection of these principles.

Single Golang Executable

DRP is built with Golang which is a modern Procedural language that is easily cross-compiled for multiple operating systems and processor architectures.  As a benefit, the DRP service and CLI tool (dr-provision and drpcli respectively) can run on platforms that range from small Raspberry Pi embedded systems, network switches at the Top-of-Rack, huge Hyper Converged Infrastructure (HCI) servers, to everything in between.  It is currently compiled and runs on Linux (arm, intel, 32 bit, and 64 bit), Mac OS X (64 bit), and Windows (64 bit).

The dr-provision binary is very small and lightweight, requiring almost zero external dependencies.  Current external dependencies are unzip, pk7zip, and bsdtar, and these dependencies should be removed in a future version.  At only 30 MByte in size, it requires fairly little resources to run.  

Modular Components ~ Composable Content

Modular architecture allows us to create complex solutions from a set of simple building blocks that offer functionality that is well tested. Breaking complex problems down in to small components, and then allowing strong templating capabilities creates a structure that allows for strong reuse patterns.   This approach permeates all of the “Content” components that create the foundational building blocks for composable provisioning activities.  

Operator Defined Workflows

Each environment has a unique set of services, applications, tooling, and practices for managing the Infrastructure.  Taking the concepts of Composable Content, we allow an operator or developer a flexible structure in which they have control in determining how loosely or tightly to integrate the DRP provisioning services in to their environment.  Every customer environment has a unique set of tools, and this methodology allows for smooth integration with those operational principles

Immutable Infrastructure    

Maintaining hardware and software in a massive data center or cloud is a significant challenge without the additional overhead of ensuring that patches are properly applied. Any changes to an active solution can introduce complications on a live system which is a major barrier to having security updates and other patches completed in a timely manner.

A better method is to only deploy a “golden image” to the live system and rather than patch each individual instance, simply tear down the instance and replace with a new copy of the “golden image.”  All patches can be applied and tested to create a new golden image which is easily rolled out in the create – destroy- re-create model of  immutability.

Distributed or Consolidated Architectures

Traditional data center and lab environments utilize centralized provisioning services.  While DRP has strong support for this scale-up or consolidated model, shifting patterns in application and service deployment topology dictates an evolving provisioning service solution.  Current Internet-of-Things (IoT), Edge, and Fog architectures distribute resources across disperse environments.

In the traditional model, a large scale operator might support a handful of datacenters with 10s of thousands of hosts in each facility.   These new trending architecture patterns can encompass 1000s of different locations, each hosting a few dozen to a few hundred hosts.  This shift creates significant burden on operational and infrastructure management tooling to support the complexities of these scale-out designs.

With strong multi-endpoint management tooling, the RackN portal can easily support both models for provisioning.  Long-lived scale-up environments with a service that is updated, upgraded, managed, loved, and cared for can exist seamlessly alongside environments with a create/destroy pattern that treats 1000s of provisioning endpoints as disposable assets.

100,000 of Anything is Hard – Scaling Concerns for Digital Rebar Architecture

Our architectural plans for Digital Rebar are beyond big – they are for massive distributed scale. Not up, but out. We are designing for the case where we have common automation content packages distributed over 100,000 stand-alone sites (think 5G cell towers) that are not synchronously managed. In that case, there will be version drift between the endpoints and content. For example, we may need to patch an installation script quickly over a whole fleet but want to upgrade the endpoints more slowly.

It’s a hard problem and it’s why we’ve focused on composable systems and fine-grain versioning.

It’s also part of the RackN move into a biweekly release cadence for Digital Rebar. That means that we are iterating from tip development to stable every two weeks. It’s fast because we don’t want operators deploying the development “tip” to access features or bug fixes.

This works for several reasons. First, much of the Digital Rebar value is delivered as content instead of in the scaffolding. Each content package has it’s own version cycle and is not tied to Digital Rebar versions. Second, many Digital Rebar features are relatively small, incremental additions. Faster releases allows content creators and operators to access that buttery goodness more quickly without trying to manage the less stable development tip.

Critical enablers for this release pace are feature flags. Starting in v3.2, Digital Rebar introduced the system level tags that are set when new features are added. These flags allow content developers to introspect the system in a multi-version way to see which behaviors are available in each endpoint. This is much more consistent and granular than version matching.

We are not designing a single endpoint system: we are planning for content that spans 1,000s of endpoints.

Feature flags are part of our 100,000 endpoint architecture thinking. In large scale systems, there can be significant version drift within a fleet deployment. We have to expect that automation designers want to enable advanced features before they are universally deployed in the fleet. That means that the system needs a way to easily advertise specific capabilities internally. Automation can then be written with different behaviors depending on the environment. For example, changing exit codes could have broken existing scripts except that scripts used flags to determine which codes were appropriate for the system. These are NOT API issues that work well with semantic versioning (semver), they are deeper system behaviors.

This matters even if you only have a single endpoint because it also enables sharing in the Digital Rebar community.

Without these changes, composable automation designed for the Digital Rebar community would quickly become very brittle and hard to maintain. Our goal is to ensure a decoupling of endpoint and content. This same benefit allows the community to share packages and large scale sites to coordinate upgrades. I don’t think that we’re done yet. This is a hard problem and we’re still evolving all the intricacies of updating and delivering composable automation.

It’s the type of complex, operational thinking that excites the RackN engineering team. I hope it excites you too because we’d love to get your thinking on how to make it even better!

 

Data Center’s Last Mile: Zero Touch Metal Automation

The embedded video is an excellent RackN and Digital Rebar overview created by Rob Hirschfeld and Greg Althaus, co-founders of RackN on the critical issue facing data center operations teams. Their open-source based offering completes the integration challenge existing between platforms/orchestration tools and control/provision technology.

By integrating with the platform and orchestration solutions, RackN is able to replace the control and provisioning tools without adding complexity or replacing established technology.

Watch the complete video below as Rob Hirschfeld provides the background of how RackN arrived at the current offering and the benefits for data center operators to support bare metal provisioning as well as immutable infrastructure. (Slides)

The demonstration video referenced in this overview:

The white paper referenced in this overview:

 

 

 

 

 

 

 

Have more questions? Contact us at sales@rackn.com or via social media on Twitter at @rackngo to learn more.

Podcast – Terraform and Digital Rebar Provision Bare Metal

In this podcast, Stephen Spector, HPE Cloud Evangelist and Greg Althaus, Co-Founder and CTO RackN, talk about the integration point for Digital Rebar Provisioning with the Terraform solution. The specific focus is on delivering bare metal provisioning to users of Terraform.

About Terraform (LINK)

Terraform enables you to safely and predictably create, change, and improve production infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

More info on Digital Rebar Provisioning

Follow the RackN L8ist Sh9y Podcast

Podcast – A Nice Mix of Ansible and Digital Rebar

Rob Hirschfeld, CEO and Co-Founder, RackN talks with Stephen Spector, HPE Cloud Evangelist about the recent uptake in Ansible news as well as how Digital Rebar Provision assists Ansible users.

Listen to the 9 minute podcast here:

As this is the launch of L8ist Sh9y Podcast from RackN we encourage you to visit our site at https://soundcloud.com/user-410091210 or subscribe to the RSS Feed. We will also be publishing on iTunes as well shortly.

Coal or Diamonds? Configuration Management is Under Pressure

Cloud Native thinking is thankfully changing the way we approach traditional IT infrastructure.  These profound changes in how we build applications with 12-factor design and containers has deep implications on how we manage configuration and the tools we use to do it.  These are not cloud only impacts – the changes impact every corner of IT data centers.

“You still have to do configuration management but… we’re getting to a point we can do a lot less” (8:30)

Configuration Management is both necessary and very hard. I’ve written and spoken about the developer rebellion against Infrastructure (and will again at DOD Dallas!).  The TL;DR on that lightning talk is “infrastructure sucks.”

In this podcast, Eric and I have time to stretch out and really discuss what’s going on with in both broad and specific terms.  At the 15 minute mark, we start talking about how “radical simplicity” is coming to provisioning and deployment automation.  We break down how the business needs for repeatable and robust automation are driving IT to rethinking huge swaths of their infrastructures.  That transitions into making a whole data center into a CI/CD pipeline.

Podcast: Episode 15: The Death of Configuration Management with Rob Hirschfeld 

“If we have radically better control of the physical infrastructure, then I don’t need anything else to install Kubernetes.” (22:00)

Like always, Eric and I are not shy about taking on IT hot topics.  Dig deep, enjoy and let us know what YOU think about these topics.  We want to hear from you.

Cloud Native PHYSICAL PROVISIONING? Come on! Really?!

We believe Cloud Native development disciplines are essential regardless of the infrastructure.

imageToday, RackN announce very low entry level support for Digital Rebar Provisioning – the RESTful Cobbler PXE/DHCP replacement.  Having a company actually standing behind this core data center function with support is a big deal; however…

We’re making two BIG claims with Provision: breaking DevOps bottlenecks and cloud native physical provisioning.  We think both points are critical to SRE and Ops success because our current approaches are not keeping pace with developer productivity and hardware complexity.

I’m going to post more about Provision can help address the political struggles of SRE and DevOps that I’ve been watching in our industry.   A hint is in the release, but the Cloud Native comment needs to be addressed.

First, Cloud Native is an architecture, not an infrastructure statement.

There is no requirement that we use VMs or AWS in Cloud Native.  From that perspective, “Cloud” is a useful but deceptive adjective.  Cloud Native is born from applications that had to succeed in hands-off, lower SLA infrastructure with fast delivery cycles on untrusted systems.  These are very hostile environments compared to “legacy” IT.

What makes Digital Rebar Provision Cloud Native?  A lot!

The following is a list of key attributes I consider essential for Cloud Native design.

Micro-services Enabled: The larger Digital Rebar project is a micro-services design.  Provision reflects a stand-alone bundling of two services: DHCP and Provision.  The new Provision service is designed to both stand alone (with embedded UX) and be part of a larger system.

Swagger RESTful API: We designed the APIs first based on years of experience.  We spent a lot of time making sure that the API conformed to spec and that includes maintaining the Swagger spec so integration is easy.

Remote CLI: We build and test our CLI extensively.  In fact, we expect that to be the primary user interface.

Security Designed In: We are serious about security even in challenging environments like PXE where options are limited by 20 year old protocols.  HTTPS is required and user or bearer token authentication is required.  That means that even API calls from machines can be secured.

12 Factor & API Config: There is no file configuration for Provision.  The system starts with command line flags or environment variables.  Deeper configuration is done via API/CLI.  That ensures that the system can be fully managed by remote and configured securely becausee credentials are required for configuration.

Fast Start / Golang:  Provision is a totally self-contained golang app including the UX.  Even so, it’s very small.  You can run it on a laptop from nothing in about 2 minutes including download.

CI/CD Coverage: We committed to deep test coverage for Provision and have consistently increased coverage with every commit.  It ensures quality and prevents regressions.

Documentation In-project Auto-generated: On-boarding is important since we’re talking about small, API-driven units.  A lot of Provisioning documentation is generated directly from the code into the actual project documentation.  Also, the written documentation is in Restructured Text in the project with good indexes and cross-references.  We regenerate the documentation with every commit.

We believe these development disciplines are essential regardless of the infrastructure.  That’s why we made sure the v3 Provision (and ultimately every component of Digital Rebar as we iterate to v3) was built to these standards.

What do you think?  Is this Cloud Native?  What did we miss?

Cloud-first Physical Provisioning? 10 ways that the DR is in to fix your PXE woes.

image

Why has it been so hard to untie from Cobbler? Why can’t we just REST-ify these 1990s Era Protocols? Dealing with the limits of PXE, DHCP and TFTP in wide-ranging data centers is tricky and Cobbler’s manual pre-defined approach was adequate in legacy data centers.

Now, we have to rethink Physical Ops in Cloud-first terms. DevOps and SRE minded operators services that have need real APIs, day-2 ops, security and control as primary design requirements.

The Digital Rebar team at RackN is hunting for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility. Deep within the Digital Rebar full life-cycle hybrid control was a cutting-edge bare metal provisioning utility. As part of our v3 roadmap, we carved out the Provisioner to also work as a stand-alone service.

Here’s 10 reasons why DR Provisioning kicks aaS:

  1. Swagger REST API & CLI. Cloud-first means having a great, tested API. Years of provisioning experience went into this 3rd generation design and it shows. That includes a powerful API-driven DHCP.
  2. Security & Authenticated API. Not an afterthought, we both HTTPS and user authentication for using the API. Our mix of basic and bearer token authentication recognizes that both users and automation will use the API. This brings a new level of security and control to data center provisioning.
  3. Stand-alone multi-architecture Golang binary. There are no dependencies or prerequisites, plus upgrades are drop in replacements. That allows users to experiment isolated on their laptop and then easily register it as a SystemD service.
  4. Nested Template Expansion. In DR Provision, Boot Environments are composed of reusable template snippets. These templates can incorporate global, profile or machine specific properties that enable users to set services, users, security or scripting extensions for their environment.
  5. Configuration at Global, Group/Profile and Node level. Properties for templates can be managed in a wide range of ways that allows operators to manage large groups of servers in consistent ways.
  6. Multi-mode (but optional) DHCP. Network IP allocation is a key component of any provisioning infrastructure; however, DHCP needs are highly site dependant. DR Provision works as a multi-interface DHCP listener and can also resolve addresses from DHCP forwarders. It can even be disabled if your environment already has a DHCP service that can configure a the “next boot” provider.
  7. Dynamic Provisioner templates for TFTP and HTTP. For security and scale, DR Provision builds provisioning files dynamically based on the Boot Environment Template system. This means that critical system information is not written to disk and files do not have to be synchronized. Of course, when you need to just serve a file that works too.
  8. Node Discovery Bootstrapping. Digital Rebar’s long-standing discovery process is enabled in the Provisioner with the included discovery boot environment. That process includes an integrated secure token sequence so that new machines can self-register with the service via the API. This eliminates the need to pre-populate the DR Provision system.
  9. Multiple Seeding Operating Systems. DR Provision comes with a long list of Boot Environments and Templates including support for many Linux flavors, Windows, ESX and even LinuxKit. Our template design makes it easy to expand and update templates even on existing deployments.
  10. Two-stage TFTP/HTTP Boot. Our specialized Sledgehammer and Discovery images are designed for speed with optimized install cycles the improve boot speed by switching from PXE TFTP to IPXE HTTP in a two stage process. This ensures maximum hardware compatibility without creating excess network load.

Digital Rebar Provision is a new generation of data center automation designed for operators with a cloud-first approach. Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols embedded in firmware requirements that are still very much alive.

We invite you to try out Digital Rebar Provision yourself and let us know what you think. It only takes a few minutes. If you want more help, contact RackN for a $1000 Quick Start offer.

LinuxKit and Three Concerns with Physical Provisioning of Immutable Images

DR ProvisionAt Dockercon this week, Docker announced an immutable operating system called LinuxKit which is powered by a Packer-like utility called Moby that RackN CTO, Greg Althaus, explains in the video below.

For additional conference notes, check out Rob Hirschfeld’s Dockercon retro blog post.

Three Concerns with Immutable O/S on Physical

With a mix of excitement and apprehension, the RackN team has been watching physical deployment of immutable operating systems like CoreOS Container Linux and RancherOS.  Overall, we like the idea of a small locked (aka immutable) in-memory image for servers; however, the concept does not map perfectly to hardware.

Note: if you want to provision these operating systems in a production way, we can help you!

These operating systems work on a “less is more” approach that strips everything out of the images to make them small and secure.  

This is great for cloud-first approaches where VM size has a material impact in cost.  It’s particularly matched for container platforms where VMs are constantly being created and destroyed.  In these cases, the immutable image is easy to update and saves money.

So, why does that not work as well on physical?

First:  HA DHCP?!  It’s not as great a map for physical systems where operating system overhead is pretty minimal.  The model requires orchestrated rebooting of your hardware.  It also means that you need a highly available (HA) PXE Provisioning infrastructure (like we’re building with Digital Rebar).

Second: Configuration. That means that they must rely on having cloud-init injected configuration.  In a physical environment, there is no way to create cloud-init like injections without integrating with the kickstart systems (a feature of Digital Rebar Provision).  Further, hardware has a lot more configuration options (like hard drives and network interfaces) than VMs.  That means that we need a robust and system-by-system way to manage these configurations.

Third:  No SSH.  Yes another problem with these minimal images is that they are supposed to eliminate SSH.   Ideally, their image and configuration provides everything required to run the image without additional administration.  Unfortunately, many applications assume post-boot configuration.  That means that people often re-enable SSH to use tools like Ansible.  If it did not conflict with the very nature of the “do-not configure-the-server” immutable model, I would suggest that SSH is a perfectly reasonable requirement for operators running physical infrastructure.

In Summary, even with those issues, we are excited about the positive impact this immutable approach can have on data center operations.

With tooling like Digital Rebar, it’s possible to manage the issues above.  If this appeals to you, let us know!