Data Center Bacon: Terraform to Metal with Digital Rebar

Posted on September 19, 2017 by Rob H

TL;DR: We’ve built a buttery smooth Terraform provider for Bare Metal that runs equally on, of course, servers, Packet.net servers or VirtualBox VMs. If you like Hashicorp Terraform and want it to own your data center too, then read on.

Deep into the Digital Rebar Provision (DRP) release plan, a customer asked the RackN team to build a Terraform provider for DRP. They had some very specific requirements that would stress all the new workflows and out-of-band management features in the release: in many ways, this integration is the ultimate proof point for DRP v3.1 because it drives DRP autonomously.

The primary goal was simple: run a data center as a resource pool for Terraform.

Here our CTO, Greg Althaus, giving a short demo of the integration.

Of course, it is not that simple. Operators need to be able to provide plans to pick correct nodes from resources pools. Also, the customer request was to deploy both Linux and Windows images based on Packet. That meant that the system needed both direct-to-disk image writing and cloud-init style post-configuration. The result is deployments that are blazingly fast (sub 5 minutes) and highly portable.

An additional challenge in building the Terraform Provider is that no one wants to practice building plans against actual servers. They are way too slow. We need to be able to build and test the Terraform provider and plans quickly on a laptop or cloud infrastructure like Packet.net. Our solution was to build parallel out-of-band IPMI type plugins for all three platforms so that the Terraform provider could interact with Digital Rebar Provision consistently regardless of the backing infrastructure.

We were able to build a full fidelity CI/CD pipeline for plans without committing dedicated infrastructure at the dev or test phases. That is a significant breakthrough.

Terraform is kicking aaS for cluster deployments on cloud and we’re getting some very enthusiastic responses when we describe both the depth and simplicity of integration with Digital Rebar Provision. We’re actively collecting feedback and testing both new DRP features and Terraform integration so it’s not available for open consumption; however, we very much want to find operators interested in field trials.

Please contact us if Terraform on Metal is interesting. We’d be happy to show you how it works and discuss our next steps.

Further Listening? Our Latest Shiny (L8stSh9y) podcast with Greg Althaus and Stephen Spector covers the work.

Go CI/CD and Immutable Infrastructure for Edge Computing Management

Posted on September 15, 2017 by Rob H

In our last post, we pretty much tore apart the idea of running mini-clouds on the edge because they are not designed to be managed at scale in resource constrained environments without deep hardware automation. While I’m a huge advocate of API-driven infrastructure, I don’t believe in a one-size-fits-all API because a good API provides purpose-driven abstractions.

The logical extension is that having deep hardware automation means there’s no need for cloud (aka virtual infrastructure) APIs. This is exactly what container-focused customers have been telling us at RackN in regular data centers so we’d expect the same to apply for edge infrastructure.

If “cloudification” is not the solution then where should we look for management patterns?

We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases. We discussed this at a session at the OpenStack OpenDev Edge summit.

Continuous Integration / Continuous Delivery (CI/CD) software pipelines help to manage environments where the risk of making changes is significant by breaking the changes into small, verifiable units. This is essential for edge because lack of physical access makes it very hard to mitigate problems. Using CI/CD, especially with A/B testing, allows for controlled rolling distribution of new software.

For example, in a 10,000 site deployment, the CI/CD infrastructure would continuously roll out updates and patches over the entire system. Small incremental changes reduce the risk of a major flaw being introduced. The effect is enhanced when changes are rolled slowly over the entire fleet instead of simultaneously rolled out to all sites (known as A/B or blue/green testing). In the rolling deployment scenario, breaking changes can be detected and stopped before they have significant impacts.

These processes and the support software systems are already in place for large scale cloud software deployments. There are likely gaps around physical proximity and heterogeneity; however, the process is there and initial use-case fit seems to be very good.

Immutable Infrastructure is a catch-all term for deployments based on images instead of configuration. This concept is popular in cloud deployments were teams produce “golden” VM or container images that contain the exact version of software needed and then are provisioned with minimal secondary configuration. In most cases, the images only need a small file injected (known as a cloud init) to complete the process.

In this immutable pattern, images are never updated post deployment; instead, instances are destroyed and recreated. It’s a deploy, destroy, repeat process. At RackN, we’ve been able to adapt Digital Rebar Provisioning to support this even at the hardware layer where images are delivered directly to disk and re-provisioning happens on a constant basis just like a cloud managing VMs.

The advantage of the immutable pattern is that we create a very repeatable and controlled environment. Instead of trying to maintain elaborate configurations and bi-directional systems of record, we can simply reset whole environments. In a CI/CD system, we constantly generate fresh images that are incrementally distributed through the environment.

Immutable Edge Infrastructure would mean building and deploying complete system images for our distributed environment. Clearly, this requires moving around larger images than just pushing patches; however, these uploads can easily be staged and they provide critical repeatability in management. The alternative is trying to keep track of which patches have been applied successfully to distributed systems. Based on personal experience, having an atomic deliverable sounds very attractive.

CI/CD and Immutable patterns are deep and complex subjects that go beyond the scope of a single post; however, they also offer a concrete basis for building manageable data centers.

The takeaway is that we need to be looking first to scale distributed software management patterns to help build robust edge infrastructure platforms. Picking a cloud platform before we’ve figured out these concerns is a waste of time.

Previous 2 Posts on OpenStack Conference:

Post 1 – OpenStack on Edge? 4 Ways Edge is Distinct from Cloud
Post 2 – Edge Infrastructure is Not Just Thousands of Mini Clouds

Edge Infrastructure is Not Just Thousands of Mini Clouds

Posted on September 14, 2017 by Rob H

I left the OpenStack OpenDev Edge Infrastructure conference with a lot of concerns relating to how to manage geographically distributed infrastructure at scale. We’ve been asking similar questions at RackN as we work to build composable automation that can be shared and reused. The critical need is to dramatically reduce site-specific customization in a way that still accommodates required variation – this is something we’ve made surprising advances on in Digital Rebar v3.1.

These are very serious issues for companies like AT&T with 1000s of local exchanges, Walmart with 10,000s of in-store server farms or Verizon with 10,000s of coffee shop Wifi zones. These workloads are not moving into centralized data centers. In fact, with machine learning and IoT, we are expecting to see more and more distributed computing needs.

Running each site as a mini-cloud is clearly not the right answer.

While we do need the infrastructure to be easily API addressable, adding cloud without fixing the underlying infrastructure management moves us in the wrong direction. For example, AT&T‘s initial 100+ OpenStack deployments were not field up-gradable and lead to their efforts to deploy OpenStack on Kubernetes; however, that may have simply moved the upgrade problem to a different platform because Kubernetes does not address the physical layer either!

There are multiple challenges here. First, any scale infrastructure problem must be solved at the physical layer first. Second, we must have tooling that brings repeatable, automation processes to that layer. It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth. These requirements are outside the scope of cloud focused tools.

Containers and platforms like Kubernetes have a significant part to play in this story. I was surprised that they were present only in a minor way at the summit. The portability and light footprint of these platforms make them a natural fit for edge infrastructure. I believe that lack of focus comes from the audience believing (incorrectly) that edge applications are not ready for container management.

With hardware layer control (which is required for edge), there is no need for a virtualization layer to provide infrastructure management. In fact, “cloud” only adds complexity and cost for edge infrastructure when the workloads are containerized. Our current cloud platforms are not designed to run in small environments and not designed to be managed in a repeatable way at thousands of data centers. This is a deep architectural gap and not easily patched.

OpenStack sponsoring the edge infrastructure event got the right people in the room but also got in the way of discussing how we should be solving these operational. How should we be solving them? In the next post, we’ll talk about management models that we should be borrowing for the edge…

Read 1st Post of 3 from OpenStack OpenDev: OpenStack on Edge? 4 Ways Edge is Distinct from Cloud

OpenStack on Edge? 4 Ways Edge Is Distinct From Cloud

Posted on September 13, 2017 by Rob H

Last week, I attended a unique OpenDev Edge Infrastructure focused event hosted by the OpenStack Foundation to help RackN understand the challenges customers are facing at the infrastructure edges. We are exploring how the new lightweight, remote API-driven Digital Rebar Provision can play a unique role in these resource and management constrained environments.

I had also hoped the event part of the Foundation’s pivot towards being an “open infrastructure” community that we’ve seen emerging as the semiannual conferences attract a broader set of open source operations technologies like Kubernetes, Ceph, Docker and SDN platforms. As a past board member, I believe this is a healthy recognition of how the community uses a growing mix of open technologies in the data center and cloud.

It’s logical for the OpenStack community, especially the telcos, to be leaders in edge infrastructure; unfortunately, that too often seemed to mean trying to “square peg” OpenStack into the every round hole at the Edge. For companies with a diverse solution portfolio, like RackN, being too myopic on using OpenStack to solve all problems keeps us from listening to the real use-cases. OpenStack has real utility but there is no one-size-fits all solution (and that goes for Kubernetes too).

By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant. It seemed as if every session had a 50% mandatory overhead in definitioning. I heard some very interesting attempts to define edge in terms of 1) resource constraints of the equipment or 2) proximity to data sources or 3) bandwidth limitations to the infrastructure. All of these are helpful ways to describe edge infrastructure.

Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms. Edge infrastructure has very distinct challenges compared to hyperscale data centers.

Here is my definition:

1) Edge is inaccessible by operators so remote lights out operation is required

2) Edge requires distributed scale management because there are many thousands of instances to be managed

3) Edge is heterogeneous because breath of environments and scale imposes variations

4) Edge has a physical location awareness component because proximity matters by design

These four items are hard operational management related challenges. They are also very distinctive challenges when compared to traditional hyperscale data center operations issues where we typically enjoy easy access, consolidated management, homogeneous infrastructure and equal network access.

In our next post, ….

September 8 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on September 8, 2017 by Rob H

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Nora Jones on Establishing, Growing, and Maturing a Chaos Engineering Practice
https://www.infoq.com/podcasts/nora-jones-chaos-engineering

Nora Jones, a senior software engineer on Netflix’ Chaos Team, talks with Wesley Reisz about what Chaos Engineering means today. She covers what it takes to build a practice, how to establish a strategy, defines cost of impact, and covers key technical considerations when leveraging chaos engineering. Read more and listen to podcast

SRE Jobs

I ran a job search on LinkedIn to find the # of available SRE positions currently open; there are 854 positions available as of this morning. Dice.com listed 30,665 positions based on a search. In comparison, DevOps only had 2,975 positions on Dice.com.

Podcast on Ansible, Kubernetes, Kubespray and Digital Rebar

Stephen Spector, HPE Cloud Evangelist talks with Rob Hirschfeld, Co-Founder and CEO RackN about the installation process for Kubernetes using Kubespray, Ansible, and Digital Rebar Provisioning. Additional commentary on the overviews of Kubernetes, Containers, and Installation in this podcast.

_____________

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

HashiConf 2017 – Sept 18 – 20 : https://www.hashiconf.com/
SDXe – Sept 25 – 27 : Panel: Do It Like a Hyperscaler: Engineernig Containers for Security and Hostile Multi-tenancy Sept 27 at 1pm
DevOps Summit – Oct 31 – Nov 2: Rob Hirschfeld Talk

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #87
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

Podcast – Install Kubernetes with Ansible, Kubespray and Digital Rebar Provision

Posted on September 7, 2017 by Rob H

More info on Digital Rebar Provisioning

Follow the RackN L8ist Sh9y Podcast

Home Page
RSS Feed
iTunes Feed Coming Soon!

September 1 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on September 1, 2017 by Rob H

Image from @DevOpsDaysDFW

10 Essential Skills of a Site Reliability Engineer (SRE) by AppDynamics
https://cloud.kapostcontent.net/pub/1418185e-b325-49d3-b65c-de338e45cb6f/ebook-10-essential-skills-of-a-site-reliability-engineer-sre.pdf

Almost overnight, it seems that Site Reliability Engineer (SRE) has become one of the hottest job titles across the IT Industry. So why all the sudden buzz and momentum around the SRE role? READ MORE

DevOps Tool Market Size Applications 2017 to 2022
http://www.tradecalls.org/2017-08-31-devops-tool-market

Global DevOps Tool Market Research Report 2017 to 2022 presents an in-depth assessment of the DevOps Tool Market including enabling technologies, key trends, market drivers, challenges, standardization, regulatory landscape, deployment models, operator case studies, opportunities, future roadmap, value chain, ecosystem player profiles and strategies. The report also presents forecasts for DevOps Tool Market investments from 2017 till 2022.

READ REPORT

Don’t be ageist: In the DevOps era, experience matters by @Jenz514
https://techbeacon.com/dont-be-ageist-devops-era-experience-matters

When it comes to attitudes toward age, DevOps is a lot like IT in general, but possibly more so. Defenders of an IT workforce that skews young have always noted that technology changes quickly, skills must be updated rapidly, business demands evolve fast, and long workdays just don’t appeal to professionals who have families to go home to. All of that may ratchet up even higher in DevOps culture. READ MORE

L8ist Sh9y Podcast : Digital Rebar and Terraform Provisioning
Blog Link http://bit.ly/2xPILHb

Stephen Spector, HPE Cloud Evangelist talks with Greg Althaus, CTO and Co-Founder of RackN about how the Digital Rebar Provisioning solution provides bare metal server support for the HashiCorp Terraform Solution.

_____________

Subscribe to our new daily DevOps, SRE, & Operations Newsletter
_____________

UPCOMING EVENTS

OpenDev Conf – Sept 7 – 8 : FAQ
DevOps Summit – Oct 31 – Nov 2: Rob Hirschfeld Talk

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #86
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

Podcast – Terraform and Digital Rebar Provision Bare Metal

Posted on August 31, 2017 by Rob H

In this podcast, Stephen Spector, HPE Cloud Evangelist and Greg Althaus, Co-Founder and CTO RackN, talk about the integration point for Digital Rebar Provisioning with the Terraform solution. The specific focus is on delivering bare metal provisioning to users of Terraform.

About Terraform (LINK)

Terraform enables you to safely and predictably create, change, and improve production infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.

More info on Digital Rebar Provisioning

Follow the RackN L8ist Sh9y Podcast

Home Page
RSS Feed
iTunes Feed Coming Soon!

August 25 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on August 25, 2017 by Rob H

SRE Items of the Week

Image from @lasseoe

What is “Site Reliability Engineering?”
https://landing.google.com/sre/interview/ben-treynor.html

In this interview, Ben Treynor shares his thoughts with Niall Murphy about what Site Reliability Engineering (SRE) is, how and why it works so well, and the factors that differentiate SRE from operations teams in industry. READ MORE

Podcast: A Nice Mix of Ansible and Digital Rebar
http://bit.ly/2vkBYEe

Follow our new L8ist Sh9y Podcast on SoundCloud at https://soundcloud.com/user-410091210.

Digital Rebar Mascot Naming

Next week the Digital Rebar community will be finalizing the name for our mascot

Several possible names are listed on a recent blog post for your consideration. Please tweet to @DigitalRebar any ideas you have as we will be choosing a name next week via a Twitter poll.

Digital Rebar v3 Provision
http://rebar.digital/

Digital Rebar is the open, fast and simple data center provisioning and control scaffolding designed with a cloud native architecture.

Our extensible stand-alone DHCP/PXE/IPXE service has minimal overhead so it can be installed and provisioning in under 5 minutes on a laptop, RPi or switch. From there, users can add custom or pre-packaged workflows for full life-cycle automation using our API and CLI or a community UX.

A cloud native bare metal approach provides API-driven infrastructure-as-code automation without locking you into a specific hardware platform, operating system or configuration model.

For physical infrastructure provisioning, Digital Rebar replaces Cobbler, Foreman, MaaS or similar with the added bonus of being able to include simple control workflows for RAID, IPMI and BIOS configuration. We also provide event driven actions via websockets API and a simple plug-in model. By design, Digital Rebar is not opinionated about scripting tools so you can mix and match Chef, Puppet, Ansible, SaltStack and even Bash.

Next version: release of v3.1 is anticipated on 9/4/2017.

UPCOMING EVENTS

DevOpsDays Dallas – August 29 – 30: Rob Hirschfeld Talk
OpenDev Conf – Sept 7 – 8 : FAQ
DevOps Summit – Oct 31 – Nov 2: Rob Hirschfeld Talk

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #85
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK