Cloud-first Physical Provisioning? 10 ways that the DR is in to fix your PXE woes.

image

Why has it been so hard to untie from Cobbler? Why can’t we just REST-ify these 1990s Era Protocols? Dealing with the limits of PXE, DHCP and TFTP in wide-ranging data centers is tricky and Cobbler’s manual pre-defined approach was adequate in legacy data centers.

Now, we have to rethink Physical Ops in Cloud-first terms. DevOps and SRE minded operators services that have need real APIs, day-2 ops, security and control as primary design requirements.

The Digital Rebar team at RackN is hunting for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility. Deep within the Digital Rebar full life-cycle hybrid control was a cutting-edge bare metal provisioning utility. As part of our v3 roadmap, we carved out the Provisioner to also work as a stand-alone service.

Here’s 10 reasons why DR Provisioning kicks aaS:

  1. Swagger REST API & CLI. Cloud-first means having a great, tested API. Years of provisioning experience went into this 3rd generation design and it shows. That includes a powerful API-driven DHCP.
  2. Security & Authenticated API. Not an afterthought, we both HTTPS and user authentication for using the API. Our mix of basic and bearer token authentication recognizes that both users and automation will use the API. This brings a new level of security and control to data center provisioning.
  3. Stand-alone multi-architecture Golang binary. There are no dependencies or prerequisites, plus upgrades are drop in replacements. That allows users to experiment isolated on their laptop and then easily register it as a SystemD service.
  4. Nested Template Expansion. In DR Provision, Boot Environments are composed of reusable template snippets. These templates can incorporate global, profile or machine specific properties that enable users to set services, users, security or scripting extensions for their environment.
  5. Configuration at Global, Group/Profile and Node level. Properties for templates can be managed in a wide range of ways that allows operators to manage large groups of servers in consistent ways.
  6. Multi-mode (but optional) DHCP. Network IP allocation is a key component of any provisioning infrastructure; however, DHCP needs are highly site dependant. DR Provision works as a multi-interface DHCP listener and can also resolve addresses from DHCP forwarders. It can even be disabled if your environment already has a DHCP service that can configure a the “next boot” provider.
  7. Dynamic Provisioner templates for TFTP and HTTP. For security and scale, DR Provision builds provisioning files dynamically based on the Boot Environment Template system. This means that critical system information is not written to disk and files do not have to be synchronized. Of course, when you need to just serve a file that works too.
  8. Node Discovery Bootstrapping. Digital Rebar’s long-standing discovery process is enabled in the Provisioner with the included discovery boot environment. That process includes an integrated secure token sequence so that new machines can self-register with the service via the API. This eliminates the need to pre-populate the DR Provision system.
  9. Multiple Seeding Operating Systems. DR Provision comes with a long list of Boot Environments and Templates including support for many Linux flavors, Windows, ESX and even LinuxKit. Our template design makes it easy to expand and update templates even on existing deployments.
  10. Two-stage TFTP/HTTP Boot. Our specialized Sledgehammer and Discovery images are designed for speed with optimized install cycles the improve boot speed by switching from PXE TFTP to IPXE HTTP in a two stage process. This ensures maximum hardware compatibility without creating excess network load.

Digital Rebar Provision is a new generation of data center automation designed for operators with a cloud-first approach. Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols embedded in firmware requirements that are still very much alive.

We invite you to try out Digital Rebar Provision yourself and let us know what you think. It only takes a few minutes. If you want more help, contact RackN for a $1000 Quick Start offer.

Why IBM’s hybrid “no-single-way” is a good plan

I got to spend a few days hearing IBM’s cloud plans at IBM Interconnect including a presentation, dinner and guest blogging.  Read below for links to that content.

As part of their CloudMinds group, we’re encouraged to look at the big picture of the conference and there’s a lot to take in. IBM has serious activity around machine learning, cognitive, serverless, functional languages, block chain, platform and infrastructure as a service. Frankly, that’s a confusing array of technologies.

Does this laundry list of technologies fit into a unified strategy? No, and that’s THE POINT.

Anyone who thinks they can predict a definitive right mix of technologies to solve customer problems is not paying attention to the pace of innovation. IBM is listening to their customers and hearing that needs are expanding not consolidating. In this type of market, limiting choice hurts customers.

That means that a hybrid strategy with overlapping offerings serves their customers interests.

IBM has the luxury and scale of being able to chase multiple technologies to find winners. Of course, there’s a danger of hanging on to losers too long too. So far, it looks like they are doing a good job of riding that sweet spot. Their agility here may be the only way that they can reasonably find a chink in Amazon’s cloud armour.

While the hybrid story is harder to tell, it’s the right one for this market.

Four Posts For Deeper Reading

The posts below cover a broad range of topics! Chris Ferris and I did some serious writing about collaboration and my DevOps/Hybrid post has been getting some attention. It’s all recommended reading so I’ve included some highlights.

#CloudMinds tackle the future of cognitive in Las Vegas huddle

Rob is part of the IBM CloudMinds group that meets occasionally to discuss rising cloud, infrastructure and technology challenges.

“Cognitive cannot and will not exist without trust. Humans will not trust cognitive unless we can show that our cognitive solutions understand them.”

How open communities can hurt, and help, interoperability

“The days of using open software passively from vendors are past, users need to have a voice and opinion about project governance. This post is a joint effort with Rob Hirschfeld, RackN, and Chris Ferris, IBM, based on their IBM Interconnect 2017 “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest?” presentation.”

When DevOps and hybrid collide (2017 trend lines)

“We’ve clearly learned that DevOps automation pays back returns in agility and performance. Originally, small-batch, lean thinking was counter-intuitive. Now it’s time to make similar investments in hybrid automation so that we can leverage the most innovation available in IT today.”

Open Source Collaboration: The Power of No & Interoperability

“Users and operators can put significant pressure on project leaders and vendors to ensure that the platforms are interoperable. “

Hey Dockercon, let’s get Physical!

IMG_20170419_121918Overall, Dockercon did a good job connecting Docker users with information.  In some ways, it was a very “let’s get down to business” conference without the open source collaboration feel of previous events.  For enterprise customers and partners, that may be a welcome change.

Unlike past Dockercons, the event did not have major announcements or a lot of non-Docker ecosystem buzz.  That said, I miss that the event did not have major announcements or a lot of non-Docker ecosystem buzz.

One item that got me excited was an immutable operating system called LinuxKit which is powered by a Packer-like utility called Moby (ok, I know it does more but that’s still fuzzy to me).

RackN CTO, Greg Althaus, was able to turn around a working LinuxKit Kubernetes demo (VIDEO) overnight.  This short video explains Moby & LinuxKit plus uses the new Digital Rebar Provision in an amazing integration.

Want to hear more about immutable operating systems?  Check out our post on RackN’s site about three challenges of running things like LinuxKit, CoreOS Container Linux and RancherOS on metal.

Oh, and YES, that was my 15-year-old daughter giving a presentation at Dockercon about workplace diversity.  I’ll link the video when they’ve posted them.

https://www.slideshare.net/KateHirschfeld/slideshelf

Packet Pushers 333: Orchestration v Automation < YES, this is what we are doing!

Iix34grhy_400x400 highly recommend catching Packet Pushers 333 “Automation & Orchestration In Networking” by Drew Conry-Murray and guests Pete Lumbis and Michael Damkot.

While the discussion is all about NETWORK DevOps, they do a good job of decrying WHY current state of system orchestration is so sad – in a word: heterogeneity.  It’s not going away because the alternative is lock-in.  They also do a good job of describing the difference between automation and orchestration; however, I think there’s a middle tier  of resource “scheduling” that better describes OpenStack and Kubernetes.

Around 5:00 minutes into the podcast, they effectively describe the composable design of Digital Rebar and the rationale for the way that we’ve abstracted interfaces for automation.  If you guys really do want to cash in by consulting with it (at 10 minutes), just give me a call.

It’s great to hear acknowledgement of both the complexity and need for solving these problems.   Thanks for the great podcast Drew, Pete and Michael!

Oh… and I’m going to be presenting at Interop ITX also.  Hopefully, I’ll get a chance to talk 1×1 with Drew.

Need PXE? Try out this Cobbler Replacement

DR Provision

Operators & SREs – we need your feedback on an open DHCP/PXE technical preview that will amaze you and can be easily tested right from your laptop.

We wanted to make open basic provisioning API-driven, secure, scalable and fast.  So we carved out the Provision & DHCP services as a stand alone unit from the larger open Digital Rebar project.  While this Golang service lacks orchestration, this complete service is part of Digital Rebar infrastructure and supports the discovery boot process, templating, security and extensive image library (Linux, ESX, Windows, … ) from the main project.

TL;DR: FIVE MINUTES TO REPLACE COBBLER?  YES.

The project APIs and CLIs are complete for all provisioning functions with good Swagger definitions and docs.  After all, it’s third generation capability from the Digital Rebar project.  The integrated UX is still evolving.

Here’s a video of the quick install process.

 

Here are some examples from the documentation:

core_servicesinstall_discovered

Starting Weekly SRE Update Posts!

sre-seriesEveryone at RackN is excited to talk about Site Reliability Engineering and we’re spinning up several efforts including an interview series about it (contact me if you want talk!).

Our first deliverable is a weekly SRE industry round up blog post (our first one!)  You can subscribe to the updates on the RackN site.

Let us know if you find news that we should include by posting comments there or tweeting to @RackNGo.

If you want to hear more of my regular opinionated stuff, never fear!  I’ve got some fun “server cage match” material coming together around DevOpsDays Austin talking about how Cloud Native, DevOps and SRE align.  You can subscribe to this blog too (button on the left)- we generally don’t double post material, so you’ll get fresh insights from both.

 

10x Faster Today but 10x Harder to Maintain Tomorrow: the Cul-De-Sac problem

I’ve been digging into what it means to be a site reliability engineer (SRE) and thinking about my experience trying to automate infrastructure in a way to scales dramatically better.  I’m not thinking about scale in number of nodes, but in operator efficiency.  The primary way to create that efficiency is limit site customization and to improve reuse.  Those changes need to start before the first install.

As an industry, we must address the “day 2” problem in collaboratively developed open software before users’ first install.

Recently, RackN asked the question “Shouldn’t we have Shared Automation for Commodity Infrastructure?” which talked about fact that we, as an industry, keep writing custom automation for what should be commodity servers.  This “snow flaking” happens because there’s enough variation at the data center system level that it’s very difficult to share and reuse automation on an ongoing basis.

Since variation enables innovation, we need to solve this problem without limiting diversity of choice.

 

Happily, platforms like Kubernetes are designed to hide these infrastructure variations for developers.  That means we can expect a productivity explosion for the huge number of applications that can narrowly target platforms.  Unfortunately, that does nothing for the platforms or infrastructure bound applications.  For this lower level software, we need to accept that operations environments are heterogeneous.

 

I realized that we’re looking at a multidimensional problem after watching communities like OpenStack struggle to evolve operations practice.

It’s multidimensional because we are building the operations practice simultaneously with the software itself.  To make things even harder, the infrastructure and dependencies are also constantly changing.  Since this degree of rapid multi-factor innovation is the new normal, we have to plan that our operations automation itself must be as upgradable.

If we upgrade both the software AND the related deployment automation then each deployment will become a cul-de-sac after day 1.

For open communities, that cul-de-sac challenge limits projects’ ability to feed operational improvements back into the user base and makes it harder for early users to stay current.  These challenges limit the virtuous feedback cycles that help communities grow.  

The solution is to approach shared project deployment automation as also being continuously deployed.

This is a deceptively hard problem.

This is a hard problem because each deployment is unique and those differences make it hard to absorb community advances without being constantly broken.  That is one of the reasons why company opt out of the community and into vendor distributions. While Vendors are critical to the ecosystem, the practice ultimately limits the growth and health of the community.

Our approach at RackN, as reflected in open Digital Rebar, is to create management abstractions that isolate deployment variables based on system level concerns.  Unlike project generated templates, this approach absorbs heterogeneity and brings in the external information that often complicate project deployment automation.  

We believe that this is a general way to solve the broader problem and invite you to participate in helping us solve the Day 2 problems that limit our open communities.