Cloud-first Physical Provisioning? 10 ways that the DR is in to fix your PXE woes.

image

Why has it been so hard to untie from Cobbler? Why can’t we just REST-ify these 1990s Era Protocols? Dealing with the limits of PXE, DHCP and TFTP in wide-ranging data centers is tricky and Cobbler’s manual pre-defined approach was adequate in legacy data centers.

Now, we have to rethink Physical Ops in Cloud-first terms. DevOps and SRE minded operators services that have need real APIs, day-2 ops, security and control as primary design requirements.

The Digital Rebar team at RackN is hunting for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility. Deep within the Digital Rebar full life-cycle hybrid control was a cutting-edge bare metal provisioning utility. As part of our v3 roadmap, we carved out the Provisioner to also work as a stand-alone service.

Here’s 10 reasons why DR Provisioning kicks aaS:

  1. Swagger REST API & CLI. Cloud-first means having a great, tested API. Years of provisioning experience went into this 3rd generation design and it shows. That includes a powerful API-driven DHCP.
  2. Security & Authenticated API. Not an afterthought, we both HTTPS and user authentication for using the API. Our mix of basic and bearer token authentication recognizes that both users and automation will use the API. This brings a new level of security and control to data center provisioning.
  3. Stand-alone multi-architecture Golang binary. There are no dependencies or prerequisites, plus upgrades are drop in replacements. That allows users to experiment isolated on their laptop and then easily register it as a SystemD service.
  4. Nested Template Expansion. In DR Provision, Boot Environments are composed of reusable template snippets. These templates can incorporate global, profile or machine specific properties that enable users to set services, users, security or scripting extensions for their environment.
  5. Configuration at Global, Group/Profile and Node level. Properties for templates can be managed in a wide range of ways that allows operators to manage large groups of servers in consistent ways.
  6. Multi-mode (but optional) DHCP. Network IP allocation is a key component of any provisioning infrastructure; however, DHCP needs are highly site dependant. DR Provision works as a multi-interface DHCP listener and can also resolve addresses from DHCP forwarders. It can even be disabled if your environment already has a DHCP service that can configure a the “next boot” provider.
  7. Dynamic Provisioner templates for TFTP and HTTP. For security and scale, DR Provision builds provisioning files dynamically based on the Boot Environment Template system. This means that critical system information is not written to disk and files do not have to be synchronized. Of course, when you need to just serve a file that works too.
  8. Node Discovery Bootstrapping. Digital Rebar’s long-standing discovery process is enabled in the Provisioner with the included discovery boot environment. That process includes an integrated secure token sequence so that new machines can self-register with the service via the API. This eliminates the need to pre-populate the DR Provision system.
  9. Multiple Seeding Operating Systems. DR Provision comes with a long list of Boot Environments and Templates including support for many Linux flavors, Windows, ESX and even LinuxKit. Our template design makes it easy to expand and update templates even on existing deployments.
  10. Two-stage TFTP/HTTP Boot. Our specialized Sledgehammer and Discovery images are designed for speed with optimized install cycles the improve boot speed by switching from PXE TFTP to IPXE HTTP in a two stage process. This ensures maximum hardware compatibility without creating excess network load.

Digital Rebar Provision is a new generation of data center automation designed for operators with a cloud-first approach. Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols embedded in firmware requirements that are still very much alive.

We invite you to try out Digital Rebar Provision yourself and let us know what you think. It only takes a few minutes. If you want more help, contact RackN for a $1000 Quick Start offer.

Hey Dockercon, let’s get Physical!

IMG_20170419_121918Overall, Dockercon did a good job connecting Docker users with information.  In some ways, it was a very “let’s get down to business” conference without the open source collaboration feel of previous events.  For enterprise customers and partners, that may be a welcome change.

Unlike past Dockercons, the event did not have major announcements or a lot of non-Docker ecosystem buzz.  That said, I miss that the event did not have major announcements or a lot of non-Docker ecosystem buzz.

One item that got me excited was an immutable operating system called LinuxKit which is powered by a Packer-like utility called Moby (ok, I know it does more but that’s still fuzzy to me).

RackN CTO, Greg Althaus, was able to turn around a working LinuxKit Kubernetes demo (VIDEO) overnight.  This short video explains Moby & LinuxKit plus uses the new Digital Rebar Provision in an amazing integration.

Want to hear more about immutable operating systems?  Check out our post on RackN’s site about three challenges of running things like LinuxKit, CoreOS Container Linux and RancherOS on metal.

Oh, and YES, that was my 15-year-old daughter giving a presentation at Dockercon about workplace diversity.  I’ll link the video when they’ve posted them.

https://www.slideshare.net/KateHirschfeld/slideshelf

Why we can’t move past installers to talk about operations – the underlay gap

20 minutes.  That’s the amount of time most developers are willing to spend installing a tool or platform that could become the foundation for their software.  I’ve watched our industry obsess on the “out of box” experience which usually translates into a single CLI command to get started (and then fails to scale up).

Secure, scalable and robust production operations is complex.  In fact, most of these platforms are specifically designed to hide that fact from developers.  

That means that these platforms intentionally hide the very complexity that they themselves need to run effectively.  Adding that complexity, at best, undermines the utility of the platform and, at worst, causes distractions that keep us forever looping on “day 1” installation issues.

I believe that systems designed to manage ops process and underlay are different than the platforms designed to manage developer life-cycle.  This is different than the fidelity gap which is about portability. Accepting that allows us to focus on delivering secure, scalable and robust infrastructure for both users.

In a pair of DevOps.com posts, I lay out my arguments about the harm being caused by trying to blend these concepts in much more detail:

  1. It’s Time to Slay the Universal Installer Unicorn
  2. How the Lure of an ‘Easy Button’ Installer Traps Projects

Three reasons why Ops Composition works: Cluster Linking, Services and Configuration (pt 2)

In part pt 1, we reviewed the RackN team’s hard won insights from previous deployment automation. We feel strongly that prioritizing portability in provisioning automation is important. Individual sites may initially succeed building just for their own needs; however, these divergences limit future collaboration and ultimately make it more expensive to maintain operations.

aid1165255-728px-install-pergo-flooring-step-5-version-2If it’s more expensive isolate then why have we failed to create shared underlay? Very simply, it’s hard to encapsulate differences between sites in a consistent way.

What makes cluster construction so hard?

There are a three key things we have to solve together: cross-node dependencies (linking), a lack of service configuration (services) and isolating attribute chains (configuration).  While they all come back to thinking of the whole system as a cluster instead of individual nodes. let’s break them down:

Cross Dependencies (Cluster Linking) – The reason for building a multi-node system, is to create an interconnected system. For example, we want a database cluster with automated fail-over or we want a storage system that predictably distributes redundant copies of our data. Most critically and most overlooked, we also want to make sure that we can trust cluster members before we share secrets with them.

These cluster building actions require that we synchronize configuration so that each step has the information it requires. While it’s possible to repeatedly bang on the configure until it converges, that approach is frustrating to watch, hard to troubleshoot and fraught with timing issues.  Taking this to the next logical steps, doing upgrades, require sequence control with circuit breakers – that’s exactly what Digital Rebar was built to provide.

Service Configuration (Cluster Services) – We’ve been so captivated with node configuration tools (like Ansible) that we overlook the reality that real deployments are intertwined mix of service, node and cross-node configuration.  Even after interacting with a cloud service to get nodes, we still need to configure services for network access, load balancers and certificates.  Once the platform is installed, then we use the platform as a services.  On physical, there are even more including DNS, IPAM and Provisioning.

The challenge with service configurations is that they are not static and generally impossible to predict in advance.  Using a load balancer?  You can’t configure it until you’ve got the node addresses allocated.  And then it needs to be updated as you manage your cluster.  This is what makes platforms awesome – they handle the housekeeping for the apps once they are installed.

Digital Rebar decomposition solves this problem because it is able to mix service and node configuration.  The orchestration engine can use node specific information to update services in the middle of a node configuration workflow sequence.  For example, bringing a NIC online with a new IP address requires multiple trusted DNS entries.  The same applies for PKI, Load Balancer and Networking.

Isolating Attribute Chains (Cluster Configuration) – Clusters have a difficult duality: they are managed as both a single entity and a collection of parts. That means that our configuration attributes are coupled together and often iterative. Typically, we solve this problem by front loading all the configuration. This leads to several problems: first, clusters must be configured in stages and, second, configuration attributes are predetermined and then statically passed into each component making variation and substitution difficult.

Our solution to this problem is to treat configuration more like functional programming where configuration steps are treated as isolated units with fully contained inputs and outputs. This approach allows us to accommodate variation between sites or cluster needs without tightly coupling steps. If we need to change container engines or networking layers then we can insert or remove modules without rewriting or complicating the majority of the chain.

This approach is a critical consideration because it allows us to accommodate both site and time changes. Even if a single site remains consistent, the software being installed will not. We must be resilient both site to site and version to version on a component basis. Any other pattern forces us to into an unmaintainable lock step provisioning model.

To avoid solving these three hard issues in the past, we’ve built provisioning monoliths. Even worse, we’ve seen projects try to solve these cluster building problems within their own context. That leads to confusing boot-strap architectures that distract from making the platforms easy for their intended audiences. It is OK for running a platform to be a different problem than using the platform.
In summary, we want composition because we are totally against ops magic.  No unicorns, no rainbows, no hidden anything.

Basically, we want to avoid all magic in a deployment. For scale operations, there should never be a “push and prey” step where we are counting on timing or unknown configuration for it to succeed. Those systems are impossible to maintain, share and scale.

I hope that this helps you look at the Digital Rebar underlay approach in a holistic why and see how it can help create a more portable and sustainable IT foundation.

Introducing Digital Rebar. Building strong foundations for New Stack infrastructure

digital_rebarThis week, I have the privilege to showcase the emergence of RackN’s updated approach to data center infrastructure automation that is container-ready and drives “cloud-style” DevOps on physical metal.  While it works at scale, we’ve also ensured it’s light enough to run a production-fidelity deployment on a laptop.

You grow to cloud scale with a ready-state foundation that scales up at every step.  That’s exactly what we’re providing with Digital Rebar.

Over the past two years, the RackN team has been working on microservices operations orchestration in the OpenCrowbar code base.  By embracing these new tools and architecture, Digital Rebar takes that base into a new directions.  Yet, we also get to leverage a scalable heterogeneous provisioner and integrations for all major devops tools.  We began with critical data center automation already working.

Why Digital Rebar? Traditional data center ops is being disrupted by container and service architectures and legacy data centers are challenged with gracefully integrating this new way of managing containers at scale: we felt it was time to start a dialog the new foundational layer of scale ops.

Both our code and vision has substantially diverged from the groundbreaking “OpenStack Installer” MVP the RackN team members launched in 2011 from inside Dell and is still winning prizes for SUSE.

We have not regressed our leading vendor-neutral hardware discovery and configuration features; however, today, our discussions are about service wrappers, heterogeneous tooling, immutable container deployments and next generation platforms.

Over the next few days, I’ll be posting more about how Digital Rebar works (plus video demos).

Deploy to Metal? No sweat with RackN new Ansible Dynamic Inventory API

Content originally posted by Ansibile & RackN so I added a video demo.  Also, see Ansible’s original post for more details about the multi-vendor “Simple OpenStack Initiative.”

The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release.  These two items make full metal control more accessible than ever for Ansible users.

The platform offers full key management.  You can add keys at the system. deployment (group of machines) and machine levels.  These keys are operator settable and can be added and removed after provisioning has been completed.  If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.

We also provide a API path for Ansible dynamic inventory.  Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system.  The inventory data includes items like number of disks, cpus and amount of RAM.  If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible.  Even better, the metadata schema includes the networking configuration and machine status.

With no added configuration, you can immediately use Ansible as your multi-server CLI for ad hoc actions and installation using playbooks.

Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick reimage of the system.

RackN respects that data centers are heterogenous.  Our vision is that your choice of hardware, operating system and network topology should not break devops deployments!  That’s why we work hard to provide useful abstracted information.  We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.

For working with bare metal, there’s no simpler way to deliver consistent repeatable results

too easy to bare metal? Ansible just works with OpenCrowbar

2012-01-15_10-21-12_716I’ve talked before about how OpenCrowbar distributes SSH keys automatically as part of its deployment process.  Now, it’s time to unleash some of the subsequent magic!

[5/21 Update: We added the “crowbar-access” role to the Drill release that allows you to inject/remove keys on a per node basis from the API or CLI at any point in the node life-cycle]

If you provision servers with your keys in place, then Ansible will just work with truly minimal configuration (one line in a file!).

Video Demo (steps bellow):

Here are my steps:

  1. Install OpenCrowbar and run some nodes to ready state [videos]
  2. Install Ansible [simple steps]
  3. Add hosts range “192.168.124.[81:83]  ansible_ssh_user=root” to the
    “/etc/ansible/hosts” file
  4. If you are really lazy, add “[Default] // host_key_checking = False” to your “~/.ansible.cfg” file
  5. now ping the hosts, “ansible all -m ping”
  6. pat yourself on the back, you’re done.
  7. to show off:
    1. touch all machines “ansible all -a “/bin/echo hello”
    2. look at types of Linux “ansible all -a “uname -a”

Further integration work can make this even more powerful.  

I’d like to see OpenCrowbar generate the Ansible inventory file from the discovery data and to map Ansible groups from deployments.  Crowbar could also call Ansible directly to use playbooks or even do a direct hand-off to Tower to complete an install without user intervention.

Wow, that would be pretty handy!   If you think so too, please join us in the OpenCrowbar community.