Surgical Ansible & Script Injections before, during or after deployment

RackN CEO, Rob Hirschfeld, has been posting about our unique composable operations approach with Digital Rebar to enable hybrid infrastructure and mix-and-match underlay tooling.

This post shows some remarkable flexibility enabled by the approach that allow operators to take limited, secure operations against running systems.

via Surgical Ansible & Script Injections before, during or after deployment. — Rob Hirschfeld

 

Surgical Ansible & Script Injections before, during or after deployment.

I’ve been posting about the unique composable operations approach the RackN team has taken with Digital Rebar to enable hybrid infrastructure and mix-and-match underlay tooling.  The orchestration design (what we call annealing) allows us to dynamically add roles to the environment and execute them as single role/node interactions in operational chains.

ansiblemtaWith our latest patches (short demo videos below), you can now create single role Ansible or Bash scripts dynamically and then incorporate them into the node execution.

That makes it very easy to extend an existing deployment on-the-fly for quick changes or as part of a development process.

You can also run an ad hoc bash script against one or groups of machines.  If that script is something unique to your environment, you can manage it without having to push it back upsteam because Digital Rebar workloads are composable and designed to be safely integrated from multiple sources.

Beyond tweaking running systems, this is fastest script development workflow that I’ve ever seen.  I can make fast, surgical iterative changes to my scripts without having to rerun whole playbooks or runlists.  Even better, I can build multiple operating system environments side-by-side and test changes in parallel.

For secure environments, I don’t have to hand out user SSH access to systems because the actions run in Digital Rebar context.  Digital Rebar can limit control per user or tenant.

I’m very excited about how this capability can be used for dev, test and production systems.  Check it out and let me know what you think.

 

 

 

Breaking Up is Hard To Do – Why I Believe Ops Decomposition (pt 1)

Over the summer, the RackN team took a radical step with our previous Ansible Kubernetes workload install: we broke it into pieces.  Why?  We wanted to eliminate all “magic happens here” steps in the deployment.

320px-dominos_fallingThe result, DR Kompos8, is a faster, leaner, transparent and parallelized installation that allows for pluggable extensions and upgrades (video tour). We also chose the operationally simplest configuration choice: Golang binaries managed by SystemDGolang binaries managed by SystemD.

Why decompose and simplify? Let’s talk about our hard earned ops automation battle scars that let to composability as a core value:

Back in the early OpenStack days, when the project was actually much simpler, we were part of a community writing Chef Cookbooks to install it. These scripts are just a sequence of programmable steps (roles in Ops-speak) that drive the configuration of services on each node in the cluster. There is an ability to find cross-cluster information and lookup local inventory so we were able to inject specific details before the process began. However, once the process started, it was pretty much like starting a dominoes chain. If anything went wrong anywhere in the installation, we had to reset all the dominoes and start over.

Like a dominoes train, it is really fun to watch when it works. Also, like dominoes, it is frustrating to set up and fix. Often we literally were holding our breath during installation hoping that we’d anticipated every variation in the software, hardware and environment. It is no surprise that the first and must critical feature we’d created was a redeploy command.

It turned out the the ability to successfully redeploy was the critical measure for success. We would not consider a deployment complete until we could wipe the systems and rebuild it automatically at least twice.

What made cluster construction so hard? There were a three key things: cross-node dependencies (linking), a lack of service configuration (services) and isolating attribute chains (configuration).

We’ll explore these three reasons in detail for part 2 of this post tomorrow.

Even without the details, it easy to understand that we want to avoid all magic in a deployment.

For scale operations, there should never be a “push and prey” step where we are counting on timing or unknown configuration for it to succeed. Likewise, we need to eliminate “it worked from my desktop” automation too.  Those systems are impossible to maintain, share and scale. Composed cluster operations addresses this problem by making work modular, predictable and transparent.

Composability is Critical in DevOps: let’s break the monoliths

This post was inspired by my DevOps.com Git for DevOps post and is an evolution of my “Functional Ops (the cake is a lie)” talks.

git_logo2016 is the year we break down the monoliths.  We’ve spent a lot of time talking about monolithic applications and microservices; however, there’s an equally deep challenge in ops automation.

Anti-monolith composability means making our automation into function blocks that can be chained together by orchestration.

What is going wrong?  We’re building fragile tightly coupled automation.

Most of the automation scripts that I’ve worked with become very long interconnected sequences well beyond the actual application that they are trying to install.  For example, Kubernetes needs etcd as a datastore.  The current model is to include the etcd install in the install script.  The same is true for SDN install/configuation and post-install test and dashboard UIs.  The simple “install Kubernetes” quickly explodes into a kitchen sink of related adjacent components.

Those installs quickly become fragile and bloated.  Even worse, they have hidden dependencies.  What happens when etcd changes.  Now, we’ve got to track down all the references to it burried in etcd based applications.  Further, we don’t get the benefits of etcd deployment improvements like secure or scale configuration.

What can we do about it?  Resist the urge to create vertical silos.

It’s temping and fast to create automation that works in a very prescriptive way for a single platform, operating system and tool chain.  The work of creating abstractions between configuration steps seems like a lot of overhead.  Even if you create those boundaries or reuse upstream automation, you’re likely to be vulnerable to changes within that component.  All these concerns drive operators to walk away from working collaboratively with each other and with developers.

Giving up on collaborative Ops hurts us all and makes it impossible to engineer excellent operational tools.  

Don’t give up!  Like git for development, we can do this together.

Got some change? Build a datacenter ops lab on your coffee break [with Packet.net MaaS]

We’re using Packet.net hosted metal to test automation for private metal (video).  You can use discount code “RACKN100” to get a credit on Packet and try it yourself.

At RackN, we’ve been shrinking our scale deployment platform down to run faithfully on a desktop class system. Since we abstract the network and hardware complexity, you can build automation that scales to physical from as little as 16 Gb of RAM (the same size as Packet’s smaller server). That allows the exact same logic we use for an 80 node Ceph or Kubernetes cluster work on my 14” laptop.

In fact, we’ve been getting a bit obsessed with making a clean restart small and fast using containers, VMs and bootstrapping scripts.

Creating a remote test lab is part of this obsession because many rehearsals make great performances.  We wanted to eliminate the setup time and process for users who just want to experiment with a production grade deployment. Using Packet.net hosted metal and some Ansible scripts, we can build a complete HA Kubernetes cluster in about 15 minutes using VMs. This lets us iterate on Kubernetes best practices virtually since the “setup metal part” is handled abstractly by Digital Rebar.

Yawn. You could do the same in AWS. Why is that exciting?

The process for the lab system we build in Packet.net can then be used to provision a complete private infrastructure on metal including RAID, BIOS and server networking. Even though the lab uses VMs, we still do real networking, storage and configuration. For example, we can iterate building real software defined networking (SDN) overlays in this environment and then scale the work up to physical gear.

The provision and deploy time is so fast (generally, under 15 minutes) that we are using it as a clean environment for Dev and QA cycles on new automation. It’s also a very practical demo environment for these platforms because of the fidelity between this environment and an actual pilot. For me, that means spending $0.40 so I don’t have to sweat losing my work in process, battery life or my wifi connection to crank out a demo.

BTW… Packet.net servers are SUPER FAST. Even the small 16 Gb RAM machine is packed with SSDs and great connectivity.

If you are exploring any of the several workloads that we’ve been building (Docker Swarm, Kubernetes, Mesos, CloudFoundry, Ceph and OpenStack) or just playing around with API driven physical provisioning, we just made that work a little easier and a lot faster.

Deploy to Metal? No sweat with RackN new Ansible Dynamic Inventory API

Content originally posted by Ansibile & RackN so I added a video demo.  Also, see Ansible’s original post for more details about the multi-vendor “Simple OpenStack Initiative.”

The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release.  These two items make full metal control more accessible than ever for Ansible users.

The platform offers full key management.  You can add keys at the system. deployment (group of machines) and machine levels.  These keys are operator settable and can be added and removed after provisioning has been completed.  If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.

We also provide a API path for Ansible dynamic inventory.  Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system.  The inventory data includes items like number of disks, cpus and amount of RAM.  If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible.  Even better, the metadata schema includes the networking configuration and machine status.

With no added configuration, you can immediately use Ansible as your multi-server CLI for ad hoc actions and installation using playbooks.

Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick reimage of the system.

RackN respects that data centers are heterogenous.  Our vision is that your choice of hardware, operating system and network topology should not break devops deployments!  That’s why we work hard to provide useful abstracted information.  We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.

For working with bare metal, there’s no simpler way to deliver consistent repeatable results

Deploy to Metal? No sweat with RackN new Ansible Dynamic Inventory API

The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release.  These two items make full metal control more accessible than ever for Ansible users.

The platform offers full key management.  You can add keys at the system, deployment (group of machines) and machine levels.  These keys can be set by the operator and can be added and removed after provisioning has been completed.  If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.

We also provide a API path for Ansible dynamic inventory.  Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system.  The inventory data includes items like number of disks, CPUs and amount of RAM.  If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible.  Even better, the metadata schema includes the networking configuration and machine status.

With no added configuration, you can immediately use Ansible as your multi-server CLI for ad-hoc actions and installation using playbooks.

Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick re-image of the system.

RackN respects that data centers are heterogeneous.  Our vision is that your choice of hardware, operating system and network topology should not break DevOps deployments!  That’s why we work hard to provide useful abstracted information.  We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.

For working with bare metal, there’s no simpler way to deliver consistent repeatable results.

too easy to bare metal? Ansible just works with OpenCrowbar

2012-01-15_10-21-12_716I’ve talked before about how OpenCrowbar distributes SSH keys automatically as part of its deployment process.  Now, it’s time to unleash some of the subsequent magic!

[5/21 Update: We added the “crowbar-access” role to the Drill release that allows you to inject/remove keys on a per node basis from the API or CLI at any point in the node life-cycle]

If you provision servers with your keys in place, then Ansible will just work with truly minimal configuration (one line in a file!).

Video Demo (steps bellow):

Here are my steps:

  1. Install OpenCrowbar and run some nodes to ready state [videos]
  2. Install Ansible [simple steps]
  3. Add hosts range “192.168.124.[81:83]  ansible_ssh_user=root” to the
    “/etc/ansible/hosts” file
  4. If you are really lazy, add “[Default] // host_key_checking = False” to your “~/.ansible.cfg” file
  5. now ping the hosts, “ansible all -m ping”
  6. pat yourself on the back, you’re done.
  7. to show off:
    1. touch all machines “ansible all -a “/bin/echo hello”
    2. look at types of Linux “ansible all -a “uname -a”

Further integration work can make this even more powerful.  

I’d like to see OpenCrowbar generate the Ansible inventory file from the discovery data and to map Ansible groups from deployments.  Crowbar could also call Ansible directly to use playbooks or even do a direct hand-off to Tower to complete an install without user intervention.

Wow, that would be pretty handy!   If you think so too, please join us in the OpenCrowbar community.