This week, I have the privilege to showcase the emergence of RackN’s updated approach to data center infrastructure automation that is container-ready and drives “cloud-style” DevOps on physical metal. While it works at scale, we’ve also ensured it’s light enough to run a production-fidelity deployment on a laptop.
You grow to cloud scale with a ready-state foundation that scales up at every step. That’s exactly what we’re providing with Digital Rebar.
Over the past two years, theRackN team has been working on microservices operations orchestration in the OpenCrowbar code base. By embracing these new tools and architecture, Digital Rebar takes that base into a new directions. Yet, we also get to leverage a scalable heterogeneous provisioner and integrations for all major devops tools. We began with critical data center automation already working.
Why Digital Rebar? Traditional data center ops is being disrupted by container and service architectures and legacy data centers are challenged with gracefully integrating this new way of managing containers at scale: we felt it was time to start a dialog the new foundational layer of scale ops.
We have not regressed our leading vendor-neutral hardware discovery and configuration features; however, today, our discussions are about service wrappers, heterogeneous tooling, immutable container deployments and next generation platforms.
Over the next few days, I’ll be posting more about how Digital Rebar works (plus video demos).
The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release. These two items make full metal control more accessible than ever for Ansible users.
The platform offers full key management. You can add keys at the system. deployment (group of machines) and machine levels. These keys are operator settable and can be added and removed after provisioning has been completed. If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.
We also provide a API path for Ansible dynamic inventory. Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system. The inventory data includes items like number of disks, cpus and amount of RAM. If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible. Even better, the metadata schema includes the networking configuration and machine status.
With no added configuration, you can immediately use Ansible as your multi-server CLI for ad hoc actions and installation using playbooks.
Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick reimage of the system.
RackN respects that data centers are heterogenous. Our vision is that your choice of hardware, operating system and network topology should not break devops deployments! That’s why we work hard to provide useful abstracted information. We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.
For working with bare metal, there’s no simpler way to deliver consistent repeatable results
Last week, Scott Jensen, RackN COO, uploaded a batch of OpenCrowbar install and demo videos. I’ve presented them in reverse chronological order so you can see what OpenCrowbar looks like before you run the installation process.
Why DNS? Maintaining DNS is essential to scale ops. It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.
I love talking about the small Ops things that make a huge impact in quality of automation. Things like automatically building a squid proxy cache infrastructure.
Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.
Why is that a big deal? There are a lot of names & IPs to manage.
In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6. Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references. It gets even more complex when you factor in DNS round robin or other common practices.
Plus mistakes are expensive. Name resolution is an essential service for operations.
I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential. OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful. This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses. The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.
What DNS automation did we enable in OpenCrowbar? Here’s a partial list:
recovery of names and IPs when interfaces and systems are decommissioned
use of flexible naming patterns so that you can control how the systems are registered
ability to register names in multiple DNS infrastructures
ability to understand sub-domains so that you can map DNS by region
ability to register the same system under multiple names
wild card support for C-Names
ability to create a DNS round-robin group and keep it updated
But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.
When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.
While this work is early, it is complete enough for field installs. We’d like to include potential users in our initial integration because we value your input.
Why is this important? We believe that there are significant cost, operational and performance benefits to running containers directly on metal. This collaboration is a tangible step towards demonstrating that value.
What did we create? The RackN workload leverages our enterprise distribution of OpenCrowbar to create a ready state environment for StackEngine to be able to deploy and automate Docker container apps.
In this pass, that’s a pretty basic Centos 7.1 environment that’s hardware and configured. The workload takes your StackEngine customer key as the input. From there, it will download and install StackEngine on all the nodes in the system. When you choose which nodes also manage the cluster, the workloads will automatically handle the cross registration.
What is our objective? We want to provide a consistent and sharable way to run directly on metal. That accelerates the exploration of this approach to operationalizing container infrastructure.
What is the roadmap? We want feedback on the workload to drive the roadmap. Our first priority is to tune to maximize performance. Later, we expect to add additional operating systems, more complex networking and closed-loop integration with StackEngine and RackN for things like automatic resources scheduling.
How can you get involved? If you are interested in working with a tech-preview version of the technology, you’ll need to a working OpenCrowbar Drill implementation (via Github or early access available from RackN), a StackEngine registration key and access to the RackN/StackEngine workload (email firstname.lastname@example.org or email@example.com for access).
It’s really pretty simple: The workload does the work to deliver an integrated physical system (Centos 7.1 right now) that has Docker installed and running. Then we build a Consul cluster to track the to-be-created Swarm. As new nodes are added into the cluster, they register into Consul and then get added into the Docker Swarm cluster. If you reset or repurpose a node, Swarm will automatically time out of the missing node so scaling up and down is pretty seamless.
When building the cluster, you have the option to pick which machines are masters for the swarm. Once the cluster is built, you just use the Docker CLI’s -H option against the chosen master node on the configured port (defaults to port 2475).
This work is intended as a foundation for more complex Swarm and/or non-Docker Container Orchestration deployments. Future additions include allowing multiple network and remote storage options.
You don’t need metal to run a quick test of this capability. You can test drive RackN OpenCrowbar using virtual machines and then expand to the full metal experience when you are ready.
Contact firstname.lastname@example.org for access to the Docker Swarm trial. For now, we’re managing the subscriber base for the workload. OpenCrowbar is a pre-req and ungated. We’re excited to give access to the code – just ask.
You can go from nothing to a distributed Ceph cluster in an hour. Need to rehearse on VMs? That’s even faster. Want to test and retune your configuration? Make some changes, take a coffee break and retest. Of course, with redeploy that fast, you can iterate until you’ve got it exactly right.
2. Automatically Optimized Disc Configuration
The RackN update optimizes the Ceph installation for disk performance by finding and flagging SSDs. That means that our deploy just works(tm) without you having to reconfigure your OS provisioning scripts or vendor disk layout.
3. Cluster Building and Balancing
This update allows you to place which roles you want on which nodes before you commit to the deployment. You can decide the right monitor to OSD/MON ratio for your needs. If you expand your cluster, the system will automatically rebalance the cluster.
4. Advanced Networking Topology & IPv6
Using the network conduit abstraction, you can separate front and back end networks for the cluster. We also take advantage of native IPv6 support and even use that as the preferred addressing.
Ceph is the leading open source block storage back-end for OpenStack; however, it’s tricky to install and few vendors invest the effort to hardware optimize their configuration. Like any foundation layer, configuration or performance errors in the storage layer will impact the entire system. Further, the Ceph infrastructure needs to be built before OpenStack is installed.
OpenCrowbar was designed to deploy platforms like Ceph. It has detailed knowledge of the physical infrastructure and sufficient orchestration to synchronize Ceph Mon cluster bring-up.
We are only at the start of the Ceph install journey. Today, you can use the open source components to bring up a Ceph cluster in a reliable way that works across hardware vendors. Much remains to optimize and tune this configuration to take advantage of SSDs, non-Centos environments and more.
We’d love to work with you to tune and extend this workload! Please join us in the OpenCrowbar community.
I’ve just completed a basic Docker Machine driver for OpenCrowbar. This enables you to quickly spin-up (and down) remote Docker hosts on bare metal servers from their command line tool. There are significant cost, simplicity and performance advantages for this approach if you were already planning to dedicate servers to container workloads.
The basics are pretty simple: using Docker Machine CLI you can “create” and “rm” new Docker hosts on bare metal using the crowbar driver. Since we’re talking about metal, “create” is really “assign a machine from an available pool.”
Behind the scenes Crowbar is doing a full provision cycle of the system including installing the operating system and injecting the user keys. Crowbar’s design would allow operators to automatically inject additional steps, add monitoring agents and security, to the provisioning process without changing the driver operation.
Beyond Create, the driver supports the other Machine verbs like remove, stop, start, ssh and inspect. In the case of remove, the Machine is cleaned up and put back in the pool for the next user [note: work remains on the full remove>recreate process].
Overall, this driver allows Docker Machine to work transparently against metal infrastructure along side whatever cloud services you also choose.
Want to try it out?
You need to setup OpenCrowbar – if you follow the defaults (192.168.124.10 ip, user, password) then the Docker Machine driver defaults will also work. Also, make sure you have the Ubuntu 14.04 ISO available for the Crowbar provisioner
Discover some nodes in Crowbar – you do NOT need metal servers to try this, the tests work fine with virtual machines (tools/kvm-slave &)
Clone my Machine repo (Wde’re looking for feedback before a pull to Docker/Machine)
Compile the code using script/build.
Allocate a Docker Node using ./docker-machine create –driver crowbar testme
Go to the Crowbar UI to watch the node be provisioned and configured into the Docker-Machines pool
Release the node using ./docker-machine rm testme
Go to the Crowbar UI to watch the node be redeployed back to the System pool
I’m working on a series for DevOps.com to explain Functional Ops (expect it to start early next week!) and it’s very hard to convey it’s east-west API nature. So I’m always excited to see how other people explain how OpenCrowbar does ops and ready state.
It’s critical to realize that the height of each component tower varies by vendor and also by location with in the data center topology. Ready state is not just about normalizing different vendors gear; it’s really about dealing with the complexity that’s inherent in building a functional data center. It’s “little” things liking knowing how to to enumerate the networking interfaces and uplinks to build the correct teams.
If you think this graphic helps, please let me know.