This week, I have the privilege to showcase the emergence of RackN’s updated approach to data center infrastructure automation that is container-ready and drives “cloud-style” DevOps on physical metal. While it works at scale, we’ve also ensured it’s light enough to run a production-fidelity deployment on a laptop.
You grow to cloud scale with a ready-state foundation that scales up at every step. That’s exactly what we’re providing with Digital Rebar.
Over the past two years, theRackN team has been working on microservices operations orchestration in the OpenCrowbar code base. By embracing these new tools and architecture, Digital Rebar takes that base into a new directions. Yet, we also get to leverage a scalable heterogeneous provisioner and integrations for all major devops tools. We began with critical data center automation already working.
Why Digital Rebar? Traditional data center ops is being disrupted by container and service architectures and legacy data centers are challenged with gracefully integrating this new way of managing containers at scale: we felt it was time to start a dialog the new foundational layer of scale ops.
We have not regressed our leading vendor-neutral hardware discovery and configuration features; however, today, our discussions are about service wrappers, heterogeneous tooling, immutable container deployments and next generation platforms.
Over the next few days, I’ll be posting more about how Digital Rebar works (plus video demos).
The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release. These two items make full metal control more accessible than ever for Ansible users.
The platform offers full key management. You can add keys at the system. deployment (group of machines) and machine levels. These keys are operator settable and can be added and removed after provisioning has been completed. If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.
We also provide a API path for Ansible dynamic inventory. Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system. The inventory data includes items like number of disks, cpus and amount of RAM. If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible. Even better, the metadata schema includes the networking configuration and machine status.
With no added configuration, you can immediately use Ansible as your multi-server CLI for ad hoc actions and installation using playbooks.
Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick reimage of the system.
RackN respects that data centers are heterogenous. Our vision is that your choice of hardware, operating system and network topology should not break devops deployments! That’s why we work hard to provide useful abstracted information. We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.
For working with bare metal, there’s no simpler way to deliver consistent repeatable results
Last week, Scott Jensen, RackN COO, uploaded a batch of OpenCrowbar install and demo videos. I’ve presented them in reverse chronological order so you can see what OpenCrowbar looks like before you run the installation process.
Why DNS? Maintaining DNS is essential to scale ops. It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.
I love talking about the small Ops things that make a huge impact in quality of automation. Things like automatically building a squid proxy cache infrastructure.
Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.
Why is that a big deal? There are a lot of names & IPs to manage.
In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6. Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references. It gets even more complex when you factor in DNS round robin or other common practices.
Plus mistakes are expensive. Name resolution is an essential service for operations.
I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential. OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful. This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses. The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.
What DNS automation did we enable in OpenCrowbar? Here’s a partial list:
recovery of names and IPs when interfaces and systems are decommissioned
use of flexible naming patterns so that you can control how the systems are registered
ability to register names in multiple DNS infrastructures
ability to understand sub-domains so that you can map DNS by region
ability to register the same system under multiple names
wild card support for C-Names
ability to create a DNS round-robin group and keep it updated
But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.
When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.
While this work is early, it is complete enough for field installs. We’d like to include potential users in our initial integration because we value your input.
Why is this important? We believe that there are significant cost, operational and performance benefits to running containers directly on metal. This collaboration is a tangible step towards demonstrating that value.
What did we create? The RackN workload leverages our enterprise distribution of OpenCrowbar to create a ready state environment for StackEngine to be able to deploy and automate Docker container apps.
In this pass, that’s a pretty basic Centos 7.1 environment that’s hardware and configured. The workload takes your StackEngine customer key as the input. From there, it will download and install StackEngine on all the nodes in the system. When you choose which nodes also manage the cluster, the workloads will automatically handle the cross registration.
What is our objective? We want to provide a consistent and sharable way to run directly on metal. That accelerates the exploration of this approach to operationalizing container infrastructure.
What is the roadmap? We want feedback on the workload to drive the roadmap. Our first priority is to tune to maximize performance. Later, we expect to add additional operating systems, more complex networking and closed-loop integration with StackEngine and RackN for things like automatic resources scheduling.
How can you get involved? If you are interested in working with a tech-preview version of the technology, you’ll need to a working OpenCrowbar Drill implementation (via Github or early access available from RackN), a StackEngine registration key and access to the RackN/StackEngine workload (email firstname.lastname@example.org or email@example.com for access).
It’s really pretty simple: The workload does the work to deliver an integrated physical system (Centos 7.1 right now) that has Docker installed and running. Then we build a Consul cluster to track the to-be-created Swarm. As new nodes are added into the cluster, they register into Consul and then get added into the Docker Swarm cluster. If you reset or repurpose a node, Swarm will automatically time out of the missing node so scaling up and down is pretty seamless.
When building the cluster, you have the option to pick which machines are masters for the swarm. Once the cluster is built, you just use the Docker CLI’s -H option against the chosen master node on the configured port (defaults to port 2475).
This work is intended as a foundation for more complex Swarm and/or non-Docker Container Orchestration deployments. Future additions include allowing multiple network and remote storage options.
You don’t need metal to run a quick test of this capability. You can test drive RackN OpenCrowbar using virtual machines and then expand to the full metal experience when you are ready.
Contact firstname.lastname@example.org for access to the Docker Swarm trial. For now, we’re managing the subscriber base for the workload. OpenCrowbar is a pre-req and ungated. We’re excited to give access to the code – just ask.
You can go from nothing to a distributed Ceph cluster in an hour. Need to rehearse on VMs? That’s even faster. Want to test and retune your configuration? Make some changes, take a coffee break and retest. Of course, with redeploy that fast, you can iterate until you’ve got it exactly right.
2. Automatically Optimized Disc Configuration
The RackN update optimizes the Ceph installation for disk performance by finding and flagging SSDs. That means that our deploy just works(tm) without you having to reconfigure your OS provisioning scripts or vendor disk layout.
3. Cluster Building and Balancing
This update allows you to place which roles you want on which nodes before you commit to the deployment. You can decide the right monitor to OSD/MON ratio for your needs. If you expand your cluster, the system will automatically rebalance the cluster.
4. Advanced Networking Topology & IPv6
Using the network conduit abstraction, you can separate front and back end networks for the cluster. We also take advantage of native IPv6 support and even use that as the preferred addressing.