Some features are worth SHOUTING about, so it’s with great pride that I get to announce DRP v3.11.
The latest Digital Rebar release (v3.11) does the impossible: PROVISION WITHOUT REBOOTING. Combined with image-based deploy and our unique multi-boot workflows, this capability makes server operations 10x faster than traditional net install processes.
But it’s not enough to have a tiny golang utility that can drive any hardware and install any operating system (we added MacOS netboot to this release). RackN has been adding enterprise integrations to core platforms like Ansible Tower, Terraform, Active Directory, Remedy, Run Book and Logstash.
Oh! And checkout our open zero-touch, HA Kubernetes installer (KRIB) based on kubeadm. We just added advanced Helm features for automatic Istio and Rook Ceph examples.
While the RackN team and I have been heads down radically simplifying physical data center automation, I’ve still been tracking some key cloud infrastructure areas. One of the more interesting ones to me is Edge Infrastructure.
This once obscure topic has come front and center based on coming computing stress from home video, retail machine and distributed IoT. It’s clear that these are not solved from centralized data centers.
While I’m posting primarily on the RackN.com blog, I like to take time to bring critical items back to my personal blog as a collection. WARNIING: Some of these statements run counter to other industry. Please let me know what you think!
By far the largest issue of the Edge discussion was actually agreeing about what “edge” meant. It seemed as if every session had a 50% mandatory overhead in definitioning. Putting my usual operations spin on the problem, I choose to define edge infrastructure in data center management terms. Edge infrastructure has very distinct challenges compared to hyperscale data centers. Read article for the list...
Running each site as a mini-cloud is clearly not the right answer. There are multiple challenges here. First, any scale infrastructure problem must be solved at the physical layer first. Second, we must have tooling that brings repeatable, automation processes to that layer. It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth. These requirements are outside the scope of cloud focused tools.
If “cloudification” is not the solution then where should we look for management patterns? We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases. We discussed this at a session at the OpenStack OpenDev Edge summit.
What do YOU think? This is an evolving topic and it’s time to engage in a healthy discussion.
Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at firstname.lastname@example.org or tweet Rob (@zehicle) or RackN (@rackngo)
SRE Items of the Week
OpenStack on Kubernetes: Will it blend? (OpenStack Summit Session) w/ Rob Hirschfeld
Contrary to pundit expectations, OpenStack did not roll over and die during the keynotes yesterday. In fact, I saw the signs of a maturing project seeing real use and adoption. More critically, OpenStack leadership started the event with an acknowledgement of being part of, not owning, the vibrant open infrastructure community. READ MORE
Immutable Infrastructure Webinar
Greg Althaus, Co-Founder and CTO, RackN
Erica Windisch, Founder and CEO, Piston
Christopher MacGown, Advisor, IOpipe
Riyaz Faizullabhoy, Security Engineer, Docker
Sheng Liang, Founder and CEO Rancher Labs
Moderated by Stephen Spector, HPE, Cloud Evangelist
_______ SREies Part1: Configuration Management by Krishelle Hardson-Hurley SREies is a series on topics related to my job as a Site Reliability Engineer (SRE). About a month ago, I wrote an article about what it means to be an SRE which included a compatibility quiz and resource list to those who were intrigued by the role. If you are unfamiliar with SRE, I would suggest starting there before moving on. In this series, I will extend my description to include more specific summaries of concepts that I have learned during my first six months at Dropbox. In this edition, I will be discussing Configuration Management. READ MORE
Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email email@example.com.
With a mix of excitement and apprehension, the RackN team has been watching physical deployment of immutable operating systems like CoreOS Container Linux and RancherOS. Overall, we like the idea of a small locked (aka immutable) in-memory image for servers; however, the concept does not map perfectly to hardware.
Note: if you want to provision these operating systems in a production way, we can help you!
These operating systems work on a “less is more” approach that strips everything out of the images to make them small and secure.
This is great for cloud-first approaches where VM size has a material impact in cost. It’s particularly matched for container platforms where VMs are constantly being created and destroyed. In these cases, the immutable image is easy to update and saves money.
So, why does that not work as well on physical?
First: HA DHCP?! It’s not as great a map for physical systems where operating system overhead is pretty minimal. The model requires orchestrated rebooting of your hardware. It also means that you need a highly available (HA) PXE Provisioning infrastructure (like we’re building with Digital Rebar).
Second: Configuration. That means that they must rely on having cloud-init injected configuration. In a physical environment, there is no way to create cloud-init like injections without integrating with the kickstart systems (a feature of Digital Rebar Provision). Further, hardware has a lot more configuration options (like hard drives and network interfaces) than VMs. That means that we need a robust and system-by-system way to manage these configurations.
Third: No SSH. Yes another problem with these minimal images is that they are supposed to eliminateSSH. Ideally, their image and configuration provides everything required to run the image without additional administration. Unfortunately, many applications assume post-boot configuration. That means that people often re-enable SSH to use tools like Ansible. If it did not conflict with the very nature of the “do-not configure-the-server” immutable model, I would suggest that SSH is a perfectly reasonable requirement for operators running physical infrastructure.
In Summary, even with those issues, we are excited about the positive impact this immutable approach can have on data center operations.
With tooling like Digital Rebar, it’s possible to manage the issues above. If this appeals to you, let us know!
Overall, Dockercon did a good job connecting Docker users with information. In some ways, it was a very “let’s get down to business” conference without the open source collaboration feel of previous events. For enterprise customers and partners, that may be a welcome change.
Unlike past Dockercons, the event did not have major announcements or a lot of non-Docker ecosystem buzz. That said, I miss that the event did not have major announcements or a lot of non-Docker ecosystem buzz.
As an industry, the CIA hacking release yesterday should be driving discussions about how to make our IT infrastructure more robust and fluid. It is not simply enough to harden because both the attack and the platforms are evolving to quickly.
We must be delivering solutions with continuous delivery and immutability assumptions baked in.
A more fluid IT that assumes constant updates and rebuilding from sources (immutable) is not just a security posture but a proven business benefit. For me, that means actually building from the hardware up where we patch and scrub systems regularly to shorten the half-life of all attach surfaces. It also means enabling existing security built into our systems that are generally ignored because of configuration complexity. These are hard but solvable automation challenges.
The problem is too big to fix individually: we need to collaborate in the open.
I’ve been really thinking deeply about how we accelerate SRE and DevOps collaboration across organizations and in open communities. The lack of common infrastructure foundations costs companies significant overhead and speed as teams across the globe reimplement automation in divergent ways. It also drags down software platforms that must adapt to each data center as a unique snowflake.
That’s why hybrid automation within AND between companies is an imperative. It enables collaboration.
Making automation portable able to handle the differences between infrastructure and environments is harder; however, it also enables sharing and reuse that creates allows us to improve collectively instead of individually.
That’s been a vision driving us at RackN with the open hybrid Digital Rebar project. Curious? Here’s RackN post that inspired this one:
“Like the hardware that runs it, the foundation automation layer must be commoditized. That means that Operators should be able to buy infrastructure (physical and cloud) from any vendor and run it in a consistent way. Instead of days or weeks to get infrastructure running, it should take hours and be fully automated from power-on. We should be able to rehearse on cloud and transfer that automation directly to (and from) physical without modification. That practice and pace should be the norm instead of the exception.”
It’s been a banner year for container awareness and adoption so we wanted to recap 2015. For RackN, container acceleration is near to our heart because we both enable and use them in fundamental ways. Look for Rob’s 2016 predictions on his blog.
The RackN team has truly deep and broad experience with containers in practical use. In the summer, we delivered multiple container orchestration workloads including Docker Swarm, Kubernetes, Cloud Foundry, StackEngine and others. In the fall, we refactored Digital Rebar to use Docker Compose with dramatic results. And we’ve been using Docker since 2013 (yes, “way back”) for ops provisioning and development.
To make it easier to review that experience, we are consolidating a list of our container related posts for 2015.
Nearly 10 TIMES faster system resets – that’s the result of fully enabling an multi-container immutable deployment on Digital Rebar.
I’ve been having a “containers all the way down” month since we launched Digital Rebar deployment using Docker Compose. I don’t want to imply that we rubbed Docker on the platform and magic happened. The RackN team spent nearly a year building up the Consul integration and service wrappers for our platform before we were ready to fully migrate.
During the Digital Rebar migration, we took our already service-oriented code base and broke it into microservices. Specifically, the Digital Rebar parts (the API and engine) now run in their own container and each service (DNS, DHCP, Provisioning, Logging, NTP, etc) also has a dedicated container. Likewise, supporting items like Consul and PostgreSQL are, surprise, managed in dedicated containers too. All together, that’s over nine containers and we continue to partition out services.
We use Docker Compose to coordinate the start-up and Consul to wire everything together. Both play a role, but Consul is the critical glue that allows Digital Rebar components to find each other. These were not random choices. We’ve been using a Docker package for over two years and using Consul service registration as an architectural choice for over a year.
Service registration plays a major role in the functional ops design because we’ve been wrapping datacenter services like DNS with APIs. Consul is a separation between providing and consuming the service. Our previous design required us to track the running service. This worked until customers asked for pluggable services (and every customer needs pluggable services as they scale).
Besides being a faster to reset the environment, there are several additional wins:
more transparent in how it operates – it’s obvious which containers provide each service and easy to monitor them as individuals.
easier to distribute services in the environment – we can find where the service runs because of the Consul registration, so we don’t have to manage it.
possible to have redundant services – it’s easy to spin up new services even on the same system
make services pluggable – as long as the service registers and there’s an API, we can replace the implementation.
no concern about which distribution is used – all our containers are Ubuntu user space but the host can be anything.
changes to components are more isolated – changing one service does not require a lot of downloading.
Docker and microservices are not magic but the benefits are real. Be prepared to make architectural investments to realize the gains.