About Rob H

A Baltimore transplant to Austin, Rob thinks about ways of building scale infrastructure for the clouds using Agile processes. He sat on the OpenStack Foundation board for four years. He co-founded RackN enable software that creates hyperscale converged infrastructure.

DRP v3.11 PROVISIONS WITHOUT REBOOTING

Some features are worth SHOUTING about, so it’s with great pride that I get to announce DRP v3.11.

The latest Digital Rebar release (v3.11) does the impossible: PROVISION WITHOUT REBOOTING.  Combined with image-based deploy and our unique multi-boot workflows, this capability makes server operations 10x faster than traditional net install processes.

But it’s not enough to have a tiny golang utility that can drive any hardware and install any operating system (we added MacOS netboot to this release).   RackN has been adding enterprise integrations to core platforms like Ansible Tower, Terraform, Active Directory, Remedy, Run Book and Logstash.

Oh!  And checkout our open zero-touch, HA Kubernetes installer (KRIB) based on kubeadm.  We just added advanced Helm features for automatic Istio and Rook Ceph examples.

To see more: https://github.com/digitalrebar/provision/releases/tag/v3.11.0

Success means putting People and Process above Tech

“I don’t care about the tech – what I really want to hear is how this product fits in our processes and helps our people get more done.”

That was the message my co-founder and I heard from an executive at a major bank last week.  For us, it was both a deja vu and a major relief because we’d just presented at the Cablelabs Summer Showcase about the importance of aligning people, process and technology together. The executive was pleased about how RackN had achieved that balance.

It wasn’t always that way: focusing on usability and simplicity first over features is scary.  

One of the most humbling startup lessons is that making great technology is not about the technology. Showing a 10x (or 100x!) improvement in provisioning speed misses the real problem for IT operators. Happily, we had some great early users who got excited about the vision for simple tooling that we built around Digital Rebar Provision v3.  Equally important was a deeply experienced team who insisted in building great tests, docs and support tooling from day 0.

We are thrilled to watch as new users are able to learn, adopt and grow their use of our open technology with minimal help from RackN.  Even without the 10x performance components RackN has added, they have been able to achieve significant time and automation improvements in their existing operational processes.  That means simpler processes, less IT complexity and more time for solving important problems.

The bank executive wanted the people and process benefits: our job with technology was to enable that first and then get out of the way.  It’s a much harder job than “make it faster” but, ultimately, much more rewarding.

If you’re interested in seeing how we’ve found that balance for bare metal automation, please check out our self-service trial at https://portal.RackN.io or contact us directly at info@rackn.com.

Getting Edge-y at OpenStack Summit – 5 ways it’s an easy concept with hard delivery

The 2018 Vancouver OpenStack Summit is very focused on IT infrastructure at the Edge. It’s a fitting topic considering the telcos’ embrace for the project; however, building the highly distributed, small footprint management needed for these environments is very different than OpenStack’s architectural priorities. There is a significant risk that the community’s bias towards it’s current code base (which still has work needed to service hyper-scale and enterprise data centers) will undermine progress in building suitable Edge IT solutions.

There are five significant ways that Edge is different than “traditional” datacenter.  We often discuss this on our L8istSh9y podcast and it’s time to summarize them in a blog post.

IT infrastructure at the Edge is different than “edge” in general. Edge is often used as a superset of Internet of Things (IoT), personal devices (phones) and other emerging smart devices. Our interest here is not the devices but the services that are the next hop back supporting data storage, processing, aggregation and sharing. To scale, these services need to move from homes to controlled environments in shared locations like 5G towers, POP and regional data centers.

Unlike built-to-purpose edge devices, the edge infrastructure will be built on generic commodity hardware.

Here are five key ways that managing IT infrastructure at the edge is distinct from anything we’ve built so far:

  • Highly Distributed – Even at hyper-scale, we’re used to building cloud platforms in terms of tens of data centers; however, edge infrastructure sites will number in the thousands and millions!  That’s distinct management sites, not servers or cores. Since the sites will not have homogeneous hardware specifications, the management of these sites requires zero-touch management that is vendor neutral, resilient and secure.  
  • Low Latency Applications – Latency is the reason why Edge needs to be highly distributed.  Edge applications like A/R, V/R, autonomous robotics and even voice controls interact with humans (and other apps) in ways that require microsecond response times.  This speed of light limitation means that we cannot rely on hyper-scale data centers to consolidate infrastructure; instead, we have to push that infrastructure into the latency range of the users and devices.
  • Decentralized Data – A lot of data comes from all of these interactive edge devices.  In our multi-vendor innovative market, data from each location could end up being sprayed all over the planet.  Shared edge infrastructure provides an opportunity to aggregate this data locally where it can be shared and (maybe?) controlled. This is a very hard technical and business problem to solve.  While it’s easy to inject blockchain as a possible solution, the actual requirements are still evolving.
  • Remote, In-Environment Infrastructure – To make matters even harder, the sites are not traditional raised floor data centers with 24×7 attendants: most will be small, remote and unstaffed sites that require a truck roll for services.  Imagine an IT shed at the base of a vacant lot cell tower behind rusted chain link fences guarded by angry squirrels and monitored by underfunded FCC regulators.
  • Multi-Tenant and Trusted – Edge infrastructure will be a multi-tenant environment because it’s simple economics driving as-a-Service style resource sharing. Unlike buy-on-credit-card public clouds, the participants in the edge will have deeper, trusted relationships with the service providers.  A high degree of trust is required because distributed application and data management must be coordinated between the Edge infrastructure manager and the application authors.  This level of integration requires a deeper trust and inspect than current public clouds require.

These are hard problems!  Solving them requires new thinking and tools that while cloud native in design, are not cloud tools.  We should not expect to lift-and-shift cloud patterns directly into edge because the requirements are fundamentally different.  This next wave of innovation requires building for an even more distributed and automated architecture.

I hope you’re as excited as we are about helping build infrastructure at the edge.  What do you think the challenges are? We’d like to hear from you!

DC2020: Putting the Data back in the Data Center

For the past two decades, data centers have been more about compute than data, but the machine learning and IoT revolutions are changing that focus for the 2020 Data Center (aka DC2020). My experience at IBM Think 2018 suggests that we should be challenging our compute centric view of a data center; instead, we should be considering the flow and processing of data. Since data is not localized, that reinforces our concept of DC2020 as a distributed and integrated environment.

We have defined data centers by the compute infrastructure stored there. Cloud (especially equated with virtualized machines) has been an infrastructure as a service (IaaS) story. Even big data “lakes” are primary compute clusters with distributed storage. This model dominates because data sources locked in application silos control of the compute translates directly to control of the data.

What if control of data is being decoupled from applications? Data is becoming it’s own thing with new technologies like machine learning, IoT, blockchain and other distributed sourcing.

In a data centric model, we are more concerned with movement and access to data than building applications to control it. Think of event driven (serverless) and microservice platforms that effectively operate on data-in-flight. It will become impossible to actually know all the ways that data is manipulated as function as a service progresses because there are no longer boundaries for applications.

This data-centric, distributed architecture model will be even more pronounced as processing moves out of data centers and into the edge. IT infrastructure at the edge will be used for handling latency critical data and aggregating data for centralization. These operations will not look like traditional application stacks: they will be data processing microservices and functions.

This data centric approach relegates infrastructure services to a subordinate role. We should not care about servers or machines except as they support platforms driving data flows.

I am not abandoning making infrastructure simple and easy – we need to do that more than ever! However, it’s easy to underestimate the coming transformation of application architectures based on advanced data processing and sharing technologies. The amount and sources of data have already grown beyond human comprehension because we still think of applications in a client-server mindset.

We’re only at the start of really embedding connected sensors and devices into our environment. As devices from many sources and vendors proliferate, they also need to coordinate. That means we’re reaching a point where devices will start talking to each other locally instead of via our centralized systems. It’s part of the coming data avalanche.

Current management systems will not survive explosive growth.  We’re entering a phase where control and management paradigms cannot keep up.

As an industry, we are rethinking management automation from declarative (“start this”) to intent (“maintain this”) focused systems.  This is the simplest way to express the difference between OpenStack and Kubernetes. That change is required to create autonomous infrastructure designs; however, it also means that we need to change our thinking about infrastructure as something that follows data instead of leads it.

That’s exactly what RackN has solved with Digital Rebar Provision.  Deeply composable, simple APIs and extensible workflows are an essential component for integrated automation in DC2020 to put the data back in data center.

DC2020: Is Exposing Bare Metal Practical or Dangerous?

One of IBM’s major announcements at Think 2018 was Managed Kubernetes on Bare Metal. This new offering combines elements of their existing offerings to expose some additional security, attestation and performance isolation. Bare metal has been a hot topic for cloud service providers recently with AWS adding it to their platform and Oracle using it as their primary IaaS. With these offerings as a backdrop, let’s explore the role of bare metal in the 2020 Data Center (DC2020).

Physical servers (aka bare metal) are the core building block for any data center; however, they are often abstracted out of sight by a virtualization layer such as VMware, KVM, HyperV or many others. These platforms are useful for many reasons. In this post, we’re focused on the fact that they provide a control API for infrastructure that makes it possible to manage compute, storage and network requests. Yet the abstraction comes at a price in cost, complexity and performance.

The historical lack of good API control has made bare metal less attractive, but that is changing quickly due to two forces.

These two forces are Container Platforms and Bare Metal as a Service or BMaaS (disclosure: RackN offers a private BMaaS platform called Digital Rebar). Container Platforms such as Kubernetes provide an application service abstraction level for data center consumers that eliminates the need for users to worry about traditional infrastructure concerns.  That means that most users no longer rely on APIs for compute, network or storage allowing the platform to handle those issues. On the other side, BMaaS VM infrastructure level APIs for the actual physical layer of the data center allow users who care about compute, network or storage the ability to work without VMs.  

The combination of containers and bare metal APIs has the potential to squeeze virtualization into a limited role.

The IBM bare metal Kubernetes announcement illustrates both of these forces working together.  Users of the managed Kubernetes service are working through the container abstraction interface and really don’t worry about the infrastructure; however, IBM is able to leverage their internal bare metal APIs to offer enhanced features to those users without changing the service offering.  These benefits include security (IBM White Paper on Security), isolation, performance and (eventually) access to metal features like GPUs. While the IBM offering still includes VMs as an option, it is easy to anticipate that becoming less attractive for all but smaller clusters.

The impact for DC2020 is that operators need to rethink how they rely on virtualization as a ubiquitous abstraction.  As more applications rely on container service abstractions the platforms will grow in size and virtualization will provide less value.  With the advent of better control of the bare metal infrastructure, operators have real options to get deep control without adding virtualization as a requirement.

Shifting to new platforms creates opportunities to streamline operations in DC2020.

Even with virtualization and containers, having better control of the bare metal is a critical addition to data center operations.  The ideal data center has automation and control APIs for every possible component from the metal up.

Learn more about the open source Digital Rebar community:

DC2020: Skeptics Guide to Blockchain in the Data Center

At Think 2018, Machine Learning and Blockchain technologies are beyond pervasive, they are assumed to be beneficial to ROI in every situation. That type of hype begs for closer review. In this post, we’ll look at a potentially real use of blockchain for operations.

There is so much noise about blockchain that it can be difficult to find a starting point. I’m leaving background reading as an exercise for the reader; instead, I want to focus on how blockchain creates a distributed ledger with shared trust. That’s a lot of buzz words! Basically, we’re talking about a system where nodes share data in a way that they use consensus with their peer to determine if the information is trustworthy.

The key concept in blockchain is moving from a central authority to a distributed authority.

In the data center, administrative trust is essential. The premises, networks, and access credentials all rely on the idea that we have a centralized authoritative group. Even PKI, which is designed for decentralized trust, relies on a centralized trust to sign keys. Looking objectively at the bundle of passwords, certificates, keys and isolation layers, there are gaping risks in this model. It only takes getting the right access to flip administrative control from an asset into a liability.

Blockchain allows us to decentralize trust in the data center by requiring systems to collaboratively validate administrative instructions.

In this model, we’d still have administrative controls and management; however, the nodes would be able to validate configuration changes with their peers or other administrative sources. For example, an out of process change (potential hack?) on a single node would be confirmed via consensus with other nodes instead of automatically trusting the source. The body of nodes protects from a bad administrative request. It also allows operators to quickly propagate configurations peer-to-peer instead of relying on a central hub and spoke model.

This is even more powerful if configuration is composited from multiple sources in a pipeline. In a multiple author system, each contributor will be involved in verifying that changes to the whole configuration. This ensures that downstream insertions are both communicated and accepted by upstream steps.  This works because blockchain is a distributed ledger. Changes made to the chain are passed back to all parties. Just like in a decentralized supply chain model, this ensures both validation and transparency.

Blockchain’s ability to provide both horizontal and vertical integrity for operations is an intriguing possibility.

I’m interested in hearing your thoughts about this application for blockchain. From a RackN and Digital Rebar perspective, these capabilities are well aligned with our composable approach to configuration. We’d be happy to talk with operators who want to look more deeply into this type of integration.

DC2020: Mono-clouds are easier! Why do Hybrid?

Background: This post was inspired by a mult-cloud session session at IBM Think2018 where I am attending as a guest of IBM. Providing hybrid solutions is a priority for IBM and it’s customers are clearly looking for multi-cloud options. In this way, IBM has made a choice to support competitive platforms. This post explores why they would do that.

There is considerable angst and hype over the terms multi-cloud and hybrid-cloud. While it would be much simpler if companies could silo into a single platform, innovation and economics requires a multi-party approach. The problem is NOT that we want to have choice and multiple suppliers. The problem is that we are moving so quickly that there is minimal interoperability and minimal efforts to create interoperability.

To drive interoperability, we need a strong commercial incentive to create an neutral ecosystem.

Even something with a clear ANSI standard like SQL has interoperability challenges. It also seems like the software industry has given up on standards in favor of APIs and rapid innovation. The reality on the ground is that technology is fundamentally heterogeneous and changing. For this reason, mono-anything is a myth and hybrid is really status quo.

If we accept multi-* that as a starting point, then we need to invest in portability and avoid platform assumptions when we build automation. Good design is to assume change at multiple points in your stack. Automation itself is a key requirement because it enables rapid iterative build, test and deploy cycles. It is not enough to automate for day 1, the key to working multi-* infrastructure is a continuous deployment pipeline.

Pipelines provide insurance for hybrid infrastructure by exposing issues quickly before they accumulate technical debt.

That means the utility of tools like Terraform, Ansible or Docker is limited to how often you exercise them. Ideally, we’d build abstraction automation layers above these primitives; however, this has proven very difficult in practice. The degrees of variation between environments and pace of innovation make it impossible to standardize without becoming very restrictive. This may be possible for a single company but is not practical for a vendor trying to support many customers with a single platform.

This means that hybrid, while required in the market, carries an integration tax that needs to be considered.

My objective for discussing Data Center 2020 topics is to find ways to lower that tax and improve the outcome. I’m interested in hearing your opinion about this challenge and if you’ve found ways to solve it.

Counterpoint Addendum: if you are in a position to avoid multi-* deployments (e.g. a start-up) then you should consider that option. There is measurable overhead of heterogeneous automation; however, I’ve found the tipping point away from a mono-stack can be surprising low and committing to a vertical stack does make applications less innovation resilient.