100,000 of Anything is Hard – Scaling Concerns for Digital Rebar Architecture

Posted on November 28, 2017 by Rob H

Our architectural plans for Digital Rebar are beyond big – they are for massive distributed scale. Not up, but out. We are designing for the case where we have common automation content packages distributed over 100,000 stand-alone sites (think 5G cell towers) that are not synchronously managed. In that case, there will be version drift between the endpoints and content. For example, we may need to patch an installation script quickly over a whole fleet but want to upgrade the endpoints more slowly.

It’s a hard problem and it’s why we’ve focused on composable systems and fine-grain versioning.

It’s also part of the RackN move into a biweekly release cadence for Digital Rebar. That means that we are iterating from tip development to stable every two weeks. It’s fast because we don’t want operators deploying the development “tip” to access features or bug fixes.

This works for several reasons. First, much of the Digital Rebar value is delivered as content instead of in the scaffolding. Each content package has it’s own version cycle and is not tied to Digital Rebar versions. Second, many Digital Rebar features are relatively small, incremental additions. Faster releases allows content creators and operators to access that buttery goodness more quickly without trying to manage the less stable development tip.

Critical enablers for this release pace are feature flags. Starting in v3.2, Digital Rebar introduced the system level tags that are set when new features are added. These flags allow content developers to introspect the system in a multi-version way to see which behaviors are available in each endpoint. This is much more consistent and granular than version matching.

We are not designing a single endpoint system: we are planning for content that spans 1,000s of endpoints.

Feature flags are part of our 100,000 endpoint architecture thinking. In large scale systems, there can be significant version drift within a fleet deployment. We have to expect that automation designers want to enable advanced features before they are universally deployed in the fleet. That means that the system needs a way to easily advertise specific capabilities internally. Automation can then be written with different behaviors depending on the environment. For example, changing exit codes could have broken existing scripts except that scripts used flags to determine which codes were appropriate for the system. These are NOT API issues that work well with semantic versioning (semver), they are deeper system behaviors.

This matters even if you only have a single endpoint because it also enables sharing in the Digital Rebar community.

Without these changes, composable automation designed for the Digital Rebar community would quickly become very brittle and hard to maintain. Our goal is to ensure a decoupling of endpoint and content. This same benefit allows the community to share packages and large scale sites to coordinate upgrades. I don’t think that we’re done yet. This is a hard problem and we’re still evolving all the intricacies of updating and delivering composable automation.

It’s the type of complex, operational thinking that excites the RackN engineering team. I hope it excites you too because we’d love to get your thinking on how to make it even better!

Podcast with Krishnan Subramanian on Edge, the Kubernetes Ecosystem & the Composable Enterprise

Posted on November 27, 2017 by spector13

In this week’s L8ist Sh9y podcast Krishnan Subramanian, Founder and Chief Research Advisor of Rishidot Research talks about Edge Computing, the Kubernetes Ecosystem and the Composable Enterprise. Key highlights:

“Multi-Cloud is the foundation of Modern Enterprise” – Krishnan
Kubernetes ecosystem and the possibility that Serverless could replace it
IT innovation requires a composable and layered approach, without this approach IT will find themselves trapped in a hard-wired infrastructure unable to move forward

Topic Time (Minutes.Seconds)

Introduction 0.0 – 1.28
Edge Computing 1.28 – 4.25
What is the Edge? 4.25 – 6.06
Use Cases Not For Cloud 6.06 – 8.50 (Networking and 5G)
Distributed Scale of Edge 8.50 – 10.03
Multi-Cloud Progress 10.03 – 12.07
Supporting Diff Infra Types? 12.07 – 16.40
Multi-Cloud & Kubernetes 16.40 – 20.54
Kubernetes Ecosystem              20.54 – 28.00 (Serverless can replace)
Ecosystem Gaps                          28.00 – 29.44
Best of Bread IT                           29.44 – 32.25 (Composable Enterprise)
IT Moves to Smaller Units 32.25 – 35.30
Back to Edge 5.30 – 41.45
Conclusion 41.45 – 42.35

Podcast Guest: Krishnan Subramanian
Founder and Chief Research Advisor, Infrastructure, Application Platforms and DevOps

Krishnan Subramanian (a.k.a Krish) is a well-known expert in the field of cloud computing. He is the founder and Chief Research Advisor at Rishidot Research, a boutique analyst firm focused on Modern Enterprise. Their open data-based research helps enterprise decision makers on their enterprise modernization strategy. His Modern Enterprise model helps enterprises innovate rapidly by transforming their IT as the core part of the innovation team. He was a speaker and panelist at various cloud computing conferences and he was also an advisor for Glue conference in 2011 and Cloud Connect Santa Clara in 2012. He has also organized industry-leading conferences like Deploycon and Cloud2020. He is also an advisor to cloud computing startups. He can be reached on Twitter @krishnan.

Putting a little ooooh! in orchestration

Posted on November 22, 2017 by Rob H

The RackN team is proud of saying that we left the Orchestration out when we migrated from Digital Rebar v2 to v3. That would mean more if anyone actually agreed on what orchestration means… In this our case, I think we can be pretty specific: Digital Rebar v3 does not manage work across multiple nodes. At this point, we’re emphatic about it because cross machine actions add a lot of complexity and require application awareness that quickly blossoms into operational woe, torture and frustration (aka WTF).

That’s why Digital Rebar focused on doing a simple yet powerful job doing multi-boot workflow on a single machine.

In the latest releases (v3.2+), we’ve delivered an easy to understand stage and task running system that is simple to extend, transparent in operation and extremely fast. There’s no special language (DSL) to learn or database to master. And if you need those things, then we encourage you to use the excellent options from Chef, Puppet, SaltStack, Ansible and others. This is because our primary design focus is planning work over multiple boots and operating system environments instead of between machines. Digital Rebar shines when you need 3+ reboots to automatically scrub, burn-in, inventory, install and then post-configure a machine.

But we may have crossed an orchestration line with our new cluster token capability.

Starting in the v3.4 release, automation authors will be able to use a shared profile to coordinate work between multiple machines. This is not a Digital Rebar feature per se – it’s a data pattern that leverages Digital Rebar locking, profiles and parameters to share information between machines. This allows scripts to elect leaders, create authoritative information (like tokens) and synchronize actions. The basic mechanism is simple: we create a shared machine profile that includes a token that allows editing the profile. Normally, machines can only edit themselves so we have to explicitly enable editing profiles with a special use token. With this capability, all the machines assigned to the profile can update the profile (and only that profile). The profile becomes an atomic, secure shared configuration space.

For example, when building a Kubernetes cluster using Kubeadm, the installation script needs to take different actions depending on which node is first. The first node needs to initialize the cluster master, generate a token and share its IP address. The subsequent nodes must wait until the master is initialized and then join using the token. The installation pattern is basically a first-in leader election while all others wait for the leader. There’s no need for more complex sequencing because the real install “orchestration” is done after the join when Kubernetes starts to configure the nodes.

Our experience is that recent cloud native systems are all capable of this type of shotgun start where all the nodes start in parallel with the minimal bootstrap coordination that Digital Rebar can provide.

Individually, the incremental features needed to enable cluster building were small additions to Digital Rebar. Together, they provide a simple yet powerful management underlay. At RackN, we believe that simple beats complex everyday and we’re fighting hard to make sure operations stays that way.

Data Center’s Last Mile: Zero Touch Metal Automation

Posted on November 21, 2017 by spector13

The embedded video is an excellent RackN and Digital Rebar overview created by Rob Hirschfeld and Greg Althaus, co-founders of RackN on the critical issue facing data center operations teams. Their open-source based offering completes the integration challenge existing between platforms/orchestration tools and control/provision technology.

By integrating with the platform and orchestration solutions, RackN is able to replace the control and provisioning tools without adding complexity or replacing established technology.

Watch the complete video below as Rob Hirschfeld provides the background of how RackN arrived at the current offering and the benefits for data center operators to support bare metal provisioning as well as immutable infrastructure. (Slides)

The demonstration video referenced in this overview:

The white paper referenced in this overview:

Have more questions? Contact us at sales@rackn.com or via social media on Twitter at @rackngo to learn more.

Podcast with Peter Miron talking NATS Service, Edge and Cloud Native Foundation

Posted on November 20, 2017 by spector13

Joining this week’s L8ist Sh9y Podcast is Peter Miron, General Manager for NATS project sponsored by Apcera provides details on this open source project how it integrates with modern application architecture as well as their participation in Cloud Native Foundation.

About NATS

NATS is a family of open source products that are tightly integrated but can be deployed independently. NATS is being deployed globally by thousands of companies, spanning innovative use-cases including: Mobile apps, Microservices and Cloud Native, and IoT. NATS is also available as a hosted solution, NATS Cloud

The core NATS Server acts as a central nervous system for building distributed applications. There are dozens of clients ranging from Java, .NET, to GO. NATS Streaming extends the platform to provide for real-time streaming & big data use-cases.

Topic Time (Minutes.Seconds)

Introduction 0.00 – 2.07
What is NATS? 2.07 – 3.36
Built for Containers, Short Term 3.36 – 5.14
Simple Example 5.14 – 6.51
Container ServiceMesh Concept 6.51 – 9.20
Loosely Coupled? 9.20 – 12.02
Inter-process Communication 12.02 – 15.11
Security 15.11 – 18.02
Generic Politics Discussion 18.02 – 24.10
Edge Computing & NATS 24.10 – 28.55
Apps to Service Portability 28.55 – 32.37
Open Source Politics – CNCF 32.37 – 39.53
Conclusion 39.53 – END

Podcast Guest: Peter Miron
General Manager for NATS team

Peter Miron is an architect at Apcera, a highly secure, policy-driven platform for cloud-native applications and microservices. He was previously the director of technology for Pershing.

Before joining Pershing, Miron worked as the SVP of engineering at Bitly and vice president at Vonage. He also worked as the CTO of Knewton.

Miron holds a bachelor’s degree in art history from Syracuse University.

November 17 – Weekly Recap Of All Things Digital Rebar And RackN

Posted on November 17, 2017 by spector13

Welcome to the weekly post of the RackN blog recap of all things Digital Rebar, RackN, SRE, and DevOps. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

Items of the Week

Digital Rebar

Terraform Bare Metal – A Leap Forward for SDx

Software Defined Infrastructure (SDx) allows operators to manage data centers in a more consistent and controlled way. It allows teams to define their environment as code and use automation to execute that definition in practice. To deliver this capability for physical (aka bare metal) servers, RackN has created a Digital Rebar provider for Terraform. The provider is a simple addition that take just seconds to enable. Read More

Digital Rebar Online Community Meetup

Our 5th Meetup is Tuesday Nov 21…

Welcome to the fifth (v005) Digital Rebar online meetup! In today’s meetup we’ll discuss the status of Digital Rebar Provision v3.3.0 features and planning activities along with Understanding the Runner and Jobs system in Stage transitions. We’ll conclude with opening up the floor for community feedback.

Join the Meetup Group

RackN

The RackN Beta now contains Digital Rebar Provision v3.2 as well as the Terraform Bare Metal Plug-in currently in final testing for official release from HashiCorp. To join the beta, simply provide your email on our registration page so we can provide the software as well as ensure our engineers are able to engage directly in support during setup and operation.

Joining this week’s L8ist Sh9y Podcast is Yves Boudreau, VP or Partnerships and Ecosystem Strategy at Ericsson. Rob Hirschfeld and Yves discuss the Ericsson Unified Delivery Network platform and the concept of a global content provider service built on heterogeneous infrastructure. Yves also provides insight into what webscale customers are looking for in the Edge as they give thought around balancing their applications from public cloud services to future edge clouds. Finally, Rob and Yves talk about the coming fundamental change in how software is created and run “independent” of hardware.” (Blog with Time/Questions)

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com

Gartner IT Infrastructure, Operations Management and Data Center – Dec 4 – 7 Event Link
CloudNativeCon + KubeCon – Dec 8 : Zero Configuration Pattern of Kubernetes on Bare Metal

If you are attending any of these events please reach out to Rob Hirschfeld to setup time to learn more about our solutions or discuss the latest industry trends.

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)– Issue #97
DevOps, SRE, & Operations – LINK
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

Terraform Bare Metal – A Leap forward for SDx

Posted on November 15, 2017 by Rob H

The Terraform Bare Metal provider allows plans to provision and recover servers using a node resource.

The operation of this provider is simple and relies on standard workflow stages in Digital Rebar. Adding the Terraform Content Package installs a new stage that adds Terraform parameters. Including this stage in the global workflow will automatically register machines as available for Terraform. The integration uses two parameters to manage the server pool: Terraform Managed and Terraform Assigned.

When the Terraform provider asks for a node resource, it queries the Digital Rebar API for machines that are managed (true) and not assigned (false) plus whatever additional filters were required in the plan. The provider then uses the API to set assigned true and the requested Stage (e.g. centos-install) and polls until the node enters the Complete stage. The destroy action reverses the action to release the node. Digital Rebar uses the stage changes as a trigger to restart the machine workflow.

Using a Terraform plan with Digital Rebar, operators can manage complex data centers layouts from a single command line.

For users, all of the above steps are completely hidden. Operators can monitor the request using the Digital Rebar UX to ensure the plan is executing. In addition, plan metadata can set user or identification values to the machines when they are reserved to help track allocations. In this way, administrators can easily track and account for machines reserved via Terraform.

For full out-of-band control, users should add the RackN IPMI plugin. This adds the ability to force power states during plan execution. The provider does not require out-of-band management to function. RackN also maintains Packet.net and VirtualBox plugins with the same API as the IPMI plugin. This allows developers to easily test plans against virtual or cloud resources.

RackN customers are making big plans to use this simple and powerful integration to manage their own SDx roadmap. We’re excited to hear about new ways to improve data center operations, especially new edge ideas. Let us know what you are thinking!

Demonstration of Terraform Bare Metal Provisioning with Digital Rebar Provision V3.2

Setting up the Environment to run Digital Rebar Provision V3.2 for Terraform

Sirens of Open Infrastructure beacons to OpenStack Community

Posted on November 14, 2017 by Rob H

OpenStack is a real platform doing real work for real users. So why does OpenStack have a reputation for not working? It falls into the lack of core-focus paradox: being too much to too many undermines your ability to do something well. In this case, we keep conflating the community and the code.

I have a long history with the project but have been pretty much outside of it (yay, Kubernetes!) for the last 18 months. That perspective helps me feel like I’m getting closer to the answer after spending a few days with the community at the latest OpenStack Summit in Sydney Australia. While I love to think about the why, the what the leaders are doing about it is very interesting too.

Fundamentally, OpenStack’s problem is that infrastructure automation is too hard and big to be solved within a single effort.

It’s so big that any workable solution will fail for a sizable number of hopeful operators. That does not keep people from the false aspiration that OpenStack code will perfectly fit their needs (especially if they are unwilling to trim their requirements).

But the problem is not inflated expectations for OpenStack VM IaaS code, it’s that we keep feeding them. I have been a long time champion for a small core with a clear ecosystem boundary. When OpenStack code claims support for other use cases, it invites disappointment and frustration.

So why is OpenStack foundation moving to expand its scope as an Open Infrastructure community with additional focus areas? It’s simple: the community is asking them to do it.

Within the vast space of infrastructure automation, there are clusters of aligned interest. These clusters are sufficiently narrow that they can collaborate on shared technologies and practices. They also have an partial overlap (Venn) with adjacencies where OpenStack is already present. There is a strong economic and social drive for members in these overlapped communities to bridge together instead of creating new disparate groups. Having the OpenStack foundation organize these efforts is a natural and expected function.

The danger of this expansion comes from also carrying the expectation that the technology (code) will also be carried into the adjacencies. That’s my my exact rationale the original VM IaaS needs to be smaller. The wealth of non-core projects crosses clusters of interests. Instead of allowing these clusters to optimize their needs around shared interests, the users get the impression that they must broadly adopt unneeded or poorly fit components. The idea of “competitive” projects should be reframed because they may overlap in function but not ui use-case fit.

It’s long past time to give up expectations that OpenStack is a “one-stop-shop” of infrastructure automation. In my opinion, it undermines the community mission by excluding adjacencies.

I believe that OpenStack must work to embrace its role as an open infrastructure community; however, it must also do the hard work to create welcoming space for adjacencies. These adjacencies will compete with existing projects currently under the OpenStack code tent. The community needs to embrace that the hard work done so far may simply be sunk cost for new use cases.

It’s the OpenStack community and the experience, not the code, that creates long term value.

Podcast with Yves Boudreau talks Heterogeneity in the Edge

Posted on November 13, 2017 by spector13

Topic Time (Minutes.Seconds)

Introduction 0.00 – 2.11
Ericsson Unified Delivery Network 2.11 – 3.01
Service Providers Space 3.01 – 4.05
Operator Customers 4.05 – 5.22
Content Provider want global coverage 5.22 – 7.15
Example 7.15 – 8.34
Edge Infrastructure w/ CDN 8.34 – 9.42
Distributed Heterogeneous Infra 9.42 – 11.30
Baking Cloud Consumption into Edge 11.30 – 11.56
Multi-Tenant Infra at Edge 11.56 – 14.05
Delivery of the Edge 14.05 – 16.16
Amazon Lambda is Expectation 16.06 – 20.36
Containers are Edge EC2? 20.36 – 25.18
Is Edge Greenfield Work? 25.18 – 29.12
Fundamental Software Change 29.12 – 31.29
Locked-In “Debt” always Re-appears 31.29 – 35.28
Conclusion 35.28 – END

Podcast Guest: Yves Boudreau

Mr. Boudreau is a 20 year veteran of the Digital, Telecom and Cable TV industries. From modest beginnings of one of the first cable broadband ISPs in Canada to the fast paced technology hub of Silicon Valley, Yves joined ERICSSON in 2011 as Vice President of Technical Sales Support and most recently has accepted a position as the VP of Partnerships and Ecosystem Strategy for the ERICSSON Unified Delivery Network. Previously, Mr. Boudreau has worked in R&D, Systems Engineering & Business Development for companies such as Com21 Inc., ARRIS Group (Cable), Imagine Communication (Video Compression) and Verivue Inc. (CDN). Yves now resides in Atlanta, Georgia with his wife Josée and 3 children. Mr. Boudreau completed his undergraduate studies in Commerce @ Laurentian University and graduate studies in Information Technology Management @ Athabasca University. Yves currently also serves on the Board of Director of the Streaming Video Alliance (www.streamingvideoalliance.org)

November 10 – Weekly Recap of all things Digital Rebar and RackN

Posted on November 10, 2017 by spector13

Items of the Week

Digital Rebar

Digital Rebar Releases V3.2 – Stage Workflow

In v3.2, Digital Rebar continues to refine the groundbreaking provisioning workflow introduced in v3.1. Updates to the workflow make it easier to consume by external systems like Terraform. We’ve also improved the consistency and performance of both the content and service.

The release of workflow and the addition of inventory means that Digital Rebar v3 effectively replaces all key functions of v2 with a significantly smaller footprint, minimal learning curve and improved performance. One v2 major feature, multi-node coordination, is not on any roadmap for v3 because we believe those use case are well serviced by upstack integrations like Terraform and Ansible. Full Post

RackN

Joining this week’s L8ist Sh9y Podcast is Zach Smith, CEO of Packet and long-time champion of bare metal hardware. Rob Hirschfeld and Zach discuss the trends in bare metal, the impact of AWS changing the way developers view infrastructure, and issues between networking and server groups in IT organizations. (Blog with Topics and Times)

OpenStack Summit Sydney

Rob Hirschfeld and Ihor Dvoretskyi presented “Building Kubernetes based highly Customizable Environments on OpenStack with Kubespray.” Full Post

https://www.slideshare.net/RackN/slideshelf

UPCOMING EVENTS

Gartner IT Infrastructure, Operations Management and Data Center – Dec 4 – 7 Event Link
CloudNativeCon + KubeCon – Dec 8 : Zero Configuration Pattern of Kubernetes on Bare Metal

If you are attending any of these events please reach out to Rob Hirschfeld to setup time to learn more about our solutions or discuss the latest industry trends.

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)– Issue #96
DevOps, SRE, & Operations – LINK
The DevOps/WebOps Marketing Geek – LINKfrom @LukasHertig
Julie Evans Blog – LINK