DNS is critical – getting physical ops integrations right matters

Why DNS? Maintaining DNS is essential to scale ops.  It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.

Names MatterI love talking about the small Ops things that make a huge impact in quality of automation.  Things like automatically building a squid proxy cache infrastructure.

Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.

Why is that a big deal?  There are a lot of names & IPs to manage.

In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6.  Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references.  It gets even more complex when you factor in DNS round robin or other common practices.

Plus mistakes are expensive.  Name resolution is an essential service for operations.

I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential.  OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful.  This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses.  The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.

What DNS automation did we enable in OpenCrowbar?  Here’s a partial list:

  1. recovery of names and IPs when interfaces and systems are decommissioned
  2. use of flexible naming patterns so that you can control how the systems are registered
  3. ability to register names in multiple DNS infrastructures
  4. ability to understand sub-domains so that you can map DNS by region
  5. ability to register the same system under multiple names
  6. wild card support for C-Names
  7. ability to create a DNS round-robin group and keep it updated

But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.

When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.

StackEngine Docker on Metal via RackN Workload for OpenCrowbar


In our quest for fast and cost effective container workloads, RackN and StackEngine have teamed up to jointly develop a bare metal StackEngine workload for the RackN Enterprise version of OpenCrowbar.  Want more background on StackEngine?  TheNewStack.io also did a recent post covering StackEngine capabilities.

While this work is early, it is complete enough for field installs.  We’d like to include potential users in our initial integration because we value your input.

Why is this important?  We believe that there are significant cost, operational and performance benefits to running containers directly on metal.  This collaboration is a tangible step towards demonstrating that value.

What did we create?  The RackN workload leverages our enterprise distribution of OpenCrowbar to create a ready state environment for StackEngine to be able to deploy and automate Docker container apps.

In this pass, that’s a pretty basic Centos 7.1 environment that’s hardware and configured.  The workload takes your StackEngine customer key as the input.  From there, it will download and install StackEngine on all the nodes in the system.  When you choose which nodes also manage the cluster, the workloads will automatically handle the cross registration.

What is our objective?  We want to provide a consistent and sharable way to run directly on metal.  That accelerates the exploration of this approach to operationalizing container infrastructure.

What is the roadmap?  We want feedback on the workload to drive the roadmap.  Our first priority is to tune to maximize performance.  Later, we expect to add additional operating systems, more complex networking and closed-loop integration with StackEngine and RackN for things like automatic resources scheduling.

How can you get involved?  If you are interested in working with a tech-preview version of the technology, you’ll need to a working OpenCrowbar Drill implementation (via Github or early access available from RackN), a StackEngine registration key and access to the RackN/StackEngine workload (email info@rackn.com or info@stackengine.com for access).

exploring Docker Swarm on Bare Metal for raw performance and ops simplicity

As part of our exploration of containers on metal, the RackN team has created a workload on top of OpenCrowbar as the foundation for a Docker Swarm on bare metal cluster.  This provides a second more integrated and automated path to Docker Clusters than the Docker Machine driver we posted last month.

It’s really pretty simple: The workload does the work to deliver an integrated physical system (Centos 7.1 right now) that has Docker installed and running.  Then we build a Consul cluster to track the to-be-created Swarm.  As new nodes are added into the cluster, they register into Consul and then get added into the Docker Swarm cluster.  If you reset or repurpose a node, Swarm will automatically time out of the missing node so scaling up and down is pretty seamless.

When building the cluster, you have the option to pick which machines are masters for the swarm.  Once the cluster is built, you just use the Docker CLI’s -H option against the chosen master node on the configured port (defaults to port 2475).

This work is intended as a foundation for more complex Swarm and/or non-Docker Container Orchestration deployments.  Future additions include allowing multiple network and remote storage options.

You don’t need metal to run a quick test of this capability.  You can test drive RackN OpenCrowbar using virtual machines and then expand to the full metal experience when you are ready.

Contact info@rackn.com for access to the Docker Swarm trial.   For now, we’re managing the subscriber base for the workload.  OpenCrowbar is a pre-req and ungated.  We’re excited to give access to the code – just ask.

Leading vs. Directing: Digital Managers must learn the difference [post 5 of 8]


On the shouldersDigital Management has a challenging deep paradox: digital workers resist direct management but require that their efforts fit into a larger picture.

If you believe the next generation companies we discussed in post #4, then the only way to unlock worker potential is enable self-motivated employees and remove all management. In Zappos case, they encouraged 14% of their workers to simply leave the company because they don’t believe in extreme self-management.

Companies like W. L. Gore & Associates, the makers of GORE-TEX, operate and thrive very well in a team-driven environment… This apparently loosey-goosey management style has brought about hundreds of major multibillion-dollar ideas and made W. L. Gore a leading incubator of consistently great ideas and products for more than fifty years. To an outside observer it looks as though the focus is on having fun. But to the initiated, it is about hiring intense self-starters who contribute wholeheartedly to what they are doing and to the team, and most important, who can self-manage their time and skill sets.

— Liquid Leadership by Brad Szollose, page 154

Frankly, both of us—Brad and Robare skeptical. We believe that these tactics do enhance productivity, but gloss over the essential ingredient in their success: a shared set of goals.

Like our Jazz analogy, the performance is the sum of the parts and the players need to understand how their work fits into the bigger picture. A traditional management structure, with controlling leadership and über clear, micromanaged direction, backfires because it restricts the workers’ ability to interpret and adapt; however, that does not mean we are advocates of “no management whatsoever” zones.  

The trendy word is Holacracy.  That loosely translates into removal of management hierarchy and power while redistributing it throughout the organization.  Are you scared of that free-fall model?  If workers reject traditional management then what are the alternatives?

We need a way to manage today’s independent thinking workforce.

According to Forbes, digital workers have an even higher need to understand the purpose of their work than previous generations. If you are a Baby Boomer (Conductor of a Symphony), then this last statement may cause you to roll your eyes in disagreement.

Directing a Jazz ensemble requires a different type of leadership. One that hierarchy junkies —orchestra members who need a conductor—would call ambiguous…IF they didn’t truly know what was happening.

Great musicians don’t join mediocre bands; they purposely seek out other teams that are challenging them, a shared set of goals and standards that produce results and success. This may require a shift in mindset for some of our readers.

Freedom in jazz improvisation comes from understanding structure. When people listen to jazz, they often believe that the soloist is “doing whatever they want.” If fact, as experienced improvisers will tell you, the soloist is rarely “doing whatever they want”.  An improvisational soloist is always following a complicated set of rules and being creative within the context of those rules.  From Jazzpath.com

In the past generation, there was no need to communicate a shared vision: you either did what you were told, OR just told people what to do. And people obeyed. Mostly out of fear of losing your job. But, in the digital workforce, shared goals are what makes the work fit together. Players participate of their own will. Not fear.

Putting this into generational terms: if you were born after 1977 (aka Gen X to the Millennials) then you were encouraged to see ALL adults as peers.  In the public school system, this trend continued as the generation was encouraged to speak up, speak out and make as many mistakes as possible…after all, THAT is how you learn. And the fear of screwing up and making mistakes was actually encouraged, as teachers also became friends and mentors.  Video games simply reinforced the same iterative learning lessons at home.

Thousands of years of social programming were flipped over in favor of iterative learning and flattened hierarchy.  Those skills showed up just in time to enable us to survive the chaos of the digital work / social media revolution.

But survival is not enough, we are looking for a way to lead and win.

Since hierarchy is flat, it’s become critical to replace directing action with building a common mission.  In individual-centric digital work, there are often multiple right ways to accomplish the team objective (our topic for post 7).  While having a clear shared goals will not help pick the right option, it will help the team accept that 1) the team has to choose and 2) the team is still on track even if some some individuals have to change direction.

Just listen to the most complex work out there that has been influenced by Jazz; the late Jeff Porcaro, pop rock drummer and cofounder of Toto admits to being influenced by Bo Diddley for his drum riffs on the song Rosanna. Or if you are a RUSH fan you know that songs like La Villa Strangiato owe the syncopated rhythms, chord changes and drum riffs to Jazz.

Or the modern artist Piet Mondrian who invented neoplasticism, was inspired by listening incessantly to a particular type of jazz called “Boogie-Woogie.”

Participants in this type of performance do not tune out and wait for direction. They must be present, bring 100% of themselves to each performance, and let go of what they did in the last concert because each new performance is customized.

You have until our next post to cry in your beer while whining that digital managers have it too hard.  In the next post, we’ll lay out 12 very concrete actions that you should be taking as a leader in the digital workforce.

PS: Brad some important insights about how their childhood experience shapes digital natives’ behavior.  We felt that topic was important but external to the primary narrative so Rob included them here:

Continue reading

Curious about SDN & OpenStack? We discuss at Open Networking Summit Panel (next Thursday)

Next Thursday (6/18), I’m on a panel at the SJC Open Networking Summit with John Zannos (Canonical), Mark Carroll (HP), Mark McClain (VMware).  Our topic is software defined networking (SDN) and OpenStack which could go anywhere in discussion.
OpenStack is clearly driving a lot of open innovation around SDN (and NFV).
I have no idea of what other’s want to bring in, but I was so excited about the questions that I suggested that I thought to just post them with my answers here as a teaser.

1) Does OpenStack require an SDN to be successful?

Historically, no.  There were two networking modes.  In the future, expect that some level of SDN will be required via the Neutron part of the project.

More broadly, SDN appears to be a critical component to broader OpenStack success.  Getting it right creates a lock-in for OpenStack.

2) If you have an SDN for OpenStack, does it need to integrate with your whole datacenter or can it be an island around OpenStack?

On the surface, you can create an Island and get away with it.  More broadly, I think that SDN is most interesting if it provides network isolation throughout your data center or your hosting provider’s data center.  You may not run everything on top of OpenStack but you will be connecting everything together with networking.

SDN has the potential to be the common glue.

3) Of the SDN approaches, which ones seem to be working?  Why?

Overall, the overlay networking approaches seem to be leading.  Anything that requires central control and administration will have to demonstrate it can scale.  Anything that actually requires re-configuring the underlay networking quickly is also going to have to make a lot of progress.

Networking is already distributed.  Anything that breaks that design pattern has an uphill battle.

4) Are SDN and NFV co-dependent?  Are they driving each other?

Yes.  The idea of spreading networking functions throughout your data center to manage east-west or individual tenant requirements (my definition of NFV) requires a way to have isolated traffic (one of the uses for SDN).

5) Is SDN relevant outside of OpenStack?  If so, in what?

Yes.  SDN on containers will become increasingly important.  Also, SDN termination to multi-user systems (like a big database) also make sense.

6) IPv6?  A threat or assistance to SDN?

IPv6 is coming, really.  I think that IPv6 has isolation and encryption capabilities that compete with SDN as an overlay.  Widespread IPv6 adoption could make SDN less relevant.  It also does a better job for multi-cloud networking since it’s neutral and you don’t have to worry about which SDN tech your host is using.

Ceph in an hour? Boring! How about Ceph hardware optimized with advanced topology networking & IPv6?

This is the most remarkable deployment that I’ve had the pleasure to post about.

The RackN team has refreshed the original OpenCrowbar Ceph deployment to take advantage of the latest capabilities of the platform.  The updated workload (APL2) requires first installing RackN Enterprise or OpenCrowbar.

The update provides five distinct capabilities:

1. Fast and Repeatable

You can go from nothing to a distributed Ceph cluster in an hour.  Need to rehearse on VMs?  That’s even faster.  Want to test and retune your configuration?  Make some changes, take a coffee break and retest.  Of course, with redeploy that fast, you can iterate until you’ve got it exactly right.

2. Automatically Optimized Disc Configuration

The RackN update optimizes the Ceph installation for disk performance by finding and flagging SSDs.  That means that our deploy just works(tm) without you having to reconfigure your OS provisioning scripts or vendor disk layout.

3. Cluster Building and Balancing

This update allows you to place which roles you want on which nodes before you commit to the deployment.  You can decide the right monitor to OSD/MON ratio for your needs.  If you expand your cluster, the system will automatically rebalance the cluster.

4. Advanced Networking Topology & IPv6

Using the network conduit abstraction, you can separate front and back end networks for the cluster.  We also take advantage of native IPv6 support and even use that as the preferred addressing.

5. Both Block and Object Services

Building up from Ready State Core, you can add the Ceph workload and be quickly installing Ceph for block and object storage.
That’s a lot of advanced capabilities included out-of-the-box made possible by having a ops orchestration platform that actually understands metal.
Of course, there’s always more to improve.  Before we take on further automated tuning, we want to hear from you and learn what use-cases are most important.

The Matrix & Surrogates as an analogies for VMs, Containers and Metal

010312_1546_2012CloudOu1.jpgTrench coats aside, I used The Matrix as a useful analogy to explain visualization and containers to a non-technical friend today.  I’m interested in hearing from others if this is a helpful analogy.

Why does anyone care about virtual servers?

Virtual servers (aka virtual machines or VMs) are important because data centers are just like the Matrix.  The real world of data centers is a ugly, messy place fraught with hidden dangers and unpleasant chores.  Most of us would rather take the blue pill and live in a safe computer generated artificial environment where we can ignore those details and live in the convenient abstraction of Mega City.

Do VMs really work to let you ignore the physical stuff?

Pretty much.  For most people, they can live their whole lives within the virtual world.  They can think they are eating the steak and never try bending the spoons.

So why are containers disruptive?  

Well, it’s like the Surrogates movie.  Right now, a lot of people living in the Cloud Matrix are setting up even smaller bubbles.  They are finding that they don’t need a whole city, they can just live inside a single room.  For them, it’s more like Surrogates where people never leave their single room.

But if they never leave the container, do they need the Matrix?

No.  And that’s the disruption.  If you’ve wrapped yourself in a smaller bubble then you really don’t need to larger wrapper.

What about that messy “real world”?

It’s still out there in both cases.  Just once you are inside the inner bubble, you can’t really tell the difference.