DNS is critical – getting physical ops integrations right matters

Why DNS? Maintaining DNS is essential to scale ops.  It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.

Names MatterI love talking about the small Ops things that make a huge impact in quality of automation.  Things like automatically building a squid proxy cache infrastructure.

Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.

Why is that a big deal?  There are a lot of names & IPs to manage.

In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6.  Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references.  It gets even more complex when you factor in DNS round robin or other common practices.

Plus mistakes are expensive.  Name resolution is an essential service for operations.

I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential.  OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful.  This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses.  The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.

What DNS automation did we enable in OpenCrowbar?  Here’s a partial list:

  1. recovery of names and IPs when interfaces and systems are decommissioned
  2. use of flexible naming patterns so that you can control how the systems are registered
  3. ability to register names in multiple DNS infrastructures
  4. ability to understand sub-domains so that you can map DNS by region
  5. ability to register the same system under multiple names
  6. wild card support for C-Names
  7. ability to create a DNS round-robin group and keep it updated

But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.

When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.

Leading vs. Directing: Digital Managers must learn the difference [post 5 of 8]

Fifth IN AN 8 POST SERIESBRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

On the shouldersDigital Management has a challenging deep paradox: digital workers resist direct management but require that their efforts fit into a larger picture.

If you believe the next generation companies we discussed in post #4, then the only way to unlock worker potential is enable self-motivated employees and remove all management. In Zappos case, they encouraged 14% of their workers to simply leave the company because they don’t believe in extreme self-management.

Companies like W. L. Gore & Associates, the makers of GORE-TEX, operate and thrive very well in a team-driven environment… This apparently loosey-goosey management style has brought about hundreds of major multibillion-dollar ideas and made W. L. Gore a leading incubator of consistently great ideas and products for more than fifty years. To an outside observer it looks as though the focus is on having fun. But to the initiated, it is about hiring intense self-starters who contribute wholeheartedly to what they are doing and to the team, and most important, who can self-manage their time and skill sets.

— Liquid Leadership by Brad Szollose, page 154

Frankly, both of us—Brad and Robare skeptical. We believe that these tactics do enhance productivity, but gloss over the essential ingredient in their success: a shared set of goals.

Like our Jazz analogy, the performance is the sum of the parts and the players need to understand how their work fits into the bigger picture. A traditional management structure, with controlling leadership and über clear, micromanaged direction, backfires because it restricts the workers’ ability to interpret and adapt; however, that does not mean we are advocates of “no management whatsoever” zones.  

The trendy word is Holacracy.  That loosely translates into removal of management hierarchy and power while redistributing it throughout the organization.  Are you scared of that free-fall model?  If workers reject traditional management then what are the alternatives?

We need a way to manage today’s independent thinking workforce.

According to Forbes, digital workers have an even higher need to understand the purpose of their work than previous generations. If you are a Baby Boomer (Conductor of a Symphony), then this last statement may cause you to roll your eyes in disagreement.

Directing a Jazz ensemble requires a different type of leadership. One that hierarchy junkies —orchestra members who need a conductor—would call ambiguous…IF they didn’t truly know what was happening.

Great musicians don’t join mediocre bands; they purposely seek out other teams that are challenging them, a shared set of goals and standards that produce results and success. This may require a shift in mindset for some of our readers.

Freedom in jazz improvisation comes from understanding structure. When people listen to jazz, they often believe that the soloist is “doing whatever they want.” If fact, as experienced improvisers will tell you, the soloist is rarely “doing whatever they want”.  An improvisational soloist is always following a complicated set of rules and being creative within the context of those rules.  From Jazzpath.com

In the past generation, there was no need to communicate a shared vision: you either did what you were told, OR just told people what to do. And people obeyed. Mostly out of fear of losing your job. But, in the digital workforce, shared goals are what makes the work fit together. Players participate of their own will. Not fear.

Putting this into generational terms: if you were born after 1977 (aka Gen X to the Millennials) then you were encouraged to see ALL adults as peers.  In the public school system, this trend continued as the generation was encouraged to speak up, speak out and make as many mistakes as possible…after all, THAT is how you learn. And the fear of screwing up and making mistakes was actually encouraged, as teachers also became friends and mentors.  Video games simply reinforced the same iterative learning lessons at home.

Thousands of years of social programming were flipped over in favor of iterative learning and flattened hierarchy.  Those skills showed up just in time to enable us to survive the chaos of the digital work / social media revolution.

But survival is not enough, we are looking for a way to lead and win.

Since hierarchy is flat, it’s become critical to replace directing action with building a common mission.  In individual-centric digital work, there are often multiple right ways to accomplish the team objective (our topic for post 7).  While having a clear shared goals will not help pick the right option, it will help the team accept that 1) the team has to choose and 2) the team is still on track even if some some individuals have to change direction.

Just listen to the most complex work out there that has been influenced by Jazz; the late Jeff Porcaro, pop rock drummer and cofounder of Toto admits to being influenced by Bo Diddley for his drum riffs on the song Rosanna. Or if you are a RUSH fan you know that songs like La Villa Strangiato owe the syncopated rhythms, chord changes and drum riffs to Jazz.

Or the modern artist Piet Mondrian who invented neoplasticism, was inspired by listening incessantly to a particular type of jazz called “Boogie-Woogie.”

Participants in this type of performance do not tune out and wait for direction. They must be present, bring 100% of themselves to each performance, and let go of what they did in the last concert because each new performance is customized.

You have until our next post to cry in your beer while whining that digital managers have it too hard.  In the next post, we’ll lay out very concrete actions that you should be taking as a leader in the digital workforce.

PS: Brad some important insights about how their childhood experience shapes digital natives’ behavior.  We felt that topic was important but external to the primary narrative so Rob included them here:

Continue reading

Hidden costs of Cloud? No surprises, it’s still about complexity = people cost

Last week, Forbes and ZDnet posted articles discussing the cost of various cloud (451 source material behind wall) full of dollar per hour costs analysis.  Their analysis talks about private infrastructure being an order of magnitude cheaper (yes, cheaper) to own than public cloud; however, the open source price advantages offered by OpenStack are swallowed by added cost of finding skilled operators and its lack of maturity.

At the end of the day, operational concerns are the differential factor.

The Magic 8 Cube

The Magic 8 Cube

These articles get tied down into trying to normalize clouds to $/vm/hour analysis and buried the lead that the operational decisions about what contributes to cloud operational costs.   I explored this a while back in my “magic 8 cube” series about six added management variations between public and private clouds.

In most cases, operations decisions is not just about cost – they factor in flexibility, stability and organizational readiness.  From that perspective, the additional costs of public clouds and well-known stacks (VMware) are easily justified for smaller operations.  Using alternatives means paying higher salaries and finding talent that requires larger scale to justify.

Operational complexity is a material cost that strongly detracts from new platforms (yes, OpenStack – we need to address this!)

Unfortunately, it’s hard for people building platforms to perceive the complexity experienced by people outside their community.  We need to make sure that stability and operability are top line features because complexity adds a very real cost because it comes directly back to cost of operation.

In my thinking, the winners will be solutions that reduce BOTH cost and complexity.  I’ve talked about that in the past and see the trend accelerating as more and more companies invest in ops automation.

Short lived VM (Mayflies) research yields surprising scheduling benefit

Last semester, Alex Hirschfeld (my son) did a simulation to explore the possible efficiency benefits of the Mayflies concept proposed by Josh McKenty and me.

Mayflies swarming from Wikipedia

In the initial phase of the research, he simulated a data center using load curves designed to oversubscribe the resources (he’s still interesting in actual load data).  This was sufficient to test the theory and find something surprising: mayflies can really improve scheduling.

Alex found an unexpected benefit comes when you force mayflies to have a controlled “die off.”  It allows your scheduler to be much smarter.

Let’s assume that you have a high mayfly ratio (70%), that means every day 10% of your resources would turn over.  If you coordinate the time window and feed that information into your scheduler, then it can make much better load distribution decisions.  Alex’s simulation showed that this approach basically eliminated hot spots and server over-crowding.

Here’s a snippet of his report explaining the effect in his own words:

On a system that is more consistent and does not have a massive virtual machine through put, Mayflies may not help with balancing the systems load, but with the social engineering aspect, it can increase the stability of the system.

Most of the time, the requests for new virtual machines on a cloud are immutable. They came in at a time and need to be fulfilled in the order of their request. Mayflies has the potential to change that. If a request is made, it has the potential to be added to a queue of mayflies that need to be reinitialized. This creates a queue of virtual machine requests that any load balancing algorithm can work with.

Mayflies can make load balancing a system easier. Knowing the exact size of the virtual machine that is going to be added and knowing when it will die makes load balancing for dynamic systems trivial.

Manage Hardware like a BOSS – latest OpenCrowbar brings API to Physical Gear

A few weeks ago, I posted about VMs being squeezed between containers and metal.   That observation comes from our experience fielding the latest metal provisioning feature sets for OpenCrowbar; consequently, so it’s exciting to see the team has cut the next quarterly release:  OpenCrowbar v2.2 (aka Camshaft).  Even better, you can top it off with official software support.

Camshaft coordinates activity

Dual overhead camshaft housing by Neodarkshadow from Wikimedia Commons

The Camshaft release had two primary objectives: Integrations and Services.  Both build on the unique functional operations and ready state approach in Crowbar v2.

1) For Integrations, we’ve been busy leveraging our ready state API to make physical servers work like a cloud.  It gets especially interesting with the RackN burn-in/tear-down workflows added in.  Our prototype Chef Provisioning driver showed how you can use the Crowbar API to spin servers up and down.  We’re now expanding this cloud-like capability for Saltstack, Docker Machine and Pivotal BOSH.

2) For Services, we’ve taken ops decomposition to a new level.  The “secret sauce” for Crowbar is our ability to interweave ops activity between components in the system.  For example, building a cluster requires setting up pieces on different systems in a very specific sequence.  In Camshaft, we’ve added externally registered services (using Consul) into the orchestration.  That means that Crowbar will either use existing DNS, Database, or NTP services or set it’s own.  Basically, Crowbar can now work FIT YOUR EXISTING OPS ENVIRONMENT without forcing a dedicated Crowbar only services like DHCP or DNS.

In addition to all these features, you can now purchase support for OpenCrowbar from RackN (my company).  The Enterprise version includes additional server life-cycle workflow elements and features like HA and Upgrade as they are available.

There are AMAZING features coming in the next release (“Drill”) including a message bus to broadcast events from the system, more operating systems (ESXi, Xenserver, Debian and Mirantis’ Fuel) and increased integration/flexibility with existing operational environments.  Several of these have already been added to the develop branch.

It’s easy to setup and test OpenCrowbar using containers, VMs or metal.  Want to learn more?  Join our community in Gitteremail list or weekly interactive community meetings (Wednesdays @ 9am PT).

Jazz vs. Symphony: Why micromanaging digital work FAILS. [post 3 of 8]

Third IN AN 8 POST SERIES, BRAD SZOLLOSE AND ROB HIRSCHFELD INVITE YOU TO SHARE IN OUR DISCUSSION ABOUT FAILURES, FIGHTS AND FRIGHTENING TRANSFORMATIONS GOING ON AROUND US AS DIGITAL WORK CHANGES WORKPLACE DELIVERABLES, PLANNING AND CULTURE.

Now that we’ve introduced music as a functional analogy for a stable 21st century leadership model and defined digital work, we’re ready to expose how work actually gets done in the information age.

First, has work really changed?  Yes.  Traditionally there was a distinct difference between organized production and service-based/creative work such as advertising, accounting or medicine.  Solve a problem by looking for clues and coming up with creative solutions to solve it.

Jazz Hands By RevolvingRevolver on DeviantArt http://revolvingrevolver.deviantart.com/

Digital work on the other hand, and more importantly – digital workers, live in a strange limbo of doing creative work but needing business structures and management models that were developed during the industrial age.

In today’s multi-generational workforce, what appears to be a generational divide has transformed into a non-age-specific cultural rift. As Brad and Rob compared notes, we came to believe that what is really happening is a learned difference in the approach to work and work culture.

There is learned difference in the approach to work and work culture that’s more obvious in, but not limited to, digital natives.

In most companies, the executives are traditionalists (Baby Boomers or hand-selected by Boomers).  While previous generations have been trained to follow hierarchy, the new culture values performance, flexibility and teamwork with a less top-down control oriented outlook.

It’s like a symphonic conductor who is used to picking the chair order and directing the tempo is handing out sheet music to a Jazz ensemble.  So how is the traditional manager going to deliver a stellar performance when his performers are Jazz trained?

In traditional concert orchestra, each musician has to go to college, train hard, earn a shot to get into the orchestra, and overtime, work very hard to earn the First Chair position (think earning the corner office).  Once in that position, they stay there until death or retirement.  Anyone who deviates, is fired. Improv is only allowed during certain songs, by a select few.  It’s the workplace equivalent to climbing the corporate ladder.

Most digital workers think they belong to a Jazz ensemble.  

It’s a mistake to believe less organized means less skilled.  Workers in the Jazz model are also talented and trained professionals.  If you look at the careers of Thelonius Monk, Duke Ellington and Dizzy Gillespie, they all had formal training, many started as children.  The same is true for digital workers: many started build job skills as children and then honed their teamwork playing video games.

But can a loosely organized group consistently deliver results? Yes. In fact, they deliver better results!

When a Jazz Improv group plays, they have a rough composition to start with. Each member is given time for a solo.  To the uninitiated there appears to be no leaders in this milieu of talent, but the leader is there.  They just refuse to control the performance; instead, they trust that each member will bring their A Game and perform at 100% of their capacity.

In business, this is scary. Don’t we need someone to check each person’s work? People are just messing around right? I mean, is this actual work? Who is in charge?

In businesss environments that operate more like Jazz, studies have proven that there is a 32% increase in productivity from traditional command and control environments driven by hierarchy.

Age, experience and position are NOT the criteria for the Digital Worker. Output is.  And output is different for each product. Management’s role in this model is to get out of the way and let the musicians create. Instead of conforming to a single style and method, the people producing in the model each bring something unique and also experience a high degree of ownership.

This is a powerful type of workplace diversity: by allowing different ways of problem solving to co-exist, we also make the workplace more inclusive and collaborative.

Sound too good to be true?  In our next post we’ll discuss trust as the critical ingredient for Jazz performance.  (Teaser)

Showing to how others explain Ready State & OpenCrowbar

I’m working on a series for DevOps.com to explain Functional Ops (expect it to start early next week!) and it’s very hard to convey it’s east-west API nature.  So I’m always excited to see how other people explain how OpenCrowbar does ops and ready state.

Ready State PictureThis week I was blown away by the drawing that I’ve recreated for this blog post.  It’s very clear graphic showing the operational complexity of heterogeneous infrastructure AND how OpenCrowbar normalizes it into a ready state.

It’s critical to realize that the height of each component tower varies by vendor and also by location with in the data center topology.  Ready state is not just about normalizing different vendors gear; it’s really about dealing with the complexity that’s inherent in building a functional data center.  It’s “little” things liking knowing how to to enumerate the networking interfaces and uplinks to build the correct teams.

If you think this graphic helps, please let me know.