I’ve been hearing the Hindu phrase “turtles all the way down” very often lately to describe the practice of using products to try and install themselves (my original posting attributed this to Dr Seuss) . This seems especially true of the container platforms that use containers to install containers that manage the containers. Yes, really – I don’t make this stuff up.
While I’m a HUGE fan of containers (RackNuses them like crazy withDigital Rebar), they do not magically solve operational issues like security, upgrade or networking. In fact, they actually complicate operational concerns by creating additional segmentation.
Solving these issues requires building a robust, repeatable, and automated underlay. That is a fundamentally different problem than managing containers or virtual machines. Asking container or VM abstraction APIs to do underlay work breaks the purpose of the abstraction which is to hide complexity.
The lure of a universal abstraction, the proverbial “single pane of glass,” is the ultimate siren song that breeds turtle recursion.
Another form of this pattern emerges from the square peg / round hole syndrome when we take a great tool and apply it to every job. For example, I was in a meeting when I heard “If you don’t think Kubernetes is greatest way to deploy software then go away [because we’re using it to install Kubernetes].” It may be the greatest way to deploy software for applications that fit its model, but it’s certainly not the only way.
What’s the solution? We should accept that there are multiple right ways to manage platforms depending on the level of abstraction that we want to expose.
Using an abstraction in the wrong place, hides information that we need to make good decisions. That makes it harder to automate, monitor and manage. It’s always faster, easier and safer when you’ve got the right tool for the job.
... "docker exec configure file" is a sad but common pattern ...
Interesting discussions happen when you hang out with straight-talking Paul Czarkowski. There’s a long chain of circumstance that lead us from an Interop panel together at Barcelona (video) to bemoaning Ansible and Docker integration early one Sunday morning outside a gate in IAD.
What started as a rant about czray ways people find of injecting configuration into containers (we seemed to think file mounting configs was “least horrific”) turned into an discussion about how to retro-fit application registry features (like consul or etcd) into legacy applications.
Ansible Inventory is basically a static registry service.
While we both acknowledge that Ansible inventory is distinctly not a registry service, the idea is a useful way to help explain the interaction between registry and configuration. The most basic goal of a registry (there are others!) is to have system components be able to find and integrate with other system components. In that sense, the inventory creates allows operators to pre-wire this information in advance in a functional way.
The utility quickly falls apart because it’s difficult to create re-runable Ansible (people can barely pronounce idempotent as it is) that could handle incremental updates. Also, a registry provides many other important functions like service health and basic cross node storage that are import.
It may not be perfect, but I thought it was @pczarkowski insight worth passing on. What do you think?
I’ve been on the outskirts of the CloudFoundry (CF) universe from the dawn of the project (it’s a little remembered fact that there was a 2011 Crowbar install of CloudFoundry.
Progress and investment have been substantial and, happily, organic. Like many platforms, it’s success relies on a reasonable balance between strong opinions about “right” patterns and enough flexibility to accommodate exceptions.
From a well patterned foundation, development teams find acceleration. This seems to be helping CloudFoundry win some high-profile enterprise adopters.
The interesting challenge ahead of the project comes from building more complex autonomous deployments. With the challenge of horizontal scale of arguably behind them, CF users are starting to build more complex architectures. This includes dynamic provisioning of the providers (like data bases, object stores and other persistent adjacent services) and connecting to containerized “micro-services.” (see Matt Stine’s preso)
While this is a natural evolution, it adds an order of magnitude more complexity because the contracts between previously isolated layers are suddenly not reliable.
For example, what happens to a CF deployment when the database provider is field upgraded to a new version. That could introduce breaking changes in dependent applications that are completely opaque to the data provider. These are hard problems to solve.
Happily, that’s exactly the discussions that we’re starting to have with container orchestration systems. It’s also part of the dialog that I’ve been trying to drive with Functional Operations (FuncOps Preso) on the physical automation side. I’m optimistic that CloudFoundry patterns will help make this problem more tractable.
A little while back, Art Fewell and I had two excellent discussions about general trends and challenges in the cloud and scale data center space. Due to technical difficulties, the first (funnier one) was lost forever to NSA archives, but the second survived!
The video and transcript were just posted to Network World as part of Art’s on going interview series. It was an action packed hour so I don’t want to re-post the transcript here. I thought selected quotes (under the video) were worth calling out to whet your appetite for the whole tamale.
My highlights:
.. partnering with a start-up was really hard, but partnering with an open source project actually gave us a lot more influence and control.
Then we got into OpenStack, … we [Dell] wanted to invest our time and that we could be part of and would be sustained and transparent to the community.
Incumbents are starting to be threatened by these new opened technologies … that I think levels of playing field is having an open platform.
…I was pointing at you and laughing… [you’ll have to see the video]
docker and containerization … potentially is disruptive to OpenStack and how OpenStack is operating
You have to turn the crank faster and faster and faster to keep up.
Small things I love about OpenStack … vendors are learning how to work in these open communities. When they don’t do it right they’re told very strongly that they don’t.
It was literally a Power Point of everything that was wrong … [I said,] “Yes, that’s true. You want to help?”
…people aiming missiles at your house right now…
With containers you can sell that same piece of hardware 10 times or more and really pack in the workloads and so you get better performance and over subscription and so the utilization of the infrastructure goes way up.
I’m not as much of a believer in that OpenStack eats the data center phenomena.
First thing is automate. I’ve talked to people a lot about getting ready for OpenStack and what they should do. The bottom line is before you even invest in these technologies, automating your workloads and deployments is a huge component for being successful with that.
Now, all of sudden the SDN layer is connecting these network function virtualization .. It’s a big mess. It’s really hard, it’s really complex.
The thing that I’m really excited about is the service architecture. We’re in the middle of doing on the RackN and Crowbar side, we’re in the middle of doing an architecture that’s basically turning data center operations into services.
What platform as a service really is about, it’s about how you store the information. What services do you offer around the elastic part? Elastic is time based, it’s where you’re manipulating in the data.
RE RackN: You can’t manufacture infrastructure but you can use it in a much “cloudier way”. It really redefines what you can do in a datacenter.
That abstraction layer means that people can work together and actually share scripts
I definitely think that OpenStack’s legacy will more likely be the community and the governance and what we’ve learned from that than probably the code.
If you are designing an application that uses microservice registration AND continuous integration then this post is for you! If not, get with the program, you are a fossil.
Sunday night, I posted about the Erlang Consul client I wrote for our Behavior Driven Development (BDD) testing infrastructure. That exposed a need to run a Consul service in the OpenCrowbar Travis-CI build automation that validates all of our pull requests. Basically, Travis spins up the full OpenCrowbar API and workers (we call it the annealer) which in turn registers services in Consul.
NOTE: This is pseudo instructions. In the actual code (here too), I created a script to install consul but this is more illustrative of the changes you need to make in your .travis.yml file.
In the first snippet, we download and unzip consul. It’s in GO so that’s about all we need for an install. I added a version check for logging validation.
After that, the BDD infrastructure can register the fake services that we expect (I created an erlang consul:reg_serv(“name”) routine that makes this super easy). Once the services are registered, OpenCrowbar will check for the services and continue without trying to instantiate them (which it cannot do in Travis).
We’ve made great strides in ops automation, but there’s no one-size-fits-all approach to ops because abstractions have limitations.
Perhaps it’s my Industrial Engineering background, I’m a huge fan of operational automation and tooling. I can remember my first experience with VMware ESX and thinking that it needed tooling automation. Since then, I’ve watched as cloud ops has revolutionized application development and deployment. We are just at the beginning of the cloud automation curve and our continuous deployment tooling and platform services deliver exponential increases in value.
These cloud breakthroughs are fundamental to Ops and uncovered real best practices for operators. Unfortunately, much of the cloud specific scripts and tools do not translate well to physical ops. How can we correct that?
Now that I focus on physical ops, I’m in awe of the capabilities being unleashed by cloud ops. Looking at Netflix chaos monkeys pattern alone, we’ve reached a point where it’s practical to inject artificial failures to improve application robustness. The idea of breaking things on purpose as an optimization is both terrifying and exhilarating.
In the last few years, I’ve watched (and lead) an application of these cloud tool chains down to physical infrastructure. Fundamentally, there’s a great fit between DevOps configuration management (Chef, Puppet, Salt, Ansible) tooling and physical ops. Most of the configuration and installation work (post-ready state) is fundamentally the same regardless if the services are physical, virtual or containerized. Installing PostgreSQL is pretty much the same for any platform.
But pretty much the same is not exactly the same. The differences between platforms often prevent us from translating useful work between frames. In physics, we’d call that an impedance mismatch: where similar devices cannot work together dues to minor variations.
An example of this Ops impedance mismatch is networking. Virtual systems present interfaces and networks that are specific to the desired workload while physical systems present all the available physical interfaces plus additional system interfaces like VLANs, bridges and teams. On a typical server, there at least 10 available interfaces and you don’t get to choose which ones are connected – you have to discover the topology. To complicate matters, the interface list will vary depending on both the server model and the site requirements.
It’s trivial in virtual by comparison, you get only the NICs you need and they are ordered consistently based on your network requests. While the basic script is the same, it’s essential that it identify the correct interface. That’s simple in cloud scripting and highly variable for physical!
Another example is drive configuration. Hardware presents limitless options of RAID, JBOD plus SSD vs HDD. These differences have dramatic performance and density implications that are, by design, completely obfuscated in cloud resources.
The solution is to create functional abstractions between the application configuration and the networking configuration. The abstraction isolates configuration differences between the scripts. So the application setup can be reused even if the networking is radically different.
With some of our OpenCrowbar latest work, we’re finally able to create practical abstractions for physical ops that’s repeatable site to site. For example, we have patterns that allow us to functionally separate the network from the application layer. Using that separation, we can build network interfaces in one layer and allow the next to assume the networking is correct as if it was a virtual machine. That’s a very important advance because it allows us to finally share and reuse operational scripts.
We’ll never fully eliminate the physical vs cloud impedance issue, but I think we can make the gaps increasingly small if we continue to 1) isolate automation layers with clear APIs and 2) tune operational abstractions so they can be reused.
Digital Natives have been trained to learn the rules of the game by just leaping in and trying. They seek out mentors, learn the politics at each level, and fail as many times as possible in order to learn how NOT to do something. Think about it this way: You gain more experience when you try and fail quickly then carefully planning every step of your journey. As long as you are willing to make adjustments to your plans, experience always trumps prediction.
Just like in life and business, games no longer come with an instruction manual.
In Wii Sports, users learn the basic in-game and figure out the subtlety of the game as they level up. Tom Bissel, in Extra Lives: Why Video Games Matter, explains that the in-game learning model is core to the evolution of video games. Game design involves interactive learning through the game experience; consequently, we’ve trained Digital Natives that success comes from overcoming failure.
Early failure is the expected process for mastery.
You don’t believe that games lead to better decision making in real life? In a January 2010 article, WIREDmagazine reported that observations of the new generation of football players showed they had adapted tactics learned in Madden NFL to the field. It is not just the number of virtual downs played; these players have gained a strategic field-level perspective on the game that was before limited only to coaches. Their experience playing video games has shattered the on-field hierarchy.
For your amusement…Here is a video about L33T versus N00B culture From College Humor “L33Ts don’t date N00Bs.” Youtu.be/JVfVqfIN8_c
Digital Natives embrace iterations and risk as a normal part of the life.
Risk is also a trait we see in entrepreneurial startups. Changing the way we did things before requires you to push the boundaries, try something new, and consistently discard what doesn’t work. In Lean Startup Lessons Learned, Eric Ries built his entire business model around the try-learn-adjust process. He’s shown that iterations don’t just work, they consistently out innovate the competition.
The entire reason Dell grew from a dorm to a multinational company is due to this type of fast-paced, customer-driven interactive learning. You are either creating something revolutionary or you will be quickly phased out of the Information Age. No one stays at the top just because he or she is cash rich anymore. Today’s Information Age company needs to be willing to reinvent itself consistently … and systematically.
Why do you think larger corporations that embrace entrepreneurship within their walls seem to survive through the worst of times and prosper like crazy during the good times?
Gamer have learned that Risk that has purpose will earn you rewards.
Before we start, we already know that some of you are cynical about what we are suggesting—Video games? Are you serious? But we’re not talking about Ms. Pac-Man. We are talking about deeply complex, rich storytelling, and task-driven games that rely on multiple missions, worldwide player communities, working together on a singular mission.
Leaders in the Cloud Generation not just know this environment, they excel in it.
The next generation of technology decision makers is made up of self-selected masters of the games. They enjoy the flow of learning and solving problems; however, they don’t expect to solve them alone or a single way. Today’s games are not about getting blocks to fall into lines; they are complex and nuanced. Winning is not about reflexes and reaction times; winning is about being adaptive and resourceful.
In these environments, it can look like chaos. Digital workspaces and processes are not random; they are leveraging new-generation skills. In the book Different, Youngme Moon explains how innovations looks crazy when they are first revealed. How is the work getting done? What is the goal here? These are called “results only work environments,” and studies have shown they increase productivity significantly.
Digital Natives reject top-down hierarchy.
These college educated self-starters are not rebels; they just understand that success is about process and dealing with complexity. They don’t need someone to spoon feed them instructions.
Studies at MIT and The London School of Economics have revealed that when high-end results are needed, giving people self-direction, the ability to master complex tasks, and the ability to serve a larger mission outside of themselves will garnish groundbreaking results.
Gaming does not create mind-addled Mountain Dew-addicted unhygienic drone workers. Digital Natives raised on video games are smart, computer savvy, educated, and, believe it or not, resourceful independent thinkers.
Thomas Edison said:
“I didn’t fail 3,000 times. I found 3,000 ways how not to create a light bulb.”
Being comfortable with making mistakes thousands of times ’til mastery sounds counter-intuitive until you realize that is how some of the greatest breakthroughs in science and physics were discovered. Thomas Edison made 3,000 failed iterations in creating the light bulb.
Level up: You win the game by failing successfully.
Translation: Learn by playing, fail fast, and embrace risk.
Digital Natives have been trained to learn the rules of the game by just leaping in and trying. They seek out mentors, learn the politics at each level, and fail as many times as possible in order to learn how NOT to do something. Think about it this way: You gain more experience when you try and fail quickly then carefully planning every step of your journey. As long as you are willing to make adjustments to your plans, experience always trumps prediction.Just like in life and business, games no longer come with an instruction manual.
In Wii Sports, users learn the basic in-game and figure out the subtlety of the game as they level up. Tom Bissel, in Extra Lives: Why Video Games Matter, explains that the in-game learning model is core to the evolution of video games. Game design involves interactive learning through the game experience; consequently, we’ve trained Digital Natives that success comes from overcoming failure.
Unlike other generations, Digital Natives believe that expertise comes directly from doing, not from position or education. This is not hubris; it’s a reflection both their computer experience and dramatic improvements in technology usability.
If you follow Joel Spolsky’s blog, “Joel on Software,” you know about a term he uses when describing information architects obsessed with the abstract and not the details; Architecture Astronauts—so high up above the problem that they might as well be in space. “They’re astronauts because they are above the oxygen level, I don’t know how they’re breathing.”
For example, a Digital Native is much better positioned to fly a military attack drone than a Digital Immigrant. According to New Scientist, March 27, 2008, the military is using game controllers for drones and robots because they are “far more intuitive.” Beyond the fact that the interfaces are intuitive to them, Digital Natives have likely logged hundreds of hours flying simulated jets under trying battle conditions. Finally, they rightly expect that they can access all the operational parameters and technical notes about the plane with a Google search.
Our new workforce is ready to perform like none other in history.
Being able to perform is just the tip of the iceberg; having the right information is the more critical asset. A Digital Native knows information (and technology) is very fast moving and fluid. It also comes from all directions … after all it’s The Information Age. This is a radical paradigm shift. Harvard Researcher David Weinberger highlights in his book Too Big to Know that people are not looking up difficult technical problems in a book or even relying on their own experiences; they query their social networks and discover multiple valid solutions. The diversity of their sources is important to them, and an established hierarchy limits their visibility; inversely, they see leaders who build strict organizational hierarchies as cutting off their access to information and diversity.
Today’s thought worker is on the front lines of the technological revolution. They see all the newness, data, and interaction with a peer-to-peer network. Remember all that code on the screen in the movie The Matrix? You get the picture.
To a Digital Native, the vice presidents of most organizations are business astronauts floating too high above the world to see what’s really going on but feeling like they have perfect clarity. Who really knows the truth? Mission Control or Major Tom? This is especially true with the acceleration of business that we are experiencing. While the Astronaut in Chief is busy ordering the VPs to move the mountains out of the way, the engineers at ground control have already collaborated on a solution to leverage an existing coal mine and sell coal as a byproduct.
The business hierarchy of yesterday worked for a specific reason: workers needed to just follow rules, keep their mouth shut, and obey. Input, no matter how small, was seen as intrusive and insubordinate … and could get one fired. Henry Ford wanted an obedient worker to mass manufacture goods. The digital age requires a smarter worker because, in today’s world, we make very sophisticated stuff that does not conform to simple rules. Responsibility, troubleshooting, and decision-making has moved to the frontlines. This requires open-source style communication.
Do not confuse the Astronaut problem as a lack of respect for authority.
Digital Natives respect informational authority, not positional. For Digital Natives, authority is flexible. They have experience forming and dissolving teams to accomplish a mission. The mission leader is the one with the right knowledge and skills for the situation, not the most senior or highest scoring. In Liquid Leadership, Brad explains that Digital Natives are not expecting managers to solve team problems; they are looking to their leadership to help build, manage, and empower their teams to do it themselves.
So why not encourage more collaboration with a singular mission in mind: develop a better end product? In a world that is expanding at such mercurial speed, a great idea can come from anywhere! Even from a customer! So why not remember to include customers in the process?
Who is Leroy Jenkins?
This viral video is about a spectacular team failure from one individual (Leroy Jenkins) who goes rogue during a team massively multi-player game. This is a Digital Natives’ version of the ant and grasshopper parable: “Don’t pull a Leroy Jenkins on us—we need to plan this out.” Youtu.be/LkCNJRfSZBU
Think about it like this: Working as a team is like joining a quest.
If comparing work to a game scenario sounds counterintuitive then let’s reframe the situation. We may have the same destination and goals, but we are from very different backgrounds. Some of us speak different languages, have different needs and wants. Some went to MIT, some to community college. Some came through Internet startups, others through competitors. Big, little, educated, and smart. Intense and humble. Outgoing and introverted. Diversity of perspective creates stronger teams.
This also means that leadership roles rotate according to each mission.
This is the culture of the gaming universe. Missions and quests are equivalent to workplace tasks accomplished and point to benchmarks achieved. Each member excepts to earn a place through tasks and points. This is where Digital Natives’ experience becomes advantage. They expect to advance in experience and skills. When you adapt the workplace to these expectations the Digital Natives thrive.
Leaders need to come down to earth and remove the spacesuit.
A leader at the top needs to stay connected to that information and disruption. Start by removing your helmet. Breathe the same oxygen as the rest of us and give us solutions that can be used here on planet earth.
On Gamification
Jeff Attwood, founder of the community-based FAQ site Stack Overflow, has been very articulate about using game design to influence how he builds communities around sharing knowledge. We recommend reading his post about “Building Social Software for the Anti-Social” on his blog, CodingHorror.com.
I’ve been seeing great acceptance on the concept of ops Ready State. Technologists from both ops and dev immediately understand the need to “draw a line in the sand” between system prep and installation. We also admit that getting physical infrastructure to Ready State is largely taken for granted; however, it often takes multiple attempts to get it right and even small application changes can require a full system rebuild.
Since even small changes can redefine the ready state requirements, changing Ready State can feel like being told to tear down your house so you remodel the kitchen.
A friend asked me to explain “Ready State” in non-technical terms. So far, the best analogy that I’ve found is when a house is “Roughed In.” It’s helpful if you’ve ever been part of house construction but may not be universally accessible so I’ll explain.
Getting to Rough In means that all of the basic infrastructure of the house is in place but nothing is finished. The foundation is poured, the plumbing lines are placed, the electrical mains are ready, the roof on and the walls are up. The house is being built according to architectural plans and major decisions like how many rooms there are and the function of the rooms (bathroom, kitchen, great room, etc). For Ready State, that’s like having the servers racked and setup with Disk, BIOS, and network configured.
While we’ve built a lot, rough in is a relatively early milestone in construction. Even major items like type of roof, siding and windows can still be changed. Speaking of windows, this is like installing an operating system in Ready State. We want to consider this as a distinct milestone because there’s still room to make changes. Once the roof and exteriors are added, it becomes much more disruptive and expensive to make.
Once the house is roughed in, the finishing work begins. Almost nothing from roughed in will be visible to the people living in the house. Like a Ready State setup, the users interact with what gets laid on top of the infrastructure. For homes it’s the walls, counters, fixtures and following. For operators, its applications like Hadoop, OpenStack or CloudFoundry.
Taking this analogy back to where we started, what if we could make rebuilding an entire house take just a day?! In construction, that’s simply not practical; however, we’re getting to a place in Ops where automation makes it possible to reconstruct the infrastructure configuration much faster.
While we can’t re-pour the foundation (aka swap out physical gear) instantly, we should be able to build up from there to ready state in a much more repeatable way.