Step Size (shown as X axis): do we make upgrades in small frequent steps or queue up changes into larger bundles? Larger steps mean that there are more changes to be accommodated simultaneously.
Change Leader (shown as Y axis): do we upgrade the server or the client first? Regardless of the choice, the followers should be able to handle multiple protocol versions if we are going to have any hope of a reasonable upgrade.
Safeness (shown as Z axis): do the changes preserve the data and productivity of the entity being upgraded? It is simpler to assume to we simply add new components and remove old components; this approach carries significant risks or redundancy requirements.
I’m strongly biased towards continuous deployment because I think it reduces risk and increases agility; however, I laying out all the vertices of the upgrade cube help to visualize where the costs and risks are being added into the traditional upgrade models.
Breaking down each vertex:
Continuous Deploy – core infrastructure is updated on a regular (usually daily or faster) basis
Protocol Driven – like changing to HTML5, the clients are tolerant to multiple protocols and changes take a long time to roll out
Staged Upgrade – tightly coordinate migration between major versions over a short period of time in which all of the components in the system step from one version to the next together.
Rolling Upgrade – system operates a small band of versions simultaneously where the components with the oldest versions are in process of being removed and their capacity replaced with new nodes using the latest versions.
Parallel Operation – two server systems operate and clients choose when to migrate to the latest version.
Protocol Stepping – rollout of clients that support multiple versions and then upgrade the server infrastructure only after all clients have achieved can support both versions.
Forced Client Migration – change the server infrastructure and then force the clients to upgrade before they can reconnect.
Big Bang – you have to shut down all components of the system to upgrade it
This type of visualization helps me identify costs and options. It’s not likely to get much time in the final presentation so I’m hoping to hear in advance if it resonates with others.
PS: like this visualization? check out my “magic 8 cube” for cloud hosting options.
When Greg Althaus and I first proposed the project that would become Dell’s Crowbar, we had already learned first-hand that there was a significant gap in both the technologies and the processes for scale operations. Our team at Dell saw that the successful cloud data centers were treating their deployments as integrated systems (now called DevOps) in which configuration of many components where coordinated and orchestrated; however, these approaches feel short of the mark in our opinion. We wanted to create a truly integrated operational environment from the bare metal through the networking up to the applications and out to the operations tooling.
Our ultimate technical nirvana is to achieve closed-loop continuous deployments. We want to see applications that constantly optimize new code, deployment changes, quality, revenue and cost of operations. We could find parts but not a complete adequate foundation for this vision.
The business driver for Crowbar is system thinking around improved time to value and flexibility. While our technical vision is a long-term objective, we see very real short-term ROI. It does not matter if you are writing your own software or deploying applications; the faster you can move that code into production the sooner you get value from innovation. It is clear to us that the most successful technology companies have reorganized around speed to market and adapting to pace of change.
System flexibility & acceleration were key values when lean manufacturing revolution gave Dell a competitive advantage and it has proven even more critical in today’s dynamic technology innovation climate.
We hope that this post helps define a vision for Crowbar beyond the upcoming refactoring. We started the project with the idea that new tools meant we could take operations to a new level.
While that’s a great objective, we’re too pragmatic in delivery to rest on a broad objective. Let’s take a look at Crowbar’s concrete strengths and growth areas.
Key strength areas for Crowbar
Late binding – hardware and network configuration is held until software configuration is known. This is a huge system concept.
Dynamic and Integrated Networking – means that we treat networking as a 1st class citizen for ops (sort of like software defined networking but integrated into the application)
System Perspective – no Application is an island. You can’t optimize just the deployment, you need to consider hardware, software, networking and operations all together.
Bootstrapping (bare metal) – while not “rocket science” it takes a lot of careful effort to get this right in a way that is meaningful in a continuous operations environment.
Open Source / Open Development / Modular Design – this problem is simply too complex to solve alone. We need to get a much broader net of environments and thinking involved.
Continuing Areas of Leadership
Open / Lean / Incremental Architecture – these are core aspects of our approach. While we have a vision, we also are very open to ways that solve problems faster and more elegantly than we’d expected.
Continuous deployment – we think the release cycles are getting faster and the only way to survive is the build change into the foundation of operations.
Integrated networking – software defined networking is cool, but not enough. We need to have semantics that link applications, networks and infrastructure together.
Equilivent physical / virtual – we’re not saying that you won’t care if it’s physical or virtual (you should), we think that it should not impact your operations.
Scale / Hybrid - the key element to hybrid is scale and to hybrid is scale. The missing connection is being able to close the loop.
Closed loop deployment – seeking load management, code quality, profit, and cost of operations as factor in managed operations.