The 451 Group Cloudscape report strikes chord misses harmony (DevOps, Hybrid Cloud, Orchestration) November 2, 2011
Posted by Rob H in Architecture, Clouds, Dave McCrory, DevOps.Tags: 451, applications, Architecture, cloud, DevOps, hybrid, networking, orchestration, ProTier, Storage
3 comments
It’s impossible to resist posting about this month’s 451 Group Cloudscape report when it calls me out by name as a leading cloud innovator:
… ProTier founders Dave McCrory and Rob Hirschfeld. ProTier [note: now part of Quest] was, indeed, the first VMware ecosystem vendor to be tracked by The 451 Group. In the face of a skeptical world, these entrepreneurs argued that virtualization needed automation in order to realize its full potential, and that the test lab was the low-hanging fruit. Subsequent events have more than vindicated their view (pg. 33).
It’s even better when the report is worth reading and offers insights into forces shaping the industry. It’s nice to be “more than vindicated” on an amazing journey we started over 10 years ago!
Rather than recite 451′s points (hybrid cloud = automation + orchestration + devops + pixie dust), I’d rather look at the problem different way as a counterpoint.
The problem is “how do we deal with applications that are scattered over multiple data centers?”
I do not think orchestration is the complete answer. Current orchestration is too focused on moving around virtual machines (aka workloads).
Ultimately, the solution lies in application architecture; however, I feel that is also a misdirection because cloud is redefining what an “application architecture” means.
Applications are a dynamic mix of compute, storage, and connectivity.
We’re entering an age when all of these ingredients will be delivered as elastic services that will be managed by the applications themselves. The concept of self management is an extension of DevOps principles that fuse application function and deployment. There are missing pieces, but I’m seeing the innovation moving to fill those gaps.
If you want to see the future of cloud applications then look at the network and storage services that are emerging. They will tell you more about the future than orchestration.
Virtualizing #OpenStack Nova: looking at the many ways to skin the CAcTus (#KVM v #XenServer v #ESX) May 27, 2011
Posted by Rob H in Uncategorized.Tags: cloud, esx, kvm, Nova, OpenStack, vcenter, virtualization, VMware, xen
3 comments
<service bulletin> Server virtualization is not cloud: it is a commonly used technology that creates convenient resource partitions for cloud operations and infrastructure as a service providers. </service bulletin>
OpenStack claims support for nearly every virtualization platform on the market. While the basics of “what is virtualization” are common across all platforms, there are important variances in how these platforms are deployed. It is important to understand these variances to make informed choices about virtualization platforms.
Your virtualization model choice will have deep implications on your server/networking choice, deployment methodology and operations infrastructure.
My focus is on architecture not specific hypervisors so I’m generalizing to just three to make the each architecture description more concrete:
- KVM (open source) is highly used by developers and single host systems
- XenServer (open/freemium) leads public cloud infrastructure (Amazon EC2, Rackspace Cloud, and GoGrid)
- ESX/vCenter (licensed) leads enterprise virtualized infrastructure
Of course, there are many more hypervisors and many different ways to deploy the three I’m referencing.
This picture shows all three options as a single system. In practice, only operators wishing to avoid exposure to RESTful recreational activities would implement multiple virtualization architectures in a single system. Let’s explore the three options:
OS + Hypervisor (KVM) architecture deploys the hypervisor a free standing application on top of an operating system (OS). In this model, the service provider manages the OS and the hypervisor independently. This means that the OS needs to be maintained, but is also allows the OS to be enhanced to better manage the cloud or add other functions (share storage). Because they are least restricted, free standing hypervisors lead the virtualization innovation wave.
Bare Metal Hypervisor (XenServer) architecture integrates the hypervisor and the OS as a single unit. In this model, the service provider manages the hypervisor as a single unit. This makes it easier to support and maintain the hypervisor because the platform can be tightly controlled; however, it limits the operator’s ability to extend or multi-purpose the server. In this model, operators may add agents directly to the individual hypervisor but would not make changes to the underlying OS or resource allocation.
Clustered Hypervisor (ESX + vCenter) architecture integrates multiple servers into a single hypervisor pool. In this model, the service provider does not manage the individual hypervisor; instead, they operate the environment through the cluster supervisor. This makes it easier to perform resource balancing and fault tolerance within the domain of the cluster; however, the operator must rely on the supervisor because directly managing the system creates a multi-master problem. Lack of direct management improves supportability at the cost of flexibility. Scale is also a challenge for clustered hypervisors because their span of control is limited to practical resource boundaries: this means that large clouds add complexity as they deal with multiple clusters.
Clearly, choosing a virtualization architecture is difficult with significant trade-offs that must be considered. It would be easy to get lost in the technical weeds except that the ultimate choice seems to be more stylistic.
Ultimately, the choice of virtualization approach comes down to your capability to manage and support cloud operations. The Hypervisor+OS approach maximum flexibility and minimum cost but requires an investment to build a level competence. Generally, this choice pervades an overall approach to embrace open cloud operations. Selecting more controlled models for virtualization reduces risk for operations and allows operators to leverage (at a price, of course) their vendor’s core competencies and mature software delivery timelines.
While all of these choices are seeing strong adoption in the general market, I have been looking at the OpenStack community in particular. In that community, the primary architectural choice is an agent per host instead of clusters. KVM is favored for development and is the hypervisor of NASA’s Nova implementation. XenServer has strong support from both Citrix and Rackspace.
Choice is good: know thyself.
BlackOps: 7 tenants for infrastructure & operations in hyperscale clouds. #CloudOps #Hyperscale April 18, 2011
Posted by Rob H in Agile, Amazon, CloudOps, Clouds, Lean, OpenStack.Tags: amazon, BlackOps, cloud, CloudOps, hyperscale, OpenStack, rightscale, Taxonomy
7 comments
In my work queue at Dell, the request for a “cloud taxonomy” keeps turning up on my priority list just behind world dominance peace. Basically, a cloud taxonomy is layer cake picture that shows all the possible cloud components stacked together like gears in an antique Swiss watch. Unfortunately, our clock-like layer cake has evolved to into a collaboration between the Swedish Chef and Rube Goldberg as we try to accommodate more and more technologies into the mix.
The key to de-spaghettifying our cloud taxomony was to realize that clouds have two distinct sides: an external well-known API and internal “black box” operations. Each side has different objectives that together create an elastic, scalable cloud.
The objective of the API side is to provide the smallest usable “surface area” for cloud consumers. Surface area describes the scope of the interface that is published to the users. The smaller the area, the easier it is for users to comprehend and harder it is for them to break. Amazon’s EC2 & S3 APIs set the standards for small surface area design and spawned a huge cloud ecosystem.
To understand the cloud taxonomy, it is essential to digest the impact of the cloud ecosystem. The cloud ecosystem exists primarily beyond the API of the cloud. It provides users with flexible options that address their specific use cases. Since the ecosystem provides the user experience on top of the APIs (e.g.: RightScale), it frees the cloud provider to focus on services and economies of scale inside the black box.
The objective of the internal side of clouds is to create a prefect black box to give API users the illusion of a perfectly performing, strictly partitioned and totally elastic resource pool. To the consumer, it does should not matter how ugly, inefficient, or inelegant the cloud operations really are; except, of course, that it does matter a great deal to the cloud operator.
Cloud operation cannot succeed at scale without mastering the discipline of operating the black box cloud (BlackOps).
The BlackOps challenge is that clouds cannot wait until all of the answers are known because issues (or solutions) to scale architecture are difficult to predict in advance. Even worse, taking the time to solve them in advance likely means that you will miss the market.
Since new technologies and approaches are constantly emerging, there is no “design pattern” for hyperscale. To cope with constant changes, BlackOps live by seven tenants that help manage their infrastructure efficiently in a dynamic environment.
- Operational ownership – don’t wait for all the king’s horses and consultants to put your back together again (but asking for help is OK).
- Simple APIs – reduce the ways that consumers can stress the system making the scale challenges more predictable.
- Efficiency based financial incentives – customers will dramatically modify their consumption if you offer rewards that better match your black box’s capabilities.
- Automated processes & verification – ensures that changes and fixes can propagate at scale while errors are self-correcting.
- Frequent incremental rolling adjustments – prevents the great from being the enemy of the good so that systems are constantly improving (learn more about “split testing”)
- Passion for operational simplicity – at hyperscale, technical debt compounds very quickly. Debt translates into increased risk and reduced agility and can infect hardware, software, and process aspects of operations.
- Hunger for feedback & root-cause knowledge – if you’re building the airplane in flight, it’s worth taking extra time to check your work. You must catch problems early before they infect your scale infrastructure. The only thing more frustrating than fixing a problem at scale, if fixing the same problem multiple times.
It’s no surprise that these are exactly the Agile & Lean principles. The pace of change of cloud is so fast and fluid, that BlackOps must use an operational model that embraces iterative and rolling deployment.
Compared to highly orchestrated traditional IT operations, this approach seems like sending a team of ninjas to battle on quicksand with objectives delivered in a fortune cookie.
I am not advocating fuzzy mysticism or by-the-seat-of-your-pants do-or-die strategies. BlackOps is a highly disciplined process based on well understood principles from just-in-time (JIT) and lean manufacturing. Best of all, they are fast to market, able to deliver high quality and capable of responding to change.
Post Script / Plug: My understanding of BlackOps is based on the operational model that Dell has introduced around our OpenStack Crowbar project. I’m going to be presenting more about this specific topic at the OpenStack Design Conference next week.
Substituting Action for Knowledge – adopting “ready, fire, aim” as a strategy (and when to run like hell) March 28, 2011
Posted by Rob H in Agile, DevOps, Lean, Open Trends.Tags: Agile, antidepressants, cloud, DevOps, Lean, operational model, operations, roadmap
1 comment so far
Today my mother-in-law (a practicing psychiatrist) was bemoaning the current medical practice of substituting action for knowledge. In her world, many doctors will make rapid changes to their patients’ therapy. Their goal is to address the issues immediately presented (patient feels sad so Dr prescribes antidepressants) rather than taking time to understand the patients’ history or make changes incrementally and measure impacts. It feels like another example of our cultural compulsion to fix problems as quickly as possible.
Her comments made me question the core way that I evangelize!
Do Lean and Agile substitute action for knowledge? No. We use action to acquire knowledge.
The fundamental assumption that drives poor decision-making is that we have enough information to make a design, solve a problem or define a market. Lean and Agile’s more core tenet is that we must attack this assumption. We must assume that we can’t gather enough information to fully define our objective. The good news, is that even without much analysis we know a lot! We know:
- roughly what we want to do (road map)
- the first steps we should take (tactics)
- who will be working on the problem (team members)
- generally how much effort it will take (time & team size)
- who has the problem that we are trying to solve (market)
We also know that we’ll learn a lot more as we get closer to our target. Every delay in starting effectively pushed our “day of clarity” further into the future. For that reason, it is essential that we build a process that constantly reviews and adjusts its targets.
We need to build a process that acquires knowledge as progress is made and makes rapid progress.
In Agile, we translate this need into the decorations of our process: reviews for learning, retrospectives for adjustments, planning for taking action and short iterations to drive the feedback loop. Agile’s mantra is “ready, fire, aim, fire, aim, fire, aim, …” which is very different from simply jumping out of a plane without a parachute and hoping you’ll find a haystack to land in.
For cloud deployments, this means building operational knowledge in stages. Technology is simply evolving too quickly and best practices too slowly for anyone to wait for a packaged solution to solve all their cloud infrastructure problems. We tried this and it does not work: clouds are a mixture hardware, software and operations. More accurately, clouds are an operational model supported by hardware and software.
Currently, 80% of cloud deployment effort is operations (or “DevOps“).
When I listen to people’s plans about building product or deploying cloud, I get very skeptical when they take a lot of time to aim at objects far off on the horizon. Perhaps they are worried that they will substitute action for knowledge; however, I think they would be better served to test their knowledge with a little action.
My MIL agrees – she sees her patients frequently and makes small adjustments to their treatment as needed. Wow, that’s an Rx for Agile!
Notes from 2011 Cloud Connect Event Day 2 (#ccevent) March 10, 2011
Posted by Rob H in CAP Theorem, Cloud Connect, Clouds, Economics.Tags: cloud, CloudConnect, NoSQL, private cloud, public cloud, Reddit
add a comment
With the OpenStack launch behind me, I have some time to attend the Cloud Connect Event. I missed all the DevOps sessions, but was getting to geek out on the NoSQL & Big Data sessions. I jumped to the private cloud track (based on Twitter traffic) and was rewarded for the shift.
I’m surprised at how much focus this cloud conference is dedicated to private cloud. At other cloud conferences I’ve attended, the focus has been on learning how to use the cloud (specifically the public cloud). This is the first cloud show I’ve attended that has so much emphasis, dialog and vendor feeding around private. This was a suits & slacks show with few jeans, t-shirts, and pony tails. Perhaps private cloud is where the $$$ is being spent now?
It definitely feels like using cloud has become assumed, but the best practices and tools are just emerging.
The twitter #ccevent stream is interesting but temporal. I’m posting my raw (spelling optional) notes (below the more tag) because there is a lot of great content from the show to support and extend the twitter stream. I’ll try to italicize some of the better lines.
(more…)
“Flatness at the Edges” guides hyperscale cloud design January 29, 2011
Posted by Rob H in Architecture, Clouds, Events.Tags: cloud, edge computing, flat network, hyperscale, meetup, OpenStack, VLAN
8 comments
As I’m working on a larger “cloud bootstrapping” white paper (look for a pending Dell release), I stumbled on an apparent unifying principle for hyperscale cloud design. I’m interested in feedback about this concept to see if it fairly encapsulates a common target for cloud hardware, networking and software design.
“Flatness at the Edges” is one of the guiding principles of hyperscale cloud designs.
Flatness means that cloud infrastructure avoids creating tiers where possible. For example, having a blade in a frame aggregating networking that is connected to a SAN via a VLAN is a tiered design in which the components are vertically coupled. A single node with local disk connected directly to the switch has all the same components but in a single “flat” layer.
Edges are the bottom tier (or “leaves” to us CS geeks) of the cloud. Being flat creates a lot of edges because most of the components are self contained. To scale and reduce complexity, clouds must rely on the edges to make independent decisions such as how to route network traffic, where to replicate data, or when to throttle VMs. The anti-example of edge design is using VLANs to segment tenants because VLANs (a limited resource) require configuration at the switching tier to manage traffic generated by an edge component. We are effectively distributing an intelligence overhead tax on each component of the cloud rather than relying on a “centralized overcloud” to rule them all.
Combining flatness and edges evolves the sympathetic concepts into full-fledged cloud design principle.
Interested in discussing this face to face? I’ll presenting this and other cloud setup concepts that the SJC OpenStack meetup on 2/3.







