PaaS Simplified: an application architecture that responds to load

handoff

In addition to attending the great sessions at the OpenStack Design Conference, our Dell team realized that we’ve been making Platform as a Service (PaaS) much more complex.  Stripping away the detritus is important because it looks like “What is a PaaS” is changing on a daily basis so boiling it down to the must fundamental is essential.

At its core, a PaaS is an application that changes its architecture based on the load.   That’s it no further definition is required.

I’ve been playing with this definition since April and am finding that it’s a much more productive definition of PaaS than any that I’ve used so far.  The reason is that it’s

  1. application focused,
  2. not language or services bound and
  3. captures the business use cases

Of course, I’m going to have to provide more backup in future posts.  I want to invite discussion about this perspective on PaaS.  I’m especially interesting in seeing how recent offerings from VMware (OpenPaaS/CloudFoundry) or Amazon (Elastic Beanstalk) measure against this concept.

Virtualizing #OpenStack Nova: looking at the many ways to skin the CAcTus (#KVM v #XenServer v #ESX)

<service bulletin> Server virtualization is not cloud: it is a commonly used technology that creates convenient  resource partitions for cloud operations and infrastructure as a service providers. </service bulletin>

OpenStack claims support for nearly every virtualization platform on the market.  While the basics of “what is virtualization” are common across all platforms, there are important variances in how these platforms are deployed.   It is important to understand these variances to make informed choices about virtualization platforms. 

Your virtualization model choice will have deep implications on your server/networking choice, deployment methodology and operations infrastructure.

My focus is on architecture not specific hypervisors so I’m generalizing to just three to make the each architecture description more concrete:

  1. KVM (open source) is highly used by developers and single host systems
  2. XenServer (open/freemium) leads public cloud infrastructure (Amazon EC2, Rackspace Cloud, and GoGrid)
  3. ESX/vCenter (licensed) leads enterprise virtualized infrastructure

Of course, there are many more hypervisors and many different ways to deploy the three I’m referencing.

This picture shows all three options as a single system.  In practice, only operators wishing to avoid exposure to RESTful recreational activities would implement multiple virtualization architectures in a single system.   Let’s explore the three options:

OS + Hypervisor (KVM) architecture deploys the hypervisor a free standing application on top of an operating system (OS).  In this model, the service provider manages the OS and the hypervisor independently.  This means that the OS needs to be maintained, but is also allows the OS to be enhanced to better manage the cloud or add other functions (share storage).  Because they are least restricted, free standing hypervisors lead the virtualization innovation wave.

Bare Metal Hypervisor (XenServer) architecture integrates the hypervisor and the OS as a single unit.  In this model, the service provider manages the hypervisor as a single unit.  This makes it easier to support and maintain the hypervisor because the platform can be tightly controlled; however, it limits the operator’s ability to extend or multi-purpose the server.   In this model, operators may add agents directly to the individual hypervisor but would not make changes to the underlying OS or resource allocation.

Clustered Hypervisor (ESX + vCenter) architecture integrates multiple servers into a single hypervisor pool.  In this model, the service provider does not manage the individual hypervisor; instead, they operate the environment through the cluster supervisor.  This makes it easier to perform resource balancing and fault tolerance within the domain of the cluster; however, the operator must rely on the supervisor because directly managing the system creates a multi-master problem.  Lack of direct management improves supportability at the cost of flexibility.  Scale is also a challenge for clustered hypervisors because their span of control is limited to practical resource boundaries: this means that large clouds add complexity as they deal with multiple clusters.

Clearly, choosing a virtualization architecture is difficult with significant trade-offs that must be considered.  It would be easy to get lost in the technical weeds except that the ultimate choice seems to be more stylistic.

Ultimately, the choice of virtualization approach comes down to your capability to manage and support cloud operations.  The Hypervisor+OS approach maximum flexibility and minimum cost but requires an investment to build a level competence.  Generally, this choice pervades an overall approach to embrace open cloud operations.  Selecting more controlled models for virtualization reduces risk for operations and allows operators to leverage (at a price, of course) their vendor’s core competencies and mature software delivery timelines.

While all of these choices are seeing strong adoption in the general market, I have been looking at the OpenStack community in particular.  In that community, the primary architectural choice is an agent per host instead of clusters.  KVM is favored for development and is the hypervisor of NASA’s Nova implementation.  XenServer has strong support from both Citrix and Rackspace. 

Choice is good: know thyself.

McCrory lays out VMware vision

Props are due to Dave McCrory for his fine investigative work reading the VMware cloudy tea leaves.  Over the weekend, he posted a series of articles about VMware’s Open PaaS and VMforce offerings.  This is a significant write-up based on information gleaned from their public code check-ins that he validated with them after the fact.

I have not had time to digest it yet – check back later for actual commentary.

Alert the villagers, it’s Frankencloud!

I’m growing more and more concerned about the preponderance of Frankencloud offerings that I see being foisted into the market place (no, my employer, Dell, is not guiltless).  Frankenclouds are “cloud solutions” that are created by using duct tape, twine, wishful marketing brochures, and at least 4 marginally cloud enabled products.

The official Frankencloud recipe goes like this:

  • Take 1 product that includes server virtualization (substitutions to VMware at your own risk)
  • Take 1 product that does storage virtualization (substitutions to SAN at your own risk)
  • Take 1 product that does network virtualization (substitutions to VLANs at your own risk)
  • Take 1 product that does IT orchestration (your guess is as good as any)
  • Take 1 product that does IT monitoring
  • Take 1 product that does Virtualization monitoring
  • Recommended: an unlimited Pizza budget for your IT Ops team

Combine the ingredients at high voltage in a climate conditioned environment.  Stir in a seriously large amounts of consulting services, training, and Red Bull.  At the end of this process, you will have your very own Frankencloud!

Frankenclouds are notoriously difficult to maintain because each part has its own version life cycle.  More critically, they also lack a brain.

Unfortunately, there are few alternatives to the Frankencloud today.  I think that the alternatives will rewrite the rules that Ops uses to create clouds.  Here are the rules that I think help drive a wooden stake through the heart of the Frankencloud (yeah, I mixed monsters):

  • not assume that server virtualization == cloud. 
  • simple, simple and simpler than that
  • focus on applications (need to write more about DevOps)
  • start with networking, not computation
  • assume that software containers are replaced, not upgraded

What do you think we can do to defeat Frankenclouds?

API vs. API: How Amazon EC2 kicks VMware, RackSpace, and Microsoft

My day job is to try and choose and influence Cloud technologies so it’s no surprise when to hear different vendors pitching why their cloud API is more open, standards based, or performant.  They have convincing yet irrelevant arguments: the primary measure of a cloud API is the size of its ecosystem.

The API’s ecosystem is the number (and vitality) of the upstream partners, SaaS services, PaaS vendors, and ISVs that have built their business on top of that API.  The fundamental truth of this model, like all ad hoc IT standards, is that success is built on business traction, not on technical merit or endorsement by standards bodies.

So which Cloud API will be the winner?  We’re just rounding the first turn and Amazon is ahead.  Let’s look at the lead fillies

  • Amazon EC2/S3 has the clear leadership.  Their API is widely copied (without clear license to do so!), includes storage and their billing model is highly innovative.
  • Microsoft Azure is making a big push.  Windows continues to dominate as a platform and their SQL cloud helps address application porting.  In addition, their PaaS integration provides a forward migration.
  • VMware vCloud has taken to high road through the official standards bodies.  VMware dominates the private cloud space and their vCenter API represents a larger ecosystem than any other virtualization API.   This ecosystem guarantees that vCloud will be widely adopted but if they can cross over into public clouds is fuzzier.
  • RackSpace has an interesting position by offering both dedicated and shared hosting.  Their service and API have been along for a long time.  They have just not created the buzz that Amazon gets.  They could be a swing vote depending on their future decisions around Cloud APIs.

But maybe we don’t have to pick the winner!  Perhaps there’s an option for a trifecta bet where we don’t have to pick a single winner.  This scenario of building a multi-API abstraction layer is getting a lot of interest and creating a lot of value.  Vendors include RightScale, DeltaCloud (was RedHat, now Apache), and jCloud.

Right now, I’m sitting in the Delta Cloud session at RedHat Summit/JBoss World.  One of my concerns about API aggregation is that the API abstraction has to be either least common denominator (LCD) or have strange exceptions.  For example, the speaker is saying that approaches to Firewalls are very different or completely missing.  This creates a serious aggravation for aggregation:  does the API leave a gap, favor one API, or invent yet another way to solve the problem.

I believe the cloud API race is not just a single horse race for the Cloud Computing Cup, it’s more like the Triple Crown.   The real winning API will cover compute, network, and storage management.   

Then again, accelerating PaaS adoption could make these IaaS Clouds into buggy whip manufacturers.

Disclosure:  My employeer, Dell, is a partner with many of the companies listed above.

Java makes stange bedfellows of VMware and Google

I was thinking about Sci-Tech’s story about VMware and Google. I’ve been watching and wondering how giants VMware and Google will dance to the music of Java (now an Oracle asset). VMware’s Spring and Groovy seems like a natural fit with Google’s AppEngine. However, neither own the Java platform yet both are banking big on it becoming the major development language. It puts them into the interesting position of having the evangelize Java together.

If they can marshall their shared interests then this combination could be a potent counter point to Microsoft’s .NET. They could provide the corporate support and lift that Sun did not. Or they could just create more confusion and dilution for an already fragmented platform.

6/29 update: after the JBoss World show, I need to add RedHat to the list of java supporters. Starting to take on an AntiMS feeling.

Putting on my Dell hat, accelerating these platforms helps our customers and our industry.

Rethinking Storage

Or “UNthinking SANs”

Back in 2001, I was co-founder of a start-up building the first Internet virtualized cloud.  Dual CPU 1U pizza box servers were brand new and we were ready to build out an 8 node, 64 VM cloud!  It was going to be a dream – all that RAM and CPU just begging to be oversubscribed.  It was enough to make Turing weep for joy.

Unfortunately, all those VMs needed lots and lots of storage.

Never fear, EMC was more than happy to quote us a lovely SAN with plenty of redundant FBAs and interconnected fabric switches.  It was all so shiny and cool yet totally unscalable and obscenely expensive.   Yes, unscalable because that nascent 8 node cloud was already at the port limit for the solution!  Yes, expensive because that $50,000 hardware solution would have needed a $1,000,000 storage solution!

The funny part is that even after learning all that, we still wanted to buy the SAN.  It was just that cool.

We never bought that SAN, but we did buy a very workable NAS device.  Then it was my job to change (“pragmatic-ize”) our architecture so that our cloud management did not require expensive shiny objects.

Our ultimate solution used the NAS for master images that were accessed by many nodes.  These requests were mainly reads and optimized.  Writes were made to differencing disks kept on local disk and highly scalable.  In systems, we were able to keep the masters local and save bandwidth.  This same strategy could easily be applied in current “stateless” VM deployments.

Some of the SANless benefits are:

  • Less cost
  • Simplicity of networking and management
  • Nearer to linear scale out
  • Improved I/O throughput
  • Better fault tolerance (storage faults are isolated to individual nodes)

Of course, there are costs:

  • More spindles means more energy use (depending on drive selection and other factors)
  • Lack of centralized data management
  • Potentially wasted space because each system carries excess capacity
  • The need to synchronize data stored in multiple locations

These are real costs; however, I believe the data management problems are unsolved issues for SAN deployments too.  Data proliferation is simply hidden inside of the VMs.

Today, I observe many different SAN focused architectures and cringe.  These same solutions could be much simpler, more scalable and dramatically affordable with minimal (or even no) changes.  If you’re serious about deploying a cloud based on commodity system then you need seriously need to re-evaluate your storage.