To avoid echo chamber, OpenStack must embrace competitive cloud ecosystem

wpid-20151023_100533.jpg
Japanese Bullet Train View

I was in Japan before the Tokyo summit on a bullet train to Kyoto watching the mix of heavy industry and bucolic mountains pass by. That scene reflects an OpenStack duality: we want to be both a dominant platform delivering core cloud services and an open source values driven collective.

First, I fundamentally believe in the success of OpenStack as the open virtual infrastructure management platform.

I believe that we have solved the virtual compute/storage/network problem sufficiently to become the de facto open IaaS platform. While not perfect, the technologies are sufficient assuming we continue to improve ease of use and operational hardening. Pursing that base capability is my primary motivation for DefCore work.

I don’t believe that the OpenStack community is, or should try to become, the authority on “all things cloud.”

In the presence of Amazon, VMware, Microsoft and Google, we cannot make that claim with any degree of self-respect. Even newcomers like DigitalOcean have an undeniable footprint and influence. Those vendor platforms drive cloud ecosystems and technologies which foster fast innovation because there is no friction to joining their ecosystems and they are sufficiently large and stable enough to represent a target market. We’ve seen clear signs from Rackspace, HP and others that platform diversity improves cloud strength.

I continue to think we (OpenStack) spend too much time evaluating what is “in” or “out” of the project and too little time talking about what’s “on,” “under” and “with” the project like Kubernetes, Mesos, Docker, SDN, Hadoop and Ceph. That type of thinking creates distance between OpenStack efforts and the majority of the market.

What motivates the drive to an all open captive community? It’s the reasonable concern that critical parts of the infrastructure will become pay-to-play. For example, what if a non-OpenStack alternative to Heat Orchestration gained popularity for OpenStack implementers. Perhaps something that ran on Amazon also. That would create external pressure that would drive internal priorities. These “non-OpenStack” products would then have influence without having to contribute back to upstream.

Can we afford to have external entities driving internal priorities? Hell yes, that’s what customer adoption looks like.

OpenStack does not own the market sufficiently to create cloud echo chamber. The next wave of cloud innovation (my money is on container platforms) will follow the path of least resistance and widest adoption. We need to embrace that these innovations will not all be inside our community so that we can welcome them as part of our ecosystem. The community needs to find peace with that.

The Upstream Imperative: paving the way for content creators is required for platform success

Since content is king, platform companies (like Google, Microsoft, Twitter, Facebook and Amazon) win by attracting developers to build on their services.  Open source tooling and frameworks are the critical interfaces for these adopters; consequently, they must invest in building communities around those platforms even if it means open sourcing previously internal only tools.

This post expands on one of my OSCON observations: companies who write lots of code have discovered an imperative to upstream their internal projects.   For background, review my thoughts about open source and supply chain management.

Huh?  What is an “upstream imperative?”  It sounds like what salmon do during spawning then read the post-script!

Historically, companies with a lot of internal development tools had no inventive to open those projects.  In fact, the “collaboration tax” of open source discouraged companies from sharing code for essential operations.   Historically, open source was considered less featured and slower than commercial or internal projects; however, this perception has been totally shattered.  So companies are faced with a balance between the overhead of supporting external needs (aka collaboration) and the innovation those users bring into the effort.

Until recently, this balance usually tipped towards opening a project but under-investing in the community to keep the collaboration costs low.  The change I saw at OSCON is that companies understand that making open projects successful bring communities closer to their products and services.

That’s a huge boon to the overall technology community.

Being able to leverage and extend tools that have been proven by these internal teams strengthens and accelerates everyone. These communities act as free laboratories that breed new platforms and build deep relationships with critical influencers.  The upstream savvy companies see returns from both innovation around their tools and more content that’s well matched to their platforms.

Oh, and companies that fail to upstream will find it increasingly hard to attract critical mind share.  Thinking the alternatives gives us a Windows into how open source impacts past incumbents.

That leads to a future post about how XaaS dog fooding and “pure-play” aaS projects like OpenStack and CloudFoundry.

Post Script about Upstreaming:

Continue reading

Why cloud compute will be free

Today at Dell, I was presenting to our storage teams about cloud storage (aka the “storage banana”) and Dave “Data Gravity” McCrory reminded me that I had not yet posted my epiphany explaining “why cloud compute will be free.”  This realization derives from other topics that he and I have blogged but not stated so simply.

Overlooking that fact that compute is already free at Google and Amazon, you must understand that it’s a cloud eat cloud world out there where losing a customer places your cloud in jeopardy.  Speaking of Jeopardy…

Answer: Something sought by cloud hosts to make profits (and further the agenda of our AI overlords).

Question: What is lock-in?

Hopefully, it’s already obvious to you that clouds are all about data.  Cloud data takes three primary forms:

  1. Data in transformation (compute)
  2. Data in motion (network)
  3. Data at rest (storage)

These three forms combine to create cloud architecture applications (service oriented, externalized state).

The challenge is to find a compelling charge model that both:

  1. Makes it hard to leave your cloud AND
  2. Encourages customers to use your resources effectively (see #1 in Azure Top 20 post)

While compute demands are relatively elastic, storage demand is very consistent, predictable and constantly grows.  Data is easily measured and difficult to move.  In this way, data represents the perfect anchor for cloud customers (model rule #1).  A host with a growing data consumption foot print will have a long-term predictable revenue base.

However, storage consumption along does not encourage model rule #2.  Since storage is the foundation for the cloud, hosts can fairly judge resource use by measuring data egress, ingress and sidegress (attrib @mccrory 2/20/11).  This means tracking not only data in and out of the cloud, but also data transacted between the providers own cloud services.  For example, Azure changes for both data at rest ($0.15/GB/mo) and data in motion ($0.01/10K).

Consequently, the financially healthiest providers are the ones with most customer data.

If hosting success is all about building a larger, persistent storage footprint then service providers will give away services that drive data at rest and/or in motion.  Giving away compute means eliminating the barrier for customers to set up web sites, develop applications, and build their business.  As these accounts grow, they will deposit data in the cloud’s data bank and ultimately deposit dollars in their piggy bank.

However, there is a no-free-lunch caveat:  free compute will not have a meaningful service level agreement (SLA).  The host will continue to charge for customers who need their applications to operate consistently.  I expect that we’ll see free compute (or “spare compute” from the cloud providers perspective) highly used for early life-cycle (development, test, proof-of-concept) and background analytic applications.

The market is starting to wake up to the idea that cloud is not about IaaS – it’s about who has the data and the networks.

Oh, dem golden spindles!  Oh, dem golden spindles!

Microsoft Azure Cloud – Top 20 Lessons Learned about MS’s PaaS

Last week Dave McCrory (@McCrory) and I (@Zehicle) had the benefit of intensive Azure training at Microsoft HQ to support Dell’s Azure Stamp.

We’ve assembled a top 20 list of things to know about programming for Azure (and really any PaaS leaning cloud):

  1. If you want performance, optimize to reduce fees. Azure (and any cloud) is architected to penalize you if you use their resources poorly. The challenge is to fix this before your boss get the tab for your unenlightened design decisions.
  2. Coding .NET on Azure easy, architecting for Azure requires learning. Clouds put things in different places than you are used to and the rules are different. Expect a learning curve.
  3. Partitioning = parallelism. Learn to love partitions in all their forms, because your app will be throttled if you throw everything into a single partition! On the upside, each partition operates in parallel and even better, they usually don’t cost extra (SQL is the exception).
  4. Roles are flexible. You can run web servers (Apache, etc) on a worker and worker tasks on a web role. This is a good way to save some change since you pay per role instance. It’s counter to separation of concerns, but financially you should also combine workers into a single role where possible.
  5. Understand walking deployments. You can (and should) have simultaneous versions of the code operating against the same data so that you can roll upgrades (ala Timothy Fitz/Eric Ries) to reduce risk and without reducing performance. You should expect your data schema to simultaneously span mutiple code versions.
  6. Learn about Update Domains (UDs). Deployment domains allow rolling upgrades and changes to Applications and Services. They are part of how you partition your overall application. If you’re planning a VIP swap deployment, then you won’t care.
  7. Each service = ONE external IP. You can have many VMs backing each service (and multiple roles in a service) and Azure will load balance between them so you can scale out each service. Think of each service as a clonable entity: there will be at least 1 and more can be added if you want to scale.
  8. Understand between VIP and DIP. VIPs stand for Virtual IPs and are external, public, and metered. DIPs are internal, private, and load balanced. Azure provides an API to discover your DIPs – do not assume you know them because they are DYNAMIC IPs. Azure won’t let you see other DIPs inside the system.
  9. Azure has rich diagnostics, but beware. Azure leverages the existing diagnostics built into their system, but has to get the data off box since instances are volitile. This means that problems can be hard to isolate while excessive logging can impact performance and generate fees. Microsoft lets you target individual systems for elevated levels. You can also Terminal Server to a VM for troubleshooting (with caution).
  10. The new Azure admin console rocks. Take your pick between Silverlight or MMC Snap-in.
  11. Everything goes into Azure Storage. Learn to love it. Queues -> storage. Tables -> storage. Blobs -> storage. Logging -> storage. Code Repo -> storage. vDisk -> storage. SQL -> SQL (they march to their own drummer).
  12. Queues are essential, but tricky. Learn the meaning of idempotent because using queues requires you to handle failures and timeouts. The scary part is that it will work nicely until you exceed some limits and then you’ll experience cascading failure. Whee! Oh yea, and queues require polling (which stinks as a notification model).
  13. SQL Azure is just mostly like MS SQL. Microsoft did a smart thing in keeping Cloud SQL so it was highly compatible with Local SQL. The biggest note is that limited in size of partition. If you embrace the size limits you will get better performance. So stop pushing BLOBs into databases and start sharding.
  14. Duplicating data in tables will improve performance. This has to do with how partitions and keys operate but is an interesting architecture for NoSQL – stage data for use. Don’t be afraid to stage the same data in multiple ways. It may be faster/cheaper to write data twice if it becomes easier to find when you search it 1000s of times.
  15. Table data can be “warmed up.” Storage has logic that makes frequently accessed items faster (sort of like a cache ;). If you can anticipate load spikes then you should warm the data just before the spike.
  16. Storage billing is both amount and transactions. You can get burned on a small, but busy set of data. Note: you will pay even if you 404 a request.
  17. Azure has a CDN. Leveraging Microsoft’s Content Delivery Network (CDN) will improve performance for your users with small, low latency, high request items. You need to change your URLs for those assets. Best practice is to use some versioning in the URI so that you can force changes. Remember, CDN is SLOWER for the first hit when the data is not in cache so avoid CDN for low volume assets!
  18. Provisioning time is not instant. Azure needs anywhere from 1-3 minutes to spin a new instance of a role. Build this lag into your architecture and dynamic scale plans. New databases and partitions are fast.
  19. The VM Role is maintained by YOU. Using the VM role is a handy shortcut, but has a long list of gotcha’s. Some of note: 1) the VM can be “reset” to the last VM image state that you uploaded, 2) you are responsble for VM OS upgrades and patches, 3) VMs must be clonable because they will operate in parallel.
  20. Azure supports more than .NET. You can setup anything in a worker (and now VM) role, but there are nuances to doing this effectively. You really need to understand how Azure works and had better be ready to crack open Visual Studio for some things even if you’re writing in Java.

We hope this list helps you navigate Azure deployments. No matter what cloud you use, understanding Azure’s architecture will help you write better cloud scale applications.

We’d love to hear your suggestions and recommendations!

Mirrored on both blogs: Rob Hirschfeld’s Blog & Dave McCrory’s Blog

Seattle Cloud Camp, Dec 2010

While I was in Seattle for Azure training preparing for Dell’s Azure Appliance , Dave @McCrory suggested that we also attend the Seattle Cloud Camp (SCC Tweets).  This event was very well attended (200 people!).  With heavy attendance by Amazon (at their HQ), Microsoft (in the ‘hood), and Google, there was a substantial cloud vendor presence (>25% from those vendors alone).  Notable omission: VMware.

My reflection about the event by segment.

Opening Sessions:

  • Most of the opening sessions were too light for the audience.  I thought we were past the “what is cloud” level, sigh.
  • Of note, the Amazon security presentation by Steve Rileywas fun and entertaining.
  • Picking on a Dell competitor specifically: calling your cloud solution “WAS” is a branding #fail (not that DCSWA much is better).

Unpanel of self-appointed cloud extroverts experts:

  • The unpanel covered some decent topics (@adronbh captured them on twitter), unfortunately none of the answers really stood out to me.  Except for NoSQL.
  • The unpanel discussion about NoSQL drew 2 answers.  1) It’s not NoSQL, it’s eventually consistent instead of strictly consistent.  (note: I’ve been calling it “Storage++”) 2) We’ll see more and more choices in this area as we tune the models for utility then we’ll see some consolidation.  The suggestion was that NoSQL would follow the same explosion/contraction pattern of SQL databases.

Session on Cloud APIs (my suggested topic)

  • The Cloud API topic was well attended (30+).  The vast overwhelming majority or the attendees were using Amazon.
  • There was some interest in having “standard” APIs for cloud functions was not well received because it was felt to stifle innovation.  We are still to early.
  • It was postulated but not generally agreed that cloud aggregation (DeltaCloud, RightScale, etc) is workable.  This was considered a reason to not require standard clouds.
  • CloudCamp sponsor, Skytap, has their own API.  These APIs are value added and provide extra abstraction levels.
  • It was said that there are a LOT (50 now, 500 soon) smaller hosts that want to enter the cloud space.  These hosts will need an API – some are inventing their own.
  • I brought up the concept discussed at OpenStack that the logical abstraction for cloud network APIs is a “vlan.”  This created confusion because some thought that I meant actual 802.1q tags.  NO!  I just meant that is was the ABSTRACTION of a VLAN connecting VMs together.
  • There was agreement from the clouderati in the room that cloud networking was f’ed up, but most people were not ready to discuss.
  • Cloud APIs have some basics that are working (semantics around VMs) but still have lots of wholes.  Notably: networking, application, services, and identity)

Session on Google App Engine (GAE)

  • GAE is got a lot going on, especially in the social/mobile space.
  • Do not think a lack of news about GAE means that they are going slow, it’s just the opposite.  It looks like they are totally kicking ass with a very focused strategy.  I suspect that they are just waiting for the market to catch-up.
  • GAE understands what a “platform” really is.  They talk about their platform as the SERVICES that they are offering.  The code is just code.  The services are impressive and include identity, mail, analysis, SQL (business only), map (as in Map-Reduce), prediction (yes, prediction!), storage, etc.  The total list was nearly 20 distinct services.
  • GAE compared them selves to Azure, not Amazon.

Getting cozy with “Adjacent Services”

I’ve had a busy week with Azure Training and Cloud Camp Seattle.  It’s going to take a few days to unwind specific posts about both, but I wanted to hit some shiny new thoughts.

Services helping each other

  • Adjacent Services are dedicated and/or public services (XaaS) that are offered along side generic public cloud offerings.   For a company like Dell (my employer), this could be specific brands of storage or databases (e.g. Oracle).  I believe these are much higher margin XaaS than IaaS.
  • Layer 7 Load Balancers represent a more intelligent link between load direction and the applications. I heard people using this term in multiple contexts.   For example,  In Azure, the apps can set themselves as “offline” and they will stop getting traffic then they can turn themselves online when they are ready for more.
  • Cloud Rollout/Migration is a rolling upgrade scheme where you can send traffic to 2 versions of your application at the same time!  You upgrade by zones and if you have >2 zones then you’ll have two active versions at the same time.  Your data models need to accommodate this.
  • We don’t have enough Agile Cloud programming books (like Dave Thomas’ RoR Intro).  We need a cloud programming book that STARTS WITH INTEGRATION TESTS and shows how to use all the adjacent services.  I may just have to write one (or three).

Thanks to many many at Microsoft for the great Azure training sessions.  I’ll add more names, but for now I have links to Steve Marx (Smarx.com) & Srirm Krishnan (Sriram Krishnan.com) .

VM != Cloud! Comparision draws ire, misses point

Having the requirement benefit of working with both Dave McCrory and Joyent on a daily basis at Dell, I cannot resist weighing in on the blog pong between them.

Dave’s post comparing VM pricing prompted Joyent to blog that VMs are not the only measure of cloud.

While I completely agree that clouds are not all about VMs, I think that Joyent is too limited in their definition of cloud in their reply.  We’re seeing an emergence of services as the differentiator between clouds.

Looking at Amazon, Azure, and Google, the clear way to reduce cloud spend is to migrate applications to consume their services (SQL, Storage, Bus, etc).

If cloud users are primarily concerned about price per hour (which I’m not convinced is the case) then they have real motivation to migrate from purely VM (or SmartMachine(tm) ) based applications to ones that use services.

PaaS, much ado about network services

There’s a surprising about of a hair pulling regarding IaaS vs PaaS.  People in the industry get into shouting matches about this topic as if it mattered more than Lindsay Lohan’s journey through rehab.

The cold hard reality is that while pundits are busy writing XaaS white papers, developers are off just writing software.  We are writing software that fits within cloud environments (weak SLA, small VMs), saves money (hosted data instead of data in VMs), and changes quickly (interpreted languages).  We’re doing using an expanding tool kit of networked components like databases, object stores, shared cache, message queue, etc.

Using network components in an application architecture is about as novel as building houses made of bricks.  So, what makes cloud architectures any better or different?

Nothing!  There is no difference if you buy VMs, install services, and wire together your application in its own little cloud bubble.  If I wanted to bait trolls, I’d call that an IaaS deployment.

However, there’s an emerging economic driver to leverage lower cost and more elastic infrastructure by using services provided by hosts rather than standing them up in a VM.  These services replace dedicated infrastructure with managed network attached services and they have become a key differentiator for all the cloud vendors

  • At Google App Engine, they include Big Tables, Queues, MemCache, etc
  • At Microsoft Azure, they include SQL Azure, Azure Storage, AppFabric, etc
  • At Amazon AWS, they include S3, SimpleDB, RDS (MySQL), Queue & Notify, etc

Using these services allows developers to focus on the business problems we are solving instead of building out infrastructure to run our applications.  We also save money because consuming an elastic managed network service is less expensive (and more consumption based) than standing up dedicated VMs to operate the services.

Ultimately, an application can be written as stateless code (really “externalized state” is more a accurate description) that relies on these services for persistence.  If a host were to dynamically instantiate instances of that code based on incoming requests then my application resource requirements would become strictly consumption based.   I would describe that as true cloud architecture. 

On a bold day, I would even consider an environment that enforced offered that architecture to be a platform.  Some may even dare to describe that as a PaaS; however, I think it’s a mistake to look to the service offering for the definition when it’s driven by the application designers’ decisions to use network services.

While we argue about PaaS vs IaaS, developers are just doing what they need.  Today they may stand-up their own services and tomorrow they incorporate 3rd party managed services.  The choice is not a binary switch, a layer cake, or a holy war.

The choice is about choosing the right most cost effective and scalable resource model.

Microsoft & Dell partner for Azure hosting

Vacation rental in Fort Walton Beach, FLA

Interesting juxtaposition, last week I was vacationing at the Azure condos in Florida and this week Dell is making a strategic announcement to develop a Dell-powered Microsoft Windows Azure Platform Appliance, as well as Azure based services delivered by Dell Services.

Press Release: http://content.dell.com/us/en/corp/d/press-releases/2010-07-12-dell-microsoft-cloud-azure-appliance.aspx

This is a project that I’m involved in, so watch for updates as details emerge.