To avoid echo chamber, OpenStack must embrace competitive cloud ecosystem

wpid-20151023_100533.jpg
Japanese Bullet Train View

I was in Japan before the Tokyo summit on a bullet train to Kyoto watching the mix of heavy industry and bucolic mountains pass by. That scene reflects an OpenStack duality: we want to be both a dominant platform delivering core cloud services and an open source values driven collective.

First, I fundamentally believe in the success of OpenStack as the open virtual infrastructure management platform.

I believe that we have solved the virtual compute/storage/network problem sufficiently to become the de facto open IaaS platform. While not perfect, the technologies are sufficient assuming we continue to improve ease of use and operational hardening. Pursing that base capability is my primary motivation for DefCore work.

I don’t believe that the OpenStack community is, or should try to become, the authority on “all things cloud.”

In the presence of Amazon, VMware, Microsoft and Google, we cannot make that claim with any degree of self-respect. Even newcomers like DigitalOcean have an undeniable footprint and influence. Those vendor platforms drive cloud ecosystems and technologies which foster fast innovation because there is no friction to joining their ecosystems and they are sufficiently large and stable enough to represent a target market. We’ve seen clear signs from Rackspace, HP and others that platform diversity improves cloud strength.

I continue to think we (OpenStack) spend too much time evaluating what is “in” or “out” of the project and too little time talking about what’s “on,” “under” and “with” the project like Kubernetes, Mesos, Docker, SDN, Hadoop and Ceph. That type of thinking creates distance between OpenStack efforts and the majority of the market.

What motivates the drive to an all open captive community? It’s the reasonable concern that critical parts of the infrastructure will become pay-to-play. For example, what if a non-OpenStack alternative to Heat Orchestration gained popularity for OpenStack implementers. Perhaps something that ran on Amazon also. That would create external pressure that would drive internal priorities. These “non-OpenStack” products would then have influence without having to contribute back to upstream.

Can we afford to have external entities driving internal priorities? Hell yes, that’s what customer adoption looks like.

OpenStack does not own the market sufficiently to create cloud echo chamber. The next wave of cloud innovation (my money is on container platforms) will follow the path of least resistance and widest adoption. We need to embrace that these innovations will not all be inside our community so that we can welcome them as part of our ecosystem. The community needs to find peace with that.

Austin OpenStack Meetup: Keystone & Knife (2/20 notes via Greg Althaus)

I could not make it to the recent Austin OpenStack Meetup, but Greg Althaus generously let me post his notes from the event.

Background

Matt Ray talks about Chef

Matt Ray from Opscode presented some of the work with Chef and OpenStack. He talked about the three main chef repos floating around. He called out Anso’s original cookbook set that is the basis for the Crowbar cookbooks (his second set), and his final set is the emerging set of cookbooks in OpenStack proper. The third one is interesting and what he plans to continue working on to make into his public openstack cookbooks. These are an amalgamation of smokestack, RCB, Anso improvements, and his (Crowbar’s).

He then demoed his knife plugin (slideshare) to build openstack virtual servers using the Openstack API. This is nice and works against TryStack.org (previously “Free Cloud”) and RCB’s demo cloud. All of that is on his github repo with instructions how to build and use. Matt and I talked about trying to get that into our Crowbar distro.

There were some questions about flow and choice of OpenStack API versus Amazon EC2 API because there was already an EC2 knife set of plugins.

Ziad Sawalha talks about Keystone

Ziad Sawalha is the PLT (Project Technical Lead) for Keystone. He works for Rackspace out of San Antonio. He drove up for the meeting.

He split his talk into two pieces, Incubation Process and Keystone Overview. He asked who was interested in what and focused his talk more towards overview than incubation.

Some key take-aways:

  • Keystone comes from Rackspace’s strong, flexible, and scalable API. It started as a known quantity from his perspective.
  • Community trusted nothing his team produced from an API perspective
  • Community is python or nothing
    • His team was ignored until they had a python prototype implementing the API
    • At this point, comments on API came in.
  • Churn in API caused problems with implementation and expectations around the close of Diablo.
    • Because comments were late, changes occurred.
    • Official implementation lagged and stalled into arriving.
  • API has been stable since Diablo final, but code is changing. that is good and shows strength of API.
  • Side note from Greg, Keystone represents to me the power of API over Code. You can have innovation around the implementation as long all the implementations have a fair ground work to plan under which is an API specification. The replacement of Keystone with the Keystone Light code base is an example of this. The only reason this is possible is that the API was sound and documented.  (Rob’s post on this)

Ziad spent the rest of his time talking about the work flow of Keystone and the API points. He covered the API points.

  • Client to Keystone, Keystone to Client for initial auth token
  • Client to Middleware API for the services to have a front.
  • Middleware to Keystone to verify and establish identity.
  • Middleware to Service to pass identity

Not many details other then flow and flexibility. He stressed the API design separated protocol from actions and data at all the layers. This allows for future variations and innovations while maintaining the APIs.

Ziad talked about the state of Essex.

  • Planned
    • RBAC (aka Role Based Access Control)
    • Stability
    • Many backends
  • Actual
    • Code replacement Keystone Light
    • Stability
    • LDAP backend
    • SQL backend

Folsum work:

  • RBAC
  • Stability
  • AD backend
  • Another backend
  • Federation was planned but will most likely be pushed to G
    • Federation is the ability for multiple independent Keystones to operate (bursting use case)
    • Dependent upon two other federation components (networking and billing/metering)

Cote & Rob interview: Crowbar+OpenStack Summit/Conference Reflections (40 mins)

I’m working on a larger post about the OpenStack Summit around API Implementation vs. Specification. You can have a preview of that AND A LOT OF OTHER STUFF (OpenStack, Crowbar, lunch) in this 40 minute interview w/ Michael Cote.

Setting: Dell World
Interview w/ @Cote at the Hilton Hotel Lobby on 6th street in Austin.

I know that Cote’s post does not have a time marker for easy navigation; however, I added them to help guide your navigation in the interview (link for audio) if you want to jump around.

  • 0:00 Introductions
  • 1:00 OpenStack
    • 1:00 Essex Conference – what is it, naming conventions
    • 2:45 Diablo is adding projects from incubation (Keystone, Dashboard,Quantum,
    • 5:30 OpenStack vs. Amazon – “OpenStack has ambitions.” We see it as a “platform for innovation.”
    • 6:30 OpenStack is a competitor for Amazon. It implements the EC2 APIs.
    • 7:30 How are people managing the evolving nature?
    • 8:20 We’re going to see OpenStack in production for the next release based on what we see in our deal flow.
    • 9:00 Every user that comes on adds momentum
    • 9:30 Rackspace setting up the OpenStack foundation is a reflection of the speed of adoption
    • 11:15 Our message is “we’re doing it, we’re in the field.” We are very hands on
    • 11:15 We chose early on to focus on helping deployment to help drive adoption
    • 12:00 “Our first test for partners is: Are you contributing back to the community?”
    • 12:44 The community told us “if you are participating then you are going to open source.” Our commits for OpenStack are live and in the open on our github.
    • 13:40 Why Github? We’re happy with it.
    • 14:20 OpenStack is using Gerrit because they have a gated trunk. They are migrated to Github
    • 15:20 APIs have been a big topic for OpenStack
    • 16:00 Do you track who is forking and following? Yes. We also have a listserv. We are trying to do a better job managing the Crowbar community. We know we need to do a better job.
    • 17:30 OpenStack is defined by its Implementation. That’s “an effective way to move the project forward quickly;” however, we’re getting to a point where people want to use alternate implementations.
    • 19:20 Implementation vs Specification is like the SOAP vs REST debate
    • 20:05 This is something the community needs to wrestle with
    • 21:45 Specification would allow the efforts to scale. The more people consume the API, the more people care about how it operates
    • 22:30 “Bugs can become the API”
    • 23:10 Asia and Europe are very active. We are seeing a ton of activity overseas.
  • 23:30 Crowbar
    • 24:00 Crowbar arose out of our need to deploy cloud software regardless of customer infrastructure
    • 24:45 We would show up and the customer needed all this cloud infrastructure. We created Crowbar because we always needed this
    • 26:00 We extended Chef because we had to do the initial bring-up including BIOS and RAID
    • 26:45 We added a state machine and an orchestration layer
    • 27:45 Updating the system is a huge component. Every month you may be upgrading the infrastructure!
    • 28:30 In our lab, we build whole clouds multiple times a day
    • 29:45 Crowbar is the “cloud unboxer”
    • 30:00 We modularized Crowbar with barclamps. Hadoop and OpenStack are a series of barclamps. Over 5 for each
    • 31:00 Barclamps are applied as layers. We are using that as a term to define DevOps
    • 31:15 We are using Crowbar to help message that we understand DevOps
    • 31:45 Soup vs Sandwich analogy – Images are like soup while DevOps is like a sandwich.
    • 32:45 If you don’t want something in a 1000 server deployment, DevOps lets you make a small change. Gives you flexibility.
    • 33:45 We added Cloud Foundry
    • 34:00 We’ve made it so easy with barclamps that partners are coming to us with ideas for barclamps. It’s like “changing the meat for the sandwich.”
    • 34:30 Dreamhost Ceph team created a barclamp and was actually running a majority of the Crowbar demos at the OpenStack conference
  • 35:25 What’s the future for Crowbar?
    • 35:30 More aspects of the infrastructure as open source
    • 35:45 More Hardware
    • 36:00 Multiple operating systems at the same time (XenServer, ESX, etc)
    • 36:30 Larger scale
    • 36:50 More types of infrastructure: storage & network
    • 37:40 Scalr shout out
    • 38:00 We know we need to collaborate more with our community
    • 38:30 The first step is to download it and try. Read my blog and sign up for the list serve
    • 39:00 CROWBAR IS NOT DELL SPECIFIC – we are working with people who want to create support for other vendor’s hardware. This benefits Dell.
    • 39:40 We don’t pretend that our customers are single vendor


PaaS Simplified: an application architecture that responds to load

handoff

In addition to attending the great sessions at the OpenStack Design Conference, our Dell team realized that we’ve been making Platform as a Service (PaaS) much more complex.  Stripping away the detritus is important because it looks like “What is a PaaS” is changing on a daily basis so boiling it down to the must fundamental is essential.

At its core, a PaaS is an application that changes its architecture based on the load.   That’s it no further definition is required.

I’ve been playing with this definition since April and am finding that it’s a much more productive definition of PaaS than any that I’ve used so far.  The reason is that it’s

  1. application focused,
  2. not language or services bound and
  3. captures the business use cases

Of course, I’m going to have to provide more backup in future posts.  I want to invite discussion about this perspective on PaaS.  I’m especially interesting in seeing how recent offerings from VMware (OpenPaaS/CloudFoundry) or Amazon (Elastic Beanstalk) measure against this concept.

Bad Premise: Cloud Outages are *not* driving IT back to premises

trapped

I wrote this responding to Lauren Carlson‘s (Software Advice) Blog Post.  Lauren – I’d be more likely to agree with the statement that “SLAs are dead”  Here’s why…

<soapbox>

Recent industry buzz about cloud service level agreements (SLAs) and reliability miss the core point about cloud.  Cloud is about agility, business models, consumerization of software and merciless pursuit of efficiency.

The fact that Amazon EC2 built its base without an “enterprise” SLA is exhibit #1 that the IT world changed and it’s not going back.

Here are my reasons why IT pandoras can’t get cloud back into the box.

#1. Cloud has vastly superior network connectivity

The concept of your users accessing your applications from inside your firewall is so 2005.  Today’s reality is that significant amounts of network access is externally routed means that applications need to live where they have excellent bandwidth to their users and to other applications.

#2. Cloud has elastic consumption of resources

Cloud is not less expensive infrastructure, it is mainly more flexible.  If you’re worried about an outage, then cloud is exactly the investment for you because you position a backup site at another location without having to pay for online resources.  It’s much harder to take down a site that invests the time to design a system that dynamically reallocates load between sites.

#3. Cloud drives more robust architecture

The fact that cloud delivery is more opaque and modular without a five 9s SLA has driven a cloud application architecture revolution (see CAP).  We have shifted the app paradigm from robust scale up hardware to robust scale out software.  Also significant, DevOps innovations have made deployments repeatable and adaptable.

The only “logical” argument for pulling applications back from the cloud is to assert control over more of the delivery chain for your application.  It the same reason that we think that driving is safer than flying – we’re the ones sitting behind the wheel when we drive.  News flash – driving is NOT safer than flying.

Cloud applications are not about hardware infrastructure, they are about SOFTWARE.  Perhaps one of the greatest disservices foisted on the market was saying cloud is synonymous with “Infrastructure as a Service” and “Virtualization.”  Cloud applications are powerful because we created ways that circumvent the limitations of IaaS and VMs!

</soapbox>

Not all APIs are equal: the power of API + implementation (OpenStack vs LibCloud vs DeltaCloud)

sky

I’ve been getting a lot of questions about Apache LibCloud and RedHat’s DeltaCloud vs. OpenStack.  While all of these projects offer APIs, only OpenStack is based on an implementation.

Having an implementation means that the API is reflected by code that delivers the functionality of the API.  This means that the implementation based API more closely reflects the actual workings of the system while the “pure” API must abstract the working of multiple systems.   The API only approach ends up having to become a least common deminator instead of a vision of the pure use cases.

LibCloud and DeltaCloud are important and useful.  They provide abstractions that help developers write applications without being tied to a specific cloud vendor.  While lack of lock-in is a concrete benefit, it comes at a price.  The price is that the API shim cannot expose features that differentiate the platforms.  This may represent a significant loss of functionality or performance.

When developers implement directly against an implemented API, they can take advantage of the full feature set of their target cloud.  They can also test and verify more directly.  These are significant benefits that result in richer, more robust and faster to market products.

Both approaches have their place and are needed in the market.  If I needed to write against multiple clouds for portability then Libcloud is a slam dunk.  If I needed rich features and an ecosystem then OpenStack or Amazon are better choices.

The unexpected openness of OpenStack: why it’s important to learn from others’ operations experience.

During the OpenStack Design Conference, Forrester’s James Staten (@Staten7) raved about OpenStack’s transparency compared to AWS.  Within the enclave of OpenStack fan boys supports (Dell alone sent >14 people to the summit), his post drew a considerable attention but did little to really further the value proposition.

“Open deployments” are a much more significant value to implementors than transparency from open source code.

For any technology solution, there are significant challenges that will only be understood when the system is under stress.  In some cases, these challenges are code defects; however, many will be related to configuration and deployment choices that are site specific.  It is correcting these issues that result in design patterns and practices that create a robust infrastructure; consequently, the process of hardening a solution is critical to its ultimate stability and success.

When a solution, like AWS, is deployed and managed by a single entity, it is extremely rare for operational lessons learned and best practices to make it to the larger community.  Amazon’s recent post mortem is a welcome exception.   This is not a bad thing (Roman Stanek’s contrasting point), it is just the reality of a proprietary cloud.  AWS operates as a black box and I don’t believe that Amazon’s operational experience would be relevant to others unless they were also operationally transparent.

While it makes business sense to remain operationally opaque, service providers lose the benefit of external lessons learned when there is no community working in parallel with them.

OpenStack’s community has an opportunity to iterate on CloudOps patterns and practices at a dramatically faster rate than any single provider.  This creates distinct value for OpenStack adopters because they can shorten or eliminate their own challenges because other adopters will have the same pains and benefit from the same fixes.

It is critical to understand that the benefit is conferred to both the party sharing the problem (they get advice and support) and the party lending assistance (they avoid the problem).  This is distinctly different from proprietary clouds where sharing is likely to cause embarrassment  unlikely to create helpful outcomes.

I am not advocating that all OpenStack deployments be the same or follow a prescriptive patterns. 

I believe that each installation will be unique in some way; however, there will  be enough commonalities and shared code to make sharing worthwhile.  This is especially true for adopters who start with tools like Crowbar that leverage community based Chef Recipes and automating scripts.  Tools that encourage automation and shared scripts help accelerate the establishment of robust deployment patterns and practices.

Ultimately, the ability to collaborate on cloud operation practice does more to strengthen OpenStack than developers, code reviews or corporate endorsements.