Nextcast #14 Transcription on OpenStack & Crowbar > “we can’t hand out trophies to everyone”

Last week, I was a guest on the NextCast OpenStack podcast hosted by Niki Acosta (EMC) [Jeff Dickey could not join].   I’ve taken some time to transcribe highlights.

We had a great discussion nextcastabout OpenStack, Ops and Crowbar.  I appreciate Niki’s insightful questions and an opportunity to share my opinions.  I feel that we covered years of material in just 1 hour and I appreciate the opportunity to appear on the podcast.

Video from full post (youtube) and the audio for download.

Plus, a FULL TRANSCRIPT!  Here’s my Next Cast #14 Short Transcripton

The objective of this transcription is to help navigate the recording, not replace it.  I did not provide complete context for remarks.

  • 04:30 Birth of Crowbar (to address Ops battle scars)
  • 08:00 The need for repeatable Ready State baseline to help community work together
  • 10:30 Should hardware matter in OpenStack? It has to, details and topology matters not vendor.
  • 11:20 OpenCompute – people are trying to open source hardware design
  • 11:50 When you are dealing with hardware, it matters. You have to get it right.
  • 12:40 Customers are hardware heterogeneous by design (and for ops tooling). Crowbar is neutral territory
  • 14:50 It’s not worth telling people they are wrong, because they are not. There are a lot of right ways to install OpenStack
  • 16:10 Sometimes people make expensive choices because it’s what they are comfortable with and it’s not helpful for me to them they a wrong – they are not.
  • 16:30 You get into a weird corner if you don’t tell anyone no. And an equally weird corner if you tell everyone yes.
  • 18:00 Aspirations of having an interoperable cloud was much harder than the actual work to build it
  • 18:30 Community want to say yes, “bring your code” but to operators that’s very frustrating because they want to be able to make substitutions
  • 19:30 Thinking that if something is included then it’s required – that’s not clear
  • 19:50 Interlock Dilemma [see my back reference]
  • 20:10 Orwell Animal Farm reference – “all animals equal but pigs are more equal”
  • 22:20 Rob defines DefCore, it’s not big and scary
  • 22:35 DefCore is about commercial use, not running the technical project
  • 23:35 OpenStack had to make money for the companies are paying for the developers who participate… they need to see ROI
  • 24:00 OpenStack is an infrastructure project, stability is the #1 feature
  • 24:40 You have to give a reason why you are saying no and a path to yes
  • 25:00 DefCore is test driven: quantitative results
  • 26:15 Balance between whole project and parts – examples are Swiftstack (wants Object only) and Dreamhost (wants Compute only)
  • 27:00 DefCore created core components vs platform levels
  • 27:30 No vendor has said they can implement DefCore without some effort
  • 28:10 We have outlets for vendors who do not want to implement the process
  • 28:30 The Board is not in a position to make technical call about what’s in, we had to build a process for community input
  • 29:10 We had to define something that could say, “this is it and we have to move on”
  • 29:50 What we want is for people to start with the core and then bring in the other projects. We want to know what people are adding so we can make that core in time
  • 30:10 This is not a recommendation is a base.
  • 30:35 OpenStack is a bubble – does not help if we just get together to pad each other on the back, we want to have a thriving ecosystem
  • 31:15 Question: “have vendors been selfish”
  • 31:35 Rob rephrased as “does OpenStack have a tragedy of the commons” problem
  • 32:30 We need to make sure that everyone is contributing back upstream
  • 32:50 Benefit of a Benevolent Dictator is that they can block features unless community needs are met
  • 33:10 We have NOT made it clear where companies should be contributing to the community. We are not doing a good job directing community efforts
  • 33:45 Hidden Influencers becomes OpenStack Product group
  • 34:55 Hidden Influencers were not connecting at the summit in a public way (like developers were)
  • 35:20 Developers could not really make big commitments of their time without the buy in from their managers (product and line)
  • 35:50 Subtle selfishness – focusing on your own features can disrupt the whole release where things would flow better if they helped others
  • 37:40 Rob was concerned that there was a lot of drift between developers and company’s product descriptions
  • 38:20 BYLAWS CHANGES – vote! here’s why we need to change
  • 38:50 Having whole projects designated as core sucks – code in core should be slower and less changing. Innovation at the core will break interoperability
  • 39:40 Hoping that core will help product managers understand where they are using the standard and adding values
  • 41:10 All babies are ugly > with core, that’s good. We are looking for the grown ups who can do work and deliver value. Babies are things you nurture and help grow because they have potential.
  • 42:00 We undermine our credibility in the community when we talk about projects that are babies as if they were ready.
  • 43:15 DefCore’s job was to help pick projects. If everyone is core then we look like a youth soccer team where everyone is getting a trophy
  • 44:30 Question: “What do you tell to users to instill confidence in OpenStack”
  • 44:50 first thing: focus on operations and automation. Table stakes (for any cloud) is getting your deployments automated. Puppies vs Cattle.
  • 45:25 People who were successful with early OpenStack were using automated deployments against the APIs.
  • 46:00 DevOps is a fundamental part of cloud computing – if you’re hand-built and not automated then you are old school IT.
  • 46:40 Niki references Gartner “Bimodal IT” [excellent reference, go read it!]
  • 47:20 VMWare is a great crutch for OpenStack. We can use VMWare for the puppies.
  • 47:45 OpenStack is not going to run on every servers (perhaps that’s heresy) but it does not make sense in every workload
  • 48:15 One size does not fit all – we need to be good at what we’re good at
  • 48:30 OpenStack needs to focus on doing something really well. That means helping people who want to bring automated workloads into the cloud
  • 49:20 Core was about sending a signal about what’s ready and people can rely on
  • 49:45 Back in 2011, I was saying OpenStack was ready for people who would make the operational investment
  • 50:30 We use Crowbar because it makes it easier to do automated deployments for infrastructure like Hadoop and Ceph where you want access to the physical media
  • 51:00 We should be encouraging people to use OpenStack for its use cases
  • 51:30 Existential question for OpenStack: are we a suite or product. The community is split here
  • 51:30 In comparing with Amazon, does OpenStack have to implement it or build an ecosystem to compete
  • 53:00 As soon as you make something THE OpenStack project (like Heat) you are sending a message that the alternates are not welcome
  • 54:30 OpenStack ends up in a trap if we pick a single project and make it the way that we are going do something. New implementations are going to surface from WITHIN the projects and we need to ready for that.
  • 55:15 new implementations are coming, we have to be ready for that. We can make ourselves vulnerable to splitting if we do not prepare.
  • 56:00 API vs Implementation? This is something that splits the community. Ultimately we to be an API spec but we are not ready for that. We have a lot of work to do first using the same code base.
  • 56:50 DefCore has taken a balanced approach using our diversity as a strength
  • 57:20 Bylaws did not allow for enough flexibility for what is core
  • 59:00 We need voters for the quorum!
  • 59:30 Rob recommended Rocky Grober (Huawei) and Shamail Tahir (EMC) for future shows

OpenStack Austin: What we’d like to see at the Design Summit

Last week, the OpenStack Austin user group discussed what we’d like to see at the upcoming OpenStack Design Summit. We had a strong turnout (48?!).

  1. To get the meeting started, Marc Padovani from HP (this month’s sponsor) provided some lessons learned from the HP OpenStack-Powered Cloud. While Marc noted that HP has not been able to share much of their development work on OpenStack; he was able to show performance metrics relating to a fix that HP contributed back to the OpenStack community. The defect related to the scheduler’s ability to handle load. The pre-fix data showed a climb and then a gap where the scheduler simply stopped responding. Post-fix, the performance curve is flat without any “dead zones.” (sharing data like this is what I call “open operations“)
  2. Next, I (Rob Hirschfeld) gave a brief overview of the OpenStack Essex Deploy Day (my summary) that Dell coordinated with world-wide participation. The Austin deploy day location was in the same room as the meetup so several of the OSEDD participants were still around.
  3. The meat of the meetup was a freeform discussion about what the group would like to see discussed at the Design Summit. My objective for the discussion was that the Austin OpenStack community could have a broader voice is we showed consensus for certain topics in advance of the meeting.

At Jim Plamondon‘s suggestion, we captured our brain storming on the OpenStack etherpad. The Etherpad is super cool – it allows simultaneous editing by multiple parties, so the notes below were crowd sourced during the meeting as we discussed topics that we’d like to see highlighted at the conference. The etherpad preserves editors, but I removed the highlights for clarity.

The next step is for me to consolidate the list into a voting page and ask the membership to rank the items (poll online!) below.

Brain storm results (unedited)

Stablity vs. Features

API vs. Code

  • What is the measurable feature set?
  • Is it an API, or an implementation?
  • Is the Foundation a formal-ish standards body?
  • Imagine the late end-game: can Azure/VMWare adopt OPenStack’s APIs and data formats to deliver interop, without running OpenStack’s code? Is this good? Are there conversations on displacing incumbents and spurring new adoption?
  • Logo issues

Documentation Standards

  • Dev docs vs user docs
  • Lag of update/fragmentation (10 blogs, 10 different methods, 2 “work”)
  • Per release getting started guide validated and available prior or at release.

Operations Focus

  • Error messages and codes vs python stack traces
  • Alternatively put, “how can we make error messages more ops-friendly, without making them less developer-friendly?”
  • Upgrade and operations of rolling updates and upgrades. Hot migrations?

If OpenStack was installable on Windows/Hyper-V as a simple MSI/Service installer – would you try it as a node?

  • Yes.

Is Nova too big?  How does it get fixed?

  • libraries?
  • sections?
  • make it smaller sub-projects
  • shorter release cycles?

nova-volume

  • volume split out?
  • volume expansion of backend storage systems
  • Is nova-volume the canonical control plane for storage provisioning?  Regardless of transport? It presently deals in block devices only… is the following blueprint correctly targeted to nova-volume?

https://blueprints.launchpad.net/nova/+spec/filedriver

Orchestration

  • Is the Donabe project dead?

Discussion about invitations to Summit

  • What is a contribution that warrants an invitation
  • Look at Launchpad’s Karma system, which confers karma for many different “contributory” acts, including bug fixes and doc fixes, in addition to code commitments

Summit Discussions

  • Is there a time for an operations summit?
  • How about an operators’ track?
  • Just a note: forums.openstack.org for users/operators to drive/show need and participation.

How can we capture the implicit knowledge (of mailing list and IRC content) in explicit content (documentation, forums, wiki, stackexchange, etc.)?

Hypervisors: room for discussion?

  • Do we want hypervisor featrure parity?
  • From the cloud-app developer’s perspective, I want to “write once, run anywhere,” and if hypervisor features preclude that (by having incompatible VM images, foe example)
  • (RobH: But “write once, run anywhere” [WORA] didn’t work for Java, right?)
  • (JimP: Yeah, but I was one of Microsoft’s anti-Java evangelists, when we were actively preventing it from working — so I know the dirty tricks vendors can use to hurt WORA in OpenStack, and how to prevent those trick from working.)

CDMI

Swift API is an evolving de facto open alternative to S3… CDMI is SNIA standards track.  Should Swift API become CDMI compliant?  Should CDMI exist as a shim… a la the S3 stuff.

OpenStack Essex Events (Austin & Boston 3/8, WW Hack Day 3/1, Docs 3/6)

The excitement over the OpenStack Essex release is building!  While my team has been making plans around the upcoming design summit in SF,  there is more immediate action afoot.

Tomorrow (3/1), numerous sites are gathering around a World Wide Essex Hack Day on 3/1.  If you want to participate or even host a hack venue, get on the list and IRC channel (details).

My team at Dell is organizing a community a follow-up OpenStack Essex Install Day next week (3/8) in both Austin and Boston.  Just like the Hack Day, the install fest will focus on Essex release code with both online and local presence.  Unlike the Hack Day, our focus will be on deployments.  For the Dell team, that means working on the Essex deployment for Crowbar.  We’re still working on a schedule and partner list so stay tuned.  I’m trying to webcast Crowbar & OpenStack training sessions during the install day.

The hack day will close with the regularly scheduled 3/8 OpenStack Austin Meetup (6:30pm at Austin TechRanch).  The topic for the meetup will be, …. wait for it …., the Essex Release.  Thanks go to HP and Dell for sponsoring!

It’s important to note that Anne Gentle is also coordinating an OpenStack Essex Doc Day on 3/6.

To recap:

Wow… that should satisfy your Essex cravings.

OpenStack Deployments Abound at Austin Meetup (12/9)

I was very impressed by the quality of discussion at the Deployment topic meeting for Austin OpenStack Meetup (#OSATX). Of the 45ish people attending, we had representations for at least 6 different OpenStack deployments (Dell, HP, ATT, Rackspace Internal, Rackspace Cloud Builders, Opscode Chef)!  Considering the scope of those deployments (several are aiming at 1000+ nodes), that’s a truly impressive accomplishment for such a young project.

Even with the depth of the discussion (notes below), we did not go into details on how individual OpenStack components are connected together.  The image my team at Dell uses is included below.  I also recommend reviewing Rackspace’s published reference architecture.

Figure 1 Diablo Software Architecture. Source Dell/OpenStack (cc w/ attribution)

Notes

Our deployment discussion was a round table so it is difficult to link statements back to individuals, but I was able to track companies (mostly).

  • HP
    • picked Ubuntu & KVM because they were the most vetted. They are also using Chef for deployment.
    • running Diablo 2, moving to Diablo Final & a flat network model. The network controller is a bottleneck. Their biggest scale issue is RabbitMQ.
    • is creating their own Nova Volume plugin for their block storage.
    • At this point, scale limits are due to simultaneous loading rather than total number of nodes.
    • The Nova node image cache can get corrupted without any notification or way to force a refresh – this defect is being addressed in Essex.
    • has setup availability zones are completely independent (500 node) systems. Expecting to converge them in the future.
  • Rackspace
    • is using the latest Ubuntu. Always stays current.
    • using Puppet to setup their cloud.
    • They are expecting to go live on Essex and are keeping their deployment on the Essex trunk. This is causing some extra work but they expect it to pay back by allowing them to get to production on Essex faster.
    • Deploying on XenServer
    • “Devs move fast, Ops not so much.”  Trying to not get behind.
  • Rackspace Cloud Builders (RCB) is running major releases being run through an automated test suite. The verified releases are being published to https://github.com/cloudbuilders (note: Crowbar is pulling our OpenStack bits from this repo).
  • Dell commented that our customers are using Crowbar primarily pilots – they are learning how to use OpenStack
    • Said they have >10 customer deployments pending
    • ATT is using OpenSource version of Crowbar
    • Need for Keystone and Dashboard were considered essential additions to Diablo
  • Hypervisors
    • KVM is considered the top one for now
    • Libvirt (which uses KVM) also supports LXE which people found to be interesting
    • XenServer via XAPI are also popular
    • No so much activity on ESX & HyperV
    • We talked about why some hypervisors are more popular – it’s about the node agent architecture of OpenStack.
  • Storage
    • NetApp via Nova Volume appears to be a popular block storage
  • Keystone / Dashboard
    • Customers want both together
    • Including keystone/dashboard was considered essential in Diablo. It was part of the reason why Diablo Final was delayed.
    • HP is not using dashboard
OpenStack API
  • Members of the Audience made comments that we need to deprecate the EC2 APIs (because it does not help OpenStack long term to maintain EC2 APIs over its own).  [1/5 Note: THIS IS NOT OFFICIAL POLICY, it is a reflection of what was discussed]
  • HP started on EC2 API but is moving to the OpenStack API

Meetup Housekeeping

  • Next meeting is Tuesday 1/10 and sponsored by SUSE (note: Tuesday is just for this January).  Topic TBD.
  • We’ve got sponsors for the next SIX meetups! Thanks for Dell (my employeer), Rackspace, HP, SUSE, Canonical and PuppetLabs for sponsoring.
  • We discussed topics for the next meetings (see the post image). We’re going to throw it to a vote for guidance.
  • The OSATX tag is also being used by Occupy San Antonio.  Enjoy the cross chatter!

OpenStack Seattle Meetup 11/30 Notes

We had an informal OpenStack meetup after the Opscode Summit in Seattle.

This turned out to be a major open cloud gab fest! In addition to Dell OpenStack leads (Greg and I), we had the Nova Project Technical Lead (PTL, Vish Ishaya, @vish), HP’s Cloud Architect (Alex Howells, @nixgeek), Opscode OpenStack cookbook master (Matt Ray, @mattray). We were joined by several other Chef Summit attendees with OpenStack interest including a pair of engineers from Spain.

We’d planned to demo using Knife-OpenStack against the Crowbar Diablo build.  Unfortunately, the knife-openstack is out of date (August 15th?!).  We need Keystone support.  Anyone up for that?

Highlights

There’s no way I can recapture everything that was said, but here are some highlights I jotted down the on the way home.

  • After the miss with Keystone and the Diablo release, solving the project dependency problem is an important problem. Vish talked at length about the ambiguity challenge of Keystone being required and also incubated. He said we were not formal enough around new projects even though we had dependencies on them. Future releases, new projects (specifically, Quantum) will not be allowed to be dependencies.
  • The focus for Essex is on quality and stability. The plan is for Essex to be a long-term supported (LTS) release tied to the Ubuntu LTS. That’s putting pressure on all the projects to ensure quality, lock features early, and avoid unproven dependencies.
  • There is a lot of activity around storage and companies are creating volume plug-ins for Nova. Vish said he knew of at least four.
  • Networking has a lot of activity. Quantum has a lot of activity, but may not emerge as a core project in time for Essex. There was general agreement that Quantum is “the killer app” for OpenStack and will take cloud to the next level.  The Quantum Open vSwitch implementaiton is completely open source and free. Some other plugins may require proprietary hardware and/or software, but there is definitely a (very) viable and completely open source option for Quantum networking.
  • HP has some serious cloud mojo going on. Alex talked about defects they have found and submitted fixes back to core. He also hinted about some interesting storage and networking IP that’s going into their OpenStack deployment. Based on his comments, I don’t expect those to become public so I’m going to limit my observations about them here.
  • We talked about hypervisors for a while. KVM and XenServer (via XAPI) were the primary topics. We did talk about LXE & OpenVZ as popular approaches too. Vish said that some of the XenServer work is using Xen Storage Manager to manage SAN images.
  • Vish is seeing a constant rise in committers. It’s hard to judge because some committers appear to be individuals acting on behalf of teams (10 to 20 people).

Note: cross posted on the OpenStack Blog.

Reminder: 12/8 Meetup @ Austin!

Missed this us in Seattle? Join us at the 12/8 OpenStack meetup in Austin co-hosted by Dell and Rackspace.

Based on our last meetup, it appears deployment is a hot topic, so we’ll kick off with that – bring your experiences, opinions, and thoughts! We’ll also open the floor to other OpenStack topics that would be discussed – open technical and business discussions – no commercials please!

We’ll also talk about organizing future OpenStack meet ups! If your company is interested in sponsoring a future meetup, find Joseph George at the meetup and he can work with you on details.

Don’t fork it up. OpenStack needs community collaboration

Cant we just be friends?

We’re standing on the eve of the OpenStack 4th Design summit (aka Essex) and I’m watching a frenzy of IT Goliaths (Dell, Citrix, Cisco, HP, Rackspace) and some Cloud Davids (Nebula, Stackops) try to tangle revenue streams from an open source cloud project.

I was pleased to read GigaOM‘s Derrick Harris validation of Dell’s strategy which featured my team’s contributions (Crowbar, OpenStack & Hadoop).  We are working hard to bring these technologies to our customers in an open and collaborative way.

Dell has substantial IT assets to bring to bear on cloud solutions.  All of them are ultimately tied to products that generate revenue for Dell; however, that does not prevent our being able to collaborate and share.  On the contrary, we benefiting from input from our partners, customers and community to determine which features are needed to accelerate adoption.  Our recent decision to accelerate Crowbar modularization is a clear example of that process.

It is essential to understand that this is not just about cloud technologies!  It is about the collaborative way we are promoting them and the processes we are using to deliver them.

With Dell’s cloud moving at hurricane speed, it has been interesting to watch how other companies are setting their own OpenStack initiatives.  It seems to me that many of these efforts involve forks from OpenStack that cannot/will not be contributed back the community.  One (but not the only) example is from HP’s Emil Sayegh who says that “HP developers … ideas will be shared…”  He does not commit to sharing HP’s code in his post.  I hope that is an oversight and not their plan.

In time, forking may be needed.  Right now, we need to focus on building a strong foundation.  Open contributions of code are the engine of that success.