meh. Compared to cloud, Ops on physical infrastructure sinks.
Unfortunately, the cloud and scale platforms need to run someone so someone’s got to deal with it. In fact, we’ve got to deal with crates of cranky servers and flocks of finicky platforms. It’s enough to keep a good operator down.
There is a light at the end of the tunnel! We can make it repeatable to provision OpenStack, Hadoop and other platforms.
As a community, we’re steadily bringing best practices and proven automation from cloud ops down into the physical space. On the OpenCrowbar project, we’re accelerating this effort using the ready state concept as a hand off point for “physical-cloud equivalency” and exploring the concept of “functional operations” to make DevOps scripts more portable.
TL;DR! We appreciate those in the community who have been patient enough to help define and learn the process we’re using the make selections; however, we also recognize that most people want to jump to the results.
While the current thinking of a testing-based definition of Core adds pressure on expanding our test suite, it seems to pass the community’s fairness checks.
Overall, the discussions lead me to believe that we’re on the right track because the discussions jump from process to impacts. It’s not too late! We’re continuing to get community feedback. So what’s next?
First…. Get involved: Upcoming Community Core Discussions
Week Before Summit: Beijing Meetup hosted by Alan Clark (details TBD)
These discussions are expected to have online access via Google Hangout. Watch Twitter when the event starts for a link.
Want to to discuss this in your meetup? Reach out to me or someone on the Board and we’ll be happy to find a way to connect with your local community!
What’s Next? Implementation!
So far, the Core discussion has been about defining the process that we’ll use to determine what is core. Assuming we move forward, the next step is to implement that process by selecting which tests are “must pass.” That means we have to both figure out how to pick the tests and do the actual work of picking them. I suspect we’ll also find testing gaps that will have developers scrambling in Ice House.
Here’s the possible (aggressive) timeline for implementation:
November: Approval of approach & timeline at next Board Meeting
January: Publish Timeline for Roll out (ideally, have usable definition for Havana)
March: Identify Havana must pass Tests (process to be determined)
April: Integration w/ OpenStack Foundation infrastructure
Obviously, there are a lot of details to work out! I expect that we’ll have an interim process to select must-pass tests before we can have a full community driven methodology.
There is still confusion around the idea that OpenStack Core requires using some of the project code. This requirement helps ensure that people claiming to be OpenStack core have a reason to contribute, not just replicate the APIs.
It’s easy to overlook that we’re trying to define a process for defining core, not core itself. We have spent a lot of time testing how individual projects may be effected based on possible outcomes. In the end, we’ll need actual data.
There are some clear anti-goals in the process that we are not ready to discuss but will clearly going to become issues quickly. They are:
Using the OpenStack name for projects that pass the API tests but don’t implement any OpenStack code. (e.g.: an OpenStack Compatible mark)
Having speciality testing sets for flavors of OpenStack that are different than core. (e.g.: OpenStack for Hosters, OpenStack Private Cloud, etc)
We need to be prepared that the list of “must pass” tests identifies a smaller core than is currently defined. It’s possible that some projects will no longer be “core”
The idea that we’re going to use real data to recommend tests as must-pass is positive; however, the time it takes to collect the data may be frustrating.
People love to lobby for their favorite projects. Gaps in testing may create problems.
We are about to put a lot of pressure on the testing efforts and that will require more investment and leadership from the Foundation.
Some people are not comfortable with self-reporting test compliance. Overall, market pressure was considered enough to punish cheaters.
There is a perceived risk of confusion as we migrate between versions. OpenStack Core for Havana seems to specific but there is concern that vendors may pass in one release and then skip re-certification. Once again, market pressure seems to be an adequate answer.
It’s not clear if a project with only 1 must-pass test is a core project. Likely, it would be considered core. Ultimately, people seem to expect that the tests will define core instead of the project boundary.
What do you think? I’d like to hear your opinions on this!
Please RSVP so that we know how much food to get! SUSE is this Month’s sponsor for food and my team at Dell continues to pickup the room rental. We have 35 RSVPs as of Monday noon – this will be another popular meeting (last meeting minutes).
With the Summit next week, I think it is very important that we pre-discuss Summit topics and priorities as a community. It will help us be more productive individually and for our collective interests when we engage the larger community next week.
To get the meeting started, Marc Padovani from HP (this month’s sponsor) provided some lessons learned from the HP OpenStack-Powered Cloud. While Marc noted that HP has not been able to share much of their development work on OpenStack; he was able to show performance metrics relating to a fix that HP contributed back to the OpenStack community. The defect related to the scheduler’s ability to handle load. The pre-fix data showed a climb and then a gap where the scheduler simply stopped responding. Post-fix, the performance curve is flat without any “dead zones.” (sharing data like this is what I call “open operations“)
The meat of the meetup was a freeform discussion about what the group would like to see discussed at the Design Summit. My objective for the discussion was that the Austin OpenStack community could have a broader voice is we showed consensus for certain topics in advance of the meeting.
At Jim Plamondon‘s suggestion, we captured our brain storming on the OpenStack etherpad. The Etherpad is super cool – it allows simultaneous editing by multiple parties, so the notes below were crowd sourced during the meeting as we discussed topics that we’d like to see highlighted at the conference. The etherpad preserves editors, but I removed the highlights for clarity.
Imagine the late end-game: can Azure/VMWare adopt OPenStack’s APIs and data formats to deliver interop, without running OpenStack’s code? Is this good? Are there conversations on displacing incumbents and spurring new adoption?
Dev docs vs user docs
Lag of update/fragmentation (10 blogs, 10 different methods, 2 “work”)
Per release getting started guide validated and available prior or at release.
Error messages and codes vs python stack traces
Alternatively put, “how can we make error messages more ops-friendly, without making them less developer-friendly?”
Upgrade and operations of rolling updates and upgrades. Hot migrations?
If OpenStack was installable on Windows/Hyper-V as a simple MSI/Service installer – would you try it as a node?
Is Nova too big? How does it get fixed?
make it smaller sub-projects
shorter release cycles?
volume split out?
volume expansion of backend storage systems
Is nova-volume the canonical control plane for storage provisioning? Regardless of transport? It presently deals in block devices only… is the following blueprint correctly targeted to nova-volume?
What is a contribution that warrants an invitation
Look at Launchpad’s Karma system, which confers karma for many different “contributory” acts, including bug fixes and doc fixes, in addition to code commitments
Is there a time for an operations summit?
How about an operators’ track?
Just a note: forums.openstack.org for users/operators to drive/show need and participation.
How can we capture the implicit knowledge (of mailing list and IRC content) in explicit content (documentation, forums, wiki, stackexchange, etc.)?
Hypervisors: room for discussion?
Do we want hypervisor featrure parity?
From the cloud-app developer’s perspective, I want to “write once, run anywhere,” and if hypervisor features preclude that (by having incompatible VM images, foe example)
(RobH: But “write once, run anywhere” [WORA] didn’t work for Java, right?)
(JimP: Yeah, but I was one of Microsoft’s anti-Java evangelists, when we were actively preventing it from working — so I know the dirty tricks vendors can use to hurt WORA in OpenStack, and how to prevent those trick from working.)
Swift API is an evolving de facto open alternative to S3… CDMI is SNIA standards track. Should Swift API become CDMI compliant? Should CDMI exist as a shim… a la the S3 stuff.