SRE role with DevOps for Enterprise [@HPE podcast]

sre-series

My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

Yes, DevOps and SRE are complementary

In this short 16 minute podcast, HPE’s Stephen Spector and I discuss how DevOps and SRE thinking overlaps and where are the differences.  We also discuss how Enterprises should be evaluating Site Reliability Engineering as a function and where it fits in their organization.

unBIOSed? Is Redfish an IPMI retread or can vendors find unification?

Server management interfaces stink.  They are inconsistent both between vendors and within their own product suites.  Ideally, Vendors would agree on a single API; however, it’s not clear if the diversity is a product of competition or actual platform variation.  Likely, it’s both.

From RedFish SiteWhat is Redfish?  It’s a REST API for server configuration that aims to replace both IPMI and vendor specific server interfaces (like WSMAN).  Here’s the official text from RedfishSpecification.org.

Redfish is a modern intelligent [server] manageability interface and lightweight data model specification that is scalable, discoverable and extensible.  Redfish is suitable for a multitude of end-users, from the datacenter operator to an enterprise management console.

I think that it’s great to see vendors trying to get on the same page and I’m optimistic that we could get something better than IPMI (that’s a very low bar).  However, I don’t expect that vendors can converge to a single API; it’s just not practical due to release times and pressures to expose special features.  I think the divergence in APIs is due both to competitive pressures and to real variance between platforms.

Even if we manage to a grand server management unification; the problem of interface heterogeneity has a long legacy tail.

In the best case reality, we’re going from N versions to N+1 (and likely N*2) versions because the legacy gear is still around for a long time.  Adding Redfish means API sprawl is going to get worse until it gets back to being about the same as it is now.

Putting pessimism aside, the sprawl problem is severe enough that it’s worth supporting Redfish on the hope that it makes things better.

That’s easy to say, but expensive to do.  If I was making hardware (I left Dell in Oct 2014), I’d consider it an expensive investment for an uncertain return.  Even so, several major hardware players are stepping forward to help standardize.  I think Redfish would have good ROI for smaller vendors looking to displace a major player can ride on the standard.

Redfish is GREAT NEWS for me since RackN/Crowbar provides hardware abstraction and heterogeneous interface support.  More API variation makes my work more valuable.

One final note: if Redfish improves hardware security in a real way then it could be a game changer; however, embedded firmware web servers can be tricky to secure and patch compared to larger application focused software stacks.  This is one area what I’m hoping to see a lot of vendor collaboration!  [note: this should be it’s own subject – the security issue is more than API, it’s about system wide configuration.  stay tuned!]

Ironic + Crowbar: United in Vision, Complementary in Approach

This post is co-authored by Devanda van der Veen, OpenStack Ironic PTL, and Rob Hirschfeld, OpenCrowbar Founder.  We discuss how Ironic and Crowbar work together today and into the future.

Normalizing the APIs for hardware configuration is a noble and long-term goal.  While the end result, a configured server, is very easy to describe; the differences between vendors’ hardware configuration tools are substantial.  These differences make it impossible challenging to create repeatable operations automation (DevOps) on heterogeneous infrastructure.

Illustration to show potential changes in provisioning control flow over time.

Illustration to show potential changes in provisioning control flow over time.

The OpenStack Ironic project is a multi-vendor community solution to this problem at the server level.  By providing a common API for server provisioning, Ironic encourages vendors to write drivers for their individual tooling such as iDRAC for Dell or iLO for HP.

Ironic abstracts configuration and expects to be driven by an orchestration system that makes the decisions of how to configure each server. That type of orchestration is the heart of Crowbar physical ops magic [side node: 5 ways that physical ops is different from cloud]

The OpenCrowbar project created extensible orchestration to solve this problem at the system level.  By decomposing system configuration into isolated functional actions, Crowbar can coordinate disparate configuration actions for servers, switches and between systems.

Today, the Provisioner component of Crowbar performs similar functions as Ironic for operating system installation and image lay down.  Since configuration activity is tightly coupled with other Crowbar configuration, discovery and networking setup, it is difficult to isolate in the current code base.  As Ironic progresses, it should be possible to shift these activities from the Provisioner to Ironic and take advantage of the community-based configuration drivers.

The immediate synergy between Crowbar and Ironic comes from accepting two modes of operation for OpenStack: bootstrapping infrastructure and multi-tenant server allocation.

Crowbar was designed as an operational platform that seeds an OpenStack ready environment.  Once that environment is configured, OpenStack can take over ownership of the resources and allow Ironic to manage and deliver “hypervisor-free” servers for each tenant.  In that way, we can accelerate the adoption of OpenStack for self-service metal.

Physical operations is messy and challenging, but we’re committed to working together to make it suck less.  Operators of the world unite!

Just for fun, putting themes to OpenStack Conferences

I’ve been to every OpenStack summit and, in retrospect, each one has a different theme.  I see these as community themes beyond the releases train that cover how the OpenStack ecosystem has changed.

The themes are, of course, highly subjective and intented to spark reflection and discussion.

City Release Theme My Commentary
ATL Ice House Its my sandbox! The new marketplace is great and there are also a lot of vendors who want to differentiate their offering and are not sure where to play.
HK Havana Project land grab It felt like a PTL gold rush as lots of new projects where tossed into the ecosystem mix.  I’m wary of perceived “anointed” projects that define “the way” to do things.
PDX Grizzly Shiny new things We went from having a defined core set of projects to a much richer and varied platforms, environments and solutions.
SD Folsom Breaking up is hard to do Nova began to fragment (cinder & quantum neutron)
SF Essex New kids are here Move over Rackspace.  Lots of new operating systems, providers, consulting and hosting companies participating.  Stackalytics makes it into a real commit race.
BOS Diablo Race to be the first Everyone was trying to show that OpenStack could be used for real work.  Lots of startups launched.
SJC Cactus Oh, you like us! We need some process This is real so everyone was exporing OpenStack.  We clearly needed to figure out how to work together.  This is where we migrated to git.
SA Bexar We’re going to take over the world We handed out rose-colored classes that mostly turned out pretty accurate; however. many some top names from that time are not in the community now (Citrix, NASA, Accenture, and others).
ATX Austin We choose “none of the above” There was a building sense of potential energy while companies figured out that 1) there was a gap and 2) they wanted to fill it together.

Stop the Presses! Austin OpenStack Meetup 7/12 features docs, bugs & cinder

Don’t miss the 7/12 OpenStack Austin meetup!  We’ve got a great agenda lined up.

This meetup is sponsored by HP (Mark Padovani will give the intro).

Topics will include

  1. 6:30 pre-meeting OpenStack intro & overview for N00bs.
  2. Anne Gentle, OpenStack Technical Writer at Rackspace Hosting, talking about How to contribute to docs & the areas needed. *
  3. Report on the Folsom.3 bug squash day (http://wiki.openstack.org/BugDays/20120712BugSquashing)
  4. (tentative) Greg Althaus, Dell, talking about the “Cinder” Block Storage project
  5. White Board – Next Meeting Topics

* if you contribute to docs then you’ll get an invite to the next design summit!   It’s a great way to support OpenStack even if you don’t write code.

OpenStack Austin: What we’d like to see at the Design Summit

Last week, the OpenStack Austin user group discussed what we’d like to see at the upcoming OpenStack Design Summit. We had a strong turnout (48?!).

  1. To get the meeting started, Marc Padovani from HP (this month’s sponsor) provided some lessons learned from the HP OpenStack-Powered Cloud. While Marc noted that HP has not been able to share much of their development work on OpenStack; he was able to show performance metrics relating to a fix that HP contributed back to the OpenStack community. The defect related to the scheduler’s ability to handle load. The pre-fix data showed a climb and then a gap where the scheduler simply stopped responding. Post-fix, the performance curve is flat without any “dead zones.” (sharing data like this is what I call “open operations“)
  2. Next, I (Rob Hirschfeld) gave a brief overview of the OpenStack Essex Deploy Day (my summary) that Dell coordinated with world-wide participation. The Austin deploy day location was in the same room as the meetup so several of the OSEDD participants were still around.
  3. The meat of the meetup was a freeform discussion about what the group would like to see discussed at the Design Summit. My objective for the discussion was that the Austin OpenStack community could have a broader voice is we showed consensus for certain topics in advance of the meeting.

At Jim Plamondon‘s suggestion, we captured our brain storming on the OpenStack etherpad. The Etherpad is super cool – it allows simultaneous editing by multiple parties, so the notes below were crowd sourced during the meeting as we discussed topics that we’d like to see highlighted at the conference. The etherpad preserves editors, but I removed the highlights for clarity.

The next step is for me to consolidate the list into a voting page and ask the membership to rank the items (poll online!) below.

Brain storm results (unedited)

Stablity vs. Features

API vs. Code

  • What is the measurable feature set?
  • Is it an API, or an implementation?
  • Is the Foundation a formal-ish standards body?
  • Imagine the late end-game: can Azure/VMWare adopt OPenStack’s APIs and data formats to deliver interop, without running OpenStack’s code? Is this good? Are there conversations on displacing incumbents and spurring new adoption?
  • Logo issues

Documentation Standards

  • Dev docs vs user docs
  • Lag of update/fragmentation (10 blogs, 10 different methods, 2 “work”)
  • Per release getting started guide validated and available prior or at release.

Operations Focus

  • Error messages and codes vs python stack traces
  • Alternatively put, “how can we make error messages more ops-friendly, without making them less developer-friendly?”
  • Upgrade and operations of rolling updates and upgrades. Hot migrations?

If OpenStack was installable on Windows/Hyper-V as a simple MSI/Service installer – would you try it as a node?

  • Yes.

Is Nova too big?  How does it get fixed?

  • libraries?
  • sections?
  • make it smaller sub-projects
  • shorter release cycles?

nova-volume

  • volume split out?
  • volume expansion of backend storage systems
  • Is nova-volume the canonical control plane for storage provisioning?  Regardless of transport? It presently deals in block devices only… is the following blueprint correctly targeted to nova-volume?

https://blueprints.launchpad.net/nova/+spec/filedriver

Orchestration

  • Is the Donabe project dead?

Discussion about invitations to Summit

  • What is a contribution that warrants an invitation
  • Look at Launchpad’s Karma system, which confers karma for many different “contributory” acts, including bug fixes and doc fixes, in addition to code commitments

Summit Discussions

  • Is there a time for an operations summit?
  • How about an operators’ track?
  • Just a note: forums.openstack.org for users/operators to drive/show need and participation.

How can we capture the implicit knowledge (of mailing list and IRC content) in explicit content (documentation, forums, wiki, stackexchange, etc.)?

Hypervisors: room for discussion?

  • Do we want hypervisor featrure parity?
  • From the cloud-app developer’s perspective, I want to “write once, run anywhere,” and if hypervisor features preclude that (by having incompatible VM images, foe example)
  • (RobH: But “write once, run anywhere” [WORA] didn’t work for Java, right?)
  • (JimP: Yeah, but I was one of Microsoft’s anti-Java evangelists, when we were actively preventing it from working — so I know the dirty tricks vendors can use to hurt WORA in OpenStack, and how to prevent those trick from working.)

CDMI

Swift API is an evolving de facto open alternative to S3… CDMI is SNIA standards track.  Should Swift API become CDMI compliant?  Should CDMI exist as a shim… a la the S3 stuff.

OpenStack Essex Events (Austin & Boston 3/8, WW Hack Day 3/1, Docs 3/6)

The excitement over the OpenStack Essex release is building!  While my team has been making plans around the upcoming design summit in SF,  there is more immediate action afoot.

Tomorrow (3/1), numerous sites are gathering around a World Wide Essex Hack Day on 3/1.  If you want to participate or even host a hack venue, get on the list and IRC channel (details).

My team at Dell is organizing a community a follow-up OpenStack Essex Install Day next week (3/8) in both Austin and Boston.  Just like the Hack Day, the install fest will focus on Essex release code with both online and local presence.  Unlike the Hack Day, our focus will be on deployments.  For the Dell team, that means working on the Essex deployment for Crowbar.  We’re still working on a schedule and partner list so stay tuned.  I’m trying to webcast Crowbar & OpenStack training sessions during the install day.

The hack day will close with the regularly scheduled 3/8 OpenStack Austin Meetup (6:30pm at Austin TechRanch).  The topic for the meetup will be, …. wait for it …., the Essex Release.  Thanks go to HP and Dell for sponsoring!

It’s important to note that Anne Gentle is also coordinating an OpenStack Essex Doc Day on 3/6.

To recap:

Wow… that should satisfy your Essex cravings.