From Metal Foundation to FIVE new workloads in five weeks

SpinningOpenCrowbar Drill release (will likely become v2.3) is wrapping up in the next few weeks and it’s been amazing to watch the RackN team validate our designs by pumping out workloads and API integrations (list below).

I’ve posted about the acceleration from having a ready state operations base and we’re proving that out.  Having an automated platform is critical for metal deployment because there is substantial tuning and iteration needed to complete installations in the field.

Getting software setup once is not a victory: that’s called a snowflake   

Real success is tearing it down and having work the second, third and nth times.  That’s because scale ops is not about being able to install platforms.  It’s about operationalizing them.

Integration: the difference between install and operationalization.

When we build a workload, we are able to build up the environment one layer at a time.  For OpenCrowbar, that starts with a hardware inventory and works up though RAID/BIOS and O/S configuration.  After the OS is ready, we are able to connect into the operational environment (SSH keys, NTP, DNS, Proxy, etc) and build real multi-switch/layer2 topographies.  Next we coordinate multi-node actions like creating Ceph, Consul and etcd clusters so that the install is demonstrably correct across nodes and repeatable at every stage.  If something has to change, you can repeat the whole install or just the impacted layers.  That is what I consider integrated operation.

It’s not just automating a complex install.  We design to be repeatable site-to-site.

Here’s the list of workloads we’ve built on OpenCrowbar and for RackN in the last few weeks:

  1. Ceph (OpenCrowbar) with advanced hardware optimization and networking that synchronizes changes in monitors.
  2. Docker Swarm (RackN) (or DIY with Docker Machine on Metal)
  3. StackEngine (RackN) builds a multi-master cluster and connects all systems together.
  4. Kubernetes (RackN) that includes automatic high available DNS configuration, flannel networking and etcd cluster building.
  5. CloudFoundry on Metal via BOSH (RackN) uses pools of hardware that are lifecycle managed OpenCrowbar including powering off systems that are idle.
  6. I don’t count the existing RackN OpenStack via Packstack (RackN) workload because it does not directly leverage OpenCrowbar clustering or networking.  It could if someone wanted to help build it.

And… we also added a major DNS automation feature and updated the network mapping logic to work in environments where Crowbar does not manage the administrative networks (like inside clouds).  We’ve also been integrating deeply with Hashicorp Consul to allow true “ops service discovery.”

DNS is critical – getting physical ops integrations right matters

Why DNS? Maintaining DNS is essential to scale ops.  It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.

Names MatterI love talking about the small Ops things that make a huge impact in quality of automation.  Things like automatically building a squid proxy cache infrastructure.

Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.

Why is that a big deal?  There are a lot of names & IPs to manage.

In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6.  Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references.  It gets even more complex when you factor in DNS round robin or other common practices.

Plus mistakes are expensive.  Name resolution is an essential service for operations.

I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential.  OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful.  This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses.  The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.

What DNS automation did we enable in OpenCrowbar?  Here’s a partial list:

  1. recovery of names and IPs when interfaces and systems are decommissioned
  2. use of flexible naming patterns so that you can control how the systems are registered
  3. ability to register names in multiple DNS infrastructures
  4. ability to understand sub-domains so that you can map DNS by region
  5. ability to register the same system under multiple names
  6. wild card support for C-Names
  7. ability to create a DNS round-robin group and keep it updated

But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.

When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.

OpenStack Vancouver six observations: partners, metal, tents, defore, brands & breakage

As always, OpenStack conferences/summits are packed with talks and discussions.  Any one of these six points could be a full post; however, I would rather post now and start discussions.  Let me know what you think!

1. Partnering Everywhere – it’s froth, not milk

Everyone is partnering with everyone! It’s a good way to appear to cover more around and appear more open. Right now, I believe these partnerships are for show and very shallow. There will be blood when money is flowing and both partners want the lion’s share.

2. Metal is Hot! attention on Ironic & MaaS

Metal is very hot topic. No surprise, but I do not think that either MaaS or Ironic have the right architecture to deal with the real complexity of automating metal in a generalized way. The consequence is that they are limited and hard to operate.

Container talks were also very hot and I believe are ultimately disruptive.  The very fact that all the container talks were overflowing is an indication of the challenges facing virtualization.

3. DefCore – Just in the Nick of Time

I think that the press and analysts were ready to proclaim that OpenStack was fragmenting and being unable to deliver the “one cloud, multiple vendors” vision. DefCore (presented as Interopability by Jonathan Bryce, DefCore shout out!) came in on the buzzer to buy us more time.

4. Big Tent Concerns – what is ecosystem & release?

Big Tent is shorthand for project governance changes that make it easier for new projects to become OpenStack projects and removes the concept of integrated releases.  The exact definition is still a work in progress.

The top concerns I have are:

  1. We cannot tell difference between community & ecosystem. We’re back to anointed projects because we’re now telling projects they have to join OpenStack to work with OpenStack.
  2. We’re changing the definition of the release but have not defined how it will change. I acknowledge that continuous release is ideal but we’re confusing people again.

5. Brands are battling – will they destroy the city?

OpenStack is hard for startups – read the full post here.  The short version is that big companies are taking up all the air.

While some are leading, others they are learning how to collaborate.  Those new to open source are slow to trust and uncertain about where to invest.  Unfortunately, we’ve created a visible contributions economy that does not reward doing the scut work so it’s no surprise that there are concerns that some of the bigger companies are free riding.

6. OpenStack is broken talks – could we reboot?  no.

It’s a sign of OpenStack’s age that Bias, Termie and others suggested we need clean slate.  Frankly, I think that OpenStack would be irrelevant by the time a rewrite was completed and it not helpful to suggest it.

What would I suggest?  I’d promote a strong core (doing!), ensure big companies collaborate on roadmap (doing!) and stop having a single node install as gate and dev reference (I’d happily help use OCB for this with partners)

PS: Apparently Neutron is not broken.

I’m very excited about the “just give me a network” work to make Neutron duplicate Nova-Net functionality.  Finally.

OpenSource.com Interview on DefCore, project management, and the future of OpenStack

Reposted from My Interview with RedHat’s OpenSource.com Jason Baker

Rob Hirschfeld has been involved with OpenStack since before the project was even officially formed, and so he brings a rich perspective as to the project’s history, its organization, and where it may be headed next. Recently, he has focused primarily on the physical infrastructure automation space, working with an an enterprise version of OpenCrowbar, an “API-driven metal” project which started as an OpenStack installer and moved to a generic workload underlay.

Rob is speaking on two panels at the upcoming OpenStack Summit in Vancouver, including DefCore 2015 and the State of OpenStack Project Management. We caught up with Rob to get updates about these two topics and what else lies ahead for OpenStack.

We asked you to help walk us through DefCore as it was being developed last year; just as a reminder, what is DefCore and why should people care about it?

DefCore creates a minimal definition for OpenStack vendors to help ensure interoperability and stability for the user community. While DefCore definitions apply only to vendors asking to use the OpenStack trademark, there are technical impacts on the tests and APIs that we select as required. We’ve worked hard to make sure the that selection process for picking “core” is transparent and fair.

What did the changes approved by the OpenStack Foundation membership earlier this year mean for DefCore?

The by-laws changes approved by the community were important to allow us to use DefCore more granular definition of Core. The previous by-laws were much more project focused. The changes allow us to select specific APIs and code components from a project as required instead of picking everything blindly. That allows projects to have both stable and new innovative components.

What can we expect from OpenStack’s structure and organization as we move forward towards the next release?

There are a lot of changes still to come. The technical leadership is making it easier to become part of the OpenStack code base. I’ve written about this change having potentially both positive and negative impacts on OpenStack to make it appear more like a suite of projects than a tightly integrate product. In many ways, DefCore helps vendors define OpenStack as a product as the community is expanding to include more capabilities. In my discussions, this is a good balance.

Switching gears a bit, you’ve also been heavily involved in the OpenStack project management working group. How has that group been progressing since they convened at the Paris Summit?

This group has made a lot of progress. We’ve seen non-board leadership step in and lead the group. That leadership is more organic and based in the companies that are directly contributing. I think that’s resulted in a lot of good ideas and documentation from the group. We’ll see some excellent results in Vancouver from them. It’s going to come back to the community and technical leadership to leverage that work. I think that’s the real test: we have to share ownership of direction between multiple perspectives. The first step in doing that is writing it down (which is what they have been doing).

Aside from the organization, let’s talk about the software itself. What are you hoping to see from the Liberty release?

I’m hoping to see adoption of Neutron accelerate. Having two network approaches makes it impossible to really have an interoperability story. That means Neutron has to be working technically, but also for operators and users. To be brutally honest, it also has to overcome its own reputation. If Neutron does not become the dominate choice, we are going to effectively have two major flavors of OpenStack. From the DefCore, vendor, or user perspective, that’s a very challenging position.

Anything else you’d like to add?

We’ve accomplished a lot together. In some ways, chasing too many targets is our biggest threat. I think that container workloads and orchestration are already being very disruptive for OpenStack. I’m hoping that we focus on delivering a stable core infrastructure. That’s why I’ve been working so hard on DefCore. Looking forward, there’s an increasing risk of trying to chase too many targets and losing the core of what users want.

This article is part of the Speaker Interview Series for OpenStack Summit Vancouver, a five-day conference for developers, users, and administrators of OpenStack Cloud Software.

Hidden costs of Cloud? No surprises, it’s still about complexity = people cost

Last week, Forbes and ZDnet posted articles discussing the cost of various cloud (451 source material behind wall) full of dollar per hour costs analysis.  Their analysis talks about private infrastructure being an order of magnitude cheaper (yes, cheaper) to own than public cloud; however, the open source price advantages offered by OpenStack are swallowed by added cost of finding skilled operators and its lack of maturity.

At the end of the day, operational concerns are the differential factor.

The Magic 8 Cube

The Magic 8 Cube

These articles get tied down into trying to normalize clouds to $/vm/hour analysis and buried the lead that the operational decisions about what contributes to cloud operational costs.   I explored this a while back in my “magic 8 cube” series about six added management variations between public and private clouds.

In most cases, operations decisions is not just about cost – they factor in flexibility, stability and organizational readiness.  From that perspective, the additional costs of public clouds and well-known stacks (VMware) are easily justified for smaller operations.  Using alternatives means paying higher salaries and finding talent that requires larger scale to justify.

Operational complexity is a material cost that strongly detracts from new platforms (yes, OpenStack – we need to address this!)

Unfortunately, it’s hard for people building platforms to perceive the complexity experienced by people outside their community.  We need to make sure that stability and operability are top line features because complexity adds a very real cost because it comes directly back to cost of operation.

In my thinking, the winners will be solutions that reduce BOTH cost and complexity.  I’ve talked about that in the past and see the trend accelerating as more and more companies invest in ops automation.

OpenStack DefCore Community Review – TWO Sessions April 21 (agenda)

During the DefCore process, we’ve had regular community check points to review and discuss the latest materials from the committee.  With the latest work on the official process and flurry of Guidelines, we’ve got a lot of concrete material to show.

To accommodate global participants, we’ll have TWO sessions (and record both):

  1. April 21 8 am Central (1 pm UTC) https://join.me/874-029-687 
  2. April 21 8 pm Central (9 am Hong Kong) https://join.me/903-179-768 

Eye on OpenStackConsult the call etherpad for call in details and other material.

Planned Agenda:

  • Background on DefCore – very short 10 minutes
    • short description
    • why board process- where community
  •  Interop AND Trademark – why it’s both – 5 minutes
  •  Vendors AND Community – balancing the needs – 5 minutes
  •  Mechanics
    • testing & capabilities – 5 minutes
    • self testing & certification – 5 minutes
    • platform & components & trademark – 5 minutes
  • Quick overview of the the Process (to help w/ reviewers) – 15 minutes
  • How to get involved (Gerrit) – 5 minute

How Nebula shows why the OpenStack community (and OSS in general) should care about profits

Fancy DogsIn dog piling onto the news about early OpenStack provider Nebula’s demise, I’ll use their news to reinforce my belief that open communities need to ensure that funds are flowing to leaders and contributors.  This is one of the reasons that I invest time in DefCore: having a clear base product definition helps create healthy vendors.

Open source software is not simply free VC funded projects.  The industry needs a stable transparent supply chain of software to build on.

I am not asserting anything about Nebula’s business or model.  To me, it is a single downward data point in my ongoing review of vendor health in the OpenStack community.  OpenStack features and functions will not matter if vendors in the community cannot afford to sustain them.

For enterprises to rely on open source software supply chains, they need to make sure they are paying into the community.  The simplest route to “paying in” is through vendors.

My OpenStack Super User Interview [cross-post]

This post of my interview for the OpenStack Super User site originally appeared there on 3/23 under the title “OpenStack at 10: different code, same collaboration?”

With over 15 years of cloud experience, Rob Hirschfeld also goes way back with OpenStack. His involvement dates to before it was officially founded and he was also one of the initial Board Members. In addition to his role as Individual Director, Hirschfeld currently chairs the DefCore committee. He’ll be speaking about DefCore at the upcoming Vancouver Summit with Alan Clark, Egle Sigler and Sean Roberts.

He talks to Superuser about the importance of patches, priorities for 2015 and why you should care about OpenStack vendors making money.

Superuser: You’ve been with the project since before it started, where do you hope it will be in five years?

In five years, I expect that nearly every line of code will have been replaced. The thing that will endure is the community governance and interaction models that we’re working out to ensure commercial collaboration.

[3/24 Added Clarification Note: I find humbled watching traditionally open-unfriendly corporations using OpenStack to learn how to become open source collaborations.  Our governance choices will have long lasting ramifications in the industry.] 

What is something that a lot of people don’t know about OpenStack?

It was essentially a “rewrite fork” of Eucalyptus created because they would not accept patches.  That’s a cautionary tale about why accepting patches is essential that should not get lost from the history books.

Any thoughts on your first steps to the priorities you laid out in your candidacy profile?

I’ve already started to get DefCore into an execution phase with additional Board and Foundation leadership joining into the effort.  We’ve set a very active schedule of meetings with two sub-committees running in parallel…It’s going to be a busy spring.

You say that the company you founded, RackN, is not creating an OpenStack product. How are you connected to the community?

RackN supports OpenCrowbar which provides a physical ready state infrastructure for scale platforms like OpenStack. We are very engaged in the community from below by helping make other distributions, vendors and operators successful.

What are the next steps to creating the “commercially successful ecosystem” you mentioned in your candidacy profile? What are the biggest obstacles to this?

We have to make stability and scale a critical feature. This will mean slowing features and new projects; however, I hear a lot of frustration that OpenStack is not focused on delivering a solid base.

Without a base, the vendors cannot build profitable products.  Without profits, they cannot keep funding the project. This may be radical for an open project, but I think everyone needs to care more if vendors are making money.

What are some more persistent myths about the cloud?

That the word cloud really means anything.  Everyone has their own definition.  Mine is “infrastructure with an API” but I’d happily tell you it’s also about process and ops.

Who are your real-life heroes?

FIRST (For Inspiration and Recognition of Science and Technology) founders Dean Kamen and Woodie Flowers. They executed a real vision about how to train for both competition and collaboration in the next generation of engineers.  Their efforts in building the next generation of leaders really impact how we will should open source collaboration. That’s real innovation.

What do you hope to get out of the next summit?

First, I want to see vendors passing DefCore requirements.  After that, I’d like to see the operators get more equal treatment and I’m hoping to spend more time working with them so they can create places to share knowledge.

What’s your favorite/most important OpenStack debate?

There are two.  First, I think the API vs. implementation is a critical growth curve for OpenStack.  We need to mature past being so implementation driven so we can have stand alone APIs.

Second, I think the “benevolent dictator” discussion is useful. Since we are never going to have one, we need a real discussion about how to define and defend project wide priorities in a meaningful way.  Resolving both items is essential to our long-term viability.

OpenStack DefCore Process Draft Posted for Review [major milestone]

OpenStack DefCore Committee is looking for community feedback about the proposed DefCore Process.

Golden PathMarch has been a month for OpenStack DefCore milestones.  At the March Board meeting, we approved the first official DefCore Guideline (called DefCore 2015.03) and we are poised to commit the first DefCore Process draft.

Once this initial commit is approved by the DefCore Committee (expected at DefCore Scale.8 Meeting 3/25 @ 9 PT), we’ll be ready for broader input by the community using the standard OpenStack Gerrit review process.  If you are not comfortable with Gerrit, we’ll take your input anyway that you want to give it except via telepathy (we’ve already got a lot on our minds).

Note: We’re also looking for input on the 2015.next Guideline targeted for 2015.04,

The DefCore Process documents the rules (who, what, when and where) that will govern how we create the DefCore Guidelines.  By design, it has to be detailed and specific without adding complexity and confusion.  The why of DefCore is all that work we did on principles that shape the process.

This process reflects nearly a year of gestation starting from the June 2014 DefCore face-to-face.  Once of the notable recent refinements was to organize material into time phases and to be more specific about who is responsible for specific actions.

To make review easier, I’ve reposted the draft.  Comments are welcome here and on the patch (and here after it lands).

DRAFT: OpenStack DefCore Process 2015A (reposted from OpenStack/DefCore)

This document describes the DefCore process required by the OpenStack bylaws and approved by the OpenStack Technical Committee and Board.

Expected Time line:

Time Frame Milestone Activities Lead By
-3 months S-3 “preliminary” draft (from current) DefCore
-2 months S-2 ID new Capabilities Community
-1 month S-1 Score capabilities DefCore
Summit S “solid” draft Community
Advisory items selected DefCore
+1 month S+1 self-testing Vendors
+2 months S+2 Test Flagging DefCore
+3 months S+3 Approve Guidance Board

Note: DefCore may accelerate the process to correct errors and omissions.

Process Definition

Continue reading

Start-ups are Time Machines for Big Companies [and open source is a worm hole]

My time at Dell (ended 10/2014) forced me to correct one of the most common misconceptions I hear about big companies – that they cannot innovate. My surprise at Dell was not the lack of innovation, but it’s overabundance. Having talked to many colleagues at big companies, I find the same pattern is everywhere. It’s not that these companies lack amazing and creative ideas, it’s that they have so many that it’s impossible for them to filter and promote them.

Innovation at a big company is like a nest of baby chicks all fighting for a worm but the parent bird can’t decide which chicks to feed.

In order for an idea to win at a big company, it generally has to shout so loudly and promise so extravagantly that it’s setup to fail right out of incubation. Consequently, great ideas are either never launched or killed in adolescence. Of course YMMV, but I’ve seen this pattern repeated throughout the tech industry.

TeamTimeCar.com-BTTF DeLorean Time Machine-OtoGodfrey.com-JMortonPhoto.com-07.jpg

“TeamTimeCar.com-BTTF DeLorean Time Machine-OtoGodfrey.com-JMortonPhoto.com-07” by Terabass. Licensed under CC BY-SA 4.0 via Wikimedia Commons

What big companies really need is a time machine. That way, they can retroactively pick the right innovation and nurture it into a product that immediately benefits from their customer base, support infrastructure and market presence.

Money is a time machine.

With enough money, they can go backwards in time and unwind the decision to not invest in that innovative idea or team. It’s called purchasing a company. Sure, there’s a significant cash premium but that’s easier than stealing more plutonium for your DeLorean. In my experience, it’s behaviorally consistent for companies to act quickly on large outlays for retroactively correct decisions while being unwilling to deal with the political and long-term planning aspects of incubation.

I’ve come to embrace this cycle of innovation with an interesting twist: the growth of open source business models enables a new degree of cross innovation between start-ups and big companies. With open source, corporate locked innovators can exercise their ideas with start-ups and start-ups can leverage the talent and financial depth of big companies.

That’s like creating temporal worm holes in the venture-time continuum. Now that sounds like a topic for a future post… thoughts?