Ops Validation using Development Tests [3/4 series on Operating Open Source Infrastructure]

Posted on May 5, 2014 by Rob H

This post is the third in a 4 part series about Success factors for Operating Open Source Infrastructure.

In an automated configuration deployment scenario, problems surface very quickly. They prevent deployment and force resolution before progress can be made. Unfortunately, many times this appears to be a failure within the deployment automation. My personal experience has been exactly the opposite: automation creates a “fail fast” environment in which critical issues are discovered and resolved during provisioning instead of sleeping until later.

Our ability to detect and stop until these issues are resolved creates exactly the type of repeatable, successful deployment that is essential to long-term success. When we look at these deployments, the most important success factors are that the deployment is consistent, known and predictable. Our ability to quickly identify and resolve issues that do not match those patterns dramatically improves the long-term stability of the system by creating an environment that has been benchmarked against a known reference.

Benchmarking against a known reference is ultimately the most significant value that we can provide in helping customers bring up complex solutions such as Openstack and Hadoop. Being successful with these deployments over the long term means that you have established a known configuration, and that you have maintained it in a way that is explainable and reference-able to other places.

Reference Implementation

The concept of a reference implementation provides tremendous value in deployment. Following a pattern that is a reference implementation enables you to compare notes, get help and ultimately upgrade and change deployment in known, predictable ways. Customers who can follow and implement a vendors’ reference, or the community’s reference implementation, are able to ask for help on the mailing lists, call in for help and work with the community in ways that are consistent and predictable.

Let’s explore what a reference implementation looks like.

In a reference implementation you have a consistent, known state of your physical infrastructure that has been implemented based upon a RA. That implementation follows a known best practice using standard gear in a consistent, known configuration. You can therefore explain your configuration to a community of other developers, or other people who have similar configuration, and can validate that your problem is not the physical configuration. Fundamentally, everything in a reference implementation is driving towards the elimination of possible failure cause. In this case, we are making sure that the physical infrastructure is not causing problems (getting to a ready state), because other people are using a similar (or identical) physical infrastructure configuration.

The next components of a reference implementation are the underlying software configurations for operating system management monitoring network configuration, IP networking stacks. Pretty much the entire component of the application is riding on. There are a lot of moving parts and complexity in this scenario, witha high likelihood of causing failures. Implementing and deploying the software stacks in an automated way, has enabled us to dramatically reduce the potential for problems caused by misconfiguration. Because the number of permutations of software in the reference stack is so high, it is essential that successful deployment tightly manages what exactly is deployed, in such a way that they can identify, name, and compare notes with other deployments.

Achieving Repeatable Deployments

In this case, our referenced deployment consists of the exact composition of the operating system, infrastructure tooling, and capabilities for the deployment. By having a reference capability, we can ensure that we have the same:

Operating system
Monitoring
Configuration stacks
Security tooling
Patches
Network stack (including bridges and VLAN, IP table configurations)

Each one of these components is a potential failure point in a deployment. By being able to configure and maintain that configuration automatically, we dramatically increase the opportunities for success by enabling customers to have a consistent configuration between sites.

Repeatable reference deployments enable customers to compare notes with Dell and with others in the community. It enables us to take and apply what we have learned from one site to another. For example, if a new patch breaks functionality, then we can quickly determine how that was caused. We can then fix the solution, add in the complimentary fix, and deploy it at that one site. If we are aware that 90% of our other sites have exactly the same configuration, it enables those other sites to avoid a similar problem. In this way, having both a pattern and practice referenced deployment enables the community to absorb or respond much more quickly, and be successful with a changing code base. We found that it is impractical to expect things not to change.

The only thing that we can do is build resiliency for change into these deployments. Creating an automated and tested referenceable deployment is the best way to cope with change.

DefCore Core Capabilities Selection Criteria SIMPLIFIED -> how we are picking Core

Posted on May 1, 2014 by Rob H

I’ve posted about the early DefCore core capabilities selection process before and we’ve put them into application and discussed them with the community. The feedback was simple: tl;dr. You’ve got the right direction but make it simpler!

So we pulled the 12 criteria into four primary categories:

Usage: the capability is widely used (Refstack will collect data)
Direction: the capability advances OpenStack technically
Community: the capability builds the OpenStack community experience
System: the capability integrates with other parts of OpenStack

These categories summarize critical values that we want in OpenStack and so make sense to be the primary factors used when we select core capabilities. While we strive to make the DefCore process objective and quantitive, we must recognize that these choices drive community behavior.

With this perspective, let’s review the selection criteria. To make it easier to cross reference, we’ve given each criteria a shortened name:

Shows Proven Usage

“Widely Deployed” Candidates are widely deployed capabilities. We favor capabilities that are supported by multiple public cloud providers and private cloud products.
“Used by Tools” Candidates are widely used capabilities:Should be included if supported by common tools (RightScale, Scalr, CloudForms, …)
“Used by Clients” Candidates are widely used capabilities: Should be included if part of common libraries (Fog, Apache jclouds, etc)

Aligns with Technical Direction

“Future Direction” Should reflect future technical direction (from the project technical teams and the TC) and help manage deprecated capabilities.
“Stable” Test is required stable for >2 releases because we don’t want core capabilities that do not have dependable APIs.
“Complete” Where the code being tested has a designated area of alternate implementation (extension framework) as per the Core Principles, there should be parity in capability tested across extension implementations. This also implies that the capability test is not configuration specific or locked to non-open technology.

Plays Well with Others

“Discoverable” Capability being tested is Service Discoverable (can be found in Keystone and via service introspection)

“Doc’d” Should be well documented, particularly the expected behavior. This can be a very subjective measure and we expect to refine this definition over time.

“Core in Last Release” A test that is a must-pass test should stay a must-pass test. This make makes core capabilities sticky release per release. Leaving Core is disruptive to the ecosystem

Takes a System View

“Foundation” Test capabilities that are required by other must-pass tests and/or depended on by many other capabilities
“Atomic” Capabilities is unique and cannot be build out of other must-pass capabilities

“Proximity” (sometimes called a Test Cluster) selects for Capabilities that are related to Core Capabilities. This helps ensure that related capabilities are managed together.

Note: The 13th “non-admin” criteria has been removed because Admin APIs cannot be used for interoperability and cannot be considered Core.

Networking in Cloud Environments, SDN, NFV, and why it matters [part 1 of 2]

Posted on May 1, 2014 by Rob H

Scott Jensen is an Engineering Director and colleague of mine from Dell with deep networking and operations experience. He had first hand experience deploying OpenStack and Hadoop and has a critical role in defining Dell’s Reference Architectures in those areas. When I saw this writeup about cloud networking, I asked if it would be OK to share it with you.

Guest Post 1 of 2 by Scott Jensen:

Having a basis in enterprise data center networking, Cloud computing I have many conversations with customers implementing a cloud infrastructure. Their design the networking infrastructure can and should be different from a classic network configuration and many do not understand why. Either due to a lack of knowledge in networking or due to a lack of understanding as to why cloud computing is different from virtualization. Once you have an understanding of both of these areas you can begin to see why emerging technologies such as SDN (Software Defined Networking) and NFV (Network Function Virtualization) begin to address some of the issues that Cloud Computing can cause with your network.

Networking is all about traffic flows. In order to properly design your infrastructure you need to understand where traffic is originating, where it is going and how much traffic will be following a specific route and at what times.

There are many differences between Cloud Computing and virtualization. In many cases people I will talk to think of Cloud as virtualization in a different environment. Of course this will work just fine however it does not take advantage of the goodness that a Cloud infrastructure can bring. Some of the major differences between Virtualization and Cloud Computing have profound effects on how the network is utilized. This all has to do with the application. That is really what it is all about anyway. Rob Hirschfeld has a great post on the difference between Pets and Cattle which describes this well.

Pets and Cattle as a workload evolution

In typical virtualized infrastructures, the applications have a fairly common pattern. Many people describe these as Pets and are managed largely the same as a physical system. They have a name, they are one of a kind, they are cared for, and when the die it can be traumatic (I know I have been there).

They run on large stateful VMs
They have a lifecycle which is typically very long such as years
The applications themselves are not designed to tolerate failures. Other technologies are brought in to ensure uptime.
The application is scaled up when demands increase. This is done by adding more memory or CPU to the VM.

Cloud applications are different. Some people describe them as cattle and they are treated like cattle in many ways. They do not necessarily have a name and if one dies it is sad but not a really big deal. We should probably figure out what killed it but life goes on.

They run on smaller stateless VMs
They have a lifecycle measured in hours or months. Sometimes even less than an hour.
The application is designed to expect failures
The application scales out by increasing the number of instances which is running when the demand increases.

In his follow-up post next week, Scott discusses how this impacts the network and how SDN and NFV promises to help.

Mayflies and Dinosaurs (extending Puppies and Cattle)

Posted on March 17, 2014 by Rob H

Josh McKenty and I were discussing the common misconception of the “Puppies and Cattle” analogy. His position is not anti-puppy! He believes puppies are sometimes unavoidable and should be isolated into portable containers (VMs) so they can be shuffled around seamlessly. His more provocative point is that we want our underlying infrastructure to be cattle so it remains highly elastic and flexible. More cattle means a more resilient system. To me, this is a fundamental CloudOps design objective.

We realized that the perfect cloud infrastructure would structurally discourage the creation of puppies.

Imagine a cloud in which servers were automatically decommissioned after a week of use. In a sort of anti-SLA, any VM running for more than 168 hours would be (gracefully) terminated. This would force a constant churn of resources within the infrastructure that enables true cattle-like management. This cloud would be able to very gracefully rebalance load and handle disruptive management operations because the workloads are designed for the churn.

We called these servers mayflies due to their limited life span.

While this approach requires a high degree of automation, the most successful cloud operators I have met are effectively building workloads with this requirement. If we require application workloads to be elastic and fault-resilient then we have a much higher degree of flexibility with the underlying infrastructure. I’ve seen this in practice with several OpenStack clouds: operators with helped applications deploy using automation were able to decommission “old” clouds much more gracefully. They effectively turned their entire cloud into a cow. Sadly, the ones without that investment puppified™ the ops infrastructure and created a much more brittle environment.

The opposite of a mayfly is the dinosaur: a server that is so brittle and locked that the slightest disturbance wipes out everything it touches.

Dinosaurs are puppies grown into a T-Rex with rows of massive razor sharp teeth and tiny manicured hands. These are systems that are so unique and historical that there’s no way to recreate them if there’s a failure. The original maintainers exit happy hour was celebrated by people who were laid-off two CEOs ago. The impact of dinosaurs goes beyond their operational risk; they are typically impossible to extend or maintain and, consequently, ossify other server around them. This type of server drains elasticity from your ops team.

Puppies do not grow up to become dogs, they become dinosaurs.

It’s a classic lean adage to do hard things more frequently. Perhaps it’s time to start creating mayflies in your ops infrastructure.

OpenCrowbar reaches critical milestone – boot, discover and forge on!

Posted on March 17, 2014 by Rob H

We started the Crowbar project because we needed to make OpenStack deployments to be fast, repeatable and sharable. We wanted a tool that looked at deployments as a system and integrated with our customers’ operations environment. Crowbar was born as an MVP and quickly grew into a more dynamic tool that could deploy OpenStack, Hadoop, Ceph and other applications, but most critically we recognized that our knowledge gaps where substantial and we wanted to collaborate with others on the learning. The result of that learning was a rearchitecture effort that we started at OSCON in 2012.

After nearly two years, I’m proud to show off the framework that we’ve built: OpenCrowbar addresses the limitations of Crowbar 1.x and adds critical new capabilities.

So what’s in OpenCrowbar? Pretty much what we targeted at the launch and we’ve added some wonderful surprises too:

Heterogeneous Operating Systems – chose which operating system you want to install on the target servers.
CMDB Flexibility – don’t be locked in to a devops toolset. Attribute injection allows clean abstraction boundaries so you can use multiple tools (Chef and Puppet, playing together).
Ops Annealer –the orchestration at Crowbar’s heart combines the best of directed graphs with late binding and parallel execution. We believe annealing is the key ingredient for repeatable and OpenOps shared code upgrades
Upstream Friendly – infrastructure as code works best as a community practice and Crowbar use upstream code without injecting “crowbarisms” that were previously required. So you can share your learning with the broader DevOps community even if they don’t use Crowbar.
Node Discovery (or not) – Crowbar maintains the same proven discovery image based approach that we used before, but we’ve streamlined and expanded it. You can use Crowbar’s API outside of the PXE discovery system to accommodate Docker containers, existing systems and VMs.
Hardware Configuration – Crowbar maintains the same optional hardware neutral approach to RAID and BIOS configuration. Configuring hardware with repeatability is difficult and requires much iterative testing. While our approach is open and generic, my team at Dell works hard to validate a on specific set of gear: it’s impossible to make statements beyond that test matrix.
Network Abstraction – Crowbar dramatically extended our DevOps network abstraction. We’ve learned that a networking is the key to success for deployment and upgrade so we’ve made Crowbar networking flexible and concise. Crowbar networking works with attribute injection so that you can avoid hardwiring networking into DevOps scripts.
Out of band control – when the Annealer hands off work, Crowbar gives the worker implementation flexibility to do it on the node (using SSH) or remotely (using an API). Making agents optional means allows operators and developers make the best choices for the actions that they need to take.
Technical Debt Paydown – We’ve also updated the Crowbar infrastructure to use the latest libraries like Ruby 2, Rails 4, Chef 11. Even more importantly, we’re dramatically simplified the code structure including in repo documentation and a Docker based developer environment that makes building a working Crowbar environment fast and repeatable.

Why change to OpenCrowbar? This new generation of Crowbar is structurally different from Crowbar 1 and we’ve investing substantially in refactoring the tooling, paying down technical debt and cleanup up documentation. Since Crowbar 1 is still being actively developed, splitting the repositories allow both versions to progress with less confusion. The majority of the principles and deployment code is very similar, I think of Crowbar as a single community.

Interested? Our new Docker Admin node is quick to setup and can boot and manage both virtual and physical nodes.

OpenStack Core Definition (DefCore) Progress in 6 key areas

Posted on January 6, 2014 by Rob H

I’m excited to report about the OpenStack Board progress on defining OpenStack core. At the Hong Kong summit, Joshua McKenty and I were asked to chair a new standing committee, now known as DefCore, to define “OpenStack Core” based on the core principles that we determined over the last 6 months (aka “the spider”).

Joshua and I took on the challenge with gusto and I’m proud to say that we’ve already made significant progress against an aggressive timeline to have the pilot must-pass tests for Havana defined before the Juno Summit in April 2014. It’s important to remember that we’re moving from a project based definition of core to test-driven capabilities because this best addresses our interoperability objectives.

In the 8 weeks since the summit, we’ve had six very productive meetings (etherpads for Prep, DefCore.1, DefCore.2, Criteria 1 and 2) with detailed notes and recorded content. Here’s my summary of our results so far:

An Aggressive Timeline for having pilot Havana must-pass tests approved by the Juno summit in May 2014. That drives the schedule backward toward a preliminary list in March. Once we have a pilot list for Havana, we expect to have Ice House done +90 days and Juno done at the Paris summit.
Test Selection Criteria a preliminary set of 14 criteria (needs a stand alone post) that will be used to quantitatively score the current 700+ tests. We also agreed to use a max 100 point weighting system for the criteria. The weights and score requirement iteratively once we have done a first scoring pass. Our objective is to make must-pass test selection as objective and transparent as possible (post with details).
Distinction between Capability & Test is important because we recognize that individual tests may validate multiple capabilities and individual capabilities may have multiple tests. Our hope is to present the results in terms of capabilities not individual tests.
Holding Off on Bylaws Changes needed to clarify how OpenStack manage core definition. It was widely expected that the DefCore committee would have to make changes to the OpenStack bylaws; however, we believe we can proceed without rushing changes. We have an active subcommittee preparing changes in advance of the next DefCore cycle.
Program vs. Project Definition efforts are needed to help take pressure off requests to have “projects promoted to core status” and how the OpenStack trademark is used for projects. We are trying to clarify OpenStack Programs (e.g.: OpenStack™ Compute) carry to the trademark while OpenStack Projects (e.g.: Nova and Glace) are members of those programs and do not carry the OpenStack trademark directly. Consequently, we’d expect people to say “OpenStack Compute Project Nova” instead of “OpenStack Nova.” This approach addresses several issues that impact DefCore Board activities around trademark, core and brand.
RefStack Development and Use Cases provide the framework for community reporting of test results. We consider this infrastructure critical to getting community input about must-pass tests and also sharing interoperability information. This effort is just beginning and needs help from the community.

For all this progress, we are only starting! We’ve cleared the blocks preventing implementation and that will expose a new set issues to discuss. Look for us to start applying the criteria to tests in the next months. That will quickly expose the strengths and weaknesses of our criteria set. We’ve also got to make progress on Program vs. Project and start RefStack coding.

We want community participation! Please let us know what you think.

Spinning up OpenStack “DefCore” Committee by spotting elephants

Posted on November 22, 2013 by Rob H

This week, Joshua McKenty, me and a handful of interested individuals (board member Eileen Evans included) met to start organizing the DefCore* Committee. This standing committee was established by an OpenStack Foundation resolution just before the Hong Kong Summit. Joshua and I were nominated as co-chairs (and about half the board volunteered to be members). This action was an immediate result of the unanimous passage of the 10 Principles that I was driving in the DefCore “Spider” cycle.

We heard overwhelmingly at the Hong Kong summit that defining core should be a major focus for the Board.

The good news is that we’re doing exactly that in CoreDef. Our challenge is to go quickly but not get ahead of community consensus. So far, that means eating the proverbial elephant in small bites and intentionally deferring topics where we cannot find consensus.

This meeting was primarily about Joshua and I figuring out how to drive DefCore quickly (go fast!) without exceeding the communities ability to review and discuss (build consensus!). While we had future-post-worthy conceptual discussions, we had a substantial agenda of get-it-done in front of us too.

Here’s a summary of key outcomes from the meeting:

1) We’ve established a tentative schedule for our first two meetings (12/3 and 12/17).

We’ve started building agendas for these two meetings.
We’ve also established rules for governance that include members to do homework!

2) We’ve agreed it’s important to present a bylaws change to the committee for consideration by the board.

This change is to address confusion around how core is defined and possibly move towards the bylaws defining a core process not a list of core projects.
This is on an accelerated track because we’d like to include it with the Community Board Member elections.

3) We’ve broken DefCore into clear “cycles” so we can be clearer about concrete objectives and what items are out of scope for a cycle. We’re using names to designate cycles for clarity.

The first cycle, “Spider,” was about finding the connections between core issues and defining a process to resolve the tension in those connections.
This cycle, “Elephant,” is about breaking the Core definition into
The next cycle(s) will be named when we get there. For now, they are all “Future”
We agreed there is a lot of benefit from being clear to community about items that we “kick down the road” for future cycles. And, yes, we will proactively cut off discussion of these items out of respect for time.

4) We reviewed the timeline proposed at the end of Spider and added it to the agenda.

The timeline assumes a staged introduction starting with Havana and accelerating for each release.
We are working the timeline backwards to ensure time for Board, TC and community input.

5) We agreed that consensus is going to be a focus for keeping things moving

This will likely drive to a smaller core definition
We will actively defer issues that cannot reach consensus in the Elephant cycle.

6) We identified some concepts that may help guide the process in this cycle

We likely need to create categories beyond “core” to help bucket tests
Committee discussion is needed but debate will be time limited

7) We identified the need to start on test criteria immediately

Board member John Zannos (in absentia) offered to help lead this effort
In defining test criteria, we are likely to have lively discussions about “OpenStack’s values”

8) We identified some out of scope topics that are important but too big to solve.

We are calling these “elephants” (or the elephant in the room).
The list of elephants needs to be agreed by DefCore and clearly communicated
We expect that the Elephant cycle will make discussing these topics more fruitful

9) We talked about RefStack code features

Allowing users to upload/post test results from their clouds to enable white box test reporting
Allowing users who have uploaded results to +/- vote on tests they think are important
We established a requirement that posting results requires an OpenStack ID
We established a requirement that only a single Corporate designate (provided by the Foundation) can make a result official for their company.
Collecting opt-in data with test results using tags for things like alternate implementations use, host operating system(s), deployment method, size of cloud, and hypervisor.
We discussed (but did not resolve) that it could be possible to have people run RefStack against public cloud end points and post their results
We agreed that RefStack needs to be able to run locally or as a hosted site.

10) We identified a lot of missing communication channels

We created a DefCore wiki page to be a home for information.
Joshua and I (and others?) will work with the Foundation staff to create “what is core” video to help the community understand the Principles and objectives for the Elephant cycle.
We are in the process of setting up mail lists, IRC, blog tags, etc.

Yikes! That’s a lot of progress priming the pump for our first DefCore meeting!

* We picked “DefCore” for the core definition committee name. One overriding reason for the name is that it has very clean search results. Since the word “core” is so widely used, we wanted to make sure that commentary on this topic is easy to track against the noisy term core. We also liked 1) the reference to DefCon and 2) that the Urban Dictionary defines it as going deaf from standing too close to the speakers.

Crowbar HK Hack Report

Posted on November 3, 2013 by Rob H

Overall, I’m happy with our three days of hacking on Crowbar 2. We’ve reached the critical “deploys workload” milestone and I’m excited about well the design is working and how clearly we’ve been able to articulate our approach in code & UI.

Of course, it’s worth noting again that Crowbar 1 has also had significant progress on OpenStack Havana workloads running on Ubuntu, Centos/RHEL, and SUSE/SLES

Here are the focus items from the hack:

Documentation – cleaned up documentation specifically by updating the README in all the projects to point to the real documentation in an effort to help people find useful information faster. Reminder: if unsure, put documentation in barclamp-crowbar/doc!
Docker Integration for Crowbar 2 progress. You can now install Docker from internal packages on an admin node. We have a strategy for allowing containers be workload nodes.
Ceph installed as workload is working. This workload revealed the need for UI improvements and additional flags for roles (hello “cluster”)
Progress on OpenSUSE and Fedora as Crowbar 2 install targets. This gets us closer to true multi-O/S support.
OpenSUSE 13.1 setup as a dev environment including tests. This is a target working environment.
Being 12 hours offset from the US really impacted remote participation.

One thing that became obvious during the hack is that we’ve reached a point in Crowbar 2 development where it makes sense to move the work into distinct repositories. There are build, organization and packaging changes that would simplify Crowbar 2 and make it easier to start using; however, we’ve been trying to maintain backwards compatibility with Crowbar 1. This is becoming impossible; consequently, it appears time to split them. Here are some items for consideration:

Crowbar 2 could collect barclamps into larger “workload” repos so there would be far fewer repos (although possibly still barclamps within a workload). For example, there would be a “core” set that includes all the current CB2 barclamps. OpenStack, Ceph and Hadoop would be their own sets.
Crowbar 2 would have a clearly named “build” or “tools” repo instead of having it called “crowbar”
Crowbar 2 framework would be either part of “core” or called “framework”
We would put these in a new organization (“Crowbar2” or “Crowbar-2”) so that the clutter of Crowbar’s current organization is avoided.

While we clearly need to break apart the repo, this suggestion needs community more discussion!

Looking to Leverage OpenStack Havana? Crowbar delivers 3xL!

Posted on October 29, 2013 by Rob H

The Crowbar community has a tradition of “day zero ops” community support for the latest OpenStack release at the summit using our pull-from-source capability. This release we’ve really gone the extra mile by doing it one THREE Linux distros (Ubuntu, RHEL & SLES) in parallel with a significant number of projects and new capabilities included.

I’m especially excited about Crowbar implementation of Havana Docker support which required advanced configuration with Nova and Glance. The community also added Heat and Celiometer in the last release cycle plus High Availability (“Titanium”) deployment work is in active development. Did I mention that Crowbar is rocking OpenStack deployments? No, because it’s redundant to mention that. We’ll upload ISOs of this work for easy access later in the week.

While my team at Dell remains a significant contributor to this work, I’m proud to point out to SUSE Cloud leadership and contributions also (including the new Ceph barclamp & integration). Crowbar has become a true multi-party framework!

Want to learn more? If you’re in Hong Kong, we are hosting a Crowbar Developer Community Meetup on Monday, November 4, 2013, 9:00 AM to 12:00 PM (HKT) in the SkyCity Marriott SkyZone Meeting Room. Dell, dotCloud/Docker, SUSE and others will lead a lively technical session to review and discuss the latest updates, advantages and future plans for the Crowbar Operations Platform. You can expect to see some live code demos, and participate in a review of the results of a recent Crowbar 2 hackathon. Confirm your seat here – space is limited! (I expect that we’ll also stream this event using Google Hangout, watch Twitter #Crowbar for the feed)

My team at Dell has a significant presence at the OpenStack Summit in Hong Kong (details about activities including sponsored parties). Be sure to seek out my fellow OpenStack Board Member Joseph George, Dell OpenStack Product Manager Kamesh Pemmaraju and Enstratius/Dell Multi-Cloud Manager Founder George Reese.

Note: The work referenced in this post is about Crowbar v1.  We’ve also reached critical milestones with Crowbar v2 and will begin implementing Havana on that platform shortly.

OpenStack Neutron using Linux Bridges (technical explanation)

Posted on October 16, 2013 by Rob H

Apparently this is “Showcase Dell OpenStack/Crowbar Team Member Week” because today I’m proxy positioning for Dell OpenStack engineer Chris Dearborn. Chris has been leading our OpenStack Neutron deployment for Grizzly and Havana.

If you’re familiar with the OpenStack Networking, skip over my introductory preamble and jump right down to the meat under “SDN Client Connection: Linux Bridge.” Hopefully we can convince Chris to put together more in this series and cover GRE and VLAN configurations too.

OpenStack and Software Defined Network

Software Defined Networking (SDN) is an emerging concept that describes a family of functionality. Like cloud, the exact meaning of SDN appears to be in the eye (or brochure) of the company providing the technology. Overall, the concept for SDN is to have programmable networks that can be automatically provisioned.

Early approaches to this used the OpenFlow™ API to programmatically modify switch routing tables (aka OSI Layer 2) on a flow by flow basis across multiple switches. While highly controlled, OpenFlow has proven difficult to implement at scale in dynamic environments; consequently, many SDN implementations are now using overlay networks based on inventoried VLANs and/or dynamic tunnels.

Inventoried VLAN overlay networks create a stable base layer 2 infrastructure that can be inventoried and handed out dynamically on-demand. Generally, the management infrastructure dynamically connects the end-points (typically virtual machines) to a dedicated existing layer 2 network. This provides all of the isolation desired without having to thrash the underlying network switch infrastructure.

Dynamic tunnel overlay network also uses client connection points to isolate traffic but do not rely on switch layer 2. Instead, they encrypt traffic before sending it over a shared network. This avoids having to match dynamic networks to static inventory; however, it also adds substantial encryption overhead to the network communication. Consequently, tunnels provide more flexibility and less up front-confirmation but with lower performance.

OpenStack Networking, project Neutron (previously Quantum), is responsible for connecting virtual machines setup by OpenStack Compute (aka Nova) to the software defined networking infrastructure. By design, Neutron accommodates different implementation plug-ins. That allows operators to choose between different approaches including the addition of commercial offerings. While it is possible to use open source capabilities for small deployments and trials, most large scale deployments choose proprietary SDN technologies.

The Crowbar OpenStack installation allows operators to choose between “Open vSwitch GRE Tunnels” and “Linux Bridge VLAN” configuration. The GRE option is more flexible and requires less up front configuration; however, the encryption used by GRE will degrade performance. The Linux Bridge VLAN option requires more upfront configuration and design.

Since GRE works with minimal configuration, let’s explore what’s required to for Crowbar to setup OpenStack Neutron Linux Bridge VLAN networking.

Note: This review assumes that you already have a working knowledge of Crowbar and OpenStack.

Background

Before we dig into how OpenStack configures SDN , we need to understand how we connect between virtual machines running in the system and the physical network. This connection uses Linux Bridges. For GRE tunnels, Crowbar configures an Open vSwitch (aka OVS) on the node to create and manage the tunnels.

One challenge with SDN traffic isolation is that we can no longer assume that virtual machines with network access can reach destinations on our same network. This means that the infrastructure must provide paths (aka gateways and routers) between the tenant and infrastructure networks. A major part of the OpenStack configuration includes setting up these connections when new tenant networks are created.

Note: In the OpenStack Grizzly and earlier releases, open source code for network routers were not configured in a highly available or redundant way. This problem is addressed in the Havana release.

For the purposes of this explanation, the “network node” is the shared infrastructure server that bridges networks. The “compute node” is any one of the servers hosting guest virtual machines. Traffic in the cloud can be between virtual machines within the cloud instance (internal) or between a virtual machine and something outside the OpenStack cloud instance (external).

Let’s make sure we’re on the same page with terminology.

OSI Layer 2 – just above physical connections (layer 1), Layer two manages traffic between servers including providing logical separation of traffic.
VLAN – Virtual Local Area Network are switch enforced isolation zones created by adding 1 of 4096 tags in the network traffic (aka tagged traffic).
Tenant – a group of users in a cloud that are logically isolated (cannot see other traffic or information) but still using shared resources.
Switch – a physical device used to provide layer 1 networking connections between end points. May provide additional services on other OSI layers such as VLANs.
Network Node – an OpenStack infrastructure server that connects tenant networks to infrastructure networks.
Compute Node – an OpenStack server that runs user workloads in virtual machines or containers.

SDN Client Connection: Linux Bridge

The VLAN range for Linux Bridge is configurable in /etc/quantum/quantum.conf by changing the network_vlan_ranges parameter. Note that this parameter is set by the Crowbar Neutron chef recipe. The VLAN range is configured to start at whatever the “vlan” attribute in the nova_fixed network in the bc-template-network.json is set to. The VLAN range end is hard coded to end at the VLAN start plus 2000.

Reminder: The maximum VLAN tag is 4096 so the VLAN tag for nova_fixed should never be set to anything greater than 2095 to be safe.

Networks are assigned the next available VLAN tag as they are created. For instance, the first manually created network will be assigned VLAN 501, the next VLAN 502, etc. Note that this is independent of what tenant the new network resides in.

The convention in Linux Bridge is to name the various network constructs including the first 11 characters of the UUID of the associated Neutron object. This allows you to run the quantum CLI command listing out the objects you are interested in, and grepping on the 11 uuid characters from the network construct name. This shows what Neutron object a given network construct maps to.

Network Creation

When a network is created, a corresponding bridge is created and is given the name br<network_uuid>. A subinterface of the NIC is also created and is named <interface_name>.<vlan_tag>. This subinterface is slaved to the bridge. Note that this only happens when the network is needed (when a VM is created on the network).

This occurs on both the network node and the compute nodes.

Additional Steps Taken On The Network Node During Network Creation

On the network node, a bridge and subinterface is created per network and the subinterface is slaved to the bridge as described above. If the network is attached to the router, then a TAP interface that the router listens on is created and slaved to the bridge. If DHCP is selected, then another TAP interface is created that the dnsmasq process talks to, and that interface is also slaved to the bridge.

VM Creation On A Compute Node

When a VM is created, a TAP interface is created and named tap<port_uuid>. The port is the Neutron port that the VM is plugged in to. This TAP interface is slaved to the bridge associated with the network that the user selected when creating the VM. Note that this occurs on compute nodes only.

Determining the dnsmasq port/tap interface for a network

The TAP port associated with dnsmasq for a network can be determined by first getting the uuid of the network, then looking on the network node in /var/lib/quantum/dhcp/<network_uuid>/interface. The interface will be named ns-. Note that this is only the first 11 characters of the uuid. The tap interface will be named tap.

Summary

Understanding OpenStack Networking is critical to operating a successful cloud deployment. The Crowbar Team at Dell has invested significant effort to automate the configuration of Neutron. This helps you eliminate the risk of manual configuration and leverage our extensive testing and field experience.

If you are interested in seeing the exact sequences used by Crowbar, please visit the Crowbar Github repository for the “Quantum Barclamp.”

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Category Archives: People/Dell