You need a Squid Proxy fabric! Getting Ready State Best Practices

Sometimes a solving a small problem well makes a huge impact for operators.  Talking to operators, it appears that automated configuration of Squid does exactly that.

Not a SQUID but...

If you were installing OpenStack or Hadoop, you would not find “setup a squid proxy fabric to optimize your package downloads” in the install guide.   That’s simply out of scope for those guides; however, it’s essential operational guidance.  That’s what I mean by open operations and creating a platform for sharing best practice.

Deploying a base operating system (e.g.: Centos) on a lot of nodes creates bit-tons of identical internet traffic.  By default, each node will attempt to reach internet mirrors for packages.  If you multiply that by even 10 nodes, that’s a lot of traffic and a significant performance impact if you’re connection is limited.

For OpenCrowbar developers, the external package resolution means that each dev/test cycle with a node boot (which is up to 10+ times a day) is bottle necked.  For qa and install, the problem is even worse!

Our solution was 1) to embed Squid proxies into the configured environments and the 2) automatically configure nodes to use the proxies.   By making this behavior default, we improve the overall performance of a deployment.   This further improves the overall network topology of the operating environment while adding improved control of traffic.

This is a great example of how Crowbar uses existing operational tool chains (Chef configures Squid) in best practice ways to solve operations problems.  The magic is not in the tool or the configuration, it’s that we’ve included it in our out-of-the-box default orchestrations.

It’s time to stop fumbling around in the operational dark.  We need to compose our tool chains in an automated way!  This is how we advance operational best practice for ready state infrastructure.

OpenCrowbar Design Principles: Attribute Injection [Series 6 of 6]

This is part 5 of 6 in a series discussing the principles behind the “ready state” and other concepts implemented in OpenCrowbar.  The content is reposted from the OpenCrowbar docs repo.

Attribute Injection

Attribute Injection is an essential aspect of the “FuncOps” story because it helps clean boundaries needed to implement consistent scripting behavior between divergent sites.

attribute_injectionIt also allows Crowbar to abstract and isolate provisioning layers. This operational approach means that deployments are composed of layered services (see emergent services) instead of locked “golden” images. The layers can be maintained independently and allow users to compose specific configurations a la cart. This approach works if the layers have clean functional boundaries (FuncOps) that can be scoped and managed atomically.

To explain how Attribute Injection accomplishes this, we need to explore why search became an anti-pattern in Crowbar v1. Originally, being able to use server based search functions in operational scripting was a critical feature. It allowed individual nodes to act as part of a system by searching for global information needed to make local decisions. This greatly added Crowbar’s mission of system level configuration; however, it also created significant hidden interdependencies between scripts. As Crowbar v1 grew in complexity, searches became more and more difficult to maintain because they were difficult to correctly scope, hard to centrally manage and prone to timing issues.

Crowbar was not unique in dealing with this problem – the Attribute Injection pattern has become a preferred alternative to search in integrated community cookbooks.

Attribute Injection in OpenCrowbar works by establishing specific inputs and outputs for all state actions (NodeRole runs). By declaring the exact inputs needed and outputs provided, Crowbar can better manage each annealing operation. This control includes deployment scoping boundaries, time sequence of information plus override and substitution of inputs based on execution paths.

This concept is not unique to Crowbar. It has become best practice for operational scripts. Crowbar simply extends to paradigm to the system level and orchestration level.

Attribute Injection enabled operations to be:

  • Atomic – only the information needed for the operation is provided so risk of “bleed over” between scripts is minimized. This is also a functional programming preference.
  • Isolated Idempotent – risk of accidentally picking up changed information from previous runs is reduced by controlling the inputs. That makes it more likely that scripts can be idempotent.
  • Cleanly Scoped – information passed into operations can be limited based on system deployment boundaries instead of search parameters. This allows the orchestration to manage when and how information is added into configurations.
  • Easy to troubleshoot – since the information is limited and controlled, it is easier to recreate runs for troubleshooting. This is a substantial value for diagnostics.

OpenCrowbar.Anvil released – hammering out a gold standard in open bare metal provisioning

OpenCrowbarI’m excited to be announcing OpenCrowbar’s first release, Anvil, for the community.  Looking back on our original design from June 2012, we’ve accomplished all of our original objectives and more.
Now that we’ve got the foundation ready, our next release (OpenCrowbar Broom) focuses on workload development on top of the stable Anvil base.  This means that we’re ready to start working on OpenStack, Ceph and Hadoop.  So far, we’ve limited engagement on workloads to ensure that those developers would not also be trying to keep up with core changes.  We follow emergent design so I’m certain we’ll continue to evolve the core; however, we believe the Anvil release represents a solid foundation for workload development.
There is no more comprehensive open bare metal provisioning framework than OpenCrowbar.  The project’s focus on a complete operations model that comprehends hardware and network configuration with just enough orchestration delivers on a system vision that sets it apart from any other tool.  Yet, Crowbar also plays nicely with others by embracing, not replacing, DevOps tools like Chef and Puppet.
Now that the core is proven, we’re porting the Crowbar v1 RAID and BIOS configuration into OpenCrowbar.  By design, we’ve kept hardware support separate from the core because we’ve learned that hardware generation cycles need to be independent from the operations control infrastructure.  Decoupling them eliminates release disruptions that we experienced in Crowbar v1 and­ makes it much easier to use to incorporate hardware from a broad range of vendors.
Here are some key components of Anvil
  • UI, CLI and API stable and functional
  • Boot and discovery process working PLUS ability to handle pre-populating and configuration
  • Chef and Puppet capabilities including Birk Shelf v3 support to pull in community upstream DevOps scripts
  • Docker, VMs and Physical Servers
  • Crowbar’s famous “late-bound” approach to configuration and, critically, networking setup
  • IPv6 native, Ruby 2, Rails 4, preliminary scale tuning
  • Remarkably flexible and transparent orchestration (the Annealer)
  • Multi-OS Deployment capability, Ubuntu, CentOS, or Different versions of the same OS
Getting the workloads ported is still a tremendous amount of work but the rewards are tremendous.  With OpenCrowbar, the community has a new way to collaborate and integration this work.  It’s important to understand that while our goal is to start a quarterly release cycle for OpenCrowbar, the workload release cycles (including hardware) are NOT tied to OpenCrowbar.  The workloads choose which OpenCrowbar release they target.  From Crowbar v1, we’ve learned that Crowbar needed to be independent of the workload releases and so we want OpenCrowbar to focus on maintaining a strong ops platform.
This release marks four years of hard-earned Crowbar v1 deployment experience and two years of v2 design, redesign and implementation.  I’ve talked with DevOps teams from all over the world and listened to their pains and needs.  We have a long way to go before we’re deploying 1000 node OpenStack and Hadoop clusters, OpenCrowbar Anvil significantly moves the needle in that direction.
Thanks to the Crowbar community (Dell and SUSE especially) for nurturing the project, and congratulations to the OpenCrowbar team getting us this to this amazing place.

 

OpenStack Neutron using Linux Bridges (technical explanation)

chris_net3Apparently this is “Showcase Dell OpenStack/Crowbar Team Member Week” because today I’m proxy positioning for Dell OpenStack engineer Chris Dearborn.  Chris has been leading our OpenStack Neutron deployment for Grizzly and Havana.

If you’re familiar with the OpenStack Networking, skip over my introductory preamble and jump right down to the meat under “SDN Client Connection: Linux Bridge.”  Hopefully we can convince Chris to put together more in this series and cover GRE and VLAN configurations too.

OpenStack and Software Defined Network

Software Defined Networking (SDN) is an emerging concept that describes a family of functionality.  Like cloud, the exact meaning of SDN appears to be in the eye (or brochure) of the company providing the technology.  Overall, the concept for SDN is to have programmable networks that can be automatically provisioned.

Early approaches to this used the OpenFlow™ API to programmatically modify switch routing tables (aka OSI Layer 2) on a flow by flow basis across multiple switches.  While highly controlled, OpenFlow has proven difficult to implement at scale in dynamic environments; consequently, many SDN implementations are now using overlay networks based on inventoried VLANs  and/or dynamic tunnels.

Inventoried VLAN overlay networks create a stable base layer 2 infrastructure that can be inventoried and handed out dynamically on-demand.  Generally, the management infrastructure dynamically connects the end-points (typically virtual machines) to a dedicated existing layer 2 network.  This provides all of the isolation desired without having to thrash the underlying network switch infrastructure.

Dynamic tunnel overlay network also uses client connection points to isolate traffic but do not rely on switch layer 2.  Instead, they encrypt traffic before sending it over a shared network.  This avoids having to match dynamic networks to static inventory; however, it also adds substantial encryption overhead to the network communication.  Consequently, tunnels provide more flexibility and less up front-confirmation but with lower performance.

OpenStack Networking, project Neutron (previously Quantum), is responsible for connecting virtual machines setup by OpenStack Compute (aka Nova) to the software defined networking infrastructure.  By design, Neutron accommodates different implementation plug-ins.  That allows operators to choose between different approaches including the addition of commercial offerings.   While it is possible to use open source capabilities for small deployments and trials, most large scale deployments choose proprietary SDN technologies.

The Crowbar OpenStack installation allows operators to choose between “Open vSwitch GRE Tunnels” and “Linux Bridge VLAN” configuration.  The GRE option is more flexible and requires less up front configuration; however, the encryption used by GRE will degrade performance.  The Linux Bridge VLAN option requires more upfront configuration and design.

Since GRE works with minimal configuration, let’s explore what’s required to for Crowbar to setup OpenStack Neutron Linux Bridge VLAN networking.

Note: This review assumes that you already have a working knowledge of Crowbar and OpenStack.

Background

Before we dig into how OpenStack configures SDN , we need to understand how we connect between virtual machines running in the system and the physical network.  This connection uses Linux Bridges.  For GRE tunnels, Crowbar configures an Open vSwitch (aka OVS) on the node to create and manage the tunnels.

One challenge with SDN traffic isolation is that we can no longer assume that virtual machines with network access can reach destinations on our same network.  This means that the infrastructure must provide paths (aka gateways and routers) between the tenant and infrastructure networks.  A major part of the OpenStack configuration includes setting up these connections when new tenant networks are created.

Note: In the OpenStack Grizzly and earlier releases, open source code for network routers were not configured in a highly available or redundant way.  This problem is addressed in the Havana release.

For the purposes of this explanation, the “network node” is the shared infrastructure server that bridges networks.  The “compute node” is any one of the servers hosting guest virtual machines.  Traffic in the cloud can be between virtual machines within the cloud instance (internal) or between a virtual machine and something outside the OpenStack cloud instance (external).

Let’s make sure we’re on the same page with terminology.

  • OSI Layer 2 – just above physical connections (layer 1), Layer two manages traffic between servers including providing logical separation of traffic.
  • VLAN – Virtual Local Area Network are switch enforced isolation zones created by adding 1 of 4096 tags in the network traffic (aka tagged traffic).
  • Tenant – a group of users in a cloud that are logically isolated (cannot see other traffic or information) but still using shared resources.
  • Switch – a physical device used to provide layer 1 networking connections between end points.  May provide additional services on other OSI layers such as VLANs.
  • Network Node – an OpenStack infrastructure server that connects tenant networks to infrastructure networks.
  • Compute Node – an OpenStack server that runs user workloads in virtual machines or containers.

SDN Client Connection: Linux Bridge 

chris_net1

The VLAN range for Linux Bridge is configurable in /etc/quantum/quantum.conf by changing the network_vlan_ranges parameter.  Note that this parameter is set by the Crowbar Neutron chef recipe.  The VLAN range is configured to start at whatever the “vlan” attribute in the nova_fixed network in the bc-template-network.json is set to.  The VLAN range end is hard coded to end at the VLAN start plus 2000.

Reminder: The maximum VLAN tag is 4096 so the VLAN tag for nova_fixed should never be set to anything greater than 2095 to be safe.

Networks are assigned the next available VLAN tag as they are created.  For instance, the first manually created network will be assigned VLAN 501, the next VLAN 502, etc.  Note that this is independent of what tenant the new network resides in.

The convention in Linux Bridge is to name the various network constructs including the first 11 characters of the UUID of the associated Neutron object.  This allows you to run the quantum CLI command listing out the objects you are interested in, and grepping on the 11 uuid characters from the network construct name.  This shows what Neutron object a given network construct maps to.

chris_net2

Network Creation

When a network is created, a corresponding bridge is created and is given the name br<network_uuid>.  A subinterface of the NIC is also created and is named <interface_name>.<vlan_tag>.  This subinterface is slaved to the bridge.  Note that this only happens when the network is needed (when a VM is created on the network).

This occurs on both the network node and the compute nodes.

Additional Steps Taken On The Network Node During Network Creation

On the network node, a bridge and subinterface is created per network and the subinterface is slaved to the bridge as described above.  If the network is attached to the router, then a TAP interface that the router listens on is created and slaved to the bridge.  If DHCP is selected, then another TAP interface is created that the dnsmasq process talks to, and that interface is also slaved to the bridge.

VM Creation On A Compute Node

When a VM is created, a TAP interface is created and named tap<port_uuid>.  The port is the Neutron port that the VM is plugged in to.  This TAP interface is slaved to the bridge associated with the network that the user selected when creating the VM.  Note that this occurs on compute nodes only.

Determining the dnsmasq port/tap interface for a network

The TAP port associated with dnsmasq for a network can be determined by first getting the uuid of the network, then looking on the network node in /var/lib/quantum/dhcp/<network_uuid>/interface.  The interface will be named ns-.  Note that this is only the first 11 characters of the uuid.  The tap interface will be named tap.

chris_net3

Summary

Understanding OpenStack Networking is critical to operating a successful cloud deployment.  The Crowbar Team at Dell has invested significant effort to automate the configuration of Neutron.  This helps you eliminate the risk of manual configuration and leverage our extensive testing and field experience.

If you are interested in seeing the exact sequences used by Crowbar, please visit the Crowbar Github repository for the “Quantum Barclamp.

7 takeaways from DevOps Days Austin

Block Tables

I spent Tuesday and Wednesday at DevOpsDays Austin and continue to be impressed with the enthusiasm and collaborative nature of the DOD events.  We also managed to have a very robust and engaged twitter backchannel thanks to an impressive pace set by Gene Kim!

I’ve still got a 5+ post backlog from the OpenStack summit, but wanted to do a quick post while it’s top of mind.

My takeaways from DevOpsDays Austin:

  1. DevOpsDays spends a lot of time talking about culture.  I’m a huge believer on the importance of culture as the foundation for the type of fundamental changes that we’re making in the IT industry; however, it’s also a sign that we’re still in the minority if we have to talk about culture evangelism.
  2. Process and DevOps are tightly coupled.  It’s very clear that Lean/Agile/Kanban are essential for DevOps success (nice job by Dominica DeGrandis).  No one even suggested DevOps+Waterfall as a joke (but Patrick Debois had a picture of a xeroxed butt in his preso which is pretty close).
  3. Still need more Devs people to show up!  My feeling is that we’ve got a lot of operators who are engaging with developers and fewer developers who are engaging with operators (the “opsdev” people).
  4. Chef Omnibus installer is very compelling.  This approach addresses issues with packaging that were created because we did not have configuration management.  Now that we have good tooling we separate the concerns between bits, configuration, services and dependencies.  This is one thing to watch and something I expect to see in Crowbar.
  5. The old mantra still holds: If something is hard, do it more often.
  6. Eli Goldratt’s The Goal is alive again thanks to Gene Kims’s smart new novel, The Phoenix project, about DevOps and IT  (I highly recommend both, start with Kim).
  7. Not DevOps, but 3D printing is awesome.  This is clearly a game changing technology; however, it takes some effort to get right.  Dell brought a Solidoodle 3D printer to the event to try and print OpenStack & Crowbar logos (watch for this in the future).

I’d be interested in hearing what other people found interesting!  Please comment here and let me know.

5 things keeping DevOps from playing well with others (Chef, Crowbar and Upstream Patterns)

Sharing can be hardSince my earliest days on the OpenStack project, I’ve wanted to break the cycle on black box operations with open ops. With the rise of community driven DevOps platforms like Opscode Chef and Puppetlabs, we’ve reached a point where it’s both practical and imperative to share operational practices in the form of code and tooling.

Being open and collaborating are not the same thing.

It’s a huge win that we can compare OpenStack cookbooks. The real victory comes when multiple deployments use the same trunk instead of forking.

This has been an objective I’ve helped drive for OpenStack (with Matt Ray) and it has been the Crowbar objective from the start and is the keystone of our Crowbar 2 work.

This has proven to be a formidable challenge for several reasons:

  1. diverging DevOps patterns that can be used between private, public, large, small, and other deployments -> solution: attribute injection pattern is promising
  2. tooling gaps prevent operators from leveraging shared deployments -> solution: this is part of Crowbar’s mission
  3. under investing in community supporting features because they are seen as taking away from getting into production -> solution: need leadership and others with join
  4. drift between target versions creates the need for forking even if the cookbooks are fundamentally the same -> solution: pull from source approaches help create distro independent baselines
  5. missing reference architectures interfere with having a stable baseline to deploy against -> solution: agree to a standard, machine consumable RA format like OpenStack Heat.

Unfortunately, these five challenges are tightly coupled and we have to progress on them simultaneously. The tooling and community requires patterns and RAs.

The good news is that we are making real progress.

Judd Maltin (@newgoliath), a Crowbar team member, has documented the emerging Attribute Injection practice that Crowbar has been leading. That practice has been refined in the open by ATT and Rackspace. It is forming the foundation of the OpenStack cookbooks.

Understanding, discussing and supporting that pattern is an important step toward accelerating open operations. Please engage with us as we make the investments for open operations and help us implement the pattern.

OpenStack Summit: Let’s talk DevOps, Fog, Upgrades, Crowbar & Dell

If you are coming to the OpenStack summit in San Diego next week then please find me at the show! I want to hear from you about the Foundation, community, OpenStack deployments, Crowbar and anything else.  Oh, and I just ordered a handful of Crowbar stickers if you wanted some CB bling.

Matt Ray (Opscode), Jason Cannavale (Rackspace) and I were Ops track co-chairs. If you have suggestions, we want to hear. We managed to get great speakers and also some interesting sessions like DevOps panel and up streaming deploy working sessions. It’s only on Monday and Tuesday, so don’t snooze or you’ll miss it.

My team from Dell has a lot going on, so there are lots of chances to connect with us:

At the Dell booth, Randy Perryman will be sharing field experience about hardware choices. We’ve got a lot of OpenStack battle experience and we want to compare notes with you.

I’m on the board meeting on Monday so likely occupied until the Mirantis party.

See you in San Diego!

PS: My team is hiring for Dev, QA and Marketing. Let me know if you want details.