12 Predictions for ’16: mono-cloud ambitions die as containers drive more hybrid IT

I expect 2016 to be a confusing year for everyone in IT.  For 2015, I predicted that new uses for containers are going to upset cloud’s apple cart; however, the replacement paradigm is not clear yet.  Consequently, I’m doing a prognostication mix and match: five predictions and seven items on a “container technology watch list.”

TL;DR: In 2016, Hybrid IT arrives on Containers’ wings.

Considering my expectations below, I think it’s time to accept that all IT is heterogeneous and stop trying to box everything into a mono-cloud.  Accepting hybrid as current state unblocks many IT decisions that are waiting for things to settle down.

Here’s the memo: “Stop waiting.  It’s not going to converge.”

2016 Predictions

  1. Container Adoption Seen As Two Stages:  We will finally accept that Containers have strength for both infrastructure (first stage adoption) and application life-cycle (second stage adoption) transformation.  Stage one offers value so we will start talking about legacy migration into containers without shaming teams that are not also rewriting apps as immutable microservice unicorns.
  2. OpenStack continues to bump and grow.  Adoption is up and open alternatives are disappearing.  For dedicated/private IaaS, OpenStack will continue to gain in 2016 for basic VM management.  Both competitive and internal pressures continue to threaten the project but I believe they will not emerge in 2016.  Here’s my complete OpenStack 2016 post?
  3. Amazon, GCE and Azure make everything else questionable.  These services are so deep and rich that I’d question anyone who is not using them.  At least one of them simply have to be part of everyone’s IT strategy for financial, talent and technical reasons.
  4. Cloud API becomes irrelevant. Cloud API is so 2011!  There are now so many reasonable clients to abstract various Infrastructures that Cloud APIs are less relevant.  Capability, interoperability and consistency remain critical factors, but the APIs themselves are not interesting.
  5. Metal aaS gets interesting.  I’m a big believer in the power of operating metal via an API and the RackN team delivers it for private infrastructure using Digital Rebar.  Now there are several companies (Packet.net, Ubiquity Hosting and others) that offer hosted metal.

2016 Container Tech Watch List

I’m planning posts about all these key container ecosystems for 2016.  I think they are all significant contributors to the emerging application life-cycle paradigm.

  1. Service Containers (& VMs): There’s an emerging pattern of infrastructure managed containers that provide critical host services like networking, logging, and monitoring.  I believe this pattern will provide significant value and generate it’s own ecosystem.
  2. Networking & Storage Services: Gaps in networking and storage for containers need to get solved in a consistent way.  Expect a lot of thrash and innovation here.
  3. Container Orchestration Services: This is the current battleground for container mind share.  Kubernetes, Mesos and Docker Swarm get headlines but there are other interesting alternatives.
  4. Containers on Metal: Removing the virtualization layer reduces complexity, overhead and cost.  Container workloads are good choices to re-purpose older servers that have too little CPU or RAM to serve as VM hosts.  Who can say no to free infrastructure?!  While an obvious win to many, we’ll need to make progress on standardized scale and upgrade operations first.
  5. Immutable Infrastructure: Even as this term wins the “most confusing” concept in cloud award, it is an important one for container designers to understand.  The unfortunate naming paradox is that immutable infrastructure drives disciplines that allow fast turnover, better security and more dynamic management.
  6. Microservices: The latest generation of service oriented architecture (SOA) benefits from a new class of distribute service registration platforms (etcd and consul) that bring new life into SOA.
  7. Paywall Registries: The important of container registries is easy to overlook because they seem to be version 2.0 of package caches; however, container layering makes these services much more dynamic and central than many realize.  (more?  Bernard Golden and I already posted about this)

What two items did not make the 2016 cut?  1) Special purpose container-focused operating systems like CoreOS or RancherOS.  While interesting, I don’t think these deployment technologies have architectural level influence.  2) Container Security via VMs. I’m seeing patterns where containers may actually be more secure than VMs.  This is FUD created by people with a vested interest in virtualization.

Did I miss something? I’d love to know what you think I got right or wrong!

2015, the year cloud died. Meet the seven riders of the cloudocalypse

i can hazAfter writing pages of notes about the impact of Docker, microservice architectures, mainstreaming of Ops Automation, software defined networking, exponential data growth and the explosion of alternative hardware architecture, I realized that it all boils down to the death of cloud as we know it.

OK, we’re not killing cloud per se this year.  It’s more that we’ve put 10 pounds of cloud into a 5 pound bag so it’s just not working in 2015 to call it cloud.

Cloud was happily misunderstood back in 2012 as virtualized infrastructure wrapped in an API beside some platform services (like object storage).

That illusion will be shattered in 2015 as we fully digest the extent of the beautiful and complex mess that we’ve created in the search for better scale economics and faster delivery pipelines.  2015 is going to cause a lot of indigestion for CIOs, analysts and wandering technology executives.  No one can pick the winners with Decisive Leadership™ alone because there are simply too many possible right ways to solve problems.

Here’s my list of the seven cloud disrupting technologies and frameworks that will gain even greater momentum in 2015:

  1. Docker – I think that Docker is the face of a larger disruption around containers and packaging.  I’m sure Docker is not the thing alone.  There are a fleet of related technologies and Docker replacements; however, there’s no doubt that it’s leading a timely rethinking of application life-cycle delivery.
  2. New languages and frameworks – it’s not just the rapid maturity of Node.js and Go, but the frameworks and services that we’re building (like Cloud Foundry or Apache Spark) that change the way we use traditional languages.
  3. Microservice architectures – this is more than containers, it’s really Functional Programming for Ops (aka FuncOps) that’s a new generation of service oriented architecture that is being empowered by container orchestration systems (like Brooklyn or Fleet).  Using microservices well seems to redefine how we use traditional cloud.
  4. Mainstreaming of Ops Automation – We’re past “if DevOps” and into the how. Ops automation, not cloud, is the real puppies vs cattle battle ground.  As IT creates automation to better use clouds, we create application portability that makes cloud disappear.  This freedom translates into new choices (like PaaS, containers or hardware) for operators.
  5. Software defined networking – SDN means different things but the impacts are all the same: we are automating networking and integrating it into our deployments.  The days of networking and compute silos are ending and that’s going to change how we think about cloud and the supporting infrastructure.
  6. Exponential data growth – you cannot build applications or infrastructure without considering how your storage needs will grow as we absorb more data streams and internet of things sources.
  7. Explosion of alternative hardware architecture – In 2010, infrastructure was basically pizza box or blade from a handful of vendors.  Today, I’m seeing a rising tide of alternatives architectures including ARM, Converged and Storage focused from an increasing cadre of sources including vendors sharing open designs (OCP).  With improved automation, these new “non-cloud” options become part of the dynamic infrastructure spectrum.

Today these seven items create complexity and confusion as we work to balance the new concepts and technologies.  I can see a path forward that redefines IT to be both more flexible and dynamic while also being stable and performing.

Want more 2015 predictions?  Here’s my OpenStack EOY post about limiting/expanding the project scope.

SDN’s got Blind Spots! What are these Projects Ignoring? [Guest Post by Scott Jensen]

Scott Jensen returns as a guest poster about SDN!  I’m delighted to share his pointed insights that expand on previous 2 Part serieS about NFV and SDN.  I especially like his Rumsfeldian “unknowable workloads”

In my [Scott’s] last post, I talked about why SDN is important in cloud environments; however, I’d like to challenge the underlying assumption that SDN cures all ops problems.

SDN implementations which I have looked at make the following base assumption about the physical network.  From the OpenContrails documentation:

The role of the physical underlay network is to provide an “IP fabric” – its responsibility is to provide unicast IP connectivity from any physical device (server, storage device, router, or switch) to any other physical device. An ideal underlay network provides uniform low-latency, non-blocking, high-bandwidth connectivity from any point in the network to any other point in the network.

The basic idea is to build an overlay network on top of the physical network in order to utilize a variety of protocols (Netflow, VLAN, VXLAN, MPLS etc.) and build the networking infrastructure which is needed by the applications and more importantly allow the applications to modify this virtual infrastructure to build the constructs that they need to operate correctly.

All well and good; however, what about the Physical Networks?

Under Provisioned / FunnyEarth.comThat is where you will run into bandwidth issues, QOS issues, latency differences and where the rubber really meets the road.  Ignoring the physical networks configuration can (and probably will) cause the entire system to perform poorly.

Does it make sense to just assume that you have uniform low latency connectivity to all points in the network?  In many cases, it does not.  For example:

  • Accesses to storage arrays have a different traffic pattern than a distributed storage system.
  • Compute resources which are used to house VMs which are running web applications are different than those which run database applications.
  • Some applications are specifically sensitive to certain networking issues such as available bandwidth, Jitter, Latency and so forth.
  • Where others will perform actions over the network at certain times of the day but then will not require the network resources for the rest of the day.  Classic examples of this are system backups or replication events.

Over Provisioned / zilya.netIf the infrastructure you are trying to implement is truly unknown as to how it will be utilized then you may have no choice than to over-provision the physical network.  In building a public cloud, the users will run whichever application they wish it may not be possible to engineer the appropriate traffic patterns.

This unknowable workload is exactly what these types of SDN projects are trying to target!

When designing these systems you do have a good idea of how it will be utilized or at least how specific portions of the system will be utilized and you need to account for that when building up the physical network under the SDN.

It is my belief that SDN applications should not just create an overlay.  That is part of the story, but should also take into account the physical infrastructure and assist with modifying the configuration of the Physical devices.  This balance achieves the best use of the network for both the applications which are running in the environment AND for the systems which they run on or rely upon for their operations.

Correctly ProvisionedWe need to reframe our thinking about SDN because we cannot just keep assuming that the speeds of the network will follow Moore’s Law and that you can assume that the Network is an unlimited resource.

Networking in Cloud Environments, SDN, NFV, and why it matters [part 2 of 2]

scott_jensen2Scott Jensen is an Engineering Director and colleague of mine from Dell with deep networking and operations experience.  He had first hand experience deploying OpenStack and Hadoop and has a critical role in defining Dell’s Reference Architectures in those areas.  When I saw this writeup about cloud networking (first post), I asked if it would be OK to post it here and share it with you.

GUEST POST 2 OF 2 BY SCOTT JENSEN:

So what is different about Cloud and how does it impact on the network

In a traditional data center this was not all that difficult (relatively).  You knew what was going to running on what system (physically) and could plan your infrastructure accordingly.  The majority of the traffic moved in a North/South direction. Or basically from outside the infrastructure (the internet for example) to inside and then responded back out.  You knew that if you had to design a communication channel from an application server to a database server this could be isolated from the other traffic as they did not usually reside on the same system.

scj-net1

Virtualization made this more difficult.  In this model you are sharing systems resources for different applications.  From the networks point of view there are a large number of systems available behind a couple of links.  Live Migration puts another wrinkle in the design as you now have to deal with a specific system moving from one physical server to another.  Network Virtualization helps out a lot with this.  With this you can now move virtual ports from one physical server to another to ensure that when one virtual machine moves from a physical server to another that the network is still available.  In many cases you managed these virtual networks the same as you managed your physical network.  As a matter of fact they were designed to emulate the physical as much as possible.  The virtual machines still looked a lot like the physical ones they replaced and can be treated in vary much the same way from a traffic flow perspective.  The traffic still is primarily a North/South pattern.

Cloud, however, is a different ball of wax.  Think about the charistics of the Cattle described above.  A cloud application is smaller and purpose built.  The majority of its traffic is between VMs as different tiers which were traditionally on the same system or in the same VM are now spread across multiple VMs.  Therefore its traffic patterns are primarily East/West.  You cannot forget that there is a North/South pattern the same as what was in the other models which is typically user interaction.  It is stateless so that many copies of itself can run in tandem allowing it to elastically scale up and down based on need and as such they are appearing and disappearing from the network.  As these VMs are spawned on the system they may be right next to each other or on different servers or potentially in different Data Centers.  But it gets even better.   scj-net2

Cloud architectures are typically multi-tenant.  This means that multiple customers will utilize this infrastructure and need to be isolated from each other.  And of course Clouds are self-service.  Users/developers can design, build and deploy whenever they want.  Including designing the network interconnects that their applications need to function.  All of this will cause overlapping IP address domains, multiple virtual networks both L2 and L3, requirements for dynamically configuring QOS, Load Balancers and Firewalls.  Lastly in our list of headaches is not the least.  Cloud systems tend to breed like rabbits or multiply like coat hangers in the closet.  There are more and more systems as 10 servers become 40 which becomes 100 then 1000 and so on.

 

So what is a poor Network Engineer to do?

First get a handle on what this Cloud thing is supposed to be for.  If you are one of the lucky ones who can dictate the use of the infrastructure then rock on!  Unfortunately, that does not seem to be the way it goes for many.  In the case where you just cannot predict how the infrastructure will be used I am reminded of the phrase “there is not replacement for displacement”.  Fast links, non-blocking switches, Network Fabrics are all necessary for the physical network but will not get you there.  Sense as a network administrator you cannot predict the traffic patterns who can?  Well the developer and the application itself.  This is what SDN is all about.  It allows a programmatic interface to what is called an overlay network.  A series of tunnels/flows which can build virtual networks on top of the physical network giving that pesky application what it was looking for.  In some cases you may want to make changes to the physical infrastructure.  For example change the configuration of the Firewall or Load Balancer or other network equipment.  SDN vendors are creating plug-ins that can make those types of configurations.  But if this is not good enough for you there is NFV.  The basic idea here is that why have specialized hardware for your core network infrastructure when we can run them virtualized as well?  Let’s run those in VM’s as well, hook them into the virtual network and SDN to configure them and we now can virtualize the routers, load balancers, firewalls and switches.  These technologies are in very much a state of flux right now but they are promising none the less.  Now if we could just virtualize the monitoring and troubleshooting of these environments I’d be happy.

 

 

OpenStack Neutron using Linux Bridges (technical explanation)

chris_net3Apparently this is “Showcase Dell OpenStack/Crowbar Team Member Week” because today I’m proxy positioning for Dell OpenStack engineer Chris Dearborn.  Chris has been leading our OpenStack Neutron deployment for Grizzly and Havana.

If you’re familiar with the OpenStack Networking, skip over my introductory preamble and jump right down to the meat under “SDN Client Connection: Linux Bridge.”  Hopefully we can convince Chris to put together more in this series and cover GRE and VLAN configurations too.

OpenStack and Software Defined Network

Software Defined Networking (SDN) is an emerging concept that describes a family of functionality.  Like cloud, the exact meaning of SDN appears to be in the eye (or brochure) of the company providing the technology.  Overall, the concept for SDN is to have programmable networks that can be automatically provisioned.

Early approaches to this used the OpenFlow™ API to programmatically modify switch routing tables (aka OSI Layer 2) on a flow by flow basis across multiple switches.  While highly controlled, OpenFlow has proven difficult to implement at scale in dynamic environments; consequently, many SDN implementations are now using overlay networks based on inventoried VLANs  and/or dynamic tunnels.

Inventoried VLAN overlay networks create a stable base layer 2 infrastructure that can be inventoried and handed out dynamically on-demand.  Generally, the management infrastructure dynamically connects the end-points (typically virtual machines) to a dedicated existing layer 2 network.  This provides all of the isolation desired without having to thrash the underlying network switch infrastructure.

Dynamic tunnel overlay network also uses client connection points to isolate traffic but do not rely on switch layer 2.  Instead, they encrypt traffic before sending it over a shared network.  This avoids having to match dynamic networks to static inventory; however, it also adds substantial encryption overhead to the network communication.  Consequently, tunnels provide more flexibility and less up front-confirmation but with lower performance.

OpenStack Networking, project Neutron (previously Quantum), is responsible for connecting virtual machines setup by OpenStack Compute (aka Nova) to the software defined networking infrastructure.  By design, Neutron accommodates different implementation plug-ins.  That allows operators to choose between different approaches including the addition of commercial offerings.   While it is possible to use open source capabilities for small deployments and trials, most large scale deployments choose proprietary SDN technologies.

The Crowbar OpenStack installation allows operators to choose between “Open vSwitch GRE Tunnels” and “Linux Bridge VLAN” configuration.  The GRE option is more flexible and requires less up front configuration; however, the encryption used by GRE will degrade performance.  The Linux Bridge VLAN option requires more upfront configuration and design.

Since GRE works with minimal configuration, let’s explore what’s required to for Crowbar to setup OpenStack Neutron Linux Bridge VLAN networking.

Note: This review assumes that you already have a working knowledge of Crowbar and OpenStack.

Background

Before we dig into how OpenStack configures SDN , we need to understand how we connect between virtual machines running in the system and the physical network.  This connection uses Linux Bridges.  For GRE tunnels, Crowbar configures an Open vSwitch (aka OVS) on the node to create and manage the tunnels.

One challenge with SDN traffic isolation is that we can no longer assume that virtual machines with network access can reach destinations on our same network.  This means that the infrastructure must provide paths (aka gateways and routers) between the tenant and infrastructure networks.  A major part of the OpenStack configuration includes setting up these connections when new tenant networks are created.

Note: In the OpenStack Grizzly and earlier releases, open source code for network routers were not configured in a highly available or redundant way.  This problem is addressed in the Havana release.

For the purposes of this explanation, the “network node” is the shared infrastructure server that bridges networks.  The “compute node” is any one of the servers hosting guest virtual machines.  Traffic in the cloud can be between virtual machines within the cloud instance (internal) or between a virtual machine and something outside the OpenStack cloud instance (external).

Let’s make sure we’re on the same page with terminology.

  • OSI Layer 2 – just above physical connections (layer 1), Layer two manages traffic between servers including providing logical separation of traffic.
  • VLAN – Virtual Local Area Network are switch enforced isolation zones created by adding 1 of 4096 tags in the network traffic (aka tagged traffic).
  • Tenant – a group of users in a cloud that are logically isolated (cannot see other traffic or information) but still using shared resources.
  • Switch – a physical device used to provide layer 1 networking connections between end points.  May provide additional services on other OSI layers such as VLANs.
  • Network Node – an OpenStack infrastructure server that connects tenant networks to infrastructure networks.
  • Compute Node – an OpenStack server that runs user workloads in virtual machines or containers.

SDN Client Connection: Linux Bridge 

chris_net1

The VLAN range for Linux Bridge is configurable in /etc/quantum/quantum.conf by changing the network_vlan_ranges parameter.  Note that this parameter is set by the Crowbar Neutron chef recipe.  The VLAN range is configured to start at whatever the “vlan” attribute in the nova_fixed network in the bc-template-network.json is set to.  The VLAN range end is hard coded to end at the VLAN start plus 2000.

Reminder: The maximum VLAN tag is 4096 so the VLAN tag for nova_fixed should never be set to anything greater than 2095 to be safe.

Networks are assigned the next available VLAN tag as they are created.  For instance, the first manually created network will be assigned VLAN 501, the next VLAN 502, etc.  Note that this is independent of what tenant the new network resides in.

The convention in Linux Bridge is to name the various network constructs including the first 11 characters of the UUID of the associated Neutron object.  This allows you to run the quantum CLI command listing out the objects you are interested in, and grepping on the 11 uuid characters from the network construct name.  This shows what Neutron object a given network construct maps to.

chris_net2

Network Creation

When a network is created, a corresponding bridge is created and is given the name br<network_uuid>.  A subinterface of the NIC is also created and is named <interface_name>.<vlan_tag>.  This subinterface is slaved to the bridge.  Note that this only happens when the network is needed (when a VM is created on the network).

This occurs on both the network node and the compute nodes.

Additional Steps Taken On The Network Node During Network Creation

On the network node, a bridge and subinterface is created per network and the subinterface is slaved to the bridge as described above.  If the network is attached to the router, then a TAP interface that the router listens on is created and slaved to the bridge.  If DHCP is selected, then another TAP interface is created that the dnsmasq process talks to, and that interface is also slaved to the bridge.

VM Creation On A Compute Node

When a VM is created, a TAP interface is created and named tap<port_uuid>.  The port is the Neutron port that the VM is plugged in to.  This TAP interface is slaved to the bridge associated with the network that the user selected when creating the VM.  Note that this occurs on compute nodes only.

Determining the dnsmasq port/tap interface for a network

The TAP port associated with dnsmasq for a network can be determined by first getting the uuid of the network, then looking on the network node in /var/lib/quantum/dhcp/<network_uuid>/interface.  The interface will be named ns-.  Note that this is only the first 11 characters of the uuid.  The tap interface will be named tap.

chris_net3

Summary

Understanding OpenStack Networking is critical to operating a successful cloud deployment.  The Crowbar Team at Dell has invested significant effort to automate the configuration of Neutron.  This helps you eliminate the risk of manual configuration and leverage our extensive testing and field experience.

If you are interested in seeing the exact sequences used by Crowbar, please visit the Crowbar Github repository for the “Quantum Barclamp.