Boot me up! out-of-band IPMI rocks then shuts up and waits

It’s hard to get excited about re-implementing functionality from v1 unless the v2 happens to also be freaking awesome.   It’s awesome because the OpenCrowbar architecture allows us to it “the right way” with real out-of-band controls against the open WSMAN APIs.

gangnam styleWith out-of-band control, we can easily turn systems on and off using OpenCrowbar orchestration.  This means that it’s now standard practice to power off nodes after discovery & inventory until they are ready for OS installation.  This is especially interesting because many servers RAID and BIOS can be configured out-of-band without powering on at all.

Frankly, Crowbar 1 (cutting edge in 2011) was a bit hacky.  All of the WSMAN control was done in-band but looped through a gateway on the admin server so we could access the out-of-band API.  We also used the vendor (Dell) tools instead of open API sets.

That means that OpenCrowbar hardware configuration is truly multi-vendor.  I’ve got Dell & SuperMicro servers booting and out-of-band managed.  Want more vendors?  I’ll give you my shipping address.

OpenCrowbar does this out of the box and in the open so that everyone can participate.  That’s how we solve this problem as an industry and start to cope with hardware snowflaking.

And this out-of-band management gets even more interesting…

Since we’re talking to servers out-of-band (without the server being “on”) we can configure systems before they are even booted for provisioning.  Since OpenCrowbar does not require a discovery boot, you could pre-populate all your configurations via the API and have the Disk and BIOS settings ready before they are even booted (for models like the Dell iDRAC where the BMCs start immediately on power connect).

Those are my favorite features, but there’s more to love:

  • the new design does not require network gateway (v1 did) between admin and bmc networks (which was a security issue)
  • the configuration will detect and preserves existing assigned IPs.  This is a big deal in lab configurations where you are reusing the same machines and have scripted remote consoles.
  • OpenCrowbar offers an API to turn machines on/off using the out-of-band BMC network.
  • The system detects if nodes have IPMI (VMs & containers do not) and skip configuration BUT still manage to have power control using SSH (and could use VM APIs in the future)
  • Of course, we automatically setup BMC network based on your desired configuration

 

OpenStack DefCore Update & 7/16 Community Reviews

The OpenStack Board effort to define “what is core” for commercial use (aka DefCore).  I have blogged extensively about this topic and rely on you to review that material because this post focuses on updates from recent activity.

First, Please Join Our Community DefCore Reviews on 7/16!

We’re reviewing the current DefCore process & timeline then talking about the Advisory Havana Capabilities Matrix (decoder).

To support global access, there are TWO meetings (both will also be recorded):

  1. July 16, 8 am PDT / 1500 UTC
  2. July 16, 6 pm PDT / 0100 UTC July 17

Note: I’m presenting about DefCore at OSCON on 7/21 at 11:30!

We want community input!  The Board is going discuss and, hopefully, approve the matrix at our next meeting on 7/22.  After that, the Board will be focused on defining Designated Sections for Havana and Ice House (the TC is not owning that as previously expected).

The DefCore process is gaining momentum.  We’ve reached the point where there are tangible (yet still non-binding) results to review.  The Refstack efforts to collect community test results from running clouds is underway: the Core Matrix will be fed into Refstack to validate against the DefCore required capabilities.

Now is the time to make adjustments and corrections!  

In the next few months, we’re going to be locking in more and more of the process as we get ready to make it part of the OpenStack by-laws (see bottom of minutes).

If you cannot make these meetings, we still want to hear from you!  The most direct way to engage is via the DefCore mailing list but 1×1 email works too!  Your input is import to us!

SDN’s got Blind Spots! What are these Projects Ignoring? [Guest Post by Scott Jensen]

Scott Jensen returns as a guest poster about SDN!  I’m delighted to share his pointed insights that expand on previous 2 Part serieS about NFV and SDN.  I especially like his Rumsfeldian “unknowable workloads”

In my [Scott’s] last post, I talked about why SDN is important in cloud environments; however, I’d like to challenge the underlying assumption that SDN cures all ops problems.

SDN implementations which I have looked at make the following base assumption about the physical network.  From the OpenContrails documentation:

The role of the physical underlay network is to provide an “IP fabric” – its responsibility is to provide unicast IP connectivity from any physical device (server, storage device, router, or switch) to any other physical device. An ideal underlay network provides uniform low-latency, non-blocking, high-bandwidth connectivity from any point in the network to any other point in the network.

The basic idea is to build an overlay network on top of the physical network in order to utilize a variety of protocols (Netflow, VLAN, VXLAN, MPLS etc.) and build the networking infrastructure which is needed by the applications and more importantly allow the applications to modify this virtual infrastructure to build the constructs that they need to operate correctly.

All well and good; however, what about the Physical Networks?

Under Provisioned / FunnyEarth.comThat is where you will run into bandwidth issues, QOS issues, latency differences and where the rubber really meets the road.  Ignoring the physical networks configuration can (and probably will) cause the entire system to perform poorly.

Does it make sense to just assume that you have uniform low latency connectivity to all points in the network?  In many cases, it does not.  For example:

  • Accesses to storage arrays have a different traffic pattern than a distributed storage system.
  • Compute resources which are used to house VMs which are running web applications are different than those which run database applications.
  • Some applications are specifically sensitive to certain networking issues such as available bandwidth, Jitter, Latency and so forth.
  • Where others will perform actions over the network at certain times of the day but then will not require the network resources for the rest of the day.  Classic examples of this are system backups or replication events.

Over Provisioned / zilya.netIf the infrastructure you are trying to implement is truly unknown as to how it will be utilized then you may have no choice than to over-provision the physical network.  In building a public cloud, the users will run whichever application they wish it may not be possible to engineer the appropriate traffic patterns.

This unknowable workload is exactly what these types of SDN projects are trying to target!

When designing these systems you do have a good idea of how it will be utilized or at least how specific portions of the system will be utilized and you need to account for that when building up the physical network under the SDN.

It is my belief that SDN applications should not just create an overlay.  That is part of the story, but should also take into account the physical infrastructure and assist with modifying the configuration of the Physical devices.  This balance achieves the best use of the network for both the applications which are running in the environment AND for the systems which they run on or rely upon for their operations.

Correctly ProvisionedWe need to reframe our thinking about SDN because we cannot just keep assuming that the speeds of the network will follow Moore’s Law and that you can assume that the Network is an unlimited resource.

Understanding OpenStack Designated Code Sections – Three critical questions

A collaboration with Michael Still (TC Member from Rackspace) & Joshua McKenty and Cross posted by Rackspace.

After nearly a year of discussion, the OpenStack board launched the DefCore process with 10 principles that set us on path towards a validated interoperability standard.   We created the concept of “designated sections” to address concerns that using API tests to determine core would undermine commercial and community investment in a working, shared upstream implementation.

Designated SectionsDesignated sections provides the “you must include this” part of the core definition.  Having common code as part of core is a central part of how DefCore is driving OpenStack operability.

So, why do we need this?

From our very formation, OpenStack has valued implementation over specification; consequently, there is a fairly strong community bias to ensure contributions are upstreamed. This bias is codified into the very structure of the GNU General Public License (GPL) but intentionally missing in the Apache Public License (APL v2) that OpenStack follows.  The choice of Apache2 was important for OpenStack to attract commercial interests, who often consider GPL a “poison pill” because of the upstream requirements.

Nothing in the Apache license requires consumers of the code to share their changes; however, the OpenStack foundation does have control of how the OpenStack™ brand is used.   Thus it’s possible for someone to fork and reuse OpenStack code without permission, but they cannot called it “OpenStack” code.  This restriction only has strength if the OpenStack brand has value (protecting that value is the primary duty of the Foundation).

This intersection between License and Brand is the essence of why the Board has created the DefCore process.

Ok, how are we going to pick the designated code?

Figuring out which code should be designated is highly project specific and ultimately subjective; however, it’s also important to the community that we have a consistent and predictable strategy.  While the work falls to the project technical leads (with ratification by the Technical Committee), the DefCore and Technical committees worked together to define a set of principles to guide the selection.

This Technical Committee resolution formally approves the general selection principles for “designated sections” of code, as part of the DefCore effort.  We’ve taken the liberty to create a graphical representation (above) that visualizes this table using white for designated and black for non-designated sections.  We’ve also included the DefCore principle of having an official “reference implementation.”

Here is the text from the resolution presented as a table:

Should be DESIGNATED: Should NOT be DESIGNATED:
  • code provides the project external REST API, or
  • code is shared and provides common functionality for all options, or
  • code implements logic that is critical for cross-platform operation
  • code interfaces to vendor-specific functions, or
  • project design explicitly intended this section to be replaceable, or
  • code extends the project external REST API in a new or different way, or
  • code is being deprecated

The resolution includes the expectation that “code that is not clearly designated is assumed to be designated unless determined otherwise. The default assumption will be to consider code designated.”

This definition is a starting point.  Our next step is to apply these rules to projects and make sure that they provide meaningful results.

Wow, isn’t that a lot of code?

Not really.  Its important to remember that designated sections alone do not define core: the must-pass tests are also a critical component.   Consequently, designated code in projects that do not have must-pass tests is not actually required for OpenStack licensed implementation.

OpenCrowbar Design Principles: Reintroduction [Series 1 of 6]

While “ready state” as a concept has been getting a lot of positive response, I forget that much of the innovation and learning behind that concept never surfaced as posts here.  The Anvil (2.0) release included the OpenCrowbar team cataloging our principles in docs.  Now it’s time to repost the team’s work into a short series over the next three days.

In architecting the Crowbar operational model, we’ve consistently twisted adapted traditional computer science concepts like late binding, simulated annealing, emergent behavior, attribute injection and functional programming to create a repeatable platform for sharing open operations practice (post 2).

Functional DevOps aka “FuncOps”

Ok, maybe that’s not going to be the 70’s era hype bubble name, but… the operational model behind Crowbar is entering its third generation and its important to understand the state isolation and integration principles behind that model is closer to functional than declarative programming.

Parliament is Crowbar’s official FuncOps sound track

The model is critical because it shapes how Crowbar approaches the infrastructure at a fundamental level so it makes it easier to interact with the platform if you see how we are approaching operations. Crowbar’s goal is to create emergent services.

We’ll expore those topics in this series to explain Crowbar’s core architectural principles.  Before we get into that, I’d like to review some history.

The Crowbar Objective

Crowbar delivers repeatable best practice deployments. Crowbar is not just about installation: we define success as a sustainable operations model where we continuously improve how people use their infrastructure. The complexity and pace of technology change is accelerating so we must have an approach that embraces continuous delivery.

Crowbar’s objective is to help operators become more efficient, stable and resilient over time.

Background

When Greg Althaus (github @GAlhtaus) and Rob “zehicle” Hirschfeld (github @CloudEdge) started the project, we had some very specific targets in mind. We’d been working towards using organic emergent swarming (think ants) to model continuous application deployment. We had also been struggling with the most routine foundational tasks (bios, raid, o/s install, networking, ops infrastructure) when bringing up early scale cloud & data applications. Another key contributor, Victor Lowther (github @VictorLowther) has critical experience in Linux operations, networking and dependency resolution that lead to made significant contributions around the Annealing and networking model. These backgrounds heavily influenced how we approached Crowbar.

First, we started with best of field DevOps infrastructure: Opscode Chef. There was already a remarkable open source community around this tool and an enthusiastic following for cloud and scale operators . Using Chef to do the majority of the installation left the Crowbar team to focus on

crowbar_engineKey Features

  • Heterogeneous Operating Systems – chose which operating system you want to install on the target servers.
  • CMDB Flexibility (see picture) – don’t be locked in to a devops toolset. Attribute injection allows clean abstraction boundaries so you can use multiple tools (Chef and Puppet, playing together).
  • Ops Annealer –the orchestration at Crowbar’s heart combines the best of directed graphs with late binding and parallel execution. We believe annealing is the key ingredient for repeatable and OpenOps shared code upgrades
  • Upstream Friendly – infrastructure as code works best as a community practice and Crowbar use upstream code
  • without injecting “crowbarisms” that were previously required. So you can share your learning with the broader DevOps community even if they don’t use Crowbar.
  • Node Discovery (or not) – Crowbar maintains the same proven discovery image based approach that we used before, but we’ve streamlined and expanded it. You can use Crowbar’s API outside of the PXE discovery system to accommodate Docker containers, existing systems and VMs.
  • Hardware Configuration – Crowbar maintains the same optional hardware neutral approach to RAID and BIOS configuration. Configuring hardware with repeatability is difficult and requires much iterative testing. While our approach is open and generic, the team at Dell works hard to validate a on specific set of gear: it’s impossible to make statements beyond that test matrix.
  • Network Abstraction – Crowbar dramatically extended our DevOps network abstraction. We’ve learned that a networking is the key to success for deployment and upgrade so we’ve made Crowbar networking flexible and concise. Crowbar networking works with attribute injection so that you can avoid hardwiring networking into DevOps scripts.
  • Out of band control – when the Annealer hands off work, Crowbar gives the worker implementation flexibility to do it on the node (using SSH) or remotely (using an API). Making agents optional means allows operators and developers make the best choices for the actions that they need to take.
  • Technical Debt Paydown – We’ve also updated the Crowbar infrastructure to use the latest libraries like Ruby 2, Rails 4, Chef 11. Even more importantly, we’re dramatically simplified the code structure including in repo documentation and a Docker based developer environment that makes building a working Crowbar environment fast and repeatable.

OpenCrowbar (CB2) vs Crowbar (CB1)?

Why change to OpenCrowbar? This new generation of Crowbar is structurally different from Crowbar 1 and we’ve investing substantially in refactoring the tooling, paying down technical debt and cleanup up documentation. Since Crowbar 1 is still being actively developed, splitting the repositories allow both versions to progress with less confusion. The majority of the principles and deployment code is very similar, I think of Crowbar as a single community.

Continue Reading > post 2

OpenStack ATL Recap to the 11s: the danger of drama + 5 challenges & 5 successes

HallwayI’ve come to accept that the “Hallway Track” is my primary session at OpenStack events.  I want to thank the many people in the community who make that the best track.  It’s not only full of deep technical content; there are also healthy doses of intrigue, politics and “let’s fix that” in the halls.

I think honest reflection is critical to OpenStack growth (reflections from last year).  My role as a Board member must not translate into pom-pom waving robot cheerleader.

 

What I heard that’s working:

  1. Foundation event team did a great job on the logistics and many appreciate the user and operator focus.  There’s is no doubt that OpenStack is being deployed at scale and helping transform cloud infrastructure.  I think that’s a great message.
  2. DefCore criteria were approved by the Board.  The overall process and impact was talked about positively at the summit.  To accelerate, we need +1s and feedback because “crickets” means we need to go slower.  I’ll have to dedicate a future post to next steps and “designated sections.”
  3. Marketplace!  Great turn out by vendors of all types, but I’m not hearing about them making a lot of money from OpenStack (which is needed for them to survive).  I like the diversity of the marketplace: consulting, aaServices, installers, networking, more networking, new distros, and ecosystem tools.
  4. There’s some real growth in aaS services for openstack (database, load balancer, dns, etc).   This is the ecosystem that many want OpenStack to drive because it helps displace Amazon cloud.  I also heard concerns that to be sure they are pluggable so companies can complete on implementation.
  5. Lots of process changes to adapt to growing pains.  People felt that the community is adapting (yeah!) but were concerned having to re-invent tooling (meh).

There are also challenges that people brought to me:

  1. Our #1 danger is drama.  Users and operators want collaboration and friendly competition.  They are turned off by vendor conflict or strong-arming in the community (e.g.: the WSJ Red Hat article and fallout).  I’d encourage everyone to breathe more and react less.
  2. Lack of product management is risking a tragedy of the commons.  Helping companies work together and across projects is needed for our collaboration processes to work.  I’ll be exploring this with Sean Roberts in future posts.
  3. Making sure there’s profit being generated from shared code.  We need to remember that most of the development is corporate funded so we need to make sure that companies generate revenue.  The trend of everyone creating unique distros may indicate a problem.
  4. We need to be more operator friendly.  I know we’re trying but we create distance with operators when we insist on creating new tools instead of using the existing ecosystem.  That also slows down dealing with upgrades, resilient architecture and other operational concerns.
  5. Anointed projects concerns have expanded since Hong Kong.  There’s a perception that Heat (orchestration), Triple0 (provisioning), Solum (platform) are considered THE only way OpenStack solves those problems and other approaches are not welcome.  While that encourages collaboration, it also chills competition and discussion.
  6. There’s a lot of whispering about the status of challenged projects: neutron (works with proprietary backends but not open, may not stay integrated) and openstack boot-strap (state of TripleO/Ironic/Heat mix).  The issue here is NOT if they are challenged but finding ways to discuss concerns openly (see anointed projects concern).

I’d enjoy hearing more about success and deeper discussion around concerns.  I use community feedback to influence my work in the community and on the board.  If you think I’ve got it right or wrong then please let me know.

How DefCore is going to change your world: three advisory cases

The first release of the DefCore Core Capabilities Matrix (DCCM) was revealed at the Atlanta summit.  At the Summit, Joshua and I had a session which examined what this means for the various members of the OpenStack community.   This rather lengthy post reviews the same advisory material.

DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled “OpenStack.”

As a refresher, there are three uses of the OpenStack mark:

  • Community: The non-commercial use of the word OpenStack by the OpenStack community to describe themselves and their activities. (like community tweets, meetups and blog posts)
  • Code: The non-commercial use of the word OpenStack to refer to components of the OpenStack framework integrated release (as in OpenStack Compute Project Nova)
  • Commerce: The commercial use of the word OpenStack to refer to products and services as governed by the OpenStack trademark policy. This is where DefCore is focused.

In the DefCore/Commerce use, properly licensed vendors have three basic obligations to meet:is_it_openstack_graphic

  1. Pass the required Refstack tests for the capabilities matrix in the version of OpenStack that they use. Vendors are expected (not required) to share their results.
  2. Run and include the “designated sections” of code for the OpenStack components that you include.
  3. Other basic obligations in their license agreement like being a currently paid up corporate sponsor or foundation member, etc.

If they meet these conditions, vendors can use the OpenStack mark in their product names and descriptions.

Enough preamble!  Let’s see the three Advisory Cases

MANDATORY DISCLAIMER: These conditions apply to fictional public, private and client use cases.  Any resemblence to actual companies is a function of the need to describe real use-cases.  These cases are advisory for illustration use only and are not to be considered definitive guidenance because DefCore is still evolving.

Public Cloud: Service Provider “BananaCloud”

A popular public cloud operator, BananaCloud has been offering OpenStack-based IaaS since the Diablo release. However, they don’t use the Keystone component. Since they also offer traditional colocation and managed services, they have an existing identity management system that they use. They made a similar choice for Horizon in favor of their own cloud portal.

banana

  1. They use Nova a custom scheduler and pass all the Nova tests. This is the simplest case since they use code and pass the tests.
  2. In the Havana DCCM, the Keystone capabilities are a must pass test; however, there are no designated sections of code for Keystone. So BananaCloud must implement a Keystone-compatible API on their IaaS environment (an effort they had underway already) that will pass Refstack, and they’re good to go.
  3. There are no must pass tests for Horizon so they have no requirements to include those features or code. They can still be OpenStack without Horizon.
  4. There are no must pass tests for Trove so they have no brand requirements to include those features or code so it’s not a brand issue; however, by using Trove and promoting its use, they increase the likelihood of its capabilities becoming must pass features.

BananaCloud also offers some advanced OpenStack capabilities, including Marconi and Trove. Since there are no must pass capabilities from these components in the Havana DCCM, it has no impact on their offering additional services. DefCore defines the minimum requirements and encourages vendors to share their full test results of additional capabilities because that is how OpenStack identifies new must pass candidates.

Note: The DefCore DCCM is advisory for the Havana release, so if BananaCloud is late getting their Keystone-compatibility work done there won’t be any commercial impact. But it will be a binding part of the trademark license agreement by the Juno release, which is only 6 months away.

Private Cloud: SpRocket Small-Business OpenStack Software

SpRocket is a new OpenStack software vendor, specializing in selling a Windows-powered version of OpenStack with tight integration to Sharepoint and AzurePack. In their feature set, they only need part of Nova and provide an alternative object storage to Swift that implements a version of the Swift API. They do use Heat as part of their implementation to set up applications back ended by Sharepoint and AzurePack.

  1. sprocketFor Nova, they already use the code and have already implemented all required capabilities except for the key-store. To comply with the DefCore requirement, they must enable the key-store capability.
  2. While their implementation of Swift passes the tests, We are still working to resolve the final disposition of Swift so there are several possible outcomes:
    1. If Swift is 0% designated then they are OK (that’s illustrated here)
    2. If Swift is 100% designated then they cannot claim to be OpenStack.
    3. If Swift is partially designated then they have to adapt their deploy to include the required code.
  3. Their use of Heat is encouraged since it is an integrated project; however, there are no required capabilities and does not influence their ability to use the mark.
  4. They use the trunk version of Windows HyperV drivers which are not designated and have no specific tests.

Ecosystem Client: “Mist” OpenStack-consuming Client Library

Mist is a client library for load+kt programmers working on applications using the OpenStack APIs. While it’s an open source project, there are many commercial applications that use the library for their applications. Unlike a “pure” OpenStack program, it also supports other Cloud APIs.

Since the Mist library does not ship or implement the OpenStack code base, the DefCore process does not apply to their effort; however, there are several important intersections with Mist and OpenStack and Core.

  • First, it is very important for the DefCore process that Mist map their use of the OpenStack APIs to the capabilities matrix. They are asked to help with this process because they are the best group to answer the “works with clients” criteria.
  • Second, if there are APIs used by Mist that are not currently tested then the OpenStack community should work with the Mist community to close those test gaps.
  • Third, if Mist relies on an API that is not must-pass they are encouraged to help identify those capabilities as core candidates in the community.

OpenCrowbar: ready to fly as OpenOps neutral platform – Dell stepping back

greg and rob

Two of Crowbar Founders: me with Greg Althaus [taken Jan 2013]

With the Anvil release in the bag, Dell announced on the community list yesterday that it has stopped active contribution on the Crowbar project.  This effectively relaunches Crowbar as a truly vendor-neutral physical infrastructure provisioning tool.

While I cannot speak for my employer, Dell, about Crowbar; I continue serve in my role as a founder of the Crowbar Project.  I agree with Eric S Raymond that founders of open source projects have a responsibility to sustain their community and ensure its longevity.

In the open DevOps bare metal provisioning market, there is nothing that matches the capabilities developed in either Crowbar v1 or OpenCrowbar.  The operations model and system focused approach is truly differentiated because no other open framework has been able to integrate networking, orchestration, discovery, provisioning and configuration management like Crowbar.

It is time for the community to take Crowbar beyond the leadership of a single hardware vendor, OS vendor, workload or CMDB tool.  OpenCrowbar offers operations freedom and flexibility to build upon an abstracted physical infrastructure (what I’ve called “ready state“).

We have the opportunity to make open operations a reality together.

As a Crowbar founder and its acting community leader, you are welcome to contact me directly or through the crowbar list about how to get engaged in the Crowbar community or help get connected to like-minded Crowbar resources.

DefCore Capabilities Scorecard & Core Identification Matrix [REVIEW TIME!]

Attribution Note: This post was collaboratively edited by members of the DefCore committee and cross posted with DefCore co-chair Joshua McKenty of Piston Cloud.

DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating minimum standards minimum standards for products labeled “OpenStack.”

The OpenStack Core definition process (aka DefCore) is moving steadily along and we’re looking for feedback from community as we move into the next phase.  Until now, we’ve been mostly working out principles, criteria and processes that we will use to answer “what is core” in OpenStack.  Now we are applying those processes and actually picking which capabilities will be used to identify Core.

TL;DR! We are now RUNNING WITH SCISSORS because we’ve reached the point there you can review early thoughts about what’s going to be considered Core (and what’s not).  We now have a tangible draft list for community review.

capabilities_selectionWhile you will want to jump directly to the review draft matrix (red means needs input), it is important to understand how we got here because that’s how DefCore will resolve the inevitable conflicts.  The very nature of defining core means that we have to say “not in” to a lot of capabilities.  Since community consensus seems to favor a “small core” in principle, that means many capabilities that people consider important are not included.

The Core Capabilities Matrix attempts to find the right balance between quantitative detail and too much information.  Each row represents an “OpenStack Capability” that is reflected by one or more individual tests.  We scored each capability equally on a 100 point scale using 12 different criteria.  These criteria were selected to respect different viewpoints and needs of the community ranging from popularity, technical longevity and quality of documentation.

While we’ve made the process more analytical, there’s still room for judgement.  Eventually, we expect to weight some criteria more heavily than others.  We will also be adjusting the score cut-off.  Our goal is not to create a perfect evaluation tool – it should inform the board and facilitate discussion.  In practice, we’ve found this approach to bring needed objectivity to the selection process.

So, where does this take us?  The first matrix is, by design, old news.  We focused on getting a score for Havana to give us a stable and known quantity; however, much of that effort will translate forward.  Using Havana as the base, we are hoping to score Ice House ninety days after the Juno summit and score Juno at K Summit in Paris.

These are ambitious goals and there are challenges ahead of us.  Since every journey starts with small steps, we’ve put ourselves on feet the path while keeping our eyes on the horizon.

Specifically, we know there are gaps in OpenStack test coverage.  Important capabilities do not have tests and will not be included.  Further, starting with a small core means that OpenStack will be enforcing an interoperability target that is relatively permissive and minimal.  Universally, the community has expressed that including short-term or incomplete items is undesirable.  It’s vital to remember that we are looking for evolutionary progress that accelerates our developer, user, operator and ecosystem communities.

How can you get involved?  We are looking for community feedback on the DefCore list on this 1st pass – we do not think we have the scores 100% right.  Of course, we’re happy to hear from you however you want to engage: in intentionally named the committed “defcore” to make it easier to cross-reference and search.

We will eventually use Refstack to collect voting/feedback on capabilities directly from OpenStack community members.

Networking in Cloud Environments, SDN, NFV, and why it matters [part 2 of 2]

scott_jensen2Scott Jensen is an Engineering Director and colleague of mine from Dell with deep networking and operations experience.  He had first hand experience deploying OpenStack and Hadoop and has a critical role in defining Dell’s Reference Architectures in those areas.  When I saw this writeup about cloud networking (first post), I asked if it would be OK to post it here and share it with you.

GUEST POST 2 OF 2 BY SCOTT JENSEN:

So what is different about Cloud and how does it impact on the network

In a traditional data center this was not all that difficult (relatively).  You knew what was going to running on what system (physically) and could plan your infrastructure accordingly.  The majority of the traffic moved in a North/South direction. Or basically from outside the infrastructure (the internet for example) to inside and then responded back out.  You knew that if you had to design a communication channel from an application server to a database server this could be isolated from the other traffic as they did not usually reside on the same system.

scj-net1

Virtualization made this more difficult.  In this model you are sharing systems resources for different applications.  From the networks point of view there are a large number of systems available behind a couple of links.  Live Migration puts another wrinkle in the design as you now have to deal with a specific system moving from one physical server to another.  Network Virtualization helps out a lot with this.  With this you can now move virtual ports from one physical server to another to ensure that when one virtual machine moves from a physical server to another that the network is still available.  In many cases you managed these virtual networks the same as you managed your physical network.  As a matter of fact they were designed to emulate the physical as much as possible.  The virtual machines still looked a lot like the physical ones they replaced and can be treated in vary much the same way from a traffic flow perspective.  The traffic still is primarily a North/South pattern.

Cloud, however, is a different ball of wax.  Think about the charistics of the Cattle described above.  A cloud application is smaller and purpose built.  The majority of its traffic is between VMs as different tiers which were traditionally on the same system or in the same VM are now spread across multiple VMs.  Therefore its traffic patterns are primarily East/West.  You cannot forget that there is a North/South pattern the same as what was in the other models which is typically user interaction.  It is stateless so that many copies of itself can run in tandem allowing it to elastically scale up and down based on need and as such they are appearing and disappearing from the network.  As these VMs are spawned on the system they may be right next to each other or on different servers or potentially in different Data Centers.  But it gets even better.   scj-net2

Cloud architectures are typically multi-tenant.  This means that multiple customers will utilize this infrastructure and need to be isolated from each other.  And of course Clouds are self-service.  Users/developers can design, build and deploy whenever they want.  Including designing the network interconnects that their applications need to function.  All of this will cause overlapping IP address domains, multiple virtual networks both L2 and L3, requirements for dynamically configuring QOS, Load Balancers and Firewalls.  Lastly in our list of headaches is not the least.  Cloud systems tend to breed like rabbits or multiply like coat hangers in the closet.  There are more and more systems as 10 servers become 40 which becomes 100 then 1000 and so on.

 

So what is a poor Network Engineer to do?

First get a handle on what this Cloud thing is supposed to be for.  If you are one of the lucky ones who can dictate the use of the infrastructure then rock on!  Unfortunately, that does not seem to be the way it goes for many.  In the case where you just cannot predict how the infrastructure will be used I am reminded of the phrase “there is not replacement for displacement”.  Fast links, non-blocking switches, Network Fabrics are all necessary for the physical network but will not get you there.  Sense as a network administrator you cannot predict the traffic patterns who can?  Well the developer and the application itself.  This is what SDN is all about.  It allows a programmatic interface to what is called an overlay network.  A series of tunnels/flows which can build virtual networks on top of the physical network giving that pesky application what it was looking for.  In some cases you may want to make changes to the physical infrastructure.  For example change the configuration of the Firewall or Load Balancer or other network equipment.  SDN vendors are creating plug-ins that can make those types of configurations.  But if this is not good enough for you there is NFV.  The basic idea here is that why have specialized hardware for your core network infrastructure when we can run them virtualized as well?  Let’s run those in VM’s as well, hook them into the virtual network and SDN to configure them and we now can virtualize the routers, load balancers, firewalls and switches.  These technologies are in very much a state of flux right now but they are promising none the less.  Now if we could just virtualize the monitoring and troubleshooting of these environments I’d be happy.