Czan we consider Ansible Inventory as simple service registry?

... "docker exec configure file" is a sad but common pattern ...

np2utaoe_400x400Interesting discussions happen when you hang out with straight-talking Paul Czarkowski. There’s a long chain of circumstance that lead us from an Interop panel together at Barcelona (video) to bemoaning Ansible and Docker integration early one Sunday morning outside a gate in IAD.

What started as a rant about czray ways people find of injecting configuration into containers (we seemed to think file mounting configs was “least horrific”) turned into an discussion about how to retro-fit application registry features (like consul or etcd) into legacy applications.

Ansible Inventory is basically a static registry service.

While we both acknowledge that Ansible inventory is distinctly not a registry service, the idea is a useful way to help explain the interaction between registry and configuration.  The most basic goal of a registry (there are others!) is to have system components be able to find and integrate with other system components.  In that sense, the inventory creates allows operators to pre-wire this information in advance in a functional way.

The utility quickly falls apart because it’s difficult to create re-runable Ansible (people can barely pronounce idempotent as it is) that could handle incremental updates.  Also, a registry provides many other important functions like service health and basic cross node storage that are import.

It may not be perfect, but I thought it was @pczarkowski insight worth passing on.  What do you think?

As CloudFoundry Builds Ecosystem and Utility, What Challenges Arise? (observations from CFSummit)

I’ve been on the outskirts of the CloudFoundry (CF) universe from the dawn of the project (it’s a little remembered fact that there was a 2011 Crowbar install of CloudFoundry.

openProgress and investment have been substantial and, happily, organic. Like many platforms, it’s success relies on a reasonable balance between strong opinions about “right” patterns and enough flexibility to accommodate exceptions.

From a well patterned foundation, development teams find acceleration.  This seems to be helping CloudFoundry win some high-profile enterprise adopters.

The interesting challenge ahead of the project comes from building more complex autonomous deployments. With the challenge of horizontal scale of arguably behind them, CF users are starting to build more complex architectures.  This includes dynamic provisioning of the providers (like data bases, object stores and other persistent adjacent services) and connecting to containerized “micro-services.”  (see Matt Stine’s preso)

While this is a natural evolution, it adds an order of magnitude more complexity because the contracts between previously isolated layers are suddenly not reliable.

For example, what happens to a CF deployment when the database provider is field upgraded to a new version.  That could introduce breaking changes in dependent applications that are completely opaque to the data provider.  These are hard problems to solve.

Happily, that’s exactly the discussions that we’re starting to have with container orchestration systems.  It’s also part of the dialog that I’ve been trying to drive with Functional Operations (FuncOps Preso) on the physical automation side.  I’m optimistic that CloudFoundry patterns will help make this problem more tractable.

Continuous Release combats disruptions of “Free Fall” development

Since I posted the “Free Fall” development post, I’ve been thinking a bit about the pros and cons of this type of off-release development.

The OpenStack Swift project does not do free fall because they are on a constant “ship ready” state for the project and only loosely flow the broader OpenStack release track.  My team at Dell also has minimal free fall development because we have a more frequent release clock and choose to have the team focus together through dev/integrate/harden cycles as much as possible.

From a Lean/Agile/CI perspective, I would work to avoid hidden development where possible.  New features are introduced by split test (they are in the code, but not active for most users) so that the all changes in incremental.  That means that refactoring, rearchitecture and new capabilities appear less disruptively.  While it may this approach appears to take more effort in the short term; my experience is that it accelerates delivery because we are less likely to over develop code.

Unfortunately, free fall development has the opposite effect.  Having code that appears in big blocks is contrary to best practices in my opinion.  Further, it rewards groups that work asynchronously and

While I think that OpenStack benefits from free fall work, I think that it is ultimately counter-productive.

Crowbar 2.0 Objectives: Scalable, Heterogeneous, Flexible and Connected

The seeds for Crowbar 2.0 have been in the 1.x code base for a while and were recently accelerated by SuSE.  With the Dell | Cloudera 4 Hadoop and Essex OpenStack-powered releases behind us, we will now be totally focused bringing these seeds to fruition in the next two months.

Getting the core Crowbar 2.0 changes working is not a major refactoring effort in calendar time; however, it will impact current Crowbar developers by changing improving the programming APIs. The Dell Crowbar team decided to treat this as a focused refactoring effort because several important changes are tightly coupled. We cannot solve them independently without causing a larger disruption.

All of the Crowbar 2.0 changes address issues and concerns raised in the community and are needed to support expanding of our OpenStack and Hadoop application deployments.

Our technical objective for Crowbar 2.0 is to simplify and streamline development efforts as the development and user community grows. We are seeking to:

  1. simplify our use of Chef and eliminate Crowbar requirements in our Opscode Chef recipes.
    1. reduce the initial effort required to leverage Crowbar
    2. opens Crowbar to a broader audience (see Upstreaming)
  2. provide heterogeneous / multiple operating system deployments. This enables:
    1. multiple versions of the same OS running for upgrades
    2. different operating systems operating simultaneously (and deal with heterogeneous packaging issues)
    3. accommodation of no-agent systems like locked systems (e.g.: virtualization hosts) and switches (aka external entities)
    4. UEFI booting in Sledgehammer
  3. strengthen networking abstractions
    1. allow networking configurations to be created dynamically (so that users are not locked into choices made before Crowbar deployment)
    2. better manage connected operations
    3. enable pull-from-source deployments that are ahead of (or forked from) available packages.
  4. improvements in Crowbar’s core database and state machine to enable
    1. larger scale concerns
    2. controlled production migrations and upgrades
  5. other important items
    1. make documentation more coupled to current features and easier to maintain
    2. upgrade to Rails 3 to simplify code base, security and performance
    3. deepen automated test coverage and capabilities

Beyond these great technical targets, we want Crowbar 2.0 is to address barriers to adoption that have been raised by our community, customers and partners. We have been tracking concerns about the learning curve for adding barclamps, complexity of networking configuration and packaging into a single ISO.

We will kick off to community part of this effort with an online review on 7/16 (details).

PS: why a refactoring?

My team at Dell does not take on any refactoring changes lightly because they are disruptive to our community; however, a convergence of requirements has made it necessary to update several core components simultaneously. Specifically, we found that desired changes in networking, operating systems, packaging, configuration management, scale and hardware support all required interlocked changes. We have been bringing many of these changes into the code base in preparation and have reached a point where the next steps require changing Crowbar 1.0 semantics.

We are first and foremost an incremental architecture & lean development team – Crowbar 2.0 will have the smallest footprint needed to begin the transformations that are currently blocking us. There is significant room during and after the refactor for the community to shape Crowbar.

The 451 Group Cloudscape report strikes chord misses harmony (DevOps, Hybrid Cloud, Orchestration)

It’s impossible to resist posting about this month’s  451 Group Cloudscape report when it calls me out by name as a leading cloud innovator:

… ProTier founders Dave McCrory and Rob Hirschfeld. ProTier [note: now part of Quest] was, indeed, the first VMware ecosystem vendor to be tracked by The 451 Group. In the face of a skeptical world, these entrepreneurs argued that virtualization needed automation in order to realize its full potential, and that the test lab was the low-hanging fruit. Subsequent events have more than vindicated their view (pg. 33).

It’s even better when the report is worth reading and offers insights into forces shaping the industry.  It’s nice to be “more than vindicated” on an amazing journey we started over 10 years ago!

Rather than recite 451’s points (hybrid cloud = automation + orchestration + devops + pixie dust), I’d rather look at the problem different way as a counterpoint.

The problem is “how do we deal with applications that are scattered over multiple data centers?”

I do not think orchestration is the complete answer.  Current orchestration is too focused on moving around virtual machines (aka workloads).

Ultimately, the solution lies in application architecture; however, I feel that is also a misdirection because cloud is redefining what an “application architecture” means.

Applications are a dynamic mix of compute, storage, and connectivity.

We’re entering an age when all of these ingredients will be delivered as elastic services that will be managed by the applications themselves.  The concept of self management is an extension of DevOps principles that fuse application function and deployment. There are missing pieces, but I’m seeing the innovation moving to fill those gaps.

If you want to see the future of cloud applications then look at the network and storage services that are emerging.  They will tell you more about the future than orchestration.

 

Bad Premise: Cloud Outages are *not* driving IT back to premises

trapped

I wrote this responding to Lauren Carlson‘s (Software Advice) Blog Post.  Lauren – I’d be more likely to agree with the statement that “SLAs are dead”  Here’s why…

<soapbox>

Recent industry buzz about cloud service level agreements (SLAs) and reliability miss the core point about cloud.  Cloud is about agility, business models, consumerization of software and merciless pursuit of efficiency.

The fact that Amazon EC2 built its base without an “enterprise” SLA is exhibit #1 that the IT world changed and it’s not going back.

Here are my reasons why IT pandoras can’t get cloud back into the box.

#1. Cloud has vastly superior network connectivity

The concept of your users accessing your applications from inside your firewall is so 2005.  Today’s reality is that significant amounts of network access is externally routed means that applications need to live where they have excellent bandwidth to their users and to other applications.

#2. Cloud has elastic consumption of resources

Cloud is not less expensive infrastructure, it is mainly more flexible.  If you’re worried about an outage, then cloud is exactly the investment for you because you position a backup site at another location without having to pay for online resources.  It’s much harder to take down a site that invests the time to design a system that dynamically reallocates load between sites.

#3. Cloud drives more robust architecture

The fact that cloud delivery is more opaque and modular without a five 9s SLA has driven a cloud application architecture revolution (see CAP).  We have shifted the app paradigm from robust scale up hardware to robust scale out software.  Also significant, DevOps innovations have made deployments repeatable and adaptable.

The only “logical” argument for pulling applications back from the cloud is to assert control over more of the delivery chain for your application.  It the same reason that we think that driving is safer than flying – we’re the ones sitting behind the wheel when we drive.  News flash – driving is NOT safer than flying.

Cloud applications are not about hardware infrastructure, they are about SOFTWARE.  Perhaps one of the greatest disservices foisted on the market was saying cloud is synonymous with “Infrastructure as a Service” and “Virtualization.”  Cloud applications are powerful because we created ways that circumvent the limitations of IaaS and VMs!

</soapbox>

DevOps: There’s a new sheriff in Cloudville

DevOps SherrifLately there’s a flurry of interest (and hiring demand) for DevOps gurus.  It’s obvious to me that there’s as much agreement between the ethical use of ground unicorn horn as there is about the job description of a DevOps tech.

I look at the world very simply:

  • Developers = generate revenue
  • Ops = control expenses
  • DevOps = write code, setup infrastructure, ??? IDK!

Before I risk my supply of ethically obtained unicorn powder by defining DevOps, I want to explore why DevOps is suddenly hot.  With cloud driving horizontal scale applications (see RAIN posts), there’s been a sea change in the type of expertise needed to manage an application.

Stereotypically, Ops teams get code over the transom from Dev teams.  They have the job of turning the code into a smoothly running application.  That requires rigid controls and safe guards.  Traditionally, Ops could manage most of the scale and security aspects of an application with traditional scale-up, reliability, and network security practices.  These practices naturally created some IT expense and policy rigidity; however, that’s what it takes to keep the lights on with 5 nines (or 5 nyets if you’re an IT customer).

Stereotypically, Dev teams live a carpe diem struggle to turn their latest code into deployed product with the least delay.  They have the job of capturing mercurial customer value by changing applications rapidly.  Traditionally, they have assumed that problems like scale, reliability, and security could be added after the fact or fixed as they are discovered.  These practices naturally created a need to constantly evolve.

In the go-go cloud world, Dev teams are by-passing Ops by getting infrastructure directly from an IaaS provider.  Meanwhile, IaaS does not provide Ops the tools, access, and controls that they have traditionally relied on for control and management.  Consequently, Dev teams have found themselves having to stage, manage and deploy applications with little expertise in operations.  Further, Ops teams have found themselves handed running cloud applications that they have to secure, scale and maintain applications without the tools they have historically relied on.

DevOps has emerged as the way to fill that gap.  The DevOps hero is comfortable flying blind on an outsourced virtualized cloud, dealing with Ops issues to tighten controls and talking shop with Dev to make needed changes to architecture.  It’s a very difficult job because of the scope of skills and the utter lack of proven best practices.

So what is a day in the life of a DevOp?   Here’s my list:

  • Design and deploy scale out architecture
  • Identify and solve performance bottlenecks
  • Interact with developers to leverage cloud services
  • Interact with operations to integrate with enterprise services
  • Audit and secure applications
  • Manage application footprint based on scale
  • Automate actions on managed infrastructure

This job is so difficult that I think the market cannot supply the needed experts.  That deficit is becoming a forcing function where the cloud industry is being driven to adopt technologies and architectures that reduce the dependence for DevOps skills.  Now, that’s the topic for a future post!