10 ways to make OpenStack more Start-up Friendly [even more critical in wake of recent consolidation]

The Josh McKenty comment that OpenStack is “aggressively anti-startup” for Business Insider got me thinking and today’s news about IBM & Cisco acquiring startups Blue Box & Piston made me decide to early release this post.

2013-03-11_20-01-50_458I think there’s a general confusion about start-ups in OpenStack.  Many of the early (and now acquired) start-ups were selling OpenStack the platform.  Since OpenStack is community infrastructure, that’s a really hard place to differentiate.  Unfortunately, there’s no material install base (yet) to create an ecosystem of start-ups on top of OpenStack.

The real question is not how to make OpenStack start-up friendly, but how to create a thriving system around OpenStack like Amazon and VMware have created.

That said, here’s my list of ten ways that OpenStack could be more start-up friendly:

  1. Accept companies will have some closed tech – Many investors believe that companies need proprietary IP. An “open all things” company will have more trouble with investors.
  2. Stop scoring commits as community currency – Small companies don’t show up in the OpenStack committer economy because they are 1) small and 2) working on their product upstream ahead of OpenStack upstream code.
  3. Have start-up travel assistance – OpenStack demands a lot of travel and start-ups don’t have the funds to chase the world-wide summits and mid-cycles.
  4. Embrace open projects outside of OpenStack governance – Not all companies want or need that type of governance for their start-up code base.  That does not make them less valuable, it just makes them not ready yet.
  5. Stop anointing ecosystem projects as OpenStack projects – Projects that are allowed into OpenStack get to grab to a megaphone even if they have minimal feature sets.
  6. Be language neutral – Python is not the only language and start-ups need to make practical choices based on their objectives, staff and architecture.
  7. Have a stable base – start-ups don’t have time to troubleshoot both their own product and OpenStack.  Without core stability, it’s risky to add OpenStack as a product requirement.
  8. Focus on interoperability – Start-ups don’t have time evangelize OpenStack.  They need OpenStack to have large base of public and private installs because that creates an addressable market.
  9. Limit big companies from making big pre-announcements – Start-ups primary advantage is being a first/fast mover.  When OpenStack members make announcements of intention (generally without substance) it damages the market for start-ups.  Normally corporate announcements are just noise but they are given credibility when they appear to come from the community.
  10. Reduce the contribution tax and patch backlog – Start-ups must seek the path of least friction.  If needed OpenStack code changes require a lot of work and time then they are unlikely to look for less expensive alternatives.

While I believe these items would help start-ups, they would have negative consequences for the large corporate contributors who have fashioned OpenStack into the type of project that supports their needs.

I’d love to what items you think I’ve overlooked or incorrectly added.

Hidden costs of Cloud? No surprises, it’s still about complexity = people cost

Last week, Forbes and ZDnet posted articles discussing the cost of various cloud (451 source material behind wall) full of dollar per hour costs analysis.  Their analysis talks about private infrastructure being an order of magnitude cheaper (yes, cheaper) to own than public cloud; however, the open source price advantages offered by OpenStack are swallowed by added cost of finding skilled operators and its lack of maturity.

At the end of the day, operational concerns are the differential factor.

The Magic 8 Cube

The Magic 8 Cube

These articles get tied down into trying to normalize clouds to $/vm/hour analysis and buried the lead that the operational decisions about what contributes to cloud operational costs.   I explored this a while back in my “magic 8 cube” series about six added management variations between public and private clouds.

In most cases, operations decisions is not just about cost – they factor in flexibility, stability and organizational readiness.  From that perspective, the additional costs of public clouds and well-known stacks (VMware) are easily justified for smaller operations.  Using alternatives means paying higher salaries and finding talent that requires larger scale to justify.

Operational complexity is a material cost that strongly detracts from new platforms (yes, OpenStack – we need to address this!)

Unfortunately, it’s hard for people building platforms to perceive the complexity experienced by people outside their community.  We need to make sure that stability and operability are top line features because complexity adds a very real cost because it comes directly back to cost of operation.

In my thinking, the winners will be solutions that reduce BOTH cost and complexity.  I’ve talked about that in the past and see the trend accelerating as more and more companies invest in ops automation.

From the archives circa 2001: “logical service cloud” patent



Sometimes, it’s fun to go back and read old things

Abstract: A virtualized logical server cloud that enables logical servers to exist independent of physical servers that instantiate the logical servers. Servers are treated as logical resources in order to create a logical server cloud. The logical attributes of a logical server are non-deterministically allocated to physical resources creating a cloud of logical servers over the physical servers. Logical separation is facilitated by the addition of a server cloud manager, which is an automated multi-server management layer. Each logical server has persistent attributes that establish its identity. Each physical server includes or is coupled to physical resources including a network resource, a data storage resource and a processor resource. At least one physical server executes virtualization software that virtualizes physical resources for logical servers. The server cloud manager maintains status and instance information for the logical servers including persistent and non-persistent attributes that link each logical server with a physical server

Inventors: Rob Hirschfeld (me) and Dave McCrory.

Research showing that Short Lived Servers (“mayflies”) create efficiency at scale [DATA REQUESTED]

Last summer, Josh McKenty and I extended the puppies and cattle metaphor to limited life cattle we called “mayflies.” It was an attempt to help drive the cattle mindset (I think of it as social engineering, or maybe PsychOps) by forcing churn. I’ve come to think of it a step in between cattle and chaos monkeys (see Adrian Cockcroft).

While our thoughts were on mainly ops patterns, I’ve heard that there could be a real operational benefit from encouraging this behavior. The increased turn over in the environment improves scheduler optimization, planned load drains and coping with platform/environment migration.

Now we have a chance to quantify this benefit: a college student (disclosure: he’s my son) has created a data center emulation to see if Mayflies help with utilization. His model appears to work.

Now, he needs some real world data, here’s his request for assistance [note: he needs data by 1/20 to be included in this term]:


I am Alexander Hirschfeld, a freshman at Rose-Hulman Institute of Technology. I am working on an independent study about Mayflies, a new idea in virtual machine management in cloud computing. Part of this management is load balancing and resource allocation for virtual machines across a collection of servers. The emulation that I am working on needs a realistic set of data to be the most accurate when modeling the results of using the methods outlined by the theory of mayflies.

Mayflies are an extension of the puppies verses cattle approach to machines, they are the extreme version of cattle as they have a known limited lifespan, such as 7 days. This requires the users of the cloud to build inherently more automated and fault-resistant applications. If you could send me a collection of the requests for new virtual machines(per standard unit of time and their requested specs/size), as well as an average lifetime for the virtual machines (or a graph or list of designated/estimated life times), and a basic summary of the collection of servers running the virtual machines(number, ram, cores), I would be better able to understand how Mayflies can affect a cloud.

Alexander Hirschfeld, twitter: @d-qoi

Needless to say, I’m really excited about the progress on demonstrating some the impact of this practice and am looking forward to posting about his results in the near future.

If you post in the comments, I will make sure you are connected to Alex.

10 pounds of OpenStack cloud in a 5 pound bag? Do we need a bigger bag?

Yesterday, I posted about cloud distruptors that are pushing the boundaries of cloud. The same forces pull at OpenStack where we are working to balance between including all aspects of running workloads and focusing on a stable foundation.

Note: I am seeking re-election to the 2015 OpenStack Board.  Voting starts 1/12.

For weeks, I’ve been reading and listening to people inside and outside the community.  There is considerable angst about the direction of OpenStack.  We need to be honest and positive about challenges without simply throwing stones in our hall of mirrors.

Closing 2014, OpenStack has gotten very big, very fast.  We’ve exploded scope, contributions and commercial participants.  Unfortunately, our process infrastructure (especially the governance by-laws) simply have not kept pace.  It’s not a matter of scaling processes we’ve got; many of the challenges created by growth require new approaches and thinking (Thierry’s post).

OpenStack BagIn 2015, we’re trying to put 10 pounds of OpenStack in a 5 pound bag.  That means we have to either a) shed 5 pounds or b) get a bigger bag.  In classic OpenStack style, we’re sort of doing both: identifying a foundational base while expanding to allow more subprojects.

To my ear, most users, operators and business people would like to see the focus being on the getting the integrated release scope solid.  So, in spirit of finding 5 pounds to shave, I’ve got five “shovel ready” items that should help:

  1. Prioritizing stability as our #1 feature.  Accomplishing this will require across the broad alignment of the vendor’s product managers to hold back on their individual priorities in favor of community.  We’ve started this effort but it’s going to take time to create the collaboration needed.
  2. Sending a clear signal about the required baseline for OpenStack.  That’s the purpose of DefCore and should be felt as we work on the Icehouse and Juno definitions.
  3. Alignment of the Board DefCore project with Technical Committee’s Levels/Big Tent initiative.  By design, these efforts interconnect.  We need to make sure the work is coordinated so that we send a clearly aligned message to the technical, operator, vendor and user communities.
  4. Accelerate changes from single node gate to something that’s either a) more services focused or b) multi-node.  OpenStack’s scale of community development  requires automation to validate the new contributions do not harm the existing code base (the gate).  The current single-node gate does not reflect the multi-node environments that users target with the code.  While it’s technically challenging to address this mismatch, it’s also essential so we ensure that we’re able to validate multi-node features.
  5. Continue to reduce drama in the open source processes.   OpenStack is infrastructure software that should enable an exciting and dynamic next generation of IT.  I hear people talk about CloudStack as “it’s not as exciting or active a community but their stuff just works.”  That’s what enterprises and operators want.  Drama is great for grabbing headings but not so great for building solid infrastructure.

What is the downside to OpenStack if we cannot accomplish these changes?  Forks.

I already see a clear pattern where vendors are creating their own distros (which are basically shallow forks) to preserve their own delivery cycle.  OpenStack’s success is tied to its utility for the customers of vendors who fund the contributors.  When the cost of being part of the community outweighs the value, those shallow forks may become true independent products.

In the case of potential forks, they allow vendors to create their own bag and pick how many pounds of cloud they want to carry.  It’s our job as a community in 2015 to make sure that we’ve reduced that temptation.

1/9/15 Note: Here’s the original analogy image used for this post

2015, the year cloud died. Meet the seven riders of the cloudocalypse

i can hazAfter writing pages of notes about the impact of Docker, microservice architectures, mainstreaming of Ops Automation, software defined networking, exponential data growth and the explosion of alternative hardware architecture, I realized that it all boils down to the death of cloud as we know it.

OK, we’re not killing cloud per se this year.  It’s more that we’ve put 10 pounds of cloud into a 5 pound bag so it’s just not working in 2015 to call it cloud.

Cloud was happily misunderstood back in 2012 as virtualized infrastructure wrapped in an API beside some platform services (like object storage).

That illusion will be shattered in 2015 as we fully digest the extent of the beautiful and complex mess that we’ve created in the search for better scale economics and faster delivery pipelines.  2015 is going to cause a lot of indigestion for CIOs, analysts and wandering technology executives.  No one can pick the winners with Decisive Leadership™ alone because there are simply too many possible right ways to solve problems.

Here’s my list of the seven cloud disrupting technologies and frameworks that will gain even greater momentum in 2015:

  1. Docker – I think that Docker is the face of a larger disruption around containers and packaging.  I’m sure Docker is not the thing alone.  There are a fleet of related technologies and Docker replacements; however, there’s no doubt that it’s leading a timely rethinking of application life-cycle delivery.
  2. New languages and frameworks – it’s not just the rapid maturity of Node.js and Go, but the frameworks and services that we’re building (like Cloud Foundry or Apache Spark) that change the way we use traditional languages.
  3. Microservice architectures – this is more than containers, it’s really Functional Programming for Ops (aka FuncOps) that’s a new generation of service oriented architecture that is being empowered by container orchestration systems (like Brooklyn or Fleet).  Using microservices well seems to redefine how we use traditional cloud.
  4. Mainstreaming of Ops Automation – We’re past “if DevOps” and into the how. Ops automation, not cloud, is the real puppies vs cattle battle ground.  As IT creates automation to better use clouds, we create application portability that makes cloud disappear.  This freedom translates into new choices (like PaaS, containers or hardware) for operators.
  5. Software defined networking – SDN means different things but the impacts are all the same: we are automating networking and integrating it into our deployments.  The days of networking and compute silos are ending and that’s going to change how we think about cloud and the supporting infrastructure.
  6. Exponential data growth – you cannot build applications or infrastructure without considering how your storage needs will grow as we absorb more data streams and internet of things sources.
  7. Explosion of alternative hardware architecture – In 2010, infrastructure was basically pizza box or blade from a handful of vendors.  Today, I’m seeing a rising tide of alternatives architectures including ARM, Converged and Storage focused from an increasing cadre of sources including vendors sharing open designs (OCP).  With improved automation, these new “non-cloud” options become part of the dynamic infrastructure spectrum.

Today these seven items create complexity and confusion as we work to balance the new concepts and technologies.  I can see a path forward that redefines IT to be both more flexible and dynamic while also being stable and performing.

Want more 2015 predictions?  Here’s my OpenStack EOY post about limiting/expanding the project scope.

Networking in Cloud Environments, SDN, NFV, and why it matters [part 2 of 2]

scott_jensen2Scott Jensen is an Engineering Director and colleague of mine from Dell with deep networking and operations experience.  He had first hand experience deploying OpenStack and Hadoop and has a critical role in defining Dell’s Reference Architectures in those areas.  When I saw this writeup about cloud networking (first post), I asked if it would be OK to post it here and share it with you.


So what is different about Cloud and how does it impact on the network

In a traditional data center this was not all that difficult (relatively).  You knew what was going to running on what system (physically) and could plan your infrastructure accordingly.  The majority of the traffic moved in a North/South direction. Or basically from outside the infrastructure (the internet for example) to inside and then responded back out.  You knew that if you had to design a communication channel from an application server to a database server this could be isolated from the other traffic as they did not usually reside on the same system.


Virtualization made this more difficult.  In this model you are sharing systems resources for different applications.  From the networks point of view there are a large number of systems available behind a couple of links.  Live Migration puts another wrinkle in the design as you now have to deal with a specific system moving from one physical server to another.  Network Virtualization helps out a lot with this.  With this you can now move virtual ports from one physical server to another to ensure that when one virtual machine moves from a physical server to another that the network is still available.  In many cases you managed these virtual networks the same as you managed your physical network.  As a matter of fact they were designed to emulate the physical as much as possible.  The virtual machines still looked a lot like the physical ones they replaced and can be treated in vary much the same way from a traffic flow perspective.  The traffic still is primarily a North/South pattern.

Cloud, however, is a different ball of wax.  Think about the charistics of the Cattle described above.  A cloud application is smaller and purpose built.  The majority of its traffic is between VMs as different tiers which were traditionally on the same system or in the same VM are now spread across multiple VMs.  Therefore its traffic patterns are primarily East/West.  You cannot forget that there is a North/South pattern the same as what was in the other models which is typically user interaction.  It is stateless so that many copies of itself can run in tandem allowing it to elastically scale up and down based on need and as such they are appearing and disappearing from the network.  As these VMs are spawned on the system they may be right next to each other or on different servers or potentially in different Data Centers.  But it gets even better.   scj-net2

Cloud architectures are typically multi-tenant.  This means that multiple customers will utilize this infrastructure and need to be isolated from each other.  And of course Clouds are self-service.  Users/developers can design, build and deploy whenever they want.  Including designing the network interconnects that their applications need to function.  All of this will cause overlapping IP address domains, multiple virtual networks both L2 and L3, requirements for dynamically configuring QOS, Load Balancers and Firewalls.  Lastly in our list of headaches is not the least.  Cloud systems tend to breed like rabbits or multiply like coat hangers in the closet.  There are more and more systems as 10 servers become 40 which becomes 100 then 1000 and so on.


So what is a poor Network Engineer to do?

First get a handle on what this Cloud thing is supposed to be for.  If you are one of the lucky ones who can dictate the use of the infrastructure then rock on!  Unfortunately, that does not seem to be the way it goes for many.  In the case where you just cannot predict how the infrastructure will be used I am reminded of the phrase “there is not replacement for displacement”.  Fast links, non-blocking switches, Network Fabrics are all necessary for the physical network but will not get you there.  Sense as a network administrator you cannot predict the traffic patterns who can?  Well the developer and the application itself.  This is what SDN is all about.  It allows a programmatic interface to what is called an overlay network.  A series of tunnels/flows which can build virtual networks on top of the physical network giving that pesky application what it was looking for.  In some cases you may want to make changes to the physical infrastructure.  For example change the configuration of the Firewall or Load Balancer or other network equipment.  SDN vendors are creating plug-ins that can make those types of configurations.  But if this is not good enough for you there is NFV.  The basic idea here is that why have specialized hardware for your core network infrastructure when we can run them virtualized as well?  Let’s run those in VM’s as well, hook them into the virtual network and SDN to configure them and we now can virtualize the routers, load balancers, firewalls and switches.  These technologies are in very much a state of flux right now but they are promising none the less.  Now if we could just virtualize the monitoring and troubleshooting of these environments I’d be happy.