10 pounds of OpenStack cloud in a 5 pound bag? Do we need a bigger bag?

Yesterday, I posted about cloud distruptors that are pushing the boundaries of cloud. The same forces pull at OpenStack where we are working to balance between including all aspects of running workloads and focusing on a stable foundation.

Note: I am seeking re-election to the 2015 OpenStack Board.  Voting starts 1/12.

For weeks, I’ve been reading and listening to people inside and outside the community.  There is considerable angst about the direction of OpenStack.  We need to be honest and positive about challenges without simply throwing stones in our hall of mirrors.

Closing 2014, OpenStack has gotten very big, very fast.  We’ve exploded scope, contributions and commercial participants.  Unfortunately, our process infrastructure (especially the governance by-laws) simply have not kept pace.  It’s not a matter of scaling processes we’ve got; many of the challenges created by growth require new approaches and thinking (Thierry’s post).

OpenStack BagIn 2015, we’re trying to put 10 pounds of OpenStack in a 5 pound bag.  That means we have to either a) shed 5 pounds or b) get a bigger bag.  In classic OpenStack style, we’re sort of doing both: identifying a foundational base while expanding to allow more subprojects.

To my ear, most users, operators and business people would like to see the focus being on the getting the integrated release scope solid.  So, in spirit of finding 5 pounds to shave, I’ve got five “shovel ready” items that should help:

  1. Prioritizing stability as our #1 feature.  Accomplishing this will require across the broad alignment of the vendor’s product managers to hold back on their individual priorities in favor of community.  We’ve started this effort but it’s going to take time to create the collaboration needed.
  2. Sending a clear signal about the required baseline for OpenStack.  That’s the purpose of DefCore and should be felt as we work on the Icehouse and Juno definitions.
  3. Alignment of the Board DefCore project with Technical Committee’s Levels/Big Tent initiative.  By design, these efforts interconnect.  We need to make sure the work is coordinated so that we send a clearly aligned message to the technical, operator, vendor and user communities.
  4. Accelerate changes from single node gate to something that’s either a) more services focused or b) multi-node.  OpenStack’s scale of community development  requires automation to validate the new contributions do not harm the existing code base (the gate).  The current single-node gate does not reflect the multi-node environments that users target with the code.  While it’s technically challenging to address this mismatch, it’s also essential so we ensure that we’re able to validate multi-node features.
  5. Continue to reduce drama in the open source processes.   OpenStack is infrastructure software that should enable an exciting and dynamic next generation of IT.  I hear people talk about CloudStack as “it’s not as exciting or active a community but their stuff just works.”  That’s what enterprises and operators want.  Drama is great for grabbing headings but not so great for building solid infrastructure.

What is the downside to OpenStack if we cannot accomplish these changes?  Forks.

I already see a clear pattern where vendors are creating their own distros (which are basically shallow forks) to preserve their own delivery cycle.  OpenStack’s success is tied to its utility for the customers of vendors who fund the contributors.  When the cost of being part of the community outweighs the value, those shallow forks may become true independent products.

In the case of potential forks, they allow vendors to create their own bag and pick how many pounds of cloud they want to carry.  It’s our job as a community in 2015 to make sure that we’ve reduced that temptation.

1/9/15 Note: Here’s the original analogy image used for this post

2015, the year cloud died. Meet the seven riders of the cloudocalypse

i can hazAfter writing pages of notes about the impact of Docker, microservice architectures, mainstreaming of Ops Automation, software defined networking, exponential data growth and the explosion of alternative hardware architecture, I realized that it all boils down to the death of cloud as we know it.

OK, we’re not killing cloud per se this year.  It’s more that we’ve put 10 pounds of cloud into a 5 pound bag so it’s just not working in 2015 to call it cloud.

Cloud was happily misunderstood back in 2012 as virtualized infrastructure wrapped in an API beside some platform services (like object storage).

That illusion will be shattered in 2015 as we fully digest the extent of the beautiful and complex mess that we’ve created in the search for better scale economics and faster delivery pipelines.  2015 is going to cause a lot of indigestion for CIOs, analysts and wandering technology executives.  No one can pick the winners with Decisive Leadership™ alone because there are simply too many possible right ways to solve problems.

Here’s my list of the seven cloud disrupting technologies and frameworks that will gain even greater momentum in 2015:

  1. Docker – I think that Docker is the face of a larger disruption around containers and packaging.  I’m sure Docker is not the thing alone.  There are a fleet of related technologies and Docker replacements; however, there’s no doubt that it’s leading a timely rethinking of application life-cycle delivery.
  2. New languages and frameworks – it’s not just the rapid maturity of Node.js and Go, but the frameworks and services that we’re building (like Cloud Foundry or Apache Spark) that change the way we use traditional languages.
  3. Microservice architectures – this is more than containers, it’s really Functional Programming for Ops (aka FuncOps) that’s a new generation of service oriented architecture that is being empowered by container orchestration systems (like Brooklyn or Fleet).  Using microservices well seems to redefine how we use traditional cloud.
  4. Mainstreaming of Ops Automation – We’re past “if DevOps” and into the how. Ops automation, not cloud, is the real puppies vs cattle battle ground.  As IT creates automation to better use clouds, we create application portability that makes cloud disappear.  This freedom translates into new choices (like PaaS, containers or hardware) for operators.
  5. Software defined networking – SDN means different things but the impacts are all the same: we are automating networking and integrating it into our deployments.  The days of networking and compute silos are ending and that’s going to change how we think about cloud and the supporting infrastructure.
  6. Exponential data growth – you cannot build applications or infrastructure without considering how your storage needs will grow as we absorb more data streams and internet of things sources.
  7. Explosion of alternative hardware architecture – In 2010, infrastructure was basically pizza box or blade from a handful of vendors.  Today, I’m seeing a rising tide of alternatives architectures including ARM, Converged and Storage focused from an increasing cadre of sources including vendors sharing open designs (OCP).  With improved automation, these new “non-cloud” options become part of the dynamic infrastructure spectrum.

Today these seven items create complexity and confusion as we work to balance the new concepts and technologies.  I can see a path forward that redefines IT to be both more flexible and dynamic while also being stable and performing.

Want more 2015 predictions?  Here’s my OpenStack EOY post about limiting/expanding the project scope.

9 scenarios to have prepared for a College Interview [from someone who does interviews]

This quick advice for preparing for a college interview is also useful for any interview: identify three key strengths and activities then prepare short insightful stories that show your strengths in each activity.  Stories are the strongest way to convey information.

I’ve been doing engineering college interviews since 2013 for my Alma mater, Duke University.  I love meeting the upcoming generation of engineers and seeing how their educational experiences will shape their future careers.  Sadly, I also find that few students are really prepared to showcase themselves well in these interviews.  Since it makes my job simpler if you are prepared, I’m going to post my recommendation for future interviews!

It does not take much to prepare for a college interview: you mainly need to be able to tell some short, detailed stories from your experiences that highlight your strengths.

In my experience, the best interviewees are good at telling short and specific stories that highlight their experiences and strengths.  It’s not that they have better experiences, they are just better prepared at showcasing them.  Being prepared makes you more confident and comfortable which then helps you control of how the interview goes and ensures that you leave the right impression.

1/9/15 Note: Control the interview?  Yes!  You should be planning to lead the interviewer to your strengths.  Don’t passively expect them to dig that information out of you.  It’s a two-way conversation, not an interrogation.

Here’s how it works:

  1. Identify three activities that you are passionate about.  They do not have to represent the majority of you effort.  Select ones that define who you are, or caused you to grow in some way.  They could be general items like “reading” or very specific like “summer camp 2016.”  You need to get excited when you talk about these items.  Put these on the rows/y-axis of a 3×3 grid (see below).
  2. Identify three attributes that describe you (you may want help from friends or parents here).  These words should be enough to give a fast snapshot of who you are.  In the example below, the person would be something like “an adventure seeking leader who values standing out as an individual.”  Put these attributes on the columns/x-axis of your grid as I’ve shown below.
  3. Come up with the nine short stories (3-6 sentences!) for the intersections on the grid where you demonstrated the key attribute during the activity.  They cannot just be statements – you must have stories because they provide easy to remember examples for your interview.  If you don’t have a story for an intersection, then talk about how you plan to work this in the future.

Note: This might feel repetitive when you construct your grid but this technique works exceptionally well during an hour-long interview.  You should repeat yourself because you need reinforce your strengths and leave the interviewer with a sure sense of who you are.

Interview Grid

Sample Grid – Click to Enlarge

 

Remember: An admissions, alumni or faculty interview is all about making a strong impression about who you are and, more importantly, what you will bring to the university.

Having a concrete set of experiences and attributes makes sure that you reinforce your strengths.  By showing them in stories, you will create a much richer picture about who you are than if you simply assert statements about yourself.  Remember the old adage of “show, don’t tell.”

Don’t use this grid as the only basis for your interview!  It should be a foundation that you can come back to during your conversations with college representatives.  These are your key discussion points – let them help you round out the dialog.

Good luck!

PS: Google your interviewer!  I expect the candidates to know me before they meet me.  It is perfectly normal and you’d be crazy to not take advantage of that.

Nextcast #14 Transcription on OpenStack & Crowbar > “we can’t hand out trophies to everyone”

Last week, I was a guest on the NextCast OpenStack podcast hosted by Niki Acosta (EMC) [Jeff Dickey could not join].   I’ve taken some time to transcribe highlights.

We had a great discussion nextcastabout OpenStack, Ops and Crowbar.  I appreciate Niki’s insightful questions and an opportunity to share my opinions.  I feel that we covered years of material in just 1 hour and I appreciate the opportunity to appear on the podcast.

Video from full post (youtube) and the audio for download.

Plus, a FULL TRANSCRIPT!  Here’s my Next Cast #14 Short Transcripton

The objective of this transcription is to help navigate the recording, not replace it.  I did not provide complete context for remarks.

  • 04:30 Birth of Crowbar (to address Ops battle scars)
  • 08:00 The need for repeatable Ready State baseline to help community work together
  • 10:30 Should hardware matter in OpenStack? It has to, details and topology matters not vendor.
  • 11:20 OpenCompute – people are trying to open source hardware design
  • 11:50 When you are dealing with hardware, it matters. You have to get it right.
  • 12:40 Customers are hardware heterogeneous by design (and for ops tooling). Crowbar is neutral territory
  • 14:50 It’s not worth telling people they are wrong, because they are not. There are a lot of right ways to install OpenStack
  • 16:10 Sometimes people make expensive choices because it’s what they are comfortable with and it’s not helpful for me to them they a wrong – they are not.
  • 16:30 You get into a weird corner if you don’t tell anyone no. And an equally weird corner if you tell everyone yes.
  • 18:00 Aspirations of having an interoperable cloud was much harder than the actual work to build it
  • 18:30 Community want to say yes, “bring your code” but to operators that’s very frustrating because they want to be able to make substitutions
  • 19:30 Thinking that if something is included then it’s required – that’s not clear
  • 19:50 Interlock Dilemma [see my back reference]
  • 20:10 Orwell Animal Farm reference – “all animals equal but pigs are more equal”
  • 22:20 Rob defines DefCore, it’s not big and scary
  • 22:35 DefCore is about commercial use, not running the technical project
  • 23:35 OpenStack had to make money for the companies are paying for the developers who participate… they need to see ROI
  • 24:00 OpenStack is an infrastructure project, stability is the #1 feature
  • 24:40 You have to give a reason why you are saying no and a path to yes
  • 25:00 DefCore is test driven: quantitative results
  • 26:15 Balance between whole project and parts – examples are Swiftstack (wants Object only) and Dreamhost (wants Compute only)
  • 27:00 DefCore created core components vs platform levels
  • 27:30 No vendor has said they can implement DefCore without some effort
  • 28:10 We have outlets for vendors who do not want to implement the process
  • 28:30 The Board is not in a position to make technical call about what’s in, we had to build a process for community input
  • 29:10 We had to define something that could say, “this is it and we have to move on”
  • 29:50 What we want is for people to start with the core and then bring in the other projects. We want to know what people are adding so we can make that core in time
  • 30:10 This is not a recommendation is a base.
  • 30:35 OpenStack is a bubble – does not help if we just get together to pad each other on the back, we want to have a thriving ecosystem
  • 31:15 Question: “have vendors been selfish”
  • 31:35 Rob rephrased as “does OpenStack have a tragedy of the commons” problem
  • 32:30 We need to make sure that everyone is contributing back upstream
  • 32:50 Benefit of a Benevolent Dictator is that they can block features unless community needs are met
  • 33:10 We have NOT made it clear where companies should be contributing to the community. We are not doing a good job directing community efforts
  • 33:45 Hidden Influencers becomes OpenStack Product group
  • 34:55 Hidden Influencers were not connecting at the summit in a public way (like developers were)
  • 35:20 Developers could not really make big commitments of their time without the buy in from their managers (product and line)
  • 35:50 Subtle selfishness – focusing on your own features can disrupt the whole release where things would flow better if they helped others
  • 37:40 Rob was concerned that there was a lot of drift between developers and company’s product descriptions
  • 38:20 BYLAWS CHANGES – vote! here’s why we need to change
  • 38:50 Having whole projects designated as core sucks – code in core should be slower and less changing. Innovation at the core will break interoperability
  • 39:40 Hoping that core will help product managers understand where they are using the standard and adding values
  • 41:10 All babies are ugly > with core, that’s good. We are looking for the grown ups who can do work and deliver value. Babies are things you nurture and help grow because they have potential.
  • 42:00 We undermine our credibility in the community when we talk about projects that are babies as if they were ready.
  • 43:15 DefCore’s job was to help pick projects. If everyone is core then we look like a youth soccer team where everyone is getting a trophy
  • 44:30 Question: “What do you tell to users to instill confidence in OpenStack”
  • 44:50 first thing: focus on operations and automation. Table stakes (for any cloud) is getting your deployments automated. Puppies vs Cattle.
  • 45:25 People who were successful with early OpenStack were using automated deployments against the APIs.
  • 46:00 DevOps is a fundamental part of cloud computing – if you’re hand-built and not automated then you are old school IT.
  • 46:40 Niki references Gartner “Bimodal IT” [excellent reference, go read it!]
  • 47:20 VMWare is a great crutch for OpenStack. We can use VMWare for the puppies.
  • 47:45 OpenStack is not going to run on every servers (perhaps that’s heresy) but it does not make sense in every workload
  • 48:15 One size does not fit all – we need to be good at what we’re good at
  • 48:30 OpenStack needs to focus on doing something really well. That means helping people who want to bring automated workloads into the cloud
  • 49:20 Core was about sending a signal about what’s ready and people can rely on
  • 49:45 Back in 2011, I was saying OpenStack was ready for people who would make the operational investment
  • 50:30 We use Crowbar because it makes it easier to do automated deployments for infrastructure like Hadoop and Ceph where you want access to the physical media
  • 51:00 We should be encouraging people to use OpenStack for its use cases
  • 51:30 Existential question for OpenStack: are we a suite or product. The community is split here
  • 51:30 In comparing with Amazon, does OpenStack have to implement it or build an ecosystem to compete
  • 53:00 As soon as you make something THE OpenStack project (like Heat) you are sending a message that the alternates are not welcome
  • 54:30 OpenStack ends up in a trap if we pick a single project and make it the way that we are going do something. New implementations are going to surface from WITHIN the projects and we need to ready for that.
  • 55:15 new implementations are coming, we have to be ready for that. We can make ourselves vulnerable to splitting if we do not prepare.
  • 56:00 API vs Implementation? This is something that splits the community. Ultimately we to be an API spec but we are not ready for that. We have a lot of work to do first using the same code base.
  • 56:50 DefCore has taken a balanced approach using our diversity as a strength
  • 57:20 Bylaws did not allow for enough flexibility for what is core
  • 59:00 We need voters for the quorum!
  • 59:30 Rob recommended Rocky Grober (Huawei) and Shamail Tahir (EMC) for future shows

OpenCrowbar 2.1 Released Last Week with new integrations and support

Crowbar 2.1 Release brings commercial support, hardware configs, chef and saltstack

OpenCrowbarLast week, the Crowbar community completed the OpenCrowbar “Broom” release and officially designed it as v2.1.  This release represents 8 months of hardening of the core orchestration engine (including automated testing), the addition of true hardware support (in the optional hardware workload) and preliminary advanced integration with Chef and Saltstack.

Core Features:

  • RAID – Automatically set RAID configuration parameters depending on how the system will be used.
    • Support for LSI controllers
    • Single and Dual RAID configuration
  • BIOS – Automatically set BIOS settings depending on how the system will be used.
    • Configuration setting for Dell PE series systems
  • Out of Band Support-  Configure and manage systems via their OOB interface
    • Support for IPMI and WSMan
  • RPM Installation (it riseth again!) – Install OpenCrowbar via a standard RPM instead of a Docker container

Integrations:

  • SaltStack integration – OpenCrowbar can install SaltStack as a configuration tool to take over after “Ready State”
  • Chef Provisioning (was Chef Metal) – OpenCrowbar driver allows Chef to build clusters on bare metal using the Crowbar API.

Infrastructure:

  • Automated smoke test and code coverage analysis for all pull requests.

And…v2.1 is the first release with commercial support!

RackN (rackn.com) offers consulting and support for the OpenCrowbar v2.1 release.  The company was started by Crowbar founders Greg Althaus, Scott Jensen, Dan Choquette, and myself specifically to productize and extend Crowbar.

Want to try it out?

Delicious 7 Layer DIP (DevOps Infrastructure Provisioning) model with graphic!

Applying architecture and computer science principles to infrastructure automation helps us build better controls.  In this post, we create an OSI-like model that helps decompose the ops environment.

The RackN team discussions about “what is Ready State” have led to some interesting realizations about physical ops.  One of the most critical has been splitting the operational configuration (DNS, NTP, SSH Keys, Monitoring, Security, etc) from the application configuration.

Interactions between these layers is much more dynamic than developers and operators expect.  

In cloud deployments, you can use ask for the virtual infrastructure to be configured in advance via the IaaS and/or golden base images.  In hardware, the environment build up needs to be more incremental because that variations in physical infrastructure and operations have to be accommodated.

Greg Althaus, Crowbar co-founder, and I put together this 7 layer model (it started as 3 and grew) because we needed to be more specific in discussion about provisioning and upgrade activity.  The system view helps explain how layer 5 and 6 operate at the system layer.

7 Layer DIP

The Seven Layers of our DIP:

  1. shared infrastructure – the base layer is about the interconnects between the nodes.  In this model, we care about the specific linkage to the node: VLAN tags on the switch port, which switch is connected, which PDU ID controls turns it on.
  2. firmware and management – nodes have substantial driver (RAID/BIOS/IPMI) software below the operating system that must be configured correctly.   In some cases, these configurations have external interfaces (BMC) that require out-of-band access while others can only be configured in pre-install environments (I call that side-band).
  3. operating system – while the operating system is critical, operators are striving to keep this layer as thin to avoid overhead.  Even so, there are critical security, networking and device mapping functions that must be configured.  Critical local resource management items like mapping media or building network teams and bridges are level 2 functions.
  4. operations clients – this layer connects the node to the logical data center infrastructure is basic ways like time synch (NTP) and name resolution (DNS).  It’s also where more sophisticated operators configure things like distributed cache, centralized logging and system health monitoring.  CMDB agents like Chef, Puppet or Saltstack are installed at the “top” of this layer to complete ready state.
  5. applications – once all the baseline is setup, this is the unique workload.  It can range from platforms for other applications (like OpenStack or Kubernetes) or the software itself like Ceph, Hadoop or anything.
  6. operations management – the external system references for layer 3 must be factored into the operations model because they often require synchronized configuration.  For example, registering a server name and IP addresses in a DNS, updating an inventory database or adding it’s thresholds to a monitoring infrastructure.  For scale and security, it is critical to keep the node configuration (layer 3) constantly synchronized with the central management systems.
  7. cluster coordination – no application stands alone; consequently, actions from layer 4 nodes must be coordinated with other nodes.  This ranges from database registration and load balancing to complex upgrades with live data migration. Working in layer 4 without layer 6 coordination creates unmanageable infrastructure.

This seven layer operations model helps us discuss which actions are required when provisioning a scale infrastructure.  In my experience, many developers want to work exclusively in layer 4 and overlook the need to have a consistent and managed infrastructure in all the other layers.  We enable this thinking in cloud and platform as a service (PaaS) and that helps improve developer productivity.

We cannot overlook the other layers in physical ops; however, working to ready state helps us create more cloud-like boundaries.  Those boundaries are a natural segue my upcoming post about functional operations (older efforts here).