Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.

Ceph in an hour? Boring! How about Ceph hardware optimized with advanced topology networking & IPv6?

This is the most remarkable deployment that I’ve had the pleasure to post about.

The RackN team has refreshed the original OpenCrowbar Ceph deployment to take advantage of the latest capabilities of the platform.  The updated workload (APL2) requires first installing RackN Enterprise or OpenCrowbar.

The update provides five distinct capabilities:

1. Fast and Repeatable

You can go from nothing to a distributed Ceph cluster in an hour.  Need to rehearse on VMs?  That’s even faster.  Want to test and retune your configuration?  Make some changes, take a coffee break and retest.  Of course, with redeploy that fast, you can iterate until you’ve got it exactly right.

2. Automatically Optimized Disc Configuration

The RackN update optimizes the Ceph installation for disk performance by finding and flagging SSDs.  That means that our deploy just works(tm) without you having to reconfigure your OS provisioning scripts or vendor disk layout.

3. Cluster Building and Balancing

This update allows you to place which roles you want on which nodes before you commit to the deployment.  You can decide the right monitor to OSD/MON ratio for your needs.  If you expand your cluster, the system will automatically rebalance the cluster.

4. Advanced Networking Topology & IPv6

Using the network conduit abstraction, you can separate front and back end networks for the cluster.  We also take advantage of native IPv6 support and even use that as the preferred addressing.

5. Both Block and Object Services

Building up from Ready State Core, you can add the Ceph workload and be quickly installing Ceph for block and object storage.
That’s a lot of advanced capabilities included out-of-the-box made possible by having a ops orchestration platform that actually understands metal.
Of course, there’s always more to improve.  Before we take on further automated tuning, we want to hear from you and learn what use-cases are most important.

Ready State Foundation for OpenStack now includes Ceph Storage

For the Paris summit, the OpenCrowbar team delivered a PackStack demo that leveraged Crowbar’s ability to create a OpenStack ready state environment.  For the Vancouver summit, we did something even bigger: we updated the OpenCrowbar Ceph workload.

Cp_1600_1200_DB2A1582-873B-413B-8F3C-103377203FDC.jpegeph is the leading open source block storage back-end for OpenStack; however, it’s tricky to install and few vendors invest the effort to hardware optimize their configuration.  Like any foundation layer, configuration or performance errors in the storage layer will impact the entire system.  Further, the Ceph infrastructure needs to be built before OpenStack is installed.

OpenCrowbar was designed to deploy platforms like Ceph.  It has detailed knowledge of the physical infrastructure and sufficient orchestration to synchronize Ceph Mon cluster bring-up.

We are only at the start of the Ceph install journey.  Today, you can use the open source components to bring up a Ceph cluster in a reliable way that works across hardware vendors.  Much remains to optimize and tune this configuration to take advantage of SSDs, non-Centos environments and more.

We’d love to work with you to tune and extend this workload!  Please join us in the OpenCrowbar community.

Crowbar HK Hack Report

Purple Fuzzy H for Hackathon (and Havana)Overall, I’m happy with our three days of hacking on Crowbar 2.  We’ve reached the critical “deploys workload” milestone and I’m excited about well the design is working and how clearly we’ve been able to articulate our approach in code & UI.

Of course, it’s worth noting again that Crowbar 1 has also had significant progress on OpenStack Havana workloads running on Ubuntu, Centos/RHEL, and SUSE/SLES

Here are the focus items from the hack:

  • Documentation – cleaned up documentation specifically by updating the README in all the projects to point to the real documentation in an effort to help people find useful information faster.  Reminder: if unsure, put documentation in barclamp-crowbar/doc!
  • Docker Integration for Crowbar 2 progress.  You can now install Docker from internal packages on an admin node.  We have a strategy for allowing containers be workload nodes.
  • Ceph installed as workload is working.  This workload revealed the need for UI improvements and additional flags for roles (hello “cluster”)
  • Progress on OpenSUSE and Fedora as Crowbar 2 install targets.  This gets us closer to true multi-O/S support.
  • OpenSUSE 13.1 setup as a dev environment including tests.  This is a target working environment.
  • Being 12 hours offset from the US really impacted remote participation.

One thing that became obvious during the hack is that we’ve reached a point in Crowbar 2 development where it makes sense to move the work into distinct repositories.  There are build, organization and packaging changes that would simplify Crowbar 2 and make it easier to start using; however, we’ve been trying to maintain backwards compatibility with Crowbar 1.  This is becoming impossible; consequently, it appears time to split them.  Here are some items for consideration:

  1. Crowbar 2 could collect barclamps into larger “workload” repos so there would be far fewer repos (although possibly still barclamps within a workload).  For example, there would be a “core” set that includes all the current CB2 barclamps.  OpenStack, Ceph and Hadoop would be their own sets.
  2. Crowbar 2 would have a clearly named “build” or “tools” repo instead of having it called “crowbar”
  3. Crowbar 2 framework would be either part of “core” or called “framework”
  4. We would put these in a new organization (“Crowbar2” or “Crowbar-2”) so that the clutter of Crowbar’s current organization is avoided.

While we clearly need to break apart the repo, this suggestion needs community more discussion!

Looking to Leverage OpenStack Havana? Crowbar delivers 3xL!

openstack_havanaThe Crowbar community has a tradition of “day zero ops” community support for the latest OpenStack release at the summit using our pull-from-source capability.  This release we’ve really gone the extra mile by doing it one THREE Linux distros (Ubuntu, RHEL & SLES) in parallel with a significant number of projects and new capabilities included.

I’m especially excited about Crowbar implementation of Havana Docker support which required advanced configuration with Nova and Glance.  The community also added Heat and Celiometer in the last release cycle plus High Availability (“Titanium”) deployment work is in active development.  Did I mention that Crowbar is rocking OpenStack deployments?  No, because it’s redundant to mention that.  We’ll upload ISOs of this work for easy access later in the week.

While my team at Dell remains a significant contributor to this work, I’m proud to point out to SUSE Cloud leadership and contributions also (including the new Ceph barclamp & integration).  Crowbar has become a true multi-party framework!

 

Want to learn more?  If you’re in Hong Kong, we are hosting a Crowbar Developer Community Meetup on Monday, November 4, 2013, 9:00 AM to 12:00 PM (HKT) in the SkyCity Marriott SkyZone Meeting Room.  Dell, dotCloud/Docker, SUSE and others will lead a lively technical session to review and discuss the latest updates, advantages and future plans for the Crowbar Operations Platform. You can expect to see some live code demos, and participate in a review of the results of a recent Crowbar 2 hackathon.  Confirm your seat here – space is limited!  (I expect that we’ll also stream this event using Google Hangout, watch Twitter #Crowbar for the feed)

My team at Dell has a significant presence at the OpenStack Summit in Hong Kong (details about activities including sponsored parties).  Be sure to seek out my fellow OpenStack Board Member Joseph George, Dell OpenStack Product Manager Kamesh Pemmaraju and Enstratius/Dell Multi-Cloud Manager Founder George Reese.

Note: The work referenced in this post is about Crowbar v1.  We’ve also reached critical milestones with Crowbar v2 and will begin implementing Havana on that platform shortly.

SUSE Cloud powered by OpenStack > Demo using Crowbar

OpenStack booth at HostingConAs much as I love talking about Crowbar and OpenStack, it’s even more fun to cheer for other people doing it!

SUSE’s be a great development partner for Crowbar and an active member of the OpenStack community.  I’m excited to see them giving a live demo today about their OpenStack technology stack (which includes Crowbar and Ceph).

Register for the Live Demo on Wed 06-26-2013 at 3.00 – 4.00 pm GMT to “learn about SUSE’s OpenStack distribution: SUSE Cloud with Dell Crowbar as the deployment mechanism and advanced features such as Ceph unified storage platform for object, block and file storage in the cloud.”

The presenter, Rick Ashford, lives in Austin and is a regular at the OpenStack Austin Meetups.  He has been working with Linux and open-source software since 1998 and currently specializes in the OpenStack cloud platform and the SUSE ecosystem surrounding it.

double Block Head with OpenStack+Equallogic & Crowbar+Ceph

Block Head

Whew….Yesterday, Dell announced TWO OpenStack block storage capabilities (Equallogic & Ceph) for our OpenStack Essex Solution (I’m on the Dell OpenStack/Crowbar team) and community edition.  The addition of block storage effectively fills the “persistent storage” gap in the solution.  I’m quadrupally excited because we now have:

  1. both open source (Ceph) and enterprise (Equallogic) choices
  2. both Nova drivers’ code is in the open at part of our open source Crowbar work

Frankly, I’ve been having trouble sitting on the news until Dell World because both features have been available in Github before the announcement (EQLX and Ceph-Barclamp).  Such is the emerging intersection of corporate marketing and open source.

As you may expect, we are delivering them through Crowbar; however, we’ve already had customers pickup the EQLX code and apply it without Crowbar.

The Equallogic+Nova Connector

block-eqlx

If you are using Crowbar 1.5 (Essex 2) then you already have the code!  Of course, you still need to have the admin information for your SAN – we did not automate the configuration of the storage system, but the Nova Volume integration.

We have it under a split test so you need to do the following to enable the configuration options:

  1. Install OpenStack as normal
  2. Create the Nova proposal
  3. Enter “Raw” Attribute Mode
  4. Change the “volume_type” to “eqlx”
  5. Save
  6. The Equallogic options should be available in the custom attribute editor!  (of course, you can edit in raw mode too)

Want Docs?  Got them!  Check out these > EQLX Driver Install Addendum

Usage note: the integration uses SSH sessions.  It has been performance tested but not been tested at scale.

The Ceph+Nova Connector

block-ceph

The Ceph capability includes a Ceph barclamp!  That means that all the work to setup and configure Ceph is done automatically done by Crowbar.  Even better, their Nova barclamp (Ceph provides it from their site) will automatically find the Ceph proposal and link the components together!

Ceph has provided excellent directions and videos to support this install.

Dell Crowbar Project: Open Source Cloud Deployer expands into the Community

Note: Cross posted on Dell Tech Center Blogs.

Background: Crowbar is an open source cloud deployment framework originally developed by Dell to support our OpenStack and Hadoop powered solutions.  Recently, it’s scope has increased to include a DevOps operations model and other deployments for additional cloud applications.

It’s only been a matter of months since we open sourced the Dell Crowbar Project at OSCON in June 2011; however, the progress and response to the project has been over whelming.  Crowbar is transforming into a community tool that is hardware, operating system, and application agnostic.  With that in mind, it’s time for me to provide a recap of Crowbar for those just learning about the project.

Crowbar started out simply as an installer for the “Dell OpenStack™-Powered Cloud Solution” with the objective of deploying a cloud from unboxed servers to a completely functioning system in under four hours.  That meant doing all the BIOS, RAID, Operations services (DNS, NTP, DHCP, etc.), networking, O/S installs and system configuration required creating a complete cloud infrastructure.  It was a big job, but one that we’d been piecing together on earlier cloud installation projects.  A key part of the project involved collaborating with Opscode Chef Server on the many system configuration tasks.  Ultimately, we met and exceeded the target with a complete OpenStack install in less than two hours.

In the process of delivering Crowbar as an installer, we realized that Chef, and tools like it, were part of a larger cloud movement known as DevOps.

The DevOps approach to deployment builds up systems in a layered model rather than using packaged images.  This layered model means that parts of the system are relatively independent and highly flexible.  Users can choose which components of the system they want to deploy and where to place those components.  For example, Crowbar deploys Nagios by default, but users can disable that component in favor of their own monitoring system.  It also allows for new components to identify that Nagios is available and automatically register themselves as clients and setup application specific profiles.  In this way, Crowbar’s use of a DevOps layered deployment model provides flexibility for BOTH modularized and integrated cloud deployments.

We believe that operations that embrace layered deployments are essential for success because they allow our customers to respond to the accelerating pace of change.  We call this model for cloud data centers “CloudOps.”

Based on the flexibility of Crowbar, our team decided to use it as the deployment model for our Apache™ Hadoop™ project (“Dell | Apache Hadoop Solution”).  While a good fit, adding Hadoop required expanding Crowbar in several critical ways.

  1. We had to make major changes in our installation and build processes to accommodate multi-operating system support (RHEL 5.6 and Ubuntu 10.10 as of Oct 2011).
  2. We introduced a modularization concept that we call “barclamps” that package individual layers of the deployment infrastructure.  These barclamps reach from the lowest system levels (IPMI, BIOS, and RAID) to the highest (OpenStack and Hadoop).

Barclamps are a very significant architecture pattern for Crowbar:

  1. They allow other applications to plug into the framework and leverage other barclamps in the solution.  For example, VMware created a Cloud Foundry barclamp and Dream Host has created a Ceph barclamp.  Both barclamps are examples of applications that can leverage Crowbar for a repeatable and predictable cloud deployment.
  2. They are independent modules with their own life cycle.  Each one has its own code repository and can be imported into a live system after initial deployment.  This allows customers to expand and manage their system after initial deployment.
  3. They have many components such as Chef Cookbooks, custom UI for configuration, dependency graphs, and even localization support.
  4. They offer services that other barclamps can consume.  The Network barclamp delivers many essential services for bootstrapping clouds including IP allocation, NIC teaming, and node VLAN configuration.
  5. They can provide extensible logic to evaluate a system and make deployment recommendations.  So far, no barclamps have implemented more than the most basic proposals; however, they have the potential for much richer analysis.

Making these changes was a substantial investment by Dell, but it greatly expands the community’s ability to participate in Crowbar development.  We believe these changes were essential to our team’s core values of open and collaborative development.

Most recently, our team moved Crowbar development into the open.  This change was reflected in our work on OpenStack Diablo (+ Keystone and Dashboard) with contributions by Opscode and Rackspace Cloud Builders.  Rather than work internally and push updates at milestones, we are now coding directly from the Crowbar repositories on Github.  It is important to note that for licensing reasons, Dell has not open sourced the optional BIOS and RAID barclamps.  This level of openness better positions us to collaborate with the crowbar community.

For a young project, we’re very proud of the progress that we’ve made with Crowbar.  We are starting a new chapter that brings new challenges such as expanding community involvement, roadmap transparency, and growing Dell support capabilities.  You will also begin to see optional barclamps that interact with proprietary and licensed hardware and software.  All of these changes are part of growing Crowbar in framework that can support a vibrant and rich ecosystem.

We are doing everything we can to make it easy to become part of the Crowbar community.  Please join our mailing list, download the open source code or ISO, create a barclamp, and make your voice heard.  Since Dell is funding the core development on this project, contacting your Dell salesperson and telling them how much you appreciate our efforts goes a long way too.

OpenStack Day 2 Aspiration: Dreaming & Breathing

Between partnering meetings, I bounced through biz and tech sessions during Day 2 of the OpenStack conference (day 1 notes).   After my impression summary, I’m including some succinct impressions, pictures, and copies of presentations by my Dell team-mates Greg Althaus & Brent Douglas.

Clouds on the road to Bexar
My overwhelming impression is a healthy tension between aspirational* and practical discussions.  The community appetite for big broad and bodacious features is understandably high: cloud seems on track as a solution for IT problems but there are is still an impedance mismatch between current apps and cloud capabilities.
As service providers ASPire to address these issues, some OpenStack blue print discussions tended to digress towards more forward-looking or long-term designs.  However, watching the crowd, there was also a quietly heads down and pragmatic audience ready to act and implement.  For this action focused group, delivering working a cloud was the top priority.  The Rackers and Nebuliziers have product to deploy and will not be distracted from the immediate concerns of living, breathing shippable code.
I find the tension between dreaming aspiration (cloud futures) and breathing aspiration (cloud delivery) necessary to the vitality of OpenStack.
[Day 3 update, these coders are holding the floor.  People who are coding have moved into the front seats of the fishbowl and the process is working very nicely.]
Specific Comments (sorry, not linking everything):
  • Cloud networking is a mess and there is substantial opportunity for innovation here.  Nicira was making an impression talking about how Open vSwitch and OpenFlow could address this at the edge switches.  interesting,  but messy.
  • I was happy with our (Dell’s) presentations: real clouds today (Bexas111010DataCenterChanges) and what to deploy on (Bexar111010OpenStackOnDCS).
  • SheepDog was presented as a way to handle block storage.  Not an iSCSI solution, works directly w/ KVM.  Strikes me as too limiting – I’d rather see just using iSCSI.  We talked about GlusterFS or Ceph (NewDream).  This area needs a lot of work to catch up with Amazon EBS.  Unfortunately, persisting data on VM “local” disks is still the dominate paradigm.
  • Discussions about how to scale drifted towards aspirational.
  • Scalr did a side presentation about automating failover.
  • Discussion about migration from Eucalyptus to OpenStack got side tracked with aspirations for a “hot” migration.  Ultimately, the differences between network was a problem.  The practical issue is discovering the meta data – host info not entirely available from the API.
  • Talked about an API for cloud networking.  This blue print was heavily attended and messy.  The possible network topologies present too many challenges to describe easily.  Fundamentally, there seems consensus that the API should have a very very simple concept of connecting VM end points to a logical segment.  That approach leverages the accepted (but out dated) VLAN semantic, but implementation will have to be topology aware.  ouch!
  • Day 3 topic Live migration: Big crowd arguing with bated breath about this.  The summary “show us how to do it without shared storage THEN we’ll talk about the API.”
Executive Tweet:  #OpenStack getting to down business.  Big dreams.  Real problems.  Delivering Code.
 
Note: I nominate Aspirational for 2010 buzzword of the year.

Greg PresentingBig Crowd on Day 1