Ceph in an hour? Boring! How about Ceph hardware optimized with advanced topology networking & IPv6?

This is the most remarkable deployment that I’ve had the pleasure to post about.

The RackN team has refreshed the original OpenCrowbar Ceph deployment to take advantage of the latest capabilities of the platform.  The updated workload (APL2) requires first installing RackN Enterprise or OpenCrowbar.

The update provides five distinct capabilities:

1. Fast and Repeatable

You can go from nothing to a distributed Ceph cluster in an hour.  Need to rehearse on VMs?  That’s even faster.  Want to test and retune your configuration?  Make some changes, take a coffee break and retest.  Of course, with redeploy that fast, you can iterate until you’ve got it exactly right.

2. Automatically Optimized Disc Configuration

The RackN update optimizes the Ceph installation for disk performance by finding and flagging SSDs.  That means that our deploy just works(tm) without you having to reconfigure your OS provisioning scripts or vendor disk layout.

3. Cluster Building and Balancing

This update allows you to place which roles you want on which nodes before you commit to the deployment.  You can decide the right monitor to OSD/MON ratio for your needs.  If you expand your cluster, the system will automatically rebalance the cluster.

4. Advanced Networking Topology & IPv6

Using the network conduit abstraction, you can separate front and back end networks for the cluster.  We also take advantage of native IPv6 support and even use that as the preferred addressing.

5. Both Block and Object Services

Building up from Ready State Core, you can add the Ceph workload and be quickly installing Ceph for block and object storage.
That’s a lot of advanced capabilities included out-of-the-box made possible by having a ops orchestration platform that actually understands metal.
Of course, there’s always more to improve.  Before we take on further automated tuning, we want to hear from you and learn what use-cases are most important.

Ready State Foundation for OpenStack now includes Ceph Storage

For the Paris summit, the OpenCrowbar team delivered a PackStack demo that leveraged Crowbar’s ability to create a OpenStack ready state environment.  For the Vancouver summit, we did something even bigger: we updated the OpenCrowbar Ceph workload.

Cp_1600_1200_DB2A1582-873B-413B-8F3C-103377203FDC.jpegeph is the leading open source block storage back-end for OpenStack; however, it’s tricky to install and few vendors invest the effort to hardware optimize their configuration.  Like any foundation layer, configuration or performance errors in the storage layer will impact the entire system.  Further, the Ceph infrastructure needs to be built before OpenStack is installed.

OpenCrowbar was designed to deploy platforms like Ceph.  It has detailed knowledge of the physical infrastructure and sufficient orchestration to synchronize Ceph Mon cluster bring-up.

We are only at the start of the Ceph install journey.  Today, you can use the open source components to bring up a Ceph cluster in a reliable way that works across hardware vendors.  Much remains to optimize and tune this configuration to take advantage of SSDs, non-Centos environments and more.

We’d love to work with you to tune and extend this workload!  Please join us in the OpenCrowbar community.

2015, the year cloud died. Meet the seven riders of the cloudocalypse

i can hazAfter writing pages of notes about the impact of Docker, microservice architectures, mainstreaming of Ops Automation, software defined networking, exponential data growth and the explosion of alternative hardware architecture, I realized that it all boils down to the death of cloud as we know it.

OK, we’re not killing cloud per se this year.  It’s more that we’ve put 10 pounds of cloud into a 5 pound bag so it’s just not working in 2015 to call it cloud.

Cloud was happily misunderstood back in 2012 as virtualized infrastructure wrapped in an API beside some platform services (like object storage).

That illusion will be shattered in 2015 as we fully digest the extent of the beautiful and complex mess that we’ve created in the search for better scale economics and faster delivery pipelines.  2015 is going to cause a lot of indigestion for CIOs, analysts and wandering technology executives.  No one can pick the winners with Decisive Leadership™ alone because there are simply too many possible right ways to solve problems.

Here’s my list of the seven cloud disrupting technologies and frameworks that will gain even greater momentum in 2015:

  1. Docker – I think that Docker is the face of a larger disruption around containers and packaging.  I’m sure Docker is not the thing alone.  There are a fleet of related technologies and Docker replacements; however, there’s no doubt that it’s leading a timely rethinking of application life-cycle delivery.
  2. New languages and frameworks – it’s not just the rapid maturity of Node.js and Go, but the frameworks and services that we’re building (like Cloud Foundry or Apache Spark) that change the way we use traditional languages.
  3. Microservice architectures – this is more than containers, it’s really Functional Programming for Ops (aka FuncOps) that’s a new generation of service oriented architecture that is being empowered by container orchestration systems (like Brooklyn or Fleet).  Using microservices well seems to redefine how we use traditional cloud.
  4. Mainstreaming of Ops Automation – We’re past “if DevOps” and into the how. Ops automation, not cloud, is the real puppies vs cattle battle ground.  As IT creates automation to better use clouds, we create application portability that makes cloud disappear.  This freedom translates into new choices (like PaaS, containers or hardware) for operators.
  5. Software defined networking – SDN means different things but the impacts are all the same: we are automating networking and integrating it into our deployments.  The days of networking and compute silos are ending and that’s going to change how we think about cloud and the supporting infrastructure.
  6. Exponential data growth – you cannot build applications or infrastructure without considering how your storage needs will grow as we absorb more data streams and internet of things sources.
  7. Explosion of alternative hardware architecture – In 2010, infrastructure was basically pizza box or blade from a handful of vendors.  Today, I’m seeing a rising tide of alternatives architectures including ARM, Converged and Storage focused from an increasing cadre of sources including vendors sharing open designs (OCP).  With improved automation, these new “non-cloud” options become part of the dynamic infrastructure spectrum.

Today these seven items create complexity and confusion as we work to balance the new concepts and technologies.  I can see a path forward that redefines IT to be both more flexible and dynamic while also being stable and performing.

Want more 2015 predictions?  Here’s my OpenStack EOY post about limiting/expanding the project scope.

You need a Squid Proxy fabric! Getting Ready State Best Practices

Sometimes a solving a small problem well makes a huge impact for operators.  Talking to operators, it appears that automated configuration of Squid does exactly that.

Not a SQUID but...

If you were installing OpenStack or Hadoop, you would not find “setup a squid proxy fabric to optimize your package downloads” in the install guide.   That’s simply out of scope for those guides; however, it’s essential operational guidance.  That’s what I mean by open operations and creating a platform for sharing best practice.

Deploying a base operating system (e.g.: Centos) on a lot of nodes creates bit-tons of identical internet traffic.  By default, each node will attempt to reach internet mirrors for packages.  If you multiply that by even 10 nodes, that’s a lot of traffic and a significant performance impact if you’re connection is limited.

For OpenCrowbar developers, the external package resolution means that each dev/test cycle with a node boot (which is up to 10+ times a day) is bottle necked.  For qa and install, the problem is even worse!

Our solution was 1) to embed Squid proxies into the configured environments and the 2) automatically configure nodes to use the proxies.   By making this behavior default, we improve the overall performance of a deployment.   This further improves the overall network topology of the operating environment while adding improved control of traffic.

This is a great example of how Crowbar uses existing operational tool chains (Chef configures Squid) in best practice ways to solve operations problems.  The magic is not in the tool or the configuration, it’s that we’ve included it in our out-of-the-box default orchestrations.

It’s time to stop fumbling around in the operational dark.  We need to compose our tool chains in an automated way!  This is how we advance operational best practice for ready state infrastructure.

OpenCrowbar.Anvil released – hammering out a gold standard in open bare metal provisioning

OpenCrowbarI’m excited to be announcing OpenCrowbar’s first release, Anvil, for the community.  Looking back on our original design from June 2012, we’ve accomplished all of our original objectives and more.
Now that we’ve got the foundation ready, our next release (OpenCrowbar Broom) focuses on workload development on top of the stable Anvil base.  This means that we’re ready to start working on OpenStack, Ceph and Hadoop.  So far, we’ve limited engagement on workloads to ensure that those developers would not also be trying to keep up with core changes.  We follow emergent design so I’m certain we’ll continue to evolve the core; however, we believe the Anvil release represents a solid foundation for workload development.
There is no more comprehensive open bare metal provisioning framework than OpenCrowbar.  The project’s focus on a complete operations model that comprehends hardware and network configuration with just enough orchestration delivers on a system vision that sets it apart from any other tool.  Yet, Crowbar also plays nicely with others by embracing, not replacing, DevOps tools like Chef and Puppet.
Now that the core is proven, we’re porting the Crowbar v1 RAID and BIOS configuration into OpenCrowbar.  By design, we’ve kept hardware support separate from the core because we’ve learned that hardware generation cycles need to be independent from the operations control infrastructure.  Decoupling them eliminates release disruptions that we experienced in Crowbar v1 and­ makes it much easier to use to incorporate hardware from a broad range of vendors.
Here are some key components of Anvil
  • UI, CLI and API stable and functional
  • Boot and discovery process working PLUS ability to handle pre-populating and configuration
  • Chef and Puppet capabilities including Birk Shelf v3 support to pull in community upstream DevOps scripts
  • Docker, VMs and Physical Servers
  • Crowbar’s famous “late-bound” approach to configuration and, critically, networking setup
  • IPv6 native, Ruby 2, Rails 4, preliminary scale tuning
  • Remarkably flexible and transparent orchestration (the Annealer)
  • Multi-OS Deployment capability, Ubuntu, CentOS, or Different versions of the same OS
Getting the workloads ported is still a tremendous amount of work but the rewards are tremendous.  With OpenCrowbar, the community has a new way to collaborate and integration this work.  It’s important to understand that while our goal is to start a quarterly release cycle for OpenCrowbar, the workload release cycles (including hardware) are NOT tied to OpenCrowbar.  The workloads choose which OpenCrowbar release they target.  From Crowbar v1, we’ve learned that Crowbar needed to be independent of the workload releases and so we want OpenCrowbar to focus on maintaining a strong ops platform.
This release marks four years of hard-earned Crowbar v1 deployment experience and two years of v2 design, redesign and implementation.  I’ve talked with DevOps teams from all over the world and listened to their pains and needs.  We have a long way to go before we’re deploying 1000 node OpenStack and Hadoop clusters, OpenCrowbar Anvil significantly moves the needle in that direction.
Thanks to the Crowbar community (Dell and SUSE especially) for nurturing the project, and congratulations to the OpenCrowbar team getting us this to this amazing place.

 

Mark Stouse’s “Making Predictions for 14” series

I was invited to be part of Mark Stouse’s 2014 big data & cloud predictions series.  His questions had me thinking deeply about the past year and I’m happy to repost them here with links to the other predictors too including (Robert ScobleShel Israel, and David H. Deans).

1.  Describe in one sentence what you do and why you’re good at it.

I specialize in architecture for infrastructure software for scale data center operations (aka “cloud”) and I have 14 years of battle scars that inform my designs.

 2.  Cloud Computing, Big Data or Consumerization: Which trend do you feel is having the most impact on IT today and why?

Cloud, Data & Consumerization are all connected, so there’s no one clear “most impactful” winner except that all three are forcing IT to rethink how we handle operations.   The pace of change for these categories (many of which are open source driven) is so fast that traditional IT governance cannot keep up.  I’m specifically talking about the DevOps and Lean Software Delivery paradigms.  These approaches do not mean that we’re trading speed for quality; in fact, I’ve seen that we’re adopting techniques that deliver both higher quality and speed.

 3.  What do you think is the biggest misconception about Cloud computing/Big Data/Consumerization?

That someone can purchase them as a SKU.  These are really architectural concepts that impact how we solve problems instead of specific products.  My experience is that customers overlook their need to understand how to change their business to take advantage of these technologies.  It’s the same classic challenge for ROI from most new technologies – they don’t exist apart from the business matching changes to the business to leverage them.

 4.  Which (Cloud Computing/Big Data/Consumerization) trend has surprised you most in the last five years?

Open source has surprised me because we’ve seen it transform from a cost concern into a supply chain concern.  When I started doing open source work for Dell, customers were very interested in innovation and controlling license costs.  This has really changed over the last few years.  Today, customers are more concerned with community participation and transparency of their product code base.  This surprised me until I realized that they are really seeking to ensure that they had maximum control and visibility into their “IT Supply Chain.”   It may seem like a paradox, but open source software is uniquely positioned to help companies maintain more control of their critical IT because they are not tightly coupled to a single vendor.

 5.  How has Cloud Computing/Big Data/Consumerization had the biggest impact in YOUR life to date?

Beyond it being my career, I believe these technologies have created a new degree of freedom for me.  I’m answering these questions from the SFO airport where I’m carrying all of the tools I need to do my job in a space small enough to fit under the seat in front of me plus a free Wifi connection.  I believe we are only just learning how access to information and portable computing will change our experience.  This learning process will be both liberating and painful as we work out the right balances between access, identity and privacy.

 6.  On a lighter note – If Cloud/Big Data/Consumerization could be personified by a superhero, which superhero would it be and why?

The Hulk.   Looks like a friendly geek but it’s going to crush you if you’re not careful.

 7.  What aspect of (Cloud Computing/Big Data/ Consumerization) are you most excited about in the future, and what excites you about it?

The Internet of Things (even if I hate the term) is very exciting because we’re moving into a place where we have real ways to connect our virtual and physical lives.  That translates into cool technologies like self-driving cars and smart power utilities.  I think it will also motivate a revolution in how people interact with computers and each other.  It’s going to open up a whole new dimension on our personal interaction with our surroundings.   I’m specifically thinking about a book “Rainbows End” by Vernor Vinge that paints this future in vivid detail.

Crowbar lays it all out: RAID & BIOS configs officially open sourced

MediaToday, Dell (my employer) announced a plethora of updates to our open source derived solutions (OpenStack and Hadoop). These solutions include the latest bits (Grizzly and Cloudera) for each project. And there’s another important notice for people tracking the Crowbar project: we’ve opened the remainder of its provisioning capability.

Yes, you can now build the open version of Crowbar and it has the code to configure a bare metal server.

Let me be very specific about this… my team at Dell tests Crowbar on a limited set of hardware configurations. Specifically, Dell server versions R720 + R720XD (using WSMAN and iIDRAC) and C6220 + C8000 (using open tools). Even on those servers, we have a limited RAID and NIC matrix; consequently, we are not positioned to duplicate other field configurations in our lab. So, while we’re excited to work with the community, caveat emptor open source.

Another thing about RAID and BIOS is that it’s REALLY HARD to get right. I know this because our team spends a lot of time testing and tweaking these, now open, parts of Crowbar. I’ve learned that doing hard things creates value; however, it’s also means that contributors to these barclamps need to be prepared to get some silicon under their fingernails.

I’m proud that we’ve reached this critical milestone and I hope that it encourages you to play along.

PS: It’s worth noting is that community activity on Crowbar has really increased. I’m excited to see all the excitement.