I spent Tuesday and Wednesday at DevOpsDaysAustin and continue to be impressed with the enthusiasm and collaborative nature of the DOD events. We also managed to have a very robust and engaged twitter backchannel thanks to an impressive pace set by Gene Kim!
DevOpsDays spends a lot of time talking about culture. I’m a huge believer on the importance of culture as the foundation for the type of fundamental changes that we’re making in the IT industry; however, it’s also a sign that we’re still in the minority if we have to talk about culture evangelism.
Process and DevOps are tightly coupled. It’s very clear that Lean/Agile/Kanban are essential for DevOps success (nice job by Dominica DeGrandis). No one even suggested DevOps+Waterfall as a joke (but Patrick Debois had a picture of a xeroxed butt in his preso which is pretty close).
Still need more Devs people to show up! My feeling is that we’ve got a lot of operators who are engaging with developers and fewer developers who are engaging with operators (the “opsdev” people).
Chef Omnibus installer is very compelling. This approach addresses issues with packaging that were created because we did not have configuration management. Now that we have good tooling we separate the concerns between bits, configuration, services and dependencies. This is one thing to watch and something I expect to see in Crowbar.
The old mantra still holds: If something is hard, do it more often.
Not DevOps, but 3D printing is awesome. This is clearly a game changing technology; however, it takes some effort to get right. Dell brought a Solidoodle 3D printer to the event to try and print OpenStack & Crowbar logos (watch for this in the future).
I’d be interested in hearing what other people found interesting! Please comment here and let me know.
The OpenStack Board spent several hours (yes, hours) discussing interoperability related topics at the last board meeting. Fundamentally, the community benefits when uses can operate easily across multiple OpenStack deployments (their own and/or public clouds).
Cloud interoperability: the ability to transfer workloads between systems without changes to the deployment operations management infrastructure.
This is NOT hybrid (which I defined as a workload transparently operating in multiple systems); however it is a prereq to achieve scalable hybrid operation.
Interoperability matters because the OpenStack value proposition is all about creating a common platform. IT World does a good job laying out the problem (note, I work for Dell). To create sites that can interoperate, we have to some serious lifting:
Understanding, discussing and supporting that pattern is an important step toward accelerating open operations. Please engage with us as we make the investments for open operations and help us implement the pattern.
For my role at Dell, I’m continually invited to seasons of meetings to define cloud, cloud architecture and cloud strategy. The reason these meetings go on and on is that everyone wants to make cloud complicated when it’s really very simple.
Cloud is infrastructure with an API.
That’s it. Everything else is just a consequence of having infrastructure with an API because API provides the ability to provide remote control.
What else do people try to lump into cloud? Here are some of my topic cloud obfuscators:
(inter)network. Yes, networks make an API interesting. They are just an essential component but they are not cloud. Most technologies are interesting because of networks: can we stop turning everything networked into cloud? Thanks to nonsensical mega-dollar marketing campaigns, I despair this is a moot point.
as-a-service. That’s another way of saying “accessible via an API.” We have many flavors of Platform, Data, Application, Love or whatever as a Service. That means they have a API. Infrastructure as a Service (IaaS) is a cloud.
virtualization. VMs were the first good example of hardware with an API; however, we had virtual containers (on Mainframes!) long before we had “cloud.” They make cloud easier but they are not cloud.
pay-as-you-go (service pricing). This is a common cloud requirement but it’s really a business model. If someone builds a private cloud then it is still a cloud.
multi-tenant. Another common requirement where we expect a cloud to be able to isolate users. I agree that this is a highly desirable attribute of a good API implementation; however, it’s not essential to a cloud. For example, most public clouds do not have true network isolation model.
elastic demand. IMHO, another word for API driven provisioning.
live migration. This is a cool feature often implemented on top of virtualization, but it’s not cloud. We were doing live migrate with shared storage and clusters before anyone heard of cloud. I don’t think this is cloud at all but someone out there does so I included it in the list.
security. Totally important consideration and required for deployments large and small, but not presence/lack does not make something cloud.
We start talking about these points and then forget the whole API thing. These items are important, but they do not make it “a cloud.” When Dave McCrory and I first discussed API Infrastructure as “cloud,” it was driven by the fact that you could hide the actual infrastructure behind the API. The critical concept was that the API allowed a you to manage a server anywhere from anywhere.
When Amazon offered the first EC2 service, it had to be a cloud because the servers were remote. It was not a cloud because it was on the internet; plenty of other companies were offering hosted servers. It was a cloud because their offering allowed required operators to use and API to interact with the infrastructure. I remember that EC2′s lack of UI (and SLA) causing many to predict it would be a failure; instead, it sparked a revolution.
I’m excited now because we’re entering a new generation of cloud where Infrastructure APIs include networking and storage in addition to compute. Mix in some of the interesting data and network services and we’re going to have truly dynamic and powerful clouds. More importantly, we going to have some truly amazing applications.
What do you think? Is API a sufficient definition of cloud in your opinion?
A vibrant project requires that we reflect honestly: we have an equal measure of challenges: shadow free fall Dev, API vs implementation, forking risk and others. As someone helping users deploy OpenStack today, I find my self straddling between a solid release (Essex) and a innovative one (Grizzly). Frankly, I’m finding it very difficult to focus on Folsom.
Grizzly excites me and clearly I’m not alone. Based on pace of development, I believe we saw a significant developer migration during feature freeze free fall.
In Grizzly, both Cinder and Quantum will have progressed to a point where they are ready for mainstream consumption. That means that OpenStack will have achieved the cloud API trifecta of compute-store-network.
Cinder will get beyond the “replace Nova Volume” feature set and expands the list of connectors.
Quantum will get to parity with Nova Network, addresses overlapping VM IPs and goes beyond L2 with L3 feature enablement like load balancing aaS.
We are having a real dialog about upgrades while the code is still in progress
And new projects like Celio and Heat are poised to address real use problems in billing and application formation.
Everything I hear about Folsom deployment is positive with stable code and significant improvements; however, we’re too late to really influence operability at the code level because the Folsom release is done. This is not a new dilemma. As operators, we seem to be forever chasing the tail of the release.
The perpetual cycle of implementing deployment after release is futile, exhausting and demoralizing because we finish just in time for the spotlight to shift to the next release.
I don’t want to slow the pace of releases. In Agile/Lean, we believe that if something is hard then we do should it more. Instead, I am looking at Grizzly and seeing an opportunity to break the cycle. I am looking at Folsom and thinking that most people will be OK with Essex for a little longer.
Maybe I’m a dreamer, but if we can close the deployment time gap then we accelerate adoption, innovation and happy hour. If that means jilting Folsom at the release altar to elope with Grizzly then I can live with that.
“Double wide” is not a term I’ve commonly applied to servers, but that’s one of the cool things about this new class of servers that Dell, my employer, started shipping today.
My team has been itching for the chance to start cloud and big data reference architectures using this super dense and flexible chassis. You’ll see it included in our next Apache Hadoop release and we’ve already got customers who are making it the foundation of their deployments (Texas Adv Computing Center case study).
If you’re tracking the latest big data & cloud hardware then the Dell PowerEdge C8000 is worth some investigation.
Basically, the Dell C8000 is a chassis that holds a flexible configuration of compute or storage sleds. It’s not a blade frame because the sleds minimize shared infrastructure. In our experience, cloud customers like the dedicated i/o and independence of sleds (as per the Bootstrapping clouds white paper). Those attributes are especially well suited for Hadoop and OpenStack because they support a “flat edges” and scale out design. While i/o independence is valued, we also want shared power infrastructure and density for efficiency reasons. Using a chassis design seems to capture the best of both worlds.
The novelty for the Dell PowerEdge C8000 is that the chassis are scary flexible. You are not locked into a pre-loaded server mix.
There are a plethora of sled choices so that you can mix choices for power, compute density and spindle counts. That includes double-wide sleds positively brimming with drives and expanded GPU processers. Drive density is important for big data configurations that are disk i/o hungry; however, our experience is the customer deployments are highly varied based on the planned workload. There are also significant big data trends towards compute, network, and balanced hardware configurations. Using the C8000 as a foundation is powerful because it can cater to all of these use-case mixes.
Late binding is a programming term that I’ve commandeered for Crowbar’s DevOps design objectives.
We believe that late binding is a best practice for CloudOps.
Understanding this concept is turning out to be an important but confusing differentiation for Crowbar. We’ve effectively inverted the typical deploy pattern of building up a cloud from bare metal; instead, Crowbar allows you to build a cloud from the top down. The difference is critical – we delay hardware decisions until we have the information needed to do the correct configuration.
If Late Binding is still confusing, the concept is really very simple: “we hold off all work until you’ve decided how you want to setup your cloud.”
Late binding arose from our design objectives. We started the project with a few critical operational design objectives:
Treat the nodes and application layers as an interconnected system
Realize that application choices should drive down the entire application stack including BIOS, RAID and networking
Expect the entire system to be in a constantly changing so we must track state and avoid locked configurations.
We’d seen these objectives as core tenets in hyperscale operators who considered bare metal and network configuration to be an integral part of their application deployment. We know it is possible to build the system in layers that only (re)deploy once the application configuration is defined.
We have all this great interconnected automation! Why waste it by having to pre-stage the hardware or networking?
In cloud, late binding is known as “elastic computing” because you wait until you need resources to deploy. But running apps on cloud virtual machines is simple when compared to operating a physical infrastructure. In physical operations, RAID, BIOS and networking matter a lot because there are important and substantial variations between nodes. These differences are what drive late binding as a one of Crowbar’s core design principles.
Given Crowbar‘s frenetic Freshman year, it’s impossible to predict everything that Crowbar could become. I certainly aspire to see the project gain a stronger developer community and the seeds of this transformation are sprouting. I also see that community driven work is positioning Crowbar to break beyond being platforms for OpenStack and Apache Hadoop solutions that pay the bills for my team at Dell to invest in Crowbar development.
I don’t have to look beyond the summer to see important development for Crowbar because of the substantial goals of the Crowbar 2.0 refactor.
Crowbar 2.0 is really just around the corner so I’d like to set some longer range goals for our next year.
Growing acceptance of Crowbar as an in data center extension for DevOps tools (what I call CloudOps)
Deeper integration into more operating environments beyond the core Linux flavors (like virtualization hosts, closed and special purpose operating systems.
Taking on production ops challenges of scale, high availability and migration
Formalization of our community engagement with summits, user groups, and broader developer contributions.
For example, Crowbar 2.0 will be able to handle downloading packages and applications from the internet. Online content is not a major benefit without being able to stage and control how those new packages are deployed; consequently, our goals remains tightly focused improvements in orchestration.
These changes create a foundation that enables a more dynamic operating environment. Ultimately, I see Crowbar driving towards a vision of fully integrated continuous operations; however, Greg & Rob’s Crowbar vision is the topic for tomorrow’s post.
When Greg Althaus and I first proposed the project that would become Dell’s Crowbar, we had already learned first-hand that there was a significant gap in both the technologies and the processes for scale operations. Our team at Dell saw that the successful cloud data centers were treating their deployments as integrated systems (now called DevOps) in which configuration of many components where coordinated and orchestrated; however, these approaches feel short of the mark in our opinion. We wanted to create a truly integrated operational environment from the bare metal through the networking up to the applications and out to the operations tooling.
Our ultimate technical nirvana is to achieve closed-loop continuous deployments. We want to see applications that constantly optimize new code, deployment changes, quality, revenue and cost of operations. We could find parts but not a complete adequate foundation for this vision.
The business driver for Crowbar is system thinking around improved time to value and flexibility. While our technical vision is a long-term objective, we see very real short-term ROI. It does not matter if you are writing your own software or deploying applications; the faster you can move that code into production the sooner you get value from innovation. It is clear to us that the most successful technology companies have reorganized around speed to market and adapting to pace of change.
System flexibility & acceleration were key values when lean manufacturing revolution gave Dell a competitive advantage and it has proven even more critical in today’s dynamic technology innovation climate.
We hope that this post helps define a vision for Crowbar beyond the upcoming refactoring. We started the project with the idea that new tools meant we could take operations to a new level.
While that’s a great objective, we’re too pragmatic in delivery to rest on a broad objective. Let’s take a look at Crowbar’s concrete strengths and growth areas.
Key strength areas for Crowbar
Late binding – hardware and network configuration is held until software configuration is known. This is a huge system concept.
Dynamic and Integrated Networking – means that we treat networking as a 1st class citizen for ops (sort of like software defined networking but integrated into the application)
System Perspective – no Application is an island. You can’t optimize just the deployment, you need to consider hardware, software, networking and operations all together.
Bootstrapping (bare metal) – while not “rocket science” it takes a lot of careful effort to get this right in a way that is meaningful in a continuous operations environment.
Open Source / Open Development / Modular Design – this problem is simply too complex to solve alone. We need to get a much broader net of environments and thinking involved.
Continuing Areas of Leadership
Open / Lean / Incremental Architecture – these are core aspects of our approach. While we have a vision, we also are very open to ways that solve problems faster and more elegantly than we’d expected.
Continuous deployment – we think the release cycles are getting faster and the only way to survive is the build change into the foundation of operations.
Integrated networking – software defined networking is cool, but not enough. We need to have semantics that link applications, networks and infrastructure together.
Equilivent physical / virtual – we’re not saying that you won’t care if it’s physical or virtual (you should), we think that it should not impact your operations.
Scale / Hybrid - the key element to hybrid is scale and to hybrid is scale. The missing connection is being able to close the loop.
Closed loop deployment – seeking load management, code quality, profit, and cost of operations as factor in managed operations.
The response to Crowbar has been exciting and humbling. I most appreciate those who looked at Crowbar and saw more than a bare metal installer. They are the ones who recognized that we are trying to solve a bigger problem: it has been too difficult to cope with change in IT operations.
During this year, we have made many changes. Many have been driven by customer, user and partner feedback while others support Dell product delivery needs. Happily, these inputs are well aligned in intent if not always in timing.
Introduction of barclamps as modular components
Expansion into multiple applications (most notably OpenStack and Apache Hadoop)
Working in the open (with public commits)
Collaborative License Agreements
Dell‘s understanding of open source and open development has made a similar transformation. Crowbar was originally Apache 2 open sourced because we imagined it becoming part of the OpenStack project. While that ambition has faded, the practical benefits of open collaboration have proven to be substantial.
The results from this first year are compelling:
For OpenStack Diablo, coordination with the Rackspace Cloud Builder team enabled Crowbar to include the Keystone and Dashboard projects into Dell’s solution
We’ve amassed hundreds of mail subscribers and Github followers
Support for multiple releases of RHEL, Centos & Ubuntu including Ubuntu 12.04 while it was still in beta.
SuSE does their own port of Crowbar to SuSE with important advances in Crowbar’s install model (from ISO to package).
We stand on the edge of many exciting transformations for Crowbar’s second year. Based on the amount of change from this year, I’m hesitant to make long term predictions. Yet, just within next few months there are significant plans based on Crowbar 2.0 refactor. We have line of site to changes that expand our tool choices, improve networking, add operating systems and become more even production ops capable.