OpenCrowbar 2.3 (Drill) Overview Videos

Last week, Scott Jensen, RackN COO, uploaded a batch of OpenCrowbar install and demo videos.  I’ve presented them in reverse chronological order so you can see what OpenCrowbar looks like before you run the installation process.

But…If you want to start downloading while you watch, here are the docs.

Please reach out on chat, email or irc (Freenode #crowbar) channels during your install and let us know how it’s going!

OpenCrowbar Basics & Provisioning (recommended start)

OpenCrowbar Install

OpenCrowbar Setup the Environment (install prep)

DNS is critical – getting physical ops integrations right matters

Why DNS? Maintaining DNS is essential to scale ops.  It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.

Names MatterI love talking about the small Ops things that make a huge impact in quality of automation.  Things like automatically building a squid proxy cache infrastructure.

Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.

Why is that a big deal?  There are a lot of names & IPs to manage.

In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6.  Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references.  It gets even more complex when you factor in DNS round robin or other common practices.

Plus mistakes are expensive.  Name resolution is an essential service for operations.

I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential.  OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful.  This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses.  The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.

What DNS automation did we enable in OpenCrowbar?  Here’s a partial list:

  1. recovery of names and IPs when interfaces and systems are decommissioned
  2. use of flexible naming patterns so that you can control how the systems are registered
  3. ability to register names in multiple DNS infrastructures
  4. ability to understand sub-domains so that you can map DNS by region
  5. ability to register the same system under multiple names
  6. wild card support for C-Names
  7. ability to create a DNS round-robin group and keep it updated

But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.

When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.

Hidden costs of Cloud? No surprises, it’s still about complexity = people cost

Last week, Forbes and ZDnet posted articles discussing the cost of various cloud (451 source material behind wall) full of dollar per hour costs analysis.  Their analysis talks about private infrastructure being an order of magnitude cheaper (yes, cheaper) to own than public cloud; however, the open source price advantages offered by OpenStack are swallowed by added cost of finding skilled operators and its lack of maturity.

At the end of the day, operational concerns are the differential factor.

The Magic 8 Cube

The Magic 8 Cube

These articles get tied down into trying to normalize clouds to $/vm/hour analysis and buried the lead that the operational decisions about what contributes to cloud operational costs.   I explored this a while back in my “magic 8 cube” series about six added management variations between public and private clouds.

In most cases, operations decisions is not just about cost – they factor in flexibility, stability and organizational readiness.  From that perspective, the additional costs of public clouds and well-known stacks (VMware) are easily justified for smaller operations.  Using alternatives means paying higher salaries and finding talent that requires larger scale to justify.

Operational complexity is a material cost that strongly detracts from new platforms (yes, OpenStack – we need to address this!)

Unfortunately, it’s hard for people building platforms to perceive the complexity experienced by people outside their community.  We need to make sure that stability and operability are top line features because complexity adds a very real cost because it comes directly back to cost of operation.

In my thinking, the winners will be solutions that reduce BOTH cost and complexity.  I’ve talked about that in the past and see the trend accelerating as more and more companies invest in ops automation.

Are VMs becoming El Caminos? Containers & Metal provide new choices for DevOps

I released “VMS ARE DEAD” this post two weeks ago on DevOps.com.  My point here is that Ops Automation (aka DevOps) is FINALLY growing beyond Cloud APIs and VMs.  This creates a much richer ecosystem of deployment targets instead of having to shoehorn every workload into the same platform.

In 2010, it looked as if visualization had won. We expected all servers to virtualize workloads and the primary question was which cloud infrastructure manager would dominate. Now in 2015, the picture is not as clear. I’m seeing a trend that threatens the “virtualize all things” battle cry.

IMG_20150301_170558985Really, it’s two intersecting trends: metal is getting cheaper and easier while container orchestration is advancing on rockets. If metal can truck around the heavy stable workloads while containers zip around like sports cars, that leaves VMs as a strange hybrid in the middle.

What’s the middle? It’s the El Camino, that notorious discontinued half car, half pick-up truck.

The explosion of interest in containerized workloads (I know, they’ve been around for a long time but Docker made them sexy somehow) has been creating secondary wave of container orchestration. Five years ago, I called that Platform as a Service (PaaS) but this new generation looks more like a CI/CD pipeline plus DevOps platform than our original PaaS concepts. These emerging pipelines obfuscate the operational environment differently than virtualized infrastructure (let’s call it IaaS). The platforms do not care about servers or application tiers, their semantic is about connecting services together. It’s a different deployment paradigm that’s more about SOA than resource reservation.

On the other side, we’ve been working hard to make physical ops more automated using the same DevOps tool chains. To complicate matters, the physics of silicon has meant that we’ve gone from scale up to scale out. Modern applications are so massive that they are going to exceed any single system so economics drives us to lots and lots of small, inexpensive servers. If you factor in the operational complexity and cost of hypervisors/clouds, an small actual dedicated server is a cost-effective substitute for a comparable virtual machine.

I’ll repeat that: a small dedicated server is a cost-effective substitute for a comparable virtual machine.

I am not speaking against virtualize servers or clouds. They have a critical role in data center operations; however, I hear from operators who are rethinking the idea that all servers will be virtualized and moving towards a more heterogeneous view of their data center. Once where they have a fleet of trucks, sports cars and El Caminos.

Of course, I’d be disingenuous if I neglected to point out that trucks are used to transport cars too. At some point, everything is metal.

Want more metal friendly reading?  See Packet CEO Zac Smith’s thinking on this topic.

Talking Functional Ops & Bare Metal DevOps with vBrownBag [video]

Last Wednesday (3/11/15), I had the privilege of talking with the vBrownBag crowd about Functional Ops and bare metal deployment.  In this hour, I talk about how functional operations (FuncOps) works as an extension of ready state.  FuncOps is a critical concept for providing abstractions to scale heterogeneous physical operations.

Timing for this was fantastic since we’d just worked out ESXi install capability for OpenCrowbar (it will exposed for work starting on Drill, the next Crowbar release cycle).

Here’s the brown bag:

If you’d like to see a demo, I’ve got hours of them posted:

Video Progression

Crowbar v2.1 demo: Visual Table of Contents [click for playlist]

Art Fewell and I discuss DevOps, SDN, Containers & OpenStack [video + transcript]

A little while back, Art Fewell and I had two excellent discussions about general trends and challenges in the cloud and scale data center space.  Due to technical difficulties, the first (funnier one) was lost forever to NSA archives, but the second survived!

The video and transcript were just posted to Network World as part of Art’s on going interview series.  It was an action packed hour so I don’t want to re-post the transcript here.  I thought selected quotes (under the video) were worth calling out to whet your appetite for the whole tamale.

My highlights:

  1. .. partnering with a start-up was really hard, but partnering with an open source project actually gave us a lot more influence and control.
  2. Then we got into OpenStack, … we [Dell] wanted to invest our time and that we could be part of and would be sustained and transparent to the community.
  3. Incumbents are starting to be threatened by these new opened technologies … that I think levels of playing field is having an open platform.
  4. …I was pointing at you and laughing… [you’ll have to see the video]
  5. docker and containerization … potentially is disruptive to OpenStack and how OpenStack is operating
  6. You have to turn the crank faster and faster and faster to keep up.
  7. Small things I love about OpenStack … vendors are learning how to work in these open communities. When they don’t do it right they’re told very strongly that they don’t.
  8. It was literally a Power Point of everything that was wrong … [I said,] “Yes, that’s true. You want to help?”
  9. …people aiming missiles at your house right now…
  10. With containers you can sell that same piece of hardware 10 times or more and really pack in the workloads and so you get better performance and over subscription and so the utilization of the infrastructure goes way up.
  11. I’m not as much of a believer in that OpenStack eats the data center phenomena.
  12. First thing is automate. I’ve talked to people a lot about getting ready for OpenStack and what they should do. The bottom line is before you even invest in these technologies, automating your workloads and deployments is a huge component for being successful with that.
  13. Now, all of sudden the SDN layer is connecting these network function virtualization ..  It’s a big mess. It’s really hard, it’s really complex.
  14. The thing that I’m really excited about is the service architecture. We’re in the middle of doing on the RackN and Crowbar side, we’re in the middle of doing an architecture that’s basically turning data center operations into services.
  15. What platform as a service really is about, it’s about how you store the information. What services do you offer around the elastic part? Elastic is time based, it’s where you’re manipulating in the data.
  16. RE RackN: You can’t manufacture infrastructure but you can use it in a much “cloudier way”. It really redefines what you can do in a datacenter.
  17. That abstraction layer means that people can work together and actually share scripts
  18. I definitely think that OpenStack’s legacy will more likely be the community and the governance and what we’ve learned from that than probably the code.

Research showing that Short Lived Servers (“mayflies”) create efficiency at scale [DATA REQUESTED]

Last summer, Josh McKenty and I extended the puppies and cattle metaphor to limited life cattle we called “mayflies.” It was an attempt to help drive the cattle mindset (I think of it as social engineering, or maybe PsychOps) by forcing churn. I’ve come to think of it a step in between cattle and chaos monkeys (see Adrian Cockcroft).

While our thoughts were on mainly ops patterns, I’ve heard that there could be a real operational benefit from encouraging this behavior. The increased turn over in the environment improves scheduler optimization, planned load drains and coping with platform/environment migration.

Now we have a chance to quantify this benefit: a college student (disclosure: he’s my son) has created a data center emulation to see if Mayflies help with utilization. His model appears to work.

Now, he needs some real world data, here’s his request for assistance [note: he needs data by 1/20 to be included in this term]:

Hello!

I am Alexander Hirschfeld, a freshman at Rose-Hulman Institute of Technology. I am working on an independent study about Mayflies, a new idea in virtual machine management in cloud computing. Part of this management is load balancing and resource allocation for virtual machines across a collection of servers. The emulation that I am working on needs a realistic set of data to be the most accurate when modeling the results of using the methods outlined by the theory of mayflies.

Mayflies are an extension of the puppies verses cattle approach to machines, they are the extreme version of cattle as they have a known limited lifespan, such as 7 days. This requires the users of the cloud to build inherently more automated and fault-resistant applications. If you could send me a collection of the requests for new virtual machines(per standard unit of time and their requested specs/size), as well as an average lifetime for the virtual machines (or a graph or list of designated/estimated life times), and a basic summary of the collection of servers running the virtual machines(number, ram, cores), I would be better able to understand how Mayflies can affect a cloud.

Thanks,
Alexander Hirschfeld, twitter: @d-qoi

Needless to say, I’m really excited about the progress on demonstrating some the impact of this practice and am looking forward to posting about his results in the near future.

If you post in the comments, I will make sure you are connected to Alex.