Note: this post is very technical and relates to detailed Chef design patterns used by Crowbar. I apologize in advance for the post’s opacity. Just unleash your inner DevOps geek and read on. I promise you’ll find some gems.
At the Opscode Community Summit, Dell’s primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that there is an instance of the chef-client agent that can act as a delegate for the external entity. We’ve been reluctant to call it a “proxy” because that term is so overloaded.
My Crowbar vision is to manage an end-to-end cloud application life-cycle. This starts from power and network connections to hardware RAID and BIOS then up to the services that are installed on the node and ultimately reaches up to applications installed in VMs on those nodes.
Our design goal is that you can control a managed node with the same Chef semantics that we already use. For example, adding a Network proposal role to the Switch managed node will force the agent to update its configuration during the next chef-client run. During the run, the managed node will see that the network proposal has several VLANs configured in its attributes. The node will then update the actual switch entity to match the attributes.
There are five key aspects of our managed node design. They are configuration, discovery, location, relationships, and sequence. Let’s explore each in detail.
A managed node’s configuration is different than a service or actuator pattern. The core concept of a node in chef is that the node owns the configuration. You make changes to the nodes configuration and it’s the nodes job to manage its state to maintain that configuration. In a service pattern, the consumer manages specific requests directly. At the summit (with apologies to Bill Clinton), I described Chef configuration as telling a node what it “is” while a service provide verbs that change a node. The critical difference is that a node is expected to maintain configuration as its composition changes (e.g.: node is now connected for VLAN 666) while a service responds to specific change requests (node adds tag for VLAN 666). Our goal is the maintain Chef’s configuration management concept for the external entities.
Managed nodes also have a resource discovery concept that must align with the current ohai discovery model. Like a regular node, the manage node’s data attributes reflect the state of the managed entity; consequently we’d expect a blade chassis managed node to enumerate the blades that are included. This creates an expectation that the manage node appears to be “root” for the entity that it represents. We are also assuming that the Chef server can be trusted with the sharable discovered data. There may be cases where these assumptions do not have to be true, but we are making them for now.
Another essential element of managed nodes is that their agent location matters because the external resource generally has restricted access. There are several examples of this requirement. Switch configuration may require a serial connection from a specific node. Blade SANs and PDUs management ports are restricted to specific networks. This means that the manage node agents must run from a specific location. This location is not important to the Chef server or the nodes’ actions against the managed node; however, it’s critical for the system when starting the managed node agent. While it’s possible for managed nodes to run on nodes that are outside the overall Chef infrastructure, our use cases make it more likely that they will run as independent processes from regular nodes. This means that we’ll have to add some relationship information for managed nodes and perhaps a barclamp to install and manage managed nodes.
All of our use cases for managed nodes have a direct physical linkage between the managed node and server nodes. For a switch, it’s the ports connected. For a chassis, it’s the blades installed. For a SAN, it’s the LUNs exposed. These links imply a hierarchical graph that is not currently modeled in Chef data – in fact, it’s completely missing and difficult to maintain. At this time, it’s not clear how we or Opscode will address this. My current expectation is that we’ll use yet more roles to capture the relationships and add some hierarchical UI elements into Crowbar to help visualize it. We’ll also need to comprehend node types because “managed nodes” are too generic in our UI context.
Finally, we have to consider the sequence of action for actions between managed nodes and nodes. In all of our uses cases, steps to bring up a node requires orchestration with the managed node. Specifically, there needs to be a hand-off between the managed node and the node. For example, installing an application that uses VLANs does not work until the switch has created the VLAN, There are the same challenges on LUNs and SAN and blades and chassis. Crowbar provides orchestration that we can leverage assuming we can declare the linkages.
For now, a hack to get started…
For now, we’ve started on a workable hack for managed nodes. This involves running multiple chef-clients on the admin server in their own paths & processes. We’ll also have to add yet more roles to comprehend the relationships between the managed nodes and the things that are connected to them. Watch the crowbar listserv for details!
Notes on the Opscode wiki from the Crowbar & Managed Node sessions