Thursday, April 26, 2007

Agile Computing Catches Up to the Data Center

What is the biggest hurdle to adopting Service Level Automation--or even dynamic computing in general--in today's data centers?

Is it technology? Nope. Most servers, storage and even network equipment can be managed reasonably well today by several vendors, with varying degrees of dynamic, policy-based provisioning. Several critical monitoring interfaces are also now standard in everything from power controllers to OSes to applications.

Is it infrastructure architecture? Not really, with one caveat. As long as an architecture has been built from the ground up to be easily managed and changed, with real attention paid to dependency management and virtualization where appropriate, most data centers are excellent candidates for automation. Which is a small leap away from utility computing.

Is it software architecture? Nope. I talked about this before, but SLA systems are just your basic event processing architecture specialized to data center resource optimization. The really good ones (*ahem*) can do this without adding any proprietary agentry to the managed software payload. In other words, what ends up running in your data center is almost exactly what you would have run without automation. There is little evidence on the application host that it is being managed at all.

Then what is it? One word: culture. The overwhelming obstacle that I see in the data center market today is fear of rapid change.

It is true of the sys admins, though they get the value of automation right away. They just need to see everything work before they trust it.

Its true of the storage admins, though storage virtualization is gaining ground. Unfortunately, this doesn't yet translate to accepting constant and sometimes rapid, somewhat arbitrary change within their domain.

It is most true of the network guys. Networks are the last bastion of the relatively static "diagram", mapping each component of the network architecture exactly with an eye to controlling change. The idea of switching VLANs on the fly, reconfiguring firewalls on demand, or even not knowing which server is assigned which IP address without looking at a management UI is scary as hell for the average network administrator.

And who can blame any of them? The history of commodity computing in data centers is littered with bad results from untracked changes, or badly managed application rollouts. Add to that the subconscious (of even conscious) fear that they are being replaced by software, and you get staunch resistance to changing times.

What everyone is missing here, though, is the key differentiation between planning for change, and executing it. No one in the entire industry is arguing that data center administrators should stop understanding exactly how their data centers work, what can go wrong, and how to mitigate risk. Cassatt (and I'm sure its competitors) spends significant time with each customer, even in pilot projects, making sure the data center design, software images, and service level definitions result in well understood behavior in all use cases.

But once those parameters are defined, and the target architectures, resources and service levels are defined, its time to let a policy-based automation environment take over execution. A Service Level Automation environment is going to make optimal decisions about resource allocation, network and storage provisioning and event handling, and do it in a fraction of the time that it would take a single human (let alone a team of humans). And, as noted above, once provisioning takes place, the applications, networks and storage run just as if a human had done the same provisioning.

(By the way, none of this breaks with ITIL standards. It just moves execution of key elements from human hands to digital hands. It also requires real integration between the SLA environment and configuration management, asset management, etc.)

All of this reminds me of the paradigm shift(s) that the software development industry went through from the highly planned, statically defined waterfall development methods of the early years to the always moving, but always well defined world of agile development methodologies. Its been painful to change the software engineering culture, but hasn't it been worth it for those that have found success? And, isn't it absolutely necessary for the highly decoupled and interdependent world of SOA?

Data center operations is about to undergo the same pain and upheaval. Developers, be kind and help your brethren through the cultural shift they are experiencing. Perhaps some of you in the agile methods field can begin to work out variations of your methods for data center planning and execution? Perhaps we should integrate data center planning activities into our "product-based" approaches?

Are you ready for this shift? Is your organization? What can you do today (architecturally and culturally) to ready your team for the coming utility computing revolution?

4 comments:

Anonymous said...

I like the link between Agile development and Service Level Automation.

Agile programming makes sense, it really does, yet waterfall methods remain. SLA makes sense, but manual processes remain.

A great analogy James, I shall "borrow" it I think. ;)

Abhijit said...

I agree with your comments and observations , and am confident the changes will happen sooner. It is just waiting to see which of the big organisations make the first move. The challenge is there are very few succesful case studies to refer. I think companies like Cassatt have started making the right move by explaining where and what the real problem is and helping with a ROI analysis.

James Urquhart said...

Thanks, abhijit. I appreciate the feedback.

I agree that a big part of the discussion at the business level is ROI. Cassatt built a simple ROI calculator that is a good "guestimator" for savings provided by SLA in most environments. We can also provide a more detailed ROI analysis for specific environments, if requested. For the most part, we can show significant ROI for any reasonably sized data center right out of the box.

VMWare has a TCO calculator, but it is not focused on automation, just consolidation and "virtualization" benefits. What our analysis has shown is that Cassatt can add significant additional savings on top of VMWare's numbers.

To see this, take the output of the VMWare calculations (from the details tab in the TCO Analysis) and input them into the Cassatt calculator. Remember to multiply the Cassatt savings by three to get the (rough) three year projection. So far, I've been seeing additional ROI in the range of 30% above the VMWare number.

I actualy wish more partners and/or competitors would do a better job promoting ROI, as it would help reinforce the "here and now" of automation and utility computing for today's IT organizations.

Anonymous said...

Although automation solutions are on the rise for IT professionals, I am looking forward to a time when networks are entirely automated. Automation has already helped improve application availability for many companies but if it where completely autonomous IT people would only need to tend to the network when there was a total crash or failure.

Most database tools now allow for a fair amount of automation but these will become more complex and powerful as time goes on. Currently network automation is helping to process workflow a lot more efficiently and that will only improve as well as time goes on. Where I work we follow the basic ITSM but I really want to see us improve business software applications for our company more through automation.