How is it that we can manage to follow an average of 245 friends on Facebook and 350 people on Twitter, yet we struggle to effectively manage a handful of cloud resources? Some will argue that it’s because social information is less important and hence requires less vigilance. We can manage more because we actually manage less. After all, we’re unlikely to be fired if we miss our uncle’s post about how many miles he ran today, but we could if we allow the website to crash and burn. But the problem may also derive from the user experience.
Systems management tools get a bad rap, and for good reason. The user experience can be less than appealing.
There’s a belief – a false one, in my experience – that technical IT folks must necessarily love complex ways in which to manage their systems. I’m sure there are über-geeks for whom complexity is an end in itself. But they’re the exception, not the rule.
In a conversation with an IT team at a Fortune 100 company earlier this year, one of the system administrators said that he’d buy a tool that “did whatever John would do.” John was their Nagios expert, a system that no one else on the team could decipher. As the sysadmin lamented, however, John sometimes goes on vacation or is unavailable while he (gasp!) spends time with family, etc. So he wanted to receive alerts on his phone when things went awry, with one button: “Do what John would do.”
He’d click that button early and often.
Nodeable isn’t yet at the point where we learn John’s behavior in given situations and make it easily replicable by others in his absence, but as an industry I suspect we’re not terribly far off from being able to approximate this. What we can do is simplify IT management by surfacing trending issues/anomalies/etc. so that the heavy lifting of managing cloud systems is done by Nodeable, not the developer or her operations team. It’s not exactly “what would John do?” management, but it’s a headstart on seriously reducing complexity so that IT can focus on tailoring systems to improve business, rather than performing root cause analysis.
And, no, it’s not necessarily an easier in the cloud, even though the magic of the cloud can be the hiding of infrastructure complexity. But the real complexity is in deciphering what’s happening in real time as apps are updated, systems are tweaked, etc. Any changes are made on a granular level, not on a “system” level, as Enstratus James Urquhart argues,
If something goes wrong with an application, developers are on the hook to fix it, change it or kill it….However, developers and engineers can only make those changes one, or a few, components at a time. Nobody can configure the “system” to work an expected way. All you can do is constantly monitor the success and effectiveness of the technologies you deploy into the cloud, and constantly tweak them to make them as useful as they can be in that environment.
For developers to be most effective, they need to spend most of their time writing and optimizing their applications, not deciphering archaic error messages, constructing search queries in Splunk to search out root problems, or other traditional IT tasks. A good system will surface insights into trending issues in real-time, based on continuous tracking of machine data that gives clues as to whether the changes to the system are helping or hurting.
In sum, a system is powerful not only in the various features it claims to have, but also in how well it obscures the complexity behind-the-scenes to let developers focus on writing apps. It’s not yet “what John would do,” but it’s getting close.
Filed under: Cloud, DevOps
