Historically, infrastructure monitoring was a backroom function. Because of this, IT organizations built siloed teams of domain experts, each focused on a specific type of IT technology. There was one team to manage data networks, another to manage storage, yet another for server monitoring, and so on.
At the time, this approach was aligned with IT’s inward-looking operating model, but it’s no longer appropriate for today’s service and customer-centric IT organizations. We now recognize that businesses consume end-to-end services, not individual IT technologies. For example, very few businesses need a raw storage array service. However, they do need a responsive and reliable data warehouse service that depends on servers, databases, storage, network connectivity, event correlation and analysis, and analytical and reporting tools. Monitoring alone includes massive amounts of event, metric and logo data.
To deliver an end-to-end service, all of these IT technologies have to work together. However, many IT operations groups still work in technology-specific towers. And, each technology team has its own management tools, whether that’s SCOM for operating systems and servers, or AppDynamics to manage application performance.
No End-to-End Visibility and Control
This creates huge problems. Each siloed operations team can only see one type of technology. So, when there’s a service impairment, the team doesn’t have visibility and control of other types of IT components – which could be the actual source of the issue. For instance, you can’t see application issues with a point infrastructure monitoring tool. In fact, you can’t even see beyond the scope of your own infrastructure domain.
What’s the outcome? L1 and L2 support teams continue to struggle with disconnected, technology-specific information, making it incredibly hard to identify and resolve service issues. Manually correlating data across domains is enormously difficult and time-consuming – which is the last thing you want when a mission-critical service is down. And, when support calls in the domain experts, no one can agree on what’s wrong. Each team has its own blinkered view and points the finger somewhere else. As a result, businesses continue to suffer prolonged service outages and serious business impacts.
Breaking Down the Walls
To manage end-to-end services effectively, you need to bring all of your monitoring data into one place. By using a manager of managers monitoring system, MoM operations collects a comprehensive set of data – including events, logs and metrics – from all of your point tools. With modern sytems, manager of managers monitoring is no longer siloed – yes, the point tools still exist, but you now have all the data in a single pane of glass.
However, there is more to it than simply integrating your monitoring data. Without a manager of managers monitoring tools generate huge amounts of disconnected, redundant and incompatible information. You need something to normalize, deduplicate, filter and correlate this data, so that your L1 and L2 support teams have consistent, meaningful events to work on, instead of chaotic noise. Other key IT event management capabilities, such as event correlation and analysis, make it much easier for support teams to separate cause from symptoms – for example, when poor application response time is due to a specific database or network issue.
Unlocking the Power of Artificial Intelligence
AIOps (Artificial Intelligence for IT Operations) systems incorporate machine learning capabilities in their core engines. This delivers advanced automation capabilities that dramatically increase service quality while driving down operational costs. For example, these kinds of platforms can recognize event patterns across multiple monitoring sources, and then use these to predict future service outages – before customers are affected. They can also automatically identify the root cause of service and infrastructure issues, and even score events based on their likely business impact.
In a Nutshell
If your IT operations team is still working in silos, then you’re putting your service quality at risk. Domain-specific management tools don’t give you the end-to-end view, which means that you don’t have visibility and control of your mission-critical services. By using a manager of managers monitoring tool with ITSM integration and AI capabilities, you break down these barriers and identify then fix service issues more quickly. And, by bringing all of your data into one place, you can now leverage machine learning to prevent service outages, understand business impact, and further accelerate service restoration.
That’s why a proactive manager of managers is the single most important investment you can make for your IT operations team.