How will AI and ML advancements impact IT organizations? As open-source tools like Python and TensorFlow mature, ITOps teams will find it easier to capture and export metrics, traces, and logs. In addition, solutions like OpenTelemetry, with its single set of standards and technology tools, will simplify the task of monitoring distributed cloud-born applications and lead to exciting advances like self-healing systems.
Workplace and Business Drivers
Many businesses have spent the past decade developing a digital transformation plan and slowly pushing that effort forward. The recent forced move to remote or hybrid working has accelerated the pace of this effort while adding the complexity of a further-distributed network. In addition, customer expectations across industries encourage companies to push the envelope with respect to customer experience, including hyper-personalization, 24/7 service, and custom offerings.
Meanwhile, cloud computing has advanced in a way that has made a ton of data available for capture and use. Companies can use that data in myriad ways: tackle cybercrimes, halt fraudulent transactions, minimize customer churn, recommend products, and more.
Trends in Machine Learning
As mentioned in previous chapters, ML is not new. However, recent advances make it easier than ever to exploit the value of ML, including the constantly increasing processing power, the rising sophistication of software to analyze data, and of course, the vast amounts of data available to train predictive models. In addition, IT organizations can use ML algorithms to create historical models to help clearly predict the future.
Marketing and sales organizations are likely to benefit from ML and AI almost immediately. According to Salesforce Research, these organizations have been picking up steam, using ML and AI to climb from 29% in 2018 to 84% in 2020. In addition, McKinsey Global Institute estimates that ML and AI overall will bring $1.4 Trillion to $2.6 Trillion in sales and marketing value over the next three years alone. These advances mean that IT organizations will be pivoting to support these efforts while also using ML for their IT tasks.
Will this highly specialized technology and the skills to apply it become available to all companies or just those with deep pockets and a technology focus? Fortunately, nearly all major tech companies are working hard to democratize ML, AI, and deep learning. Remember how difficult it was for nontechnical people to sell products online before eBay and Etsy? Similarly, companies like Microsoft, AWS, and Google are working to ensure that companies will not need a massive team of data scientists and engineers to take advantage of ML.
In addition, organizations like the Open Machine Learning (OpenML) project provide a space for interested technologists to participate in an open, organized, online ecosystem for ML. OpenML builds open-source tools to find available data from any domain, easily draw them into just about any ML environment, and quickly build models with thousands of data scientists. It also helps members analyze results and offers advice on building better models.
Likewise, TensorFlow is a popular end-to-end opensource platform for ML. It has a comprehensive environment of tools, libraries, and resources that helps researchers drive advancements in ML while developers quickly build and deploy ML-powered applications. OpenTelemetry is perhaps even more helpful to IT teams because it can collect telemetry data from distributed systems. This data collection helps IT organizations tackle the troubleshooting, debugging, and managing of applications and host environments. It’s a simple method for ITOps and DevOps teams to set up their code base for data collection and make adjustments that pave the way for growth.
Improving Operations with ML
Companies can use ML to drive business growth and secure operations for personnel and customers. But there are cautions to consider as businesses move forward with ML and AI-based solutions. Gartner highlights the extent to which ITOps, business units, and data scientists must work together to develop the right solutions.
“ML models often fail when data scientists initiate solutions without consulting the business unit owner or when the business team has unrealistic performance expectations from the model. Realizing the benefits of the ML model requires both teams to work together, while also owning several tasks throughout the process.”
IT organizations can employ ML and AI technologies in many ways to make it easier to gather and analyze data and help live up to digital transformation and security goals.
Infusing ML and AI Into Observability Platforms
More organizations are using observability platforms with the expectation that ML and AI will become more embedded soon. Observability is a progression of application performance monitoring (APM) data collection methods. It addresses the fast-growing, distributed nature of cloud-born applications and can enable better network monitoring and better APM. It can also help support the CX, improve employee productivity, and maintain the digital infrastructure. In addition, incorporating ML, AI, and automation into these platforms can decrease the time required to find and fix problems before they impact your business.
Securing machine and user data flowing through a company’s infrastructure is a top priority of IT organizations. Algorithms can model the historical behavioral patterns and detect variances and anomalies compared to past patterns. This process could be further automated using ML and AI, ultimately enabling businesses to automatically block bad actors in near real-time.
Progressing Towards Self-Healing Systems
As ML and AI technology vendors work to improve and extend their platform capabilities, IT organizations will benefit, gaining increasingly more actionable insights and proactive functionality at a reasonable cost. This will set the foundation for integrated self-healing systems.
Use Case: Using ML in ITOps today
Big data and machine learning can combine to help automate IT operations using Artificial Intelligence for IT Operations, known as AIOps. For example, ITOps teams can use it for event correlation, anomaly detection, and causality determination. One of the advantages of this system is the ability to bring together data from disparate sources and apply AI algorithms to provide context and minimize the volume of irrelevant alerts. This is particularly helpful for companies that are spread out geographically. For example, a manufacturing firm based in the UK depends on the availability and performance of an IT and networking infrastructure located in data centers around the world.
The firm uses an IT infrastructure monitoring and observability platform with AIOps capabilities to consolidate views and gather analytics that help the company avoid duplicating monitoring data. This saves IT administrators time tracking down issues and providing the IT team with a single source of truth from a unified data set.
Likewise, support engineers use alert thresholds to identify and resolve issues before they require escalation. In one scenario, an engineer received a service call about an air conditioning outage at a data center. Because AIOps features were used to track the temperature of all the company’s servers, an early alert prevented the A/C outage from turning into a data center outage, which would have affected hundreds of people across the organization.
Open-source tools like Python and TensorFlow can make it easier for IT organizations to capture and export metrics, traces, and logs. And solutions like OpenTelemetry will simplify the task of monitoring distributed cloudborn applications. In our next chapter, we’ll show in detail how you can use ML to monitor your infrastructure.