How MLOps Deliver Machine Learning Applications to Scale Production AI

As you work to build your business into an AI-driven enterprise, you’ve probably encountered a few baffling pitfalls along the way. While you have hired top-notch data scientists to build models and invested in expensive data science tools, you still can’t get your AI projects off the ground. Why? What factors make it so hard to implement AI at scale?

How MLOps Deliver Machine Learning Applications to Scale Production AI
How MLOps Deliver Machine Learning Applications to Scale Production AI. Photo by ThisisEngineering RAEng on Unsplash

In this article, we take a closer look at the real-world machine learning issues that may be holding back your AI projects, such as:

  • Why data scientists are very particular about how they like to work and why you should respect their preferences
  • What common stumbling blocks come up when the IT and data science teams work together
  • How thinking about production model governance proactively can save you and your business legal and regulatory headaches down the road
  • Why monitoring that is designed specifically for machine learning is so important

Table of contents

The Data Science Unicorn
Throwing the Model “Over the Wall”
Lack of Governance over Production Models
Having No Monitoring in Place Is Like Flying Blind in New Airspace
A Complex and Sometimes Difficult Life

You’re embracing AI and working to build your organization into an AI-driven enterprise. You’ve implemented high levels of machine learning and AI to power your business processes and applications. You have hired data scientists to build models, probably multiple teams of them. You have invested in data science tools and large-scale infrastructure projects to power your initiative.

But you still can’t seem to get off the ground. Why? What factors make it so hard to implement AI at scale? In this article, we take a closer look at the myths, mysteries, and mishaps that might be holding back your AI projects.

The Data Science Unicorn

Data scientists are a rare breed. As a dedicated problem solvers, they are experts at working with data. They understand how to build models, using the data to make predictions. They know how to optimize models for predictive accuracy through a process of structured experimentation.

But you have to be careful. They can be very particular about how they like to work, and it’s in your best interest to hear them out.

Each data scientist typically has a preferred language like Python or R and may have expertise in particular frameworks like TensorFlow or PyTorch or modeling algorithms like XGBoost. When you finally find a data scientist who can work on your problems, they are going to work in their preferred languages, tools, and frameworks. Asking them to do otherwise is a recipe for disaster.

Keep in mind that most data scientists are not trained, software developers. The code that they create through their experimentation is not suitable for production operations. So, while the model may look promising in the lab, it may never make it to a production environment.

Good data scientists are hard to find and highly desirable in the market. When you do find one, you want to make sure they are happy and productive, doing work that they value with the tools they like to work with. If they are not satisfied, they will leave for greener pastures. Then you have to start over and find another data science unicorn to decipher what work needs to be completed.

Throwing the Model “Over the Wall”

To put a model into production, the data scientists will need to go to the IT team for help. One technique they often use is to throw the model “over the wall” to IT and let them deal with it. This method fails for a variety of reasons.

For one thing, IT uses Java, C, and other familiar software environments, not languages like Python and R, and they don’t work in Jupyter notebooks. IT operates in infrastructure, application monitoring, security, and software development. So, they have no idea what the model is or what it is meant to do. When a model in Python in a Jupyter notebook comes their way from a data scientist, they have three options: ignore it, rewrite it in a language they understand, or get the data science team involved in the production side.

When IT decides to rewrite the model, they are treating the model like other software, and predictive models are not like other software. Predictive models are the products of a set of training data and an algorithm and tuning parameters. Therefore, they can become stale and lose their effectiveness if they are not retrained on fresh data.

The process to update the model becomes vastly more complicated with rewritten models. If an issue occurs that requires a new model version, the model is unlikely to be updated and will either continue to run with poor results or worse – the whole project will be shut down.

While rewriting the model is an issue, pulling the data science team into production also has consequences. Data scientists may have the mandate to get their models up and running in production so that the business can see value from the models.

Putting this production process entirely on the data scientist is a recipe for disaster.

Data scientists are not experts on important issues, such as production coding practices, production environments, security, or governance. While they might get a production model up and running as a service once, that model is virtually guaranteed to be brittle and to fail as soon as conditions and data change in production. And they will change.

This situation leaves the data scientist with two options: babysit the production model or set it and forget it and hope for the best. Babysitting the model is problematic. Production models require maintenance, which can be nearly constant when the model and surrounding code are not production-grade. In this situation, it is not unheard of for data scientists to spend 60% to 80% of their time babysitting existing production models instead of developing new ones.

Moreover, the babysitting approach is an issue for both the data scientists and the business. The data scientist becomes unhappy because they are not doing what they want to do, which is to develop new models. The company is not happy because the development of new AI projects has stalled. Therefore, data scientists start to look for places where they can do data science. When they leave, the production model fails, and the project is shut down or restarted with a new data scientist, only to repeat the fatal process over again.

60-80% Time spent by data scientists on babysitting existing production models instead of developing new ones.

Lack of Governance over Production Models

When production systems fail, there are considerable consequences to business owners, IT stakeholders, customers, business users, executives, and even investors. To eliminate or reduce this risk, changes in code go through rigorous and tightly controlled processes before production deployment to maximize success and minimize risk.

  • First, access to production systems is tightly controlled, with only trained users making changes using established procedures.
  • Second, code moves from development to a staging or testing environment to simulate production conditions. Only after a period of success and testing can the code move to production.
  • Third, logs of all user access and changes made to production systems support both troubleshooting and legal and regulatory reporting needs.

With the initial deployment of less critical AI projects, many companies previously bypassed software governance requirements, in part to speed AI use but also because machine learning models require a new set of governance practices.

As AI projects move into the mainstream and companies develop models for critical customer-facing and business process applications, the need for governance becomes vital. Without strong governance in place, these new models cannot be placed into production. Production model governance operates on the same premise as software governance – to maximize the chance of a successful deployment and minimize risk through control of access and established update procedures.

Production model governance includes tight access control for AI models and related code moving into production, including the following:

  1. Only a very limited number of users will have the ability to make changes to your production models and environments.
  2. Models are validated and tested and then “warmed up” before replacing the running production version. This testing and warm-up processes ensure that the new version performs as expected.
  3. Access and event logs track all changes to the system, including those through automated means. These audit logs are critical for troubleshooting and legal and regulatory compliance.
  4. Predictive models influence downstream applications and behavior, such as deciding to provide credit to a banking customer. Logging this decision along with the model version that created the prediction is critical. Preserving the model version and the training data profile is also essential.

Only with these production model governance steps in place can a business minimize risk, ensure legal and regulatory compliance, and create a repeatable process to scale AI adoption.

Having No Monitoring in Place Is Like Flying Blind in New Airspace

Application performance monitoring is nothing new for businesses. A variety of tools exist to see that software applications are running as expected, including responding to requests in a timely fashion and profiling resource consumption. These systems alert operators when applications fail to meet target SLAs or start to consume more resources than usual.

While these same monitoring concepts apply to machine learning and AI-based apps, surprisingly, almost nobody is doing it.

A predictive model is the product of training data and an algorithm that produces a model. This model is capable of looking at new data and predicting an outcome based on patterns from the training data. As long as the data for predictions is similar to the training data, the model will predict results as expected. However, when the world changes – and it will – the data on requests to the model will become different enough to cause the model to lose predictive power. This decrease in predictive accuracy is insidious in that it won’t show up as a memory leak or as an unresponsive model. Instead, the model will keep responding, but the results will be less and less valuable.

This decay in model performance can trigger downstream consequences, including loss of revenue or trust from business stakeholders. With the realization that predictive models can become less predictive and that you have no way to know it, a business may choose two routes.

First, models may be periodically retrained based on an assessment of how often the data may change in a significant way. However, this method can waste both data science and computing resources.

Second, and the most popular, is to leave the model running until a new model replaces it. Of course, this may not happen for months or even years, which leaves an inefficient model in production for an extended period.

The solution is monitoring designed specifically for machine learning. This monitoring looks for characteristics of AI models that indicate possible accuracy changes.

One such monitoring technique is called data drift or concept drift. In data drift, the system compares the training data profile to the profile of the production data coming in on requests. If these two data sets begin to drift apart from each other – by even just one variable – this can be a warning sign that the model accuracy may not be far behind. For this reason, data drift is an essential monitoring tool that can indicate the need for model retraining before there is an impact on model accuracy.

Without production model monitoring in place, your business may conclude that model performance is unknown. From there, your business could refuse to put the models into production for fear of what might happen, leaving your business behind in an AI-driven world.

A Complex and Sometimes Difficult Life

The first deployment of your model is one of many. It may take weeks to build a model, and that model and versions of that model could run for years in production. Once the model is up and working in production, downstream systems and applications become dependent on it. From there, it needs to stay up and running continually — especially for real-time application use cases. Otherwise, you risk impacting your business.

But without a way to seamlessly replace a model with a newer or better version, you shouldn’t even roll out the first model.

AI models have a complex lifecycle that can include frequent updates and periodic replacement as new data and new techniques come along. Models in production require extensive testing, validation, and warm-up before deployment. Yet, most businesses are not prepared for the level of effort needed to maintain production models. Manual methods by data scientists are expensive, cause a turnover on your data science team, and don’t scale. Quite simply, a lack of automation and processes for lifecycle management prevent businesses from moving from a few models to many models.

Model lifecycle management automates vital parts of the model update process.

For example, via monitoring, you can determine that you need a new model version based on more recent data. Rather than needing to engage your data science team, your IT operator can simply request an update to an external system or kick off the retraining job for a more integrated system. Once the new model is available, the operator can quickly start the model in a warm-up and begin to compare its performance to the production version. If they are satisfied with the outcome, they can seamlessly move the new version into production with a click or two, while preserving the prior version for future reference or compliance purposes.


You want to scale your use of AI, but now you realize you are blocked by production issues. MLOps will get your AI projects out of the lab and into production where they can generate value and help transform your business.

With MLOps, your Data Science and IT Operations teams can collaborate to deploy and manage models in production. You can continuously improve model performance with proactive management and, last but not least, you can reduce risk and ensure regulatory compliance with a single system to manage and govern all your production models.

MLOps covers four key areas for delivering machine learning applications in production at scale:

  • Production Model Deployment: With MLOps, the goal is to make model deployment easy. Operations teams, not data scientists, can deploy models written in a variety of modern programming languages like Python and R onto modern runtime environments in the cloud or on-premise. Users of the MLOps system don’t have to know any of these technologies to drag and drop a model into the system, create a container, and deploy the model to a production environment.
  • Production Model Monitoring: With MLOps, your monitoring is designed for machine learning. Monitoring includes service health, data drift, model accuracy, and proactive alerts that are sent to stakeholders using a variety of channels like email, Slack, and Pagerduty, based on severity. With MLOps monitoring in place, your teams can deploy and manage thousands of models, and your business will be ready to scale production AI.
  • Model Lifecycle Management: MLOps recognizes that models need to be updated frequently and seamlessly. Model lifecycle management supports the testing and warm-up of replacement models, A/B testing of new models against older versions, the seamless rollout of updates, failover procedures, and full version control for simple rollback to prior model versions.
  • Production Model Governance: MLOps provides the integrations and capabilities you need to ensure consistent, repeatable, and reportable processes for your models in production. Key capabilities include access control for production models and systems, including integration to LDAP and role-based access control systems (RBAC), as well as approval flows, logging, version storage, and traceability of results for legal and regulatory compliance.

Source: DataRobot