Everybody agrees on the great value that machine learning models can bring to the optimisation of processes. However, despite how valuable these models are to businesses, they can be quite difficult to maintain and update. Riaktr has successfully worked with numerous clients to automate the maintenance and re-training of their data science models, saving over 80% of upkeep time, and freeing up data scientists to do more creative and engaging work.
Steven sounded frustrated when he called us up. He’s the Head of Data Science for a large in media company. We had worked with his team in the past and knew that they had been doing great work. The company’s marketing department was raving about the product movement models that the data scientists were creating. These models allowed them to be a lot more effective in targeting customers with their campaigns. So, what was the problem?
“My data guys are world-class when it comes to creating new models,” he told us. “But they don’t have much time for that anymore, because they’re so busy maintaining the old ones.” The problem sounded familiar. We had heard of similar scenarios at several other companies we work with: Data scientists create a model that gives great impact at the beginning, but, then, would require maintenance to ensure long-term effectiveness – new data needs to be aggregated, de-trended, and re-trained. This was mostly done manually, effectively locking up scientists’ time. Valuable time that they would much rather spend on being creative and solving new problems.
When we spoke on the phone, Steven’s team had 18 product movement models in production, based on data from the company’s main market. The plan was to roll-out these models across the remaining 21 markets, and Steven did not know how he was going to be able to handle all that workload… 18 models across 21 countries meant that his team would be busy, day and night, fixing and updating models – while he needed them to solve new business problems.
Luckily, we had been working on a solution for Steve’s type of problem – an automated factory for data science models. We agreed to customise a solution for their environment and spent the following four months building the ‘factory’.
What exactly is a model factory?
A model factory is a framework that helps data teams take models from development to production faster and more reliably. It’s assembled from various software products that are tied together to support a best practice workflow. We decided to use open-source software throughout the entire process. This allows us to rely on a large community of developers and researchers for solutions to our development challenges. It also helps us, and our clients, avoid vendor lock-in.
Automating a large number of data science models via a 5-step process
Our client already had a Customer Analytical Record (CAR) in place. On top of that, the model factory we built for our media client consists of the following fully-automated components:
1. Data Checker 1
The first Checker ensures the quality of the input variables, checking counts, distributions and consistency over time.
2. Data Preparation
The preparation module selects the target and corresponding training data. Historical data is aggregated and statistical aggregations are run here. Variables not suitable for modeling are cleaned or removed.
Sampling techniques are used to de-trend the data and create relevant target aggregations. Training, validation and testing samples are marked.
The core piece is our factory. Here, the models get trained in batch and scored on a daily base. Models are automatically re-estimated and re-trained. A set of business rules decides when to replace the original model.
5. Data Checker 2
At the end of the process, we validate the consistency and validity of scores created by the factory before releasing them.
6. Output formatting
Depending on the use, scores are made available in batch or via API. They come represented as natural probabilities or percentile ranks.
A model factory can be monitored by production support; costing much less of your data scientists’ time.
Once we handed over our solution to our Steve, he was impressed by the impact the framework had on their data science work. “I feel you reduced the time we invest in model maintenance by 80%,” he said when we met for a follow-up meeting a few months after the launch of the factory. Instead of occupying his data scientists’ time with model upkeep, he handed the monitoring work to a single business analyst. That person is responsible for checking on the original 18 models, but also oversees a total of around 100 other models; including those that were created for the company’s other markets. “The factory scales very well – the more models you have, the greater the time gained per model ”, Steven concluded.
The standardisation of the code is another benefit of integrating data science work into a model factory because it makes documentation and handing-over a lot easier. In today’s, ‘war for data science talent,’ it is unavoidable that data scientist receive great offer other employers. “I recently had one of my best guys leave my team for a very lucrative offer. I was surprised by how quickly and smoothly the hand-over of his models went, because everything had been pre-structured during the model factory automation,” Steven told us.
At the end of our on-site presence, we had a couple of after-work beers with some members of the data team. We jokingly asked what they were doing with all their free time now that the model factory had relieved them of most maintenance work. “It’s not like Steven lets us slack in our jobs now,” one data scientist said with a wink – Steven was known to be a very ambitious leader who pushed his team quite a bit. “But it’s true that we get to go home on time much more often now. It’s really also been a work-life balance factory for us so far.”
Words: Andrzej Pyrka – Business Development Director