Building the perfect ML model

12 de January de 2024

Now that I have your attention, lets get down to business: perfection must be out of the scope when dealing with Machine Learning (ML).

I am sorry, but you can not train a 100% Accuracy model. In fact, if you claim something like this it is likely that you have an «overfitting» problem or some «data leakage». Generally speaking, whenever someone from your Data Science team (or your friend Pepe for that matter) shows you «too good to be true» results, these are some of the usual suspects.

Although you can google these terms for more details, in my own words: overfitting occurs when you trust that your model is smarter than common sense, and data leakage happens when the data scientist believes that he is smarter than physics.

As an example of the former, if you try to predict house prices in a given city using only the flat surface (independent variable), you may be tempted to use a spline model (polynomial of order 3) to win some points in your «target metric»… however, you know that the expected behaviour is that price should not decrease with surface. On the other hand, if you try to predict the power generation of a solar plant on day D, you can not use the actual mean temperature on day D… that is cheating, or better yet you have a data leakage problem.

Building the perfect ML for your customer (or stakeholders)

This in indeed doable. Because building the perfect ML model for your customer does not need a sophisticated algorithm or a crazy «target metric». What you need to do is to focus on the business problem that your customers or stakeholders want to solve and then find the Minimum Viable and Valuable Product (MVVP).

What you need to do is to focus on the business problem that your customers or stakeholders want to solve and then find the Minimum Viable and Valuable Product (MVVP).

MVVP (Data Science MVP)

Please note that I added a second V for the classic MVP. Why? Because in Data Science, we need to focus always on adding value to the business. If we achieve this (and of course IF it is viable), we did a very good job indeed. This is the workflow I follow:

Define the target MVVP.
Is this project doable within the given timeline? If not, try to redefine the MVVP or negotiate the timeline.
Can we do this using our current tech stack? Unfortunately in Data Science we usually work under time pressure. It is always cool for a DS to play with new toys, but we need to recycle as much as possible. If we need to learn some new tech, we should go back to previous step.
How is this project improving current solution? Lets be real, before using ML the business was working just fine, so it is very likely that this problem was solved already ( A.K.A. «naive approach»). If not, back to step 1.
Can we wrap it better? Streamlit, Tableau or even Excel sheet, the question here is not how cool the technology used to wrap our model is, but how usable will be for the customer.
We got the MVVP in time, congrats! Great news, but now go back to step 1 to find something more sophisticated that will increase value for the customer.

In other words: data science projects need to be customer centric and not technology centric.

I had the chance to use this approach successfully in the past with important clients (a major international airline and a streaming platform) and still using it my current work every day. I hope this read was interesting, please do not hesitate to share with me your thoughts or experiences!

Etiquetas

data science, machine learning

LESS DATA, MORE STORIES

Building the perfect ML model

Building the perfect ML for your customer (or stakeholders)

MVVP (Data Science MVP)

Leave a comment Cancel reply

Building the perfect ML model

Building the perfect ML for your customer (or stakeholders)

MVVP (Data Science MVP)

Share this:

Leave a comment Cancel reply