What did COVID do to all our models?

An interview with Dean Abbott and John Elder about change management, complexity, interpretability, and the risk of AI taking over humanity.

After the KNIME Fall Summit, the dinosaurs went back home… well, switched off their laptops. Dean Abbott and John Elder, longstanding data science experts, were invited to the Fall Summit by Michael to join him in a discussion of The Future of Data Science: A Fireside Chat with Industry Dinosaurs. The result was a sparkling conversation about data science challenges and new trends. Since switching off the studio lights, Rosaria has distilled and expanded some of the highlights about change management, complexity, interpretability, and more in the data science world. Let’s see where it brought us.

What is your experience with change management in AI, when reality changes and models have to be updated? What did COVID do to all our models?

[Dean] Machine Learning (ML) algorithms assume consistency between past and future. When things change, the models fail. COVID has changed our habits, and therefore our data. Pre-COVID models struggle to deal with the new situation.

[John] A simple example would be the Traffic layer on Google Maps. After lockdowns hit country after country in 2020, Google Maps traffic estimates were very inaccurate for a while. It had been built on fairly stable training data but now that system was thrown completely out of whack.

How do you figure out when the world has changed and the models don't work anymore?

[Dean] Here’s a little trick I use: I partition my data by time and label records as “before” and “after”. I then build a classification model to discriminate the “after” vs. the “before” from the same inputs the model uses. If the discrimination is possible, then the “after” is different from the “before”, the world has changed, the data has changed, and the models must be retrained.

How complicated is it to retrain models in projects, especially after years of customization?

[John] Training models is usually the easiest step of all! The vast majority of otherwise successful projects die in the implementation phase. The greatest time is spent in the data cleansing and preparation phase. And the most problems are missed or made in the business understanding / project definition phase. So if you understand what the flaw is and can obtain new data and have the implementation framework in place, creating a new model is, by comparison, very straightforward.

Based on your decades-long experience, how complex is it to put together a really functioning Data Science application?

[John] It can vary of course, by complexity. Most of our projects get functioning prototypes at least in a few months. But for all, I cannot stress enough the importance of feedback: You have to talk to people much more often than you want to. And listen! We learn new things about the business problem, the data, or constraints, each time. Not all us quantitative people are skilled at speaking with humans, so it often takes a team. But the whole team of stakeholders has to learn to speak the same language.

[Dean] It is important to talk to our business counterpart. People fear change and don’t want to change the current status. One key problem really is psychological. The analysts are often seen as an annoyance. So, we have to build the trust between the business counterpart and the analytics geeks. The start of a project should always include the following step: Sync up domain experts / project managers, the analysts, and the IT and infrastructure (DevOps) team so everyone is clear on the objectives of the project and how it will be executed. Analysts are number 11 on the top 10 list of people they have to see every day! Let’s avoid embodying data scientist arrogance: “The business can’t understand us/our techniques, but we know what works best”. What we don’t understand, however, are the domains experts are actually experts in the domain we are working in! Translation of data science assumptions and approaches into language that is understood by the domain experts is key!

The latest trend now is deep learning, apparently it can solve everything. I got a question from a student lately, asking “why do we need to learn other ML algorithms if deep learning is the state of the art to solve data science problems”?

[Dean] Deep learning sucked a lot of the oxygen out of the room. It feels so much like the early 1990s when neural networks ascended with similar optimism! Deep Learning is a set of powerful techniques for sure, but they are hard to implement and optimize. XGBoost, Ensembles of trees, are also powerful but currently more mainstream. The vast majority of problems we need to solve using advanced analytics really don’t require complex solutions, so start simple; deep learning is overkill in these situations. It is best to use the Occam’s razor principle: if two models perform the same, adopt the simplest.

About complexity. The other trend, opposite to deep learning, is ML interpretability. Here, you greatly (excessively?) simplify the model in order to be able to explain it. Is interpretability that important?

[John] I often find myself fighting interpretability. It is nice, sure, but often comes at too high a cost of the most important model property: reliable accuracy. But many stakeholders believe interpretability is essential, so it becomes a barrier for acceptance. Thus, it is essential to discover what kind of interpretability is needed. Perhaps it is just knowing what the most important variables are? That’s doable with many nonlinear models. Maybe, as with explaining to credit applicants why they were turned down, one just needs to interpret outputs for one case at a time? We can build a linear approximation for a given point. Or, we can generate data from our black box model and build an “interpretable” model of any complexity to fit that data.

Lastly, research has shown that if users have the chance to play with a model – that is, to poke it with trial values of inputs and see its outputs, and perhaps visualize it – they get the same warm feelings of interpretability. Overall, trust – in the people and technology behind the model – is necessary for acceptance, and this is enhanced by regular communication and by including the eventual users of the model in the build phases and decisions of the modeling process.

[Dean] By the way KNIME Analytics Platform has a great feature to quantify the importance of the input variables in a Random Forest! The Random Forest Learner node outputs the statistics of candidate and splitting variables. Remember that, when you use the Random Forest Learner node.

There is an increase in requests for explanations of what a model does. For example, for some security classes, the European Union is demanding verification that the model doesn’t do what it’s not supposed to do. If we have to explain it all, then maybe Machine Learning is not the way to go. No more Machine Learning?

[Dean] Maybe full explainability is too hard to obtain, but we can achieve progress by performing a grid search on model inputs to create something like a score card describing what the model does. This is something like regression testing in hardware and software QA. If a formal proof what models are doing is not possible, then let’s test and test and test! Input Shuffling and Target Shuffling can help to achieve a rough representation of the model behavior.

[John] Talking about understanding what a model does, I would like to raise the problem of reproducibility in science. A huge proportion of journal articles in all fields -- 65 to 90% -- is believed to be unreplicable. This is a true crisis in science. Medical papers try to tell you how to reproduce their results. ML papers don’t yet seem to care about reproducibility. A recent study showed that only 15% of AI papers share their code.

Let’s talk about Machine Learning Bias. Is it possible to build models that don’t discriminate?

[John] (To be a nerd for a second, that word is unfortunately overloaded. To “discriminate” in the ML world word is your very goal: to make a distinction between two classes.) But to your real question, it depends on the data (and on whether the analyst is clever enough to adjust for weaknesses in the data): The models will pull out of the data the information reflected therein. The computer knows nothing about the world except for what’s in the data in front of it. So the analyst has to curate the data -- take responsibility for those cases reflecting reality. If certain types of people, for example, are under-represented then the model will pay less attention to them and won’t be as accurate on them going forward. I ask, “What did the data have to go through to get here?” (to get in this dataset) to think of how other cases might have dropped out along the way through the process (that is survivor bias). A skilled data scientist can look for such problems and think of ways to adjust/correct for them.

[Dean] The bias is not in the algorithms. The bias is in the data. If the data is biased, we’re working with a biased view of the world. Math is just math, it is not biased.

Will AI take over humanity?!

[John] I believe AI is just good engineering. Will AI exceed human intelligence? In my experience anyone under 40 believes yes, this is inevitable, and most over 40 (like me, obviously): no! AI models are fast, loyal, and obedient. Like a good German Shepherd dog, an AI model will go and get that ball, but it knows nothing about the world other than the data it has been shown. It has no common sense. It is a great assistant for specific tasks, but actually quite dimwitted.

[Dean] On that note, I would like to report two quotes made by Marvin Minsky in 1961 and 1970, from the dawn of AI, that I think describe well the future of AI.

“Within our lifetime some machines may surpass us in general intelligence” (1961)

“In three to eight years we’ll have a machine with the intelligence of a human being” (1970)

These ideas have been around for a long time. Here is one reason why AI will not solve all the problems: We’re judging its behavior based on one number, one number only! (Model error.) For example, predictions of stock prices over the next five years, predicted by building models using root mean square error as the error metric, cannot possibly paint the full picture of what the data are actually doing and severely hampers the model and its ability to flexibly uncover the patterns. We all know that RMSE is too coarse of a measure. Deep Learning algorithms will continue to get better, but we also need to get better at judging how good a model really is. So, no! I do not think that AI will take over humanity.

We have reached the end of this interview. We would like to thank Dean and John for their time and their pills of knowledge. Let’s hope we meet again soon!

--------

About Dean Abbott and John Elder