What is overfitting in machine learning?

Overfitting is when a machine learning model becomes too focused on past data and misses the bigger picture. It happens when a model looks accurate on training data but doesn't generalize to new data. For example, a strategy that works perfectly for current customers but fails with new ones.

Why is overfitting a problem?

Great results on past data don't guarantee future success, especially when making decisions about new customers, markets, or scenarios. In complex datasets, models can easily pick up patterns that look meaningful but don't generalize.

How can you avoid overfitting?

Before trusting a strong result, ask whether it would still hold if the data changed. Avoid relying too heavily on patterns found in one dataset and assuming they will hold everywhere. Using multiple perspectives and mental models can help avoid mistaking a narrow view for a general truth.

Too Perfect to be True? Understanding Overfitting

I recently read about how incredibly complex the gut microbiome is, with trillions of microbes interacting in ways we’re only beginning to understand. It’s exactly the kind of problem where machine learning can uncover patterns in data far too complex to analyze manually.

But in practice, even powerful machine learning models can become too focused on past data and miss the bigger picture.

This is called overfitting. Imagine a strategy that works perfectly for your current customers but fails with new ones or predicting gut health by memorizing just one person’s microbiome. It looks accurate, but doesn’t generalize.

The key takeaway: great results on past data don’t guarantee future success, especially when making decisions about new customers, markets, or scenarios. Watch this short video to learn why it happens and how to avoid it.

From gut feeling to data: How machine learning makes sense of complexity

Rob Knight (computational microbiologist and professor at the University of California, San Diego) explains how scientists use data and machine learning to analyze and visualize complex systems like the microbiome, where massive, noisy datasets make pattern detection both powerful and risky.

For you, this is where overfitting becomes real. In complex datasets like these, models can easily pick up patterns that look meaningful but don’t generalize. Before trusting a strong result, ask: Would this still hold if the data changed?

Why one way of thinking isn’t enough

Book Featured in Understanding Overfitting in The Data Drop

In The Great Mental Models: General Thinking Concepts, Shane Parrish introduces simple frameworks for thinking more clearly about the world and making better decisions.

A core idea in the book is that no single way of thinking is enough. Relying too heavily on one perspective can lead you to see patterns that aren’t really there, like using the same tool for every problem. I like this idea because it’s easy to fall into the trap of trusting what seems to work, without questioning whether it applies more broadly.

Overfitting is a similar trap in data. It happens when we rely too heavily on patterns found in one dataset and assume they will hold everywhere, mistaking a narrow view for a general truth.

newsletter

Too Perfect to be True? Understanding Overfitting

From gut feeling to data: How machine learning makes sense of complexity

Why one way of thinking isn’t enough

Subscribe to The Data Drop for more insights like this

You might also like