Data Science Explained – Pronto! Season 2

December 1, 2022 — by Casiana Rîmbu & Roberto Cadili
Data Science Explained, Pronto! Season 2

Short but intense, Season 2 of “Data Science Pronto!” says goodbye to the data science community. After Season 1, we continued to ask hot questions and share insights through seven short videos, focusing on the whats and whys behind all things data science.

Warning: This post contains spoilers.

In this year’s collection, we looked closely at neural networks and deep learning architectures. We investigated different types of learning rate strategies, looked behind the scenes of the Backpropagation algorithm, and illustrated how convolutional neural networks detect key image features. We populated the two existing sections about “Data Preparation” and “Times Series Analysis” with videos on the importance of feature scaling and weak stationarity, respectively. 

But that’s not all. Season 2 introduced “Natural Language Processing,” zooming in on TF-IDF to evaluate how relevant a word is to a document s, as well as context-aware word embeddings such as Word2Vec.

We promise you’ll learn a thing or two in 20 minutes. Let’s revisit the topics we tackled this season.

Data Preparation

Data preparation is a vital step in any data science project. Cleaning, reshaping, and refining raw data into usable dataset that can be leveraged for analytics can take up 50%-80% of a data scientist’s time and effort. After data leakage and bootstrapping, this video investigates what seems an easy data preparation step.

Feature Scaling

You’ve likely heard of feature scaling in a data science class. Many algorithms are based on the values of variance (like PCA) or distance (like clustering). Because variance and distance are heavily influenced by the range of the feature, feature scaling is needed to allow each feature to equally play a role when training an algorithm. Could you tell the difference between two of the most popular techniques, i.e. normalization and standardization? Do you know the reasons one or the other may be needed? Check out Ali’s explanation. 

Time Series Analysis

There is never enough discussion, educational content, or data flows about one of hottest subfields in data science. It’s fascinating to observe how introducing time to analyze a sequence of data points requires ad-hoc methods and algorithms to extract meaningful insights or make predictions. After explaining why R^2 should be avoided to measure model performance in time series analysis, in this season, we discuss a crucial assumption many time series forecasting models rely on.

Weak Stationarity

If you don’t know what weak stationarity is, this is your chance to learn. This is especially relevant if you model stock prices, retail sales, or meteorological data for forecasting. By definition, a time series is stationary if the joint probability distribution of all its random variables, Xt, does not change over time. So how does adding weak change this? And why is it so important in forecasting algorithms? Check out this video for the details.

Deep Learning

Deep learning is the cool kid in school — everybody talks about it and wants to be friends with it. There’s hardly anything you cannot do with advanced deep learning architectures, from generating digital images from natural language descriptions to autonomous vehicle systems. Because of the complexity of these architectures, deep learning models are often regarded as black boxes. They produce useful information without revealing any information about their internal workings. So now is a good opportunity to shed light on a few critical aspects of the field.


Without the Backpropagation algorithm, there would be no learning, and without learning, neural architectures are simply huge algorithmic overheads. In combination with Gradient Descent (one of the most popular algorithms for model optimization), Backpropagation updates the model’s weights of each neuron in one layer for them to pass meaningful information onto the next layer. But how does that happen, exactly? Is it just magic, or is there a mathematical explanation? Don’t miss Dionysios and Casiana’s video.

Learning Rate

Closely related to the Gradient Descent and Backpropagation algorithms is the concept of learning rate —- a configurable hyperparameter used in optimization algorithms to train many machine learning models, such as neural networks. It controls how much a model will change in response to the estimated error each time its parameters are updated. All clear? If not, this video will explain learning rates, how to choose the optimal one, and strategies for implementing them.

Convolutional Neural Networks

These are most commonly used to analyze visual imagery, used for image recognition, video analysis, drug discovery, and much more. But could you explain how a convolutional layer works? What’s the difference between 1D, 2D, and 3D convolutions? Surprised to hear that there are convolutions of different dimensions? If so, you should definitely check out Emilio’s explanation.

Natural Language Processing

An ever-increasing share of human interaction, communication, and culture is recorded as digital text. This data gives unprecedented insights into fundamental questions in the social sciences, humanities, and industry. Meanwhile, new machine learning models are rapidly transforming the way science and business are conducted, opening a wide range of possibilities to transform text into value. In this video, we start with an essential question for any text-driven application: How do we represent texts for predictive analytics?


While this numerical statistic, which reflects how important a word is to a document in a collection or corpus, is surely familiar to any data scientists working in text analysis and information retrieval, understanding the proper interaction between TF and IDF is not so trivial. Think you already know it all? Check out Lisa’s quick but exhaustive explanation.

Word Embeddings

Effective text representation is not easy. Natural languages are articulated and interdependent organisms rich in phrasing ambiguities, misspellings, words with multiple meanings, phrases with multiple intentions, and much more. One possible solution? Use word embeddings — a dense numerical representation for words which is generated with the help of neural networks. But how are word embeddings actually constructed? And how can we generate context-aware word embeddings? In less than three minutes, Aline unravels the mystery.

Coming soon

That was the entire second season of “Data Science Pronto!” Did you know the answers already? If you did, maybe you should star in the next season! Otherwise, stay tuned to learn more!

Email with your burning questions about obscure best practices, incomprehensible parts of algorithms, or frequently mentioned use cases, and we will try to answer.

We will now pause for a few months to recharge and find more data science topics to address. We’ll be back sooner than you think!

The “Data Science Pronto” team thanks you for watching!

You Might Also Like

What are you looking for?