Two decades into the AI revolution, deep learning is becoming a standard part of the analytics toolkit. Here’s what it means.
As first published in InfoWorld.
Pick up a magazine, scroll through the tech blogs, or simply chat with your peers at an industry conference. You’ll quickly notice that almost everything coming out of the technology world seems to have some element of artificial intelligence or machine learning to it. The way artificial intelligence is discussed, it’s starting to sound almost like propaganda. Here is the one true technology that can solve all of your needs! AI is here to save us all!
While it’s true that we can do amazing things with AI-based techniques, we generally aren’t embodying the full meaning of the term “intelligence.” Intelligence implies a system with which humans can have a creative conversation—a system that has ideas and that can develop new ones. At issue is the terminology. “Artificial intelligence” today commonly describes the implementation of some aspects of human abilities, such as object or speech recognition, but certainly not the entire potential for human intelligence.
Thus “artificial intelligence” is probably not the best way to describe the “new” machine learning technology we’re using today, but that train has left the station. In any case, while machine learning is not yet synonymous with machine intelligence, it certainly has become more powerful, more capable, and easier to use. AI—meaning neural networks or deep learning as well as “classic” machine learning—is finally on its way to becoming a standard part of the analytics toolkit.
Now that we are well into the AI revolution (or rather evolution), it’s important to look at how the concept of artificial intelligence has been co-opted, why, and what it will mean in the future. Let’s dive deeper to investigate why artificial intelligence, even some slightly misconstrued version of it, has attracted the present level of attention.
The AI promise: Why now?
In the current hype cycle, artificial intelligence or machine learning often are depicted as relatively new technologies that have suddenly matured, only recently moving from the concept stage to integration in applications. There is a general belief that the creation of stand-alone machine learning products has happened only over the last few years. In reality, the important developments in artificial intelligence are not new. The AI of today is a continuation of advances achieved over the past couple of decades. The change, the reasons we are seeing artificial intelligence appear in so many more places, is not so much about the AI technologies themselves, but the technologies that surround them—namely, data generation and processing power.
I won’t bore you with citing how many zettabytes of data we are going to store soon (how many zeros does a zettabyte have anyway?). We all know that our ability to generate and collect data is growing phenomenally. At the same time, we’ve seen a mind-boggling increase in available computing power. The shift from single-core processors to multi-core as well as the development and adoption of general-purpose graphics processing units (GPGPUs) provide enough power for deep learning. We don’t even need to handle compute in-house anymore. We can simply rent the processing power somewhere in the cloud.
With so much data and plenty of compute resources, data scientists are finally in a position to use the methods developed in past decades at a totally different scale. In the 1990s, it took days to train a neural network to recognize numbers on tens of thousands of examples with handwritten digits. Today, we can train a much more complex (i.e. “deep”) neural network on tens of millions of images to recognize animals, faces, and other complex objects. And we can deploy deep learning models to automate tasks and decisions in mainstream business applications, such as detecting and forecasting the ripeness of produce or routing incoming calls.
This may sound suspiciously like building real intelligence, but it is important to note that underneath these systems, we are simply tuning parameters of a mathematical dependency, albeit a pretty complex one. Artificial intelligence methods aren’t good at acquiring “new” knowledge; they only learn from what is presented to them. Put differently, artificial intelligence doesn’t ask “why” questions. Systems don’t operate like the children who persistently question their parents as they try to understand the world around them. The system only knows what it was fed. It will not recognize anything it was not previously made aware of.
In other, “classic” machine learning scenarios, it’s important to know our data and have an idea about how we want that system to find patterns. For example, we know that birth year is not a useful fact about our customers, unless we convert this number to the customer’s age. We also know about the effect of seasonality. We shouldn’t expect a system to learn fashion buying patterns independently of the season. Further, we may want to inject a few other things into the system to learn on top of what it already knows. Unlike deep learning, this type of machine learning, which businesses have been using for decades, has progressed more on a steady pace.
Recent advances in artificial intelligence have come primarily in areas where data scientists are able to mimic human recognition abilities, such as recognizing objects in images or words in acoustic signals. Learning to recognize patterns in complex signals, such as audio streams or images, is extremely powerful—powerful enough that many people wonder why we aren’t using deep learning techniques everywhere.
The AI promise: What now?
Organizational leadership may be asking when they should use artificial intelligence. Well, AI-based research has made massive progress when it comes to neural networks solving problems that are related to mimicking what humans do well (object recognition and speech recognition being the two most prominent examples). Whenever one asks, “What’s a good object representation?” and can’t come up with an answer, then a deep learning model may be worth trying. However, when data scientists are able to construct a semantically rich object representation, then classic machine learning methods are probably a better choice (and yes, it’s worth investing a bit of serious thought into trying to find a good object representation).
In the end, one simply wants to try out different techniques within the same platform and not be limited by some software vendor’s choice of methods or inability to catch up with the current progress in the field. This is why open source platforms are leaders in this market; they allow practitioners to combine current state-of-the-art technologies with the latest bleeding-edge developments.
Moving forward, as teams become aligned in their goals and methods for using machine learning to achieve them, deep learning will become part of every data scientist’s toolbox. For many tasks, adding deep learning methods to the mix will provide great value. Think about it. We will be able to include object recognition in a system, making use of a pre-trained artificial intelligence system. We will be able to incorporate existing voice or speech recognition components because someone else has gone through the trouble of collecting and annotating enough data. But in the end, we will realize that deep learning, just like classic machine learning before it, is really just another tool to use when it makes sense.
The AI promise: What next?
One of the road blocks that will surface, just as it did two decades ago, is the extreme difficulty one encounters when trying to understand what artificial intelligence systems have learned and how they come up with their predictions. This may not be critical when it comes to predicting whether a customer may or may not like a particular product. But issues will arise when it comes to explaining why a system interacting with humans behaved in an unexpected way. Humans are willing to accept “human failure”—we don’t expect humans to be perfect. But we will not accept failure from an artificial intelligence system, especially if we can’t explain why it failed (and correct it).
As we become more familiar with deep learning, we will realize—just as we did for machine learning two decades ago—that despite the complexity of the system and the volume of data on which it was trained, understanding patterns is impossible without domain knowledge. Human speech recognition works as well as it does because we can often fill in a hole by knowing the context of the current conversation.
Today’s artificial intelligence systems don’t have that deep understanding. What we see now is shallow intelligence, the capacity to mimic isolated human recognition abilities and sometimes outperform humans on those isolated tasks. Training a system on billions of examples is just a matter of having the data and getting access to enough compute resources—not a deal-breaker anymore.
Chances are, the usefulness of artificial intelligence will ultimately fall somewhere short of the “save the world” propaganda. Perhaps all we’ll get is an incredible tool for practitioners to use to do their jobs faster and better.