“Most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI.”
— Yann LeCun
The field of machine learning has two major branches Supervised Learning and Unsupervised Learning.
In supervised learning, the AI agent has access to labels, which can be used to improve performance on a task. In the email spam filter problem, we have a dataset of emails with all their text. We know which of these emails are spam and which are not. This is what we call labels. These labels are very valuable in helping the supervised learning AI separate the spam emails from the rest.
The strengths and weaknesses of Supervised Learning
Supervised learning excels at optimizing performance in well-defined tasks with plenty of labels. For example, consider a very large dataset of object images, all of them labeled.
If the dataset is sufficiently large enough and we train using the right machine learning algorithms, we can build a very good supervised learning-based image classification system.
However, the costs of manually labeling an image dataset are high. And, even the best-curated image datasets have only thousands of labels.
This is a problem as supervised learning systems are very good at classifying images of objects with labels. However, they can perform poorly at classifying object images with no labels.
In unsupervised learning, labels are not available. Therefore, the task of the AI agent is not well-defined, and performance cannot be clearly measured. Consider the email spam filter problem — this time without labels. Now, the AI agent will attempt to understand the underlying structure of emails, separating the database of emails into different groups.
Unsupervised learning problems are less clearly defined than the supervised learning problems and are harder for the AI agent to solve. But, if handled well, the solution can be powerful.
The unsupervised system is better than the supervised system at finding new patterns in future data. This makes the unsupervised solution more nimble on a go-forward basis. We call this, the power of unsupervised learning.
The Strengths and Weaknesses of Unsupervised Learning:
Instead of being guided by labels, unsupervised learning works by learning the underlying structure of the data it has trained on. It does this by trying to represent the data it trains with a set of parameters that are significantly smaller than the number of examples available in the dataset. By performing this representation learning, unsupervised learning is able to identify distinct patterns in the dataset.
Unsupervised learning makes previously intractable problems more solvable and is much more nimble at finding hidden patterns both in the historical data and in future data. Moreover, we now have an AI approach for the huge troves of unlabeled data that exist in the world.
Read also: Web Scraping Tutorial in Python – Part 1