Try writing rules that recognize a cat in a photo. Pointy ears won't work because some breeds fold them flat, fur won't work because sphynxes barely have any, and whiskers match otters too. Rule lists for problems like this collapse under their own exceptions, which is why machine learning flips traditional programming on its head. Instead of writing rules for the computer to apply to data, you show it data and let it work out the rules. The result is a model: behavior learned from examples instead of spelled out by hand.
That one idea powers search ranking, spam filters, recommendations, and the language models everyone is talking about. The machinery underneath is surprisingly consistent: define a measure of wrong, then nudge the model to be less wrong, millions of times.
Before any algorithm, there is the workflow around it. Data gets cleaned and turned into features, split so the model is tested on examples it has never seen, and scored with metrics that tell you more than raw accuracy does. Gradient descent does the actual learning, and regularization keeps it honest.
Supervised learning is learning from labeled examples: inputs paired with correct answers. Regression predicts numbers, classification predicts categories, and the classics here run from a straight line through data to ensembles of hundreds of trees. Every major intro course opens with linear regression, and for good reason: most of deep learning is that idea, stacked.
Unsupervised learning works without labels: the algorithm gets raw data and has to find the structure on its own. Clustering groups similar points, dimensionality reduction squeezes thousands of features down to the few that matter, and anomaly detection flags the points that don't belong.
Stack enough simple neurons and you can learn almost any pattern. This tier covers the network itself and backpropagation, the chain rule that makes training possible. Then come the classic architectures: CNNs for images, recurrent networks for sequences. It ends right where the AI Research page picks up, because transformers grew straight out of sequence models.
Here is the whole path, tier by tier, in the order the big intro courses teach it. Each topic will get its own page soon, but until then, use this as the map.
None of the famous algorithms matter if the workflow around them is broken. Everything in this tier exists to answer one question honestly: is the model actually learning, or just memorizing?
From raw data to deployed model, and every step between.
Turning messy real-world data into something a model can learn from.
Hold data back so you know the model generalizes, not memorizes.
When models memorize instead of learn, and how to stop them.
Precision, recall, and why accuracy alone can lie to you.
How models learn: follow the error downhill, one step at a time.
With the workflow in place, start where the labels are. Every method here is the same bet in a different shape: examples with known answers can predict the next unknown one.
Fits a straight line through data to predict numbers.
A linear model bent into predicting yes-or-no probabilities.
Predicts by asking the closest examples what they are.
A flowchart of yes/no questions, learned from data.
Hundreds of trees voting beat any single tree.
Trees built one at a time, each fixing the last one's mistakes.
Draws the widest possible gap between two classes.
Labels are expensive, and most of the world's data has none. These algorithms work with what's actually abundant: raw data, and the structure hiding in it.
Groups data around k center points without any labels.
Builds a tree of clusters, from single points to one big group.
Finds clusters by density, and calls the leftovers noise.
Squeezes many features into the few directions that matter most.
Flattens high-dimensional data so you can actually look at it.
Learns what normal looks like, then flags what isn't.
Every model so far needed humans to choose its features. A deep network learns the features itself, and that one step made images, audio, and language learnable.
Layers of simple units that together learn almost anything.
The chain rule, applied backwards, that trains every network.
The architecture that taught computers to see.
Networks with memory, and the road that led to transformers.
Turns words and items into vectors where distance means similarity.