Netflix Uses 76,897 Unique Movie Categories for Its Recommendation Algorithm

Alexis Madrigal and Ian Bogost did some truly incredible work to reverse engineer Netflix’s recommendation algorithm:

If you use Netflix, you’ve probably wondered about the specific genres that it suggests to you. Some of them just seem so specific that it’s absurd. Emotional Fight-the-System Documentaries? Period Pieces About Royalty Based on Real Life? Foreign Satanic Stories from the 1980s?

If Netflix can show such tiny slices of cinema to any given user, and they have 40 million users, how vast did their set of “personalized genres” need to be to describe the entire Hollywood universe?

This idle wonder turned to rabid fascination when I realized that I could capture each and every microgenre that Netflix’s algorithm has ever created.

Through a combination of elbow grease and spam-level repetition, we discovered that Netflix possesses not several hundred genres, or even several thousand, but 76,897 unique ways to describe types of movies.

There are so many that just loading, copying, and pasting all of them took the little script I wrote more than 20 hours.

This is probably the most fascinating article you’ll read all weekend. I’m almost surprised that Netflix would agree to be interviewed for the piece, given that so much of their competitive advantage is revealed.

To wit:

As the hours ticked by, the Netflix grammar—how it pieced together the words to form comprehensible genres—began to become apparent as well.

If a movie was both romantic and Oscar-winning, Oscar-winning always went to the left: Oscar-winning Romantic Dramas. Time periods always went at the end of the genre: Oscar-winning Romantic Dramas from the 1950s.

The single-word adjectives (such as romantic) could basically just pile up, though, at least to a point: Oscar-winning Romantic Forbidden-Love Movies.

And the content-area categories were generally tacked onto the end: Oscar-winning Romantic Movies about Marriage.

In fact, there was a hierarchy for each category of descriptor. Generally speaking, a genre would be formed out of a subset of these components:

Region + Adjectives + Noun Genre + Based On… + Set In… + From the… + About… + For Age X to Y

There were a few wildcards, too, like everyone’s favorite, “With a Strong Female Lead” and “For Hopeless Romantics.”

And, of course, there were all the genres that are for movies or TV shows starring or directed by certain individuals.

If you want to cogently understand why Netflix is the only streaming service that matters now and in the future it’s in this piece by Madrigal. Like Apple, or any great company for that matter, the little details matter a lot.

Comments on this entry are closed.