Search and recommendations with embeddings

We built a system that serves 10s of millions of song recommendations for 10s of millions of customers at a low latency. 

This multi-part series explains how we built it, starting from core concepts, and working our way into practicum. This isn't the only way to build recommendation systems based on machine learning, as the state-of-the-art is always evolving. 

To train the machine learning model, we used customer playback history to learn from the "wisdom of the crowd". Our model is trained to predict whether a customer would listen to a song they haven't played before. We then "productionize" the model by making it work for live, always on, ever changing, customer trafic. Our technology uses the Elasticsearch query language for matching songs, and a custom KNN plugin for scoring those matches.

As this is hard to do, the goal of these posts is to explain how the system works, from the conceptual to the practical. Once the reader understands the concepts, my hope is they can understand at a deeper level how most recommendation systems work today. 

Parts 1-3 explain fundamentals:

Parts 4-6 are a practicum, describing what we actually built:

  • Part 4 - ANN vs KNN Tuning for Quality and Latency (coming soon)
  • Part 5 - How to create embeddings/Machine Learning to build models (coming soon)
  • Part 6 - Architecture, and infrastructure (coming soon)