A Step-by-Step Guide to Building a Recommendation System with Scikit-learn

Recommendation systems power our daily digital experiences, from suggesting movies on streaming platforms to recommending products in online stores. In this tutorial, we’ll dive into the practical steps for building a basic recommendation system using Python’s Scikit-learn library. We’ll focus on collaborative filtering, one of the most popular techniques for personalized recommendations.

Understanding Recommendation Systems

Recommendation systems use algorithms to suggest items based on user preferences and behavior. The two main types are:

Content-based filtering: Recommends items similar to those the user liked in the past.
Collaborative filtering: Suggests items based on the interactions and preferences of many users.

This tutorial will implement a simple collaborative filtering system using Scikit-learn.

Preparing Your Dataset

The first step in building a recommendation system is gathering your data. Most collaborative filtering systems use a user-item interaction matrix with ratings or implicit feedback (like clicks or purchases).

For demonstration, let’s create a small dataset representing user ratings of movies:

import pandas as pd

# Sample data: Users and movie ratings
ratings_data = {
    'User': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
    'Movie_A': [5, 4, None, 2, None],
    'Movie_B': [3, None, 4, None, 5],
    'Movie_C': [None, 2, 1, 5, 4],
    'Movie_D': [4, None, 2, 3, None]
}
ratings_df = pd.DataFrame(ratings_data)
ratings_df.set_index('User', inplace=True)
print(ratings_df)

Handling Missing Values

Recommendation datasets often contain missing values, since not all users rate all items. Scikit-learn’s algorithms require numerical matrices, so we need to handle these missing entries.

The SimpleImputer class can fill missing values with the average rating for each item:

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')
rating_matrix = imputer.fit_transform(ratings_df)

Building the Collaborative Filtering Model

Collaborative filtering often involves finding similar users (user-based) or similar items (item-based). For simplicity, we'll use cosine similarity to measure user similarity and recommend movies based on nearest neighbors.

Let’s use Scikit-learn’s NearestNeighbors to implement user-based collaborative filtering:

from sklearn.neighbors import NearestNeighbors

# Fit the nearest neighbors model
model = NearestNeighbors(metric='cosine', algorithm='brute')
model.fit(rating_matrix)

Generating Recommendations

To recommend movies for a user, we find their nearest neighbors and aggregate their ratings:

import numpy as np

# Choose a target user (e.g., index 0 for 'Alice')
target_user_index = 0

# Find the top 2 nearest neighbors
distances, indices = model.kneighbors([rating_matrix[target_user_index]], n_neighbors=3)
print('Nearest neighbors:', indices)

# Aggregate ratings for movies not yet rated by the target user
rated_movies = ~np.isnan(ratings_df.iloc[target_user_index].values)
unrated_indices = np.where(~rated_movies)[0]

neighbor_indices = indices[0][1:]  # Exclude the user herself
neighbor_ratings = rating_matrix[neighbor_indices]

recommendations = neighbor_ratings[:, unrated_indices].mean(axis=0)
movie_names = ratings_df.columns[unrated_indices]

for i, movie in enumerate(movie_names):
    print(f"Recommended rating for {movie}: {recommendations[i]:.2f}")

This script finds similar users and predicts how much the target user might like unrated movies, helping generate actionable recommendations.

Tips for Improving Your Recommendation System

Use larger datasets for more accurate results.
Experiment with different similarity metrics (e.g., Pearson correlation).
Try item-based collaborative filtering for alternative approaches.
Explore advanced models like matrix factorization (e.g., SVD) for scalability.

Conclusion

Building a basic recommendation system with Scikit-learn is straightforward and powerful for prototyping. Collaborative filtering is a cornerstone of personalization in AI, and knowing how to implement it can unlock new possibilities for your projects. For production use, consider frameworks like Surprise or TensorFlow Recommenders for more scalability and flexibility.

With this step-by-step guide, you’re equipped to start experimenting and enhancing recommendation systems in your own AI and machine learning projects!

Understanding Recommendation Systems

Preparing Your Dataset

Handling Missing Values

Building the Collaborative Filtering Model

Generating Recommendations

Tips for Improving Your Recommendation System

Conclusion

Related Articles

Getting Started with Text Classification Using Scikit-learn

Fine-Tuning Large Language Models: A Step-by-Step Tutorial with Hugging Face Transformers

How to Implement Attention Mechanisms in Neural Networks: A Practical Tutorial

How to Deploy a Machine Learning Model with FastAPI: A Step-by-Step Tutorial