Recommendation systems power our daily digital experiences, from suggesting movies on streaming platforms to recommending products in online stores. In this tutorial, we’ll dive into the practical steps for building a basic recommendation system using Python’s Scikit-learn library. We’ll focus on collaborative filtering, one of the most popular techniques for personalized recommendations.
Understanding Recommendation Systems
Recommendation systems use algorithms to suggest items based on user preferences and behavior. The two main types are:
- Content-based filtering: Recommends items similar to those the user liked in the past.
- Collaborative filtering: Suggests items based on the interactions and preferences of many users.
This tutorial will $1 a simple collaborative filtering system using Scikit-learn.
Preparing Your Dataset
The first step in building a recommendation system is gathering your data. Most collaborative filtering systems use a user-item interaction matrix with ratings or implicit feedback (like clicks or purchases).
For demonstration, let’s create a small dataset representing user ratings of movies:
import pandas as pd
# Sample data: Users and movie ratings
ratings_data = {
'User': ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'],
'Movie_A': [5, 4, None, 2, None],
'Movie_B': [3, None, 4, None, 5],
'Movie_C': [None, 2, 1, 5, 4],
'Movie_D': [4, None, 2, 3, None]
}
ratings_df = pd.DataFrame(ratings_data)
ratings_df.set_index('User', inplace=True)
print(ratings_df)
Handling Missing Values
Recommendation datasets often contain missing values, since not all users rate all items. Scikit-learn’s algorithms require numerical matrices, so we need to handle these missing entries.
The SimpleImputer class can fill missing values with the average rating for each item:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy='mean')
rating_matrix = imputer.fit_transform(ratings_df)
Building the Collaborative Filtering Model
Collaborative filtering often involves finding similar users (user-based) or similar items (item-based). For simplicity, we'll use cosine similarity to measure user similarity and recommend movies based on nearest neighbors.
Let’s use Scikit-learn’s NearestNeighbors to implement user-based collaborative filtering:
from sklearn.neighbors import NearestNeighbors
# Fit the nearest neighbors model
model = NearestNeighbors(metric='cosine', algorithm='brute')
model.fit(rating_matrix)
Generating Recommendations
To recommend movies for a user, we find their nearest neighbors and aggregate their ratings:
import numpy as np
# Choose a target user (e.g., index 0 for 'Alice')
target_user_index = 0
# Find the top 2 nearest neighbors
distances, indices = model.kneighbors([rating_matrix[target_user_index]], n_neighbors=3)
print('Nearest neighbors:', indices)
# Aggregate ratings for movies not yet rated by the target user
rated_movies = ~np.isnan(ratings_df.iloc[target_user_index].values)
unrated_indices = np.where(~rated_movies)[0]
neighbor_indices = indices[0][1:] # Exclude the user herself
neighbor_ratings = rating_matrix[neighbor_indices]
recommendations = neighbor_ratings[:, unrated_indices].mean(axis=0)
movie_names = ratings_df.columns[unrated_indices]
for i, movie in enumerate(movie_names):
print(f"Recommended rating for {movie}: {recommendations[i]:.2f}")
This script finds similar users and predicts how much the target user might like unrated movies, helping generate actionable recommendations.
Tips for Improving Your Recommendation System
- Use larger datasets for more accurate results.
- Experiment with different similarity metrics (e.g., Pearson correlation).
- Try item-based collaborative filtering for alternative approaches.
- Explore advanced $1 like matrix factorization (e.g., SVD) for scalability.
Conclusion
Building a basic recommendation system with Scikit-learn is straightforward and powerful for prototyping. Collaborative filtering is a cornerstone of personalization in AI, and knowing how to implement it can unlock new possibilities for your projects. For production use, consider frameworks like Surprise or TensorFlow Recommenders for more scalability and flexibility.
With this $1 guide, you’re equipped to start experimenting and enhancing recommendation systems in your own AI and machine learning projects!