Building A Recommendation System

Title: Building a Recommendation System in Python: A Comprehensive Guide


Building A Recommendation System
Building A Recommendation System

Introduction

One of the most popular applications of machine learning that we interact with daily is the recommendation system. Have you ever wondered how Netflix suggests movies to your liking? Or how Amazon recommends products you might be interested in? It’s all thanks to these systems. In this tutorial, we will discuss building a recommendation system using Python. We’re going to start from the basics, suitable for beginners, and move on to address topics for experienced Python programmers.

Table of Contents

  1. What is a Recommendation System?
  2. Types of Recommendation Systems
  3. Building a Recommendation System in Python
  4. Evaluating a Recommendation System

1. What is a Recommendation System?

A recommendation system is a subfield of machine learning that aims to predict users’ preferences towards certain items. Whether it be movies, music, blogs, or products, recommendation systems are extensively used in different internet applications and have become integral to the fields of machine learning and data science.

2. Types of Recommendation Systems

There are two primary types of recommendation systems: Collaborative Filtering and Content-Based Filtering.

Collaborative Filtering

Collaborative Filtering (CF) uses the behavior of other users to recommend items. In simple terms, if person A likes items 1, 2, 3 and person B likes 2, 3, 4 then they have similar interests and person A should like item 4 and person B should like item 1.

Content-Based Filtering

On the other hand, Content-Based Filtering uses the properties of the items to recommend other items similar to what the user likes, based on their previous actions or explicit feedback.

3. Building a Recommendation System in Python

Now, let’s dive into the main event. We are going to start with loading datasets. For this article, we will be using a simplified version of the MovieLens dataset.

Loading dataset

import pandas as pd

movies_df = pd.read_csv('movies.csv')
ratings_df = pd.read_csv('ratings.csv')

Collaborative Filtering Recommendation

Let’s build a collaborative filtering recommendation system using Python’s powerful libraries pandas and scikit-learn.

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import pairwise_distances

# Creating user-movie matrix
matrix = ratings_df.pivot(index='userId', columns='movieId', values='rating').fillna(0)

# Creating cosine similarity matrix
cosine_sim_matrix = 1- pairwise_distances(matrix.values, metric='cosine')

# Converting cosine_sim_matrix to a DataFrame
cosine_sim_matrix_df = pd.DataFrame(cosine_sim_matrix, index=matrix.index, columns=matrix.index)
cosine_sim_matrix_df is our recommendation model, which we can use to recommend movies that are similar to the user’s history.

Content-Based Filtering Recommendation

Our content-based filtering approach will be based on movie genres.

from sklearn.feature_extraction.text import TfidfVectorizer

# Using TfidfVectorizer to transform text to feature vectors
tfidf = TfidfVectorizer(stop_words='english')
movies_df['genres'] = movies_df['genres'].fillna('')

tfidf_matrix = tfidf.fit_transform(movies_df['genres'])

# Compute the cosine similarity matrix
cosine_sim2 = cosine_similarity(tfidf_matrix, tfidf_matrix)

We now have a pairwise cosine similarity matrix for all the movies in our dataset.

4. Evaluating a Recommendation System

Evaluating a recommendation system is as crucial as building one. The most common methods are RMSE (Root Mean Squared Error) and MSE (Mean Squared Error). With the surprise library, it becomes very easy!

from surprise import accuracy, Dataset, Reader, SVD
from surprise.model_selection import cross_validate

# Load data from DataFrame to surprise Dataset
reader = Reader(rating_scale=(0, 5))
data = Dataset.load_from_df(ratings_df[['userId', 'movieId', 'rating']], reader)

# Train SVD recommender
recommender = SVD()

# Perform cross validation
cross_validate(recommender, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

This large chunk of code trains an SVD (Singular Value Decomposition) model and computes the RMSE and MAE of the recommender system. An exciting field, recommendation systems require a deep understanding of the problem at hand. The Python code snippets above provide a simple yet comprehensive understanding and hands-on experience.

Conclusion

In summary, recommendation systems play an integral role in the efficiency and effectiveness of online platforms’ product delivery. They are here to stay, and their importance will only increase with the rise of digitalization.

By learning the Python code for building these systems, you open up a whole new world of opportunities. Even though we discussed examples using movies, the techniques herein are applicable to any user-item situation. As always in data science, the power is in your hands.

Happy coding!

References:

  1. “F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19.”
  2. Scikit-learn: Machine Learning in Python
  3. “Building and Testing Recommender Systems With Surprise, Step-by-Step”
Share this article:

Leave a Comment