Building A Linear Regression Model From Scratch

Building a Linear Regression Model from Scratch in Python

Linear regression is one of the foundational algorithms of machine learning and data science. It’s a simple, yet powerful predictive modeling technique that allows for understanding relationships between variables. In the context of machine learning, linear regression uses the relationship data to forecast or predict the output.


Building A Linear Regression Model From Scratch
Building A Linear Regression Model From Scratch

In this article, we aim to lay a solid foundation for understanding linear regression, and how to implement it from scratch in Python. We also assume the reader has a basic understanding of Python and elementary concepts of statistics.

Table of Contents

  1. Introduction
  2. Understanding Linear Regression
    • Simple Linear Regression
    • Multiple Linear Regression
  3. Mathematical Overview
  4. Building a Linear Regression Model from Scratch
  5. Implementation in Python
  6. Testing the Model
  7. Conclusion

Introduction

Python is a versatile language with a rich data science ecosystem. It provides an extensive array of libraries and tools for machine learning and data wrangling, but nothing quite hones your understanding of a subject like building from the ground up.

Let’s start by discussing what linear regression is.

Understanding Linear Regression

Linear regression is a type of regression analysis where the relationship between the dependent variable and the independent variable is linear. Here:

  • Dependent Variable: The main factor that you’re trying to understand or predict.
  • Independent Variable: The factors that you will leverage to understand or predict the dependent variable.

Simple Linear Regression

Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y, while more complex models include two or more independent variables.

Multiple Linear Regression

Multiple linear regression attempts to model the relationship between two or more features and a response by fitting a linear equation to observed data. Hence, a larger number of independent variables are associated with the dependent variables.

Mathematical Overview

The foundation of any linear regression model is the equation: y = mx + c.

However, in the case of multiple linear regression, the equation becomes y = m1x1 + m2x2 + m3x3 + ... + c where y is the dependent variable, m is the model coefficient, and c is the constant.

Linear Regression’s primary objective is to minimize the total sum of the squared difference between the actual value y and the value predicted by our model y predicted.

Building a Linear Regression Model from Scratch

Let’s walk through the steps to build a simple linear regression model from scratch.

Step 1 – Import the Required Libraries

Start by importing the required Python libraries. We will need NumPy for mathematical computations and matplotlib for plotting graphs.

import numpy as np
import matplotlib.pyplot as plt

Step 2 – Create Fake Data

For illustration purposes, let us create a fake dataset using NumPy.

np.random.seed(0)
x = 2 - 3 * np.random.normal(0, 1, 100)
y = x - 2 * (x ** 2) + 0.5 * (x ** 3) + np.random.normal(-3, 3, 100)
plt.scatter(x,y, s=10)
plt.show()

Step 3 – Transform the Dataset

To apply linear regression, we’ll need to convert the dataset into a column vector, which we’ll do using the reshape method. We also split the dataset into training and testing sets.

x = x[:, np.newaxis]
y = y[:, np.newaxis]

Step 4 – Building the Model

Our task is to find the best fit line for the given dataset. We will need two parameters m & c to define that line.

m = 0
c = 0

L = 0.01  # The learning Rate
epochs = 1000  # The number of iterations to perform gradient descent

n = float(len(x)) # Number of elements in x

# Performing Gradient Descent 
for i in range(epochs): 
  y_predicted = m*x + c  # The current predicted value of Y
  D_m = (-2/n) * sum(x * (y - y_predicted))  # Derivative wrt m
  D_c = (-2/n) * sum(y - y_predicted)  # Derivative wrt c
  m = m - L * D_m  # Update m
  c = c - L * D_c  # Update c

print (m, c)

Step 5 – Making Predictions

Now, we’ll use the linear model for prediction of future values.

y_predicted = m*x + c

Testing the Model

To see how well our model has performed and visually inspect the line of best fit we plot the values.

plt.scatter(x, y) 
plt.plot([min(x), max(x)], [min(y_predicted), max(y_predicted)], color='red')  # regression line
plt.show()

Conclusion

And there you have it! You have built a simple linear regression model completely from scratch in Python. It’s crucial to remember that linear regression is a starting point in prediction modeling, and linear relationships may not always exist in real-world scenarios! But for scenarios that do, it is a wonderfully simple predictive tool.

Remember, to be a better data scientist, it’s about understanding the gearing that powers the algorithms, not just knowing how to use sklearn’s built-in functions. Happy coding!

Share this article:

Leave a Comment