Time Series Forecasting With Arima

Time Series Forecasting with ARIMA in Python

Time series forecasting is a crucial component of many important fields, from predicting stock prices to weather forecasting and beyond. One of the most popular time series forecasting methods is known as ARIMA, an acronym for AutoRegressive Integrated Moving Average. In this tutorial, we will explore the fundamentals of ARIMA and how to implement it using Python.


Time Series Forecasting With Arima
Time Series Forecasting With Arima

Table of Contents

  1. Understanding Time Series
  2. What is ARIMA?
  3. How ARIMA Works
  4. Implementing ARIMA in Python
  5. Model Evaluation
  6. Conclusion

Understanding Time Series

A time series is a sequence of data points collected or recorded at regular time intervals. It’s essential to understand that these data points are time-dependent, meaning the order does matter, unlike many Machine Learning problem scenarios.

Let’s quickly understand the three main components of time series data:

  • Trend: Shows a consistent upward or downward slope of your data points over time.
  • Seasonality: Shows clear patterns of changes at regular intervals.
  • Noise: Fluctuations in the data that don’t seem to follow a pattern or regularity.

What is ARIMA?

ARIMA stands for Autoregressive Integrated Moving Average. This statistical methodology is widely used for time series analysis to understand the data or predict future points in the series. ARIMA has been traditionally considered effective in contexts where data show evidence of non-stationarity.

Here’s the breakdown of ARIMA:

AR: Autoregression The model leverages the dependent relationship between a data point and its predecessors.

I: Integrated The use of differencing raw observations to allow for the time series to achieve stationarity.

MA: Moving Average The model makes use of the dependency between a data observation and a residual error from the moving average model applied to preceding data points.

How ARIMA Works

The ARIMA forecasting for a stationary time series is a linear equation, where the predictors depend on the parameters (p,d,q) of the ARIMA model:

p: The order of the autoregressive part. d: The number of non-seasonal differences. q: The order of the moving average part.

The main required step for applying ARIMA is to ensure that the time series is stationary, i.e., the series’ properties do not depend on the time at which the series is observed. To satisfy this, we can use transformations like differencing and logging.

Also, in a nutshell, an ARIMA model aims to find the best fit that describes the autocorrelations in a time series data.

Implementing ARIMA in Python

We will use the Python statsmodels package to apply ARIMA. The dataset we’ll be using is the Air Passengers dataset that provides monthly totals of a US airline passengers from 1949 to 1960.

Here are the steps to follow:

1. Import Libraries and Load Dataset

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

df = pd.read_csv('AirPassengers.csv')

2. Convert Month Column to DateTime

df['Month'] = pd.to_datetime(df['Month'])
df.set_index('Month', inplace=True)

3. Plotting the Data

plt.plot(df)
plt.show()

4. Applying ARIMA

First, we’ll fit the model by calling ARIMA’s function and passing in the p, d, and q parameters and the training data. Then, we’ll call the fit() function to fit the model to the Train dataset.

model = ARIMA(df, order=(1,1,1))
model_fit = model.fit(disp=0)

5. Predicting the Future Data

Finally, we’ll use the predict() function on the ARIMA model to make predictions on the Test dataset.

start_index = df.index.get_loc('1958-01-01')
end_index = df.index.get_loc('1960-12-01')
forecast = model_fit.predict(start=start_index, end=end_index)

Model Evaluation

We are using Root Mean Square Error (RMSE) to evaluate the performance of the ARIMA model. RMSE is the square root of the average of the squared differences between the forecasts and the actual values.

test = df['1958-01-01':'1960-12-01']
mse = mean_squared_error(test, forecast)
rmse = np.sqrt(mse)
print('RMSE: ', rmse)

Conclusion

ARIMA is a powerful and flexible model for time series forecasting. It’s best suited for long-term forecasts with data showing clear patterns and trends and seasonal or cyclic behaviors. However, it can struggle with sudden, unexpected variations, or when the structure of the time series abruptly changes.

Bear in mind that it’s always crucial to fully understand the underlying assumptions behind any statistical model and ensure that the data you’re working with satisfy those assumptions. Happy Time Series Forecasting!

This concludes our tutorial on Time Series Forecasting with ARIMA in Python for PythonTimes.com. Whether you are a beginner striving to delve into the world of time series analysis or an experienced data analyst seeking to hone your skills, we hope you found this tutorial informative and easy to follow. For any queries or feedback, feel free to drop us a line in the comment section below.

Share this article:

Leave a Comment