Time Series Forecasting with ARIMA in Python
Time series forecasting is a crucial component of many important fields, from predicting stock prices to weather forecasting and beyond. One of the most popular time series forecasting methods is known as ARIMA, an acronym for AutoRegressive Integrated Moving Average. In this tutorial, we will explore the fundamentals of ARIMA and how to implement it using Python.

Table of Contents
- Understanding Time Series
- What is ARIMA?
- How ARIMA Works
- Implementing ARIMA in Python
- Model Evaluation
- Conclusion
Understanding Time Series
A time series is a sequence of data points collected or recorded at regular time intervals. It’s essential to understand that these data points are time-dependent, meaning the order does matter, unlike many Machine Learning problem scenarios.
Let’s quickly understand the three main components of time series data:
- Trend: Shows a consistent upward or downward slope of your data points over time.
- Seasonality: Shows clear patterns of changes at regular intervals.
- Noise: Fluctuations in the data that don’t seem to follow a pattern or regularity.
What is ARIMA?
ARIMA stands for Autoregressive Integrated Moving Average. This statistical methodology is widely used for time series analysis to understand the data or predict future points in the series. ARIMA has been traditionally considered effective in contexts where data show evidence of non-stationarity.
Here’s the breakdown of ARIMA:
AR: Autoregression The model leverages the dependent relationship between a data point and its predecessors.
I: Integrated The use of differencing raw observations to allow for the time series to achieve stationarity.
MA: Moving Average The model makes use of the dependency between a data observation and a residual error from the moving average model applied to preceding data points.
How ARIMA Works
The ARIMA forecasting for a stationary time series is a linear equation, where the predictors depend on the parameters (p,d,q) of the ARIMA model:
p: The order of the autoregressive part. d: The number of non-seasonal differences. q: The order of the moving average part.
The main required step for applying ARIMA is to ensure that the time series is stationary, i.e., the series’ properties do not depend on the time at which the series is observed. To satisfy this, we can use transformations like differencing and logging.
Also, in a nutshell, an ARIMA model aims to find the best fit that describes the autocorrelations in a time series data.
Implementing ARIMA in Python
We will use the Python statsmodels
package to apply ARIMA. The dataset we’ll be using is the Air Passengers dataset that provides monthly totals of a US airline passengers from 1949 to 1960.
Here are the steps to follow:
1. Import Libraries and Load Dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
df = pd.read_csv('AirPassengers.csv')
2. Convert Month Column to DateTime
df['Month'] = pd.to_datetime(df['Month'])
df.set_index('Month', inplace=True)
3. Plotting the Data
plt.plot(df)
plt.show()
4. Applying ARIMA
First, we’ll fit the model by calling ARIMA’s function and passing in the p, d, and q parameters and the training data. Then, we’ll call the fit() function to fit the model to the Train dataset.
model = ARIMA(df, order=(1,1,1))
model_fit = model.fit(disp=0)
5. Predicting the Future Data
Finally, we’ll use the predict()
function on the ARIMA model to make predictions on the Test dataset.
start_index = df.index.get_loc('1958-01-01')
end_index = df.index.get_loc('1960-12-01')
forecast = model_fit.predict(start=start_index, end=end_index)
Model Evaluation
We are using Root Mean Square Error (RMSE) to evaluate the performance of the ARIMA model. RMSE is the square root of the average of the squared differences between the forecasts and the actual values.
test = df['1958-01-01':'1960-12-01']
mse = mean_squared_error(test, forecast)
rmse = np.sqrt(mse)
print('RMSE: ', rmse)
Conclusion
ARIMA is a powerful and flexible model for time series forecasting. It’s best suited for long-term forecasts with data showing clear patterns and trends and seasonal or cyclic behaviors. However, it can struggle with sudden, unexpected variations, or when the structure of the time series abruptly changes.
Bear in mind that it’s always crucial to fully understand the underlying assumptions behind any statistical model and ensure that the data you’re working with satisfy those assumptions. Happy Time Series Forecasting!
This concludes our tutorial on Time Series Forecasting with ARIMA in Python for PythonTimes.com. Whether you are a beginner striving to delve into the world of time series analysis or an experienced data analyst seeking to hone your skills, we hope you found this tutorial informative and easy to follow. For any queries or feedback, feel free to drop us a line in the comment section below.