Time Series Forecasting with Prophet in Python
Time series forecasting can be a challenge as there are several differing strategies you could leverage and several differing hyperparameters for every strategy.
The Prophet Library is an open-source library developed for making predictions for univariate time series datasets. It is simple to leverage and developed to automatically identify a good set of hyperparameters for the model in an attempt to make skilful predictions for data with trends and seasonal structure by default.
In this guide, you will find out how to leverage the Facebook Prophet library for time series forecasting.
In this guideline, you will find out how to leverage the Facebook Prophet library for time series forecasting.
After going through this guide, you will be aware of:
- Prophet is an open-source library produced by Facebook and developed for automatic forecasting of univariate time series data.
- How to fit Prophet models and leverage them to make in-sample and out-of-sample forecasts.
- How to assess a Prophet model on a hold-out dataset.
Tutorial Summarization
This guide is subdivided into three portions, which are:
- Prophet Forecasting Library
- Car Sales Dataset
- Load and summarize dataset
- Load and plot dataset
- Forecast vehicle sales with Prophet
- Fit Prophet Model
- Make an in-sample forecast
- Make an out-of-sample forecast
- Manually assess forecast model
Prophet Forecasting Library
Prophet, or “Facebook Prophet” is an open-source library for univariate (a single variable) time series forecasting produced by Facebook.
Prophet implements what they reference to as an additive time series forecasting model, and the implementation is compatible with seasonality, trends, and holidays.
Implements a process for forecasting time series data on the basis of an additive model where non-linear trends are fitted with annual, weekly, and daily seasonality, plus holiday impacts.
It is developed to be simple and completely automated, for example, point it at a time series and obtain a forecast. As such, it is targeted for internal company utilization, like prediction of sales, capacity, etc.
For a great overview of Prophet and its capacities, take a peek at the post:
Prophet – forecasting at scale, 2017
The library furnishes dual interfaces, which includes R and Python. We will concentrate on the Python interface in this tutorial.
The first step is to setup the Prophet library leveraging Pip, as follows:
sudo pip install fbprophet
Then, we can confirm that the library was setup in a correct manner.
To do this, we can import the library and print the version number in Python. The full instance is detailed below:
# check prophet version
import fbprophet
# print version number
print(‘Prophet %s’ % fbprophet.__version__)
Running the instance prints the setup version of Prophet.
You should have the same version or higher.
Prophet 0.5
Now that we have Prophet setup, let’s choose a dataset we can leverage to explore leveraging the library.
Car Sales Dataset
We will leverage the monthly car sales dataset.
It is a conventional univariate time series dataset that contains both a trend and seasonality. The dataset contains 108 months of data and a naïve persistence forecast and accomplish a mean absolute error of approximately 3,235 sales, furnishing a lower error limit.
There is no requirement to download the dataset as we will download it automatically as part of every instance.
Monthly car sales dataset (CSV)
Monthly car sales dataset description
Load and summarize dataset
To start with, let’s load and summarize the dataset.
Prophet needs data to be in Pandas DataFrames. Thus, we will load and summarize the data leveraging Pandas.
We can load the data straight from the URL by calling the read_csv() Pandas function, then summarize the shape (number of rows and columns) of the data and view the initial first few rows of data.
The full instance is detailed below.
# load the car sales dataset
from pandas import read_csv
# load data
path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’
df = read_csv(path, header=0)
# summarize shape
print(df.shape)
# show first few rows
print(df.head())
Running the instance first reports the number of rows and columns, then details the initial five rows of data.
We can observe that as we expected, there are 9 years worth of information and dual columns. The first column is the date and the second is the number of sales.
Observe that the first column in the output is a row index and is not a part of the dataset, just a beneficial tool that Pandas leverages to order rows.
(108, 2)
Month Sales
0 1960-01 6550
1 1960-02 8728
2 1960-03 12026
3 1960-04 14395
4 1960-05 14587
Load and Plot Dataset
A time-series dataset does not make much sense to us till we plot it.
Plotting a time series assists us to actually observe if there is a trend, a seasonal cycle, outliers, and more. It provides us a feel for the data.
We can plot the data with ease in Pandas by calling the plot() function on the DataFrame.
The complete instance is detailed below.
# load and plot the car sales dataset
from pandas import read_csv
from matplotlib import pyplot
# load data
path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’
df = read_csv(path, header=0)
# plot the time series
df.plot()
pyplot.show()
Running the instance develops a plot of the time series.
We can clearly observe the trend in sales over time and a monthly seasonal pattern to the sales. These are patterns we expect the forecast model to take into account.
Now that we are acquainted with the dataset, let’s look into how we can leverage the Prophet library to make predictions.
Forecast car sales with Prophet
In this section, we will look into leveraging the Prophet to forecast the car sales dataset.
Let’s begin by fitting a model on the dataset.
Fit Prophet Model
To leverage Prophet for forecasting, first, a Prophet() object is defined and configured, then it is fitted on the dataset by calling the fit() function and passing the data.
The Prophet() object takes arguments to setup the variant of model you want, like the variant of growth, the variant of seasonality, and more. By default, the model will work hard to figure out almost everything automatically.
The fit() function takes a DataFrame of time series data. The DataFrame must have a particular format. The first column must have the name ‘ds’ and contain the date-times. The second column must possess the name ‘y’ and contain the observations.
This implies we alter the column names in the dataset. It also needs that the first column be converted to date-time objects, if they are not already (for example, this can be down as aspect of the loading dataset with the correct arguments to read_csv).
For instance, we can alter our loaded car sales dataset to have this expected structure, as follows:
…
# prepare expected column names
df.columns = [‘ds’, ‘y’]
df[‘ds’]= to_datetime(df[‘ds’])
The complete instance of fitting a Prophet model on the car sales dataset is detailed below.
# fit prophet model on the car sales dataset
from pandas import read_csv
from pandas import to_datetime
from fbprophet import Prophet
# load data
path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’
df = read_csv(path, header=0)
# prepare expected column names
df.columns = [‘ds’, ‘y’]
df[‘ds’]= to_datetime(df[‘ds’])
# define the model
model = Prophet()
# fit the model
model.fit(df)
Running the instance loads the dataset, preps the DataFrame in the expected format, and fits a Prophet model.
By default, the library furnishes a ton of verbose output during the fitting process. We think it’s bad practice in general as it trains developers to ignore output.
Nonetheless, the output summarizes what occurred during the model fitting process, particularly the optimization procedures that ran.
We will not reproduce this output in subsequent sections when we fit the model.
Now, let’s make a forecast.
Make an In-Sample Forecast
It can be useful to make a prediction on historical data.
That is, we can make a prediction on data leveraged as input in training the model. Ideally, the model has observed the data prior and would make an ideal prediction.
Nonetheless, this is not the scenario as the model attempts to generalize across all scenarios in the data.
This is referred to as making an in-sample (in training set sample) forecast and reviewing the outcomes can provide insights into how good the model is. That is, how well it learned the training data.
A prediction is made by calling the predict() function and passing a DataFrame that contains one column named ‘ds’ and rows with date-times for all the intervals to be forecasted.
There are several ways to develop this “forecast” DataFrame. In this scenario, we will loop over one year of dates, for example, the previous 12 months in the dataset, and develop a string for every month. We will then convert the listing of dates into a DataFrame and convert the string values into date-time objects.
…
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = ‘1968-%02d’ % i
future.append([date])
future = DataFrame(future)
future.columns = [‘ds’]
future[‘ds’]= to_datetime(future[‘ds’])
This DataFrame can then be furnished to the predict() function to calculate a forecast.
The outcome of the predict() function is a DataFrame that consists of several columns. Probably the most critical columns are the forecast date time (‘ds’) the forecasted value (‘yhat’) and the lower and upper bounds on the predicted value (‘yhat_lower’ and ‘yhat_upper’) that furnish uncertainty of the forecast.
For instance, we can print the initial few forecasts as follows:
…
# summarize the forecast
print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())
Prophet also furnishes a built-in tool for visualization of the forecast in the context of the training dataset.
This can be accomplished by calling the plot() function on the model and passing it as an outcome of DataFrame. It will develop a plot of the training dataset and overlay the forecast with the upper and lower bounds for the prediction dates.
…
print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())
# plot forecast
model.plot(forecast)
pyplot.show()
Connecting this all together, a full instance of making an in-sample prediction is detailed below.
# make an in-sample forecast
from pandas import read_csv
from pandas import to_datetime
from pandas import DataFrame
from fbprophet import Prophet
from matplotlib import pyplot
# load data
path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’
df = read_csv(path, header=0)
# prepare expected column names
df.columns = [‘ds’, ‘y’]
df[‘ds’]= to_datetime(df[‘ds’])
# define the model
model = Prophet()
# fit the model
model.fit(df)
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = ‘1968-%02d’ % i
future.append([date])
future = DataFrame(future)
future.columns = [‘ds’]
future[‘ds’]= to_datetime(future[‘ds’])
# use the model to make a forecast
forecast = model.predict(future)
# summarize the forecast
print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())
# plot forecast
model.plot(forecast)
pyplot.show()
Running the instance predicts the last year’s of the dataset.
The first five months of the forecast are reported and we can observe that values are not too differing from the actual sales values in the dataset.
ds yhat yhat_lower yhat_upper
0 1968-01-01 14364.866157 12816.266184 15956.555409
1 1968-02-01 14940.687225 13299.473640 16463.811658
2 1968-03-01 20858.282598 19439.403787 22345.747821
3 1968-04-01 22893.610396 21417.399440 24454.642588
4 1968-05-01 24212.079727 22667.146433 25816.191457
Then, a plot is developed, we can observe the training data are indicated as black dots and the prediction is a blue line with upper and lower bounds in a blue shaded area.
We can observe that the predicted 12 months is a good match for the actual observations, particularly when the bounds are taken into account.
Make an Out-of-Sample Forecast
In practice, we really want for a forecast model to make a prediction beyond the training data.
This is referred to as an out-of-sample forecast.
We can accomplish this in the same manner as an in-sample forecast and merely mention a differing forecast period.
In this scenario, a period beyond the end of the training dataset, beginning 1969-01.
…
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = ‘1969-%02d’ % i
future.append([date])
future = DataFrame(future)
future.columns = [‘ds’]
future[‘ds’]= to_datetime(future[‘ds’])
Connecting this together, the complete instance is detailed below.
# make an out-of-sample forecast
from pandas import read_csv
from pandas import to_datetime
from pandas import DataFrame
from fbprophet import Prophet
from matplotlib import pyplot
# load data
path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’
df = read_csv(path, header=0)
# prepare expected column names
df.columns = [‘ds’, ‘y’]
df[‘ds’]= to_datetime(df[‘ds’])
# define the model
model = Prophet()
# fit the model
model.fit(df)
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = ‘1969-%02d’ % i
future.append([date])
future = DataFrame(future)
future.columns = [‘ds’]
future[‘ds’]= to_datetime(future[‘ds’])
# use the model to make a forecast
forecast = model.predict(future)
# summarize the forecast
print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())
# plot forecast
model.plot(forecast)
pyplot.show()
Running the instance makes an out-of-the-sample forecast for the car sales data.
The initial five rows of the forecasted are printed, even though it is difficult to obtain an idea of whether they are sensible or not.
ds yhat yhat_lower yhat_upper
0 1969-01-01 15406.401318 13751.534121 16789.969780
1 1969-02-01 16165.737458 14486.887740 17634.953132
2 1969-03-01 21384.120631 19738.950363 22926.857539
3 1969-04-01 23512.464086 21939.204670 25105.341478
4 1969-05-01 25026.039276 23544.081762 26718.820580
A plot is developed to assist us in evaluating the forecast in the context of the training data.
The new one-year forecast does look sensible, at least by eye.
Manually Assess Forecast Model
It is crucial to develop an objective estimate of a forecast model’s performance.
This can be accomplished by holding some data back from the model, like for the previous 1 year. Then, fitting the model on the first portion of the data, leveraging it to make forecasts on the held-back portion, and calculating and error measure, like the mean absolute error throughout the forecasts. E.g. a simulated out-of-sample forecast.
We can perform this with the samples data by developing a new DataFrame for training with the previous year removed.
…
# create test dataset, remove last 12 months
train = df.drop(df.index[-12:])
print(train.tail())
A prediction can then be made on the previous years of date-times.
We can then recover the forecast values and the expected values from the original dataset and calculate a mean absolute error metric leveraging the sci-kit learn library.
…
# calculate MAE between expected and predicted values for december
y_true = df[‘y’][-12:].values
y_pred = forecast[‘yhat’].values
mae = mean_absolute_error(y_true, y_pred)
print(‘MAE: %.3f’ % mae)
It can also be beneficial to plot the expected vs. predicted values to observe how well the out-of-sample prediction matches the known values.
…
# plot expected vs actual
pyplot.plot(y_true, label=’Actual’)
pyplot.plot(y_pred, label=’Predicted’)
pyplot.legend()
pyplot.show()
Connecting this together, the instance below demonstrates how to assess a Prophet model on a hold-out dataset.
# evaluate prophet time series forecasting model on hold out dataset
from pandas import read_csv
from pandas import to_datetime
from pandas import DataFrame
from fbprophet import Prophet
from sklearn.metrics import mean_absolute_error
from matplotlib import pyplot
# load data
path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’
df = read_csv(path, header=0)
# prepare expected column names
df.columns = [‘ds’, ‘y’]
df[‘ds’]= to_datetime(df[‘ds’])
# create test dataset, remove last 12 months
train = df.drop(df.index[-12:])
print(train.tail())
# define the model
model = Prophet()
# fit the model
model.fit(train)
# define the period for which we want a prediction
future = list()
for i in range(1, 13):
date = ‘1968-%02d’ % i
future.append([date])
future = DataFrame(future)
future.columns = [‘ds’]
future[‘ds’] = to_datetime(future[‘ds’])
# use the model to make a forecast
forecast = model.predict(future)
# calculate MAE between expected and predicted values for december
y_true = df[‘y’][-12:].values
y_pred = forecast[‘yhat’].values
mae = mean_absolute_error(y_true, y_pred)
print(‘MAE: %.3f’ % mae)
# plot expected vs actual
pyplot.plot(y_true, label=’Actual’)
pyplot.plot(y_pred, label=’Predicted’)
pyplot.legend()
pyplot.show()
Running the instance first reports the final few rows of the training dataset.
It confirms the training ends in the last month of 1967 and 1968 will be leveraged as the hold-out dataset.
2 3 4 5 6 | ds y 91 1967-08-01 13434 92 1967-09-01 13598 93 1967-10-01 17187 94 1967-11-01 16119 95 1967-12-01 13713 |
Then, a mean absolute error is calculated for the forecast period.
In this scenario, we can observe that the error is approximately 1,336 sales, which is a lot lesser (better) than a naïve persistence model that accomplishes an error of 3,235 sales over the same period.
MAE: 1336.814
Lastly, a plot is developed contrasting the actual vs predicted values. In this scenario, we can observe that the forecast is a good fit. The model possesses skill and forecast that appears sensible.
The Prophet Library also furnishes tools to automatically assess models and plot outcomes, even though those tools don’t seem to work well with data above one day in resolution.
Further Reading
This section furnishes additional resources on the subject if you are seeking to delve deeper.
- Prophet Homepage
- Prophet GitHub Project
- Prophet API Documentation
- Prophet: forecasting at scale, 2017
- Forecasting at scale, 2017
- Car Sales Dataset
- Package ‘prophet’, R Documentation
Conclusion
In this guide, you found out how to leverage the Facebook Prophet library for time series forecasting.
Particularly, you learned:
- Prophet is an open-source library produced by Facebook and developed for automatic forecasting of univariate time series data.
- How to fit Prophet models and leverage them to make in-sample and out-of-sample forecasts.
- How to assess a Prophet model on a hold-out dataset.