>Business >Time Series Forecasting with Prophet in Python

Time Series Forecasting with Prophet in Python

Time series forecasting can be a challenge as there are several differing strategies you could leverage and several differing hyperparameters for every strategy.

The Prophet Library is an open-source library developed for making predictions for univariate time series datasets. It is simple to leverage and developed to automatically identify a good set of hyperparameters for the model in an attempt to make skilful predictions for data with trends and seasonal structure by default.

In this guide, you will find out how to leverage the Facebook Prophet library for time series forecasting.

In this guideline, you will find out how to leverage the Facebook Prophet library for time series forecasting.

After going through this guide, you will be aware of:

  • Prophet is an open-source library produced by Facebook and developed for automatic forecasting of univariate time series data.
  • How to fit Prophet models and leverage them to make in-sample and out-of-sample forecasts.
  • How to assess a Prophet model on a hold-out dataset.

Tutorial Summarization

This guide is subdivided into three portions, which are:

  1. Prophet Forecasting Library
  2. Car Sales Dataset
    1. Load and summarize dataset
    2. Load and plot dataset
  3. Forecast vehicle sales with Prophet
    1. Fit Prophet Model
    2. Make an in-sample forecast
    3. Make an out-of-sample forecast
    4. Manually assess forecast model

Prophet Forecasting Library

Prophet, or “Facebook Prophet” is an open-source library for univariate (a single variable) time series forecasting produced by Facebook.

Prophet implements what they reference to as an additive time series forecasting model, and the implementation is compatible with seasonality, trends, and holidays.

Implements a process for forecasting time series data on the basis of an additive model where non-linear trends are fitted with annual, weekly, and daily seasonality, plus holiday impacts.

It is developed to be simple and completely automated, for example, point it at a time series and obtain a forecast. As such, it is targeted for internal company utilization, like prediction of sales, capacity, etc.

For a great overview of Prophet and its capacities, take a peek at the post:

Prophet – forecasting at scale, 2017

The library furnishes dual interfaces, which includes R and Python. We will concentrate on the Python interface in this tutorial.

The first step is to setup the Prophet library leveraging Pip, as follows:

sudo pip install fbprophet

 

Then, we can confirm that the library was setup in a correct manner.

To do this, we can import the library and print the version number in Python. The full instance is detailed below:

# check prophet version

import fbprophet

# print version number

print(‘Prophet %s’ % fbprophet.__version__)

 

Running the instance prints the setup version of Prophet.

You should have the same version or higher.

Prophet 0.5

 

Now that we have Prophet setup, let’s choose a dataset we can leverage to explore leveraging the library.

Car Sales Dataset

We will leverage the monthly car sales dataset.

It is a conventional univariate time series dataset that contains both a trend and seasonality. The dataset contains 108 months of data and a naïve persistence forecast and accomplish a mean absolute error of approximately 3,235 sales, furnishing a lower error limit.

There is no requirement to download the dataset as we will download it automatically as part of every instance.

Monthly car sales dataset (CSV)

Monthly car sales dataset description

Load and summarize dataset

To start with, let’s load and summarize the dataset.

Prophet needs data to be in Pandas DataFrames. Thus, we will load and summarize the data leveraging Pandas.

We can load the data straight from the URL by calling the read_csv() Pandas function, then summarize the shape (number of rows and columns) of the data and view the initial first few rows of data.

The full instance is detailed below.

# load the car sales dataset

from pandas import read_csv

# load data

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’

df = read_csv(path, header=0)

# summarize shape

print(df.shape)

# show first few rows

print(df.head())

 

Running the instance first reports the number of rows and columns, then details the initial five rows of data.

We can observe that as we expected, there are 9 years worth of information and dual columns. The first column is the date and the second is the number of sales.

Observe that the first column in the output is a row index and is not a part of the dataset, just a beneficial tool that Pandas leverages to order rows.

(108, 2)

Month  Sales

0  1960-01   6550

1  1960-02   8728

2  1960-03  12026

3  1960-04  14395

4  1960-05  14587

 

Load and Plot Dataset

A time-series dataset does not make much sense to us till we plot it.

Plotting a time series assists us to actually observe if there is a trend, a seasonal cycle, outliers, and more. It provides us a feel for the data.

We can plot the data with ease in Pandas by calling the plot() function on the DataFrame.

The complete instance is detailed below.

# load and plot the car sales dataset

from pandas import read_csv

from matplotlib import pyplot

# load data

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’

df = read_csv(path, header=0)

# plot the time series

df.plot()

pyplot.show()

 

Running the instance develops a plot of the time series.

We can clearly observe the trend in sales over time and a monthly seasonal pattern to the sales. These are patterns we expect the forecast model to take into account.

Now that we are acquainted with the dataset, let’s look into how we can leverage the Prophet library to make predictions.

Forecast car sales with Prophet

In this section, we will look into leveraging the Prophet to forecast the car sales dataset.

Let’s begin by fitting a model on the dataset.

Fit Prophet Model

To leverage Prophet for forecasting, first, a Prophet() object is defined and configured, then it is fitted on the dataset by calling the fit() function and passing the data.

The Prophet() object takes arguments to setup the variant of model you want, like the variant of growth, the variant of seasonality, and more. By default, the model will work hard to figure out almost everything automatically.

The fit() function takes a DataFrame of time series data. The DataFrame must have a particular format. The first column must have the name ‘ds’ and contain the date-times. The second column must possess the name ‘y’ and contain the observations.

This implies we alter the column names in the dataset. It also needs that the first column be converted to date-time objects, if they are not already (for example, this can be down as aspect of the loading dataset with the correct arguments to read_csv).

For instance, we can alter our loaded car sales dataset to have this expected structure, as follows:

# prepare expected column names

df.columns = [‘ds’, ‘y’]

df[‘ds’]= to_datetime(df[‘ds’])

 

The complete instance of fitting a Prophet model on the car sales dataset is detailed below.

# fit prophet model on the car sales dataset

from pandas import read_csv

from pandas import to_datetime

from fbprophet import Prophet

# load data

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’

df = read_csv(path, header=0)

# prepare expected column names

df.columns = [‘ds’, ‘y’]

df[‘ds’]= to_datetime(df[‘ds’])

# define the model

model = Prophet()

# fit the model

model.fit(df)

 

Running the instance loads the dataset, preps the DataFrame in the expected format, and fits a Prophet model.

By default, the library furnishes a ton of verbose output during the fitting process. We think it’s bad practice in general as it trains developers to ignore output.

Nonetheless, the output summarizes what occurred during the model fitting process, particularly the optimization procedures that ran.

We will not reproduce this output in subsequent sections when we fit the model.

Now, let’s make a forecast.

Make an In-Sample Forecast

It can be useful to make a prediction on historical data.

That is, we can make a prediction on data leveraged as input in training the model. Ideally, the model has observed the data prior and would make an ideal prediction.

Nonetheless, this is not the scenario as the model attempts to generalize across all scenarios in the data.

This is referred to as making an in-sample (in training set sample) forecast and reviewing the outcomes can provide insights into how good the model is. That is, how well it learned the training data.

A prediction is made by calling the predict() function and passing a DataFrame that contains one column named ‘ds’ and rows with date-times for all the intervals to be forecasted.

There are several ways to develop this “forecast” DataFrame. In this scenario, we will loop over one year of dates, for example, the previous 12 months in the dataset, and develop a string for every month. We will then convert the listing of dates into a DataFrame and convert the string values into date-time objects.

# define the period for which we want a prediction

future = list()

for i in range(1, 13):

date = ‘1968-%02d’ % i

future.append([date])

future = DataFrame(future)

future.columns = [‘ds’]

future[‘ds’]= to_datetime(future[‘ds’])

 

This DataFrame can then be furnished to the predict() function to calculate a forecast.

The outcome of the predict() function is a DataFrame that consists of several columns. Probably the most critical columns are the forecast date time (‘ds’) the forecasted value (‘yhat’) and the lower and upper bounds on the predicted value (‘yhat_lower’ and ‘yhat_upper’) that furnish uncertainty of the forecast.

For instance, we can print the initial few forecasts as follows:

# summarize the forecast

print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())

 

Prophet also furnishes a built-in tool for visualization of the forecast in the context of the training dataset.

This can be accomplished by calling the plot() function on the model and passing it as an outcome of DataFrame. It will develop a plot of the training dataset and overlay the forecast with the upper and lower bounds for the prediction dates.

print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())

# plot forecast

model.plot(forecast)

pyplot.show()

 

Connecting this all together, a full instance of making an in-sample prediction is detailed below.

# make an in-sample forecast

from pandas import read_csv

from pandas import to_datetime

from pandas import DataFrame

from fbprophet import Prophet

from matplotlib import pyplot

# load data

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’

df = read_csv(path, header=0)

# prepare expected column names

df.columns = [‘ds’, ‘y’]

df[‘ds’]= to_datetime(df[‘ds’])

# define the model

model = Prophet()

# fit the model

model.fit(df)

# define the period for which we want a prediction

future = list()

for i in range(1, 13):

date = ‘1968-%02d’ % i

future.append([date])

future = DataFrame(future)

future.columns = [‘ds’]

future[‘ds’]= to_datetime(future[‘ds’])

# use the model to make a forecast

forecast = model.predict(future)

# summarize the forecast

print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())

# plot forecast

model.plot(forecast)

pyplot.show()

 

Running the instance predicts the last year’s of the dataset.

The first five months of the forecast are reported and we can observe that values are not too differing from the actual sales values in the dataset.

ds          yhat    yhat_lower    yhat_upper

0 1968-01-01  14364.866157  12816.266184  15956.555409

1 1968-02-01  14940.687225  13299.473640  16463.811658

2 1968-03-01  20858.282598  19439.403787  22345.747821

3 1968-04-01  22893.610396  21417.399440  24454.642588

4 1968-05-01  24212.079727  22667.146433  25816.191457

 

Then, a plot is developed, we can observe the training data are indicated as black dots and the prediction is a blue line with upper and lower bounds in a blue shaded area.

We can observe that the predicted 12 months is a good match for the actual observations, particularly when the bounds are taken into account.

Make an Out-of-Sample Forecast

In practice, we really want for a forecast model to make a prediction beyond the training data.

This is referred to as an out-of-sample forecast.

We can accomplish this in the same manner as an in-sample forecast and merely mention a differing forecast period.

In this scenario, a period beyond the end of the training dataset, beginning 1969-01.

# define the period for which we want a prediction

future = list()

for i in range(1, 13):

date = ‘1969-%02d’ % i

future.append([date])

future = DataFrame(future)

future.columns = [‘ds’]

future[‘ds’]= to_datetime(future[‘ds’])

 

Connecting this together, the complete instance is detailed below.

# make an out-of-sample forecast

from pandas import read_csv

from pandas import to_datetime

from pandas import DataFrame

from fbprophet import Prophet

from matplotlib import pyplot

# load data

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’

df = read_csv(path, header=0)

# prepare expected column names

df.columns = [‘ds’, ‘y’]

df[‘ds’]= to_datetime(df[‘ds’])

# define the model

model = Prophet()

# fit the model

model.fit(df)

# define the period for which we want a prediction

future = list()

for i in range(1, 13):

date = ‘1969-%02d’ % i

future.append([date])

future = DataFrame(future)

future.columns = [‘ds’]

future[‘ds’]= to_datetime(future[‘ds’])

# use the model to make a forecast

forecast = model.predict(future)

# summarize the forecast

print(forecast[[‘ds’, ‘yhat’, ‘yhat_lower’, ‘yhat_upper’]].head())

# plot forecast

model.plot(forecast)

pyplot.show()

 

Running the instance makes an out-of-the-sample forecast for the car sales data.

The initial five rows of the forecasted are printed, even though it is difficult to obtain an idea of whether they are sensible or not.

ds          yhat    yhat_lower    yhat_upper

0 1969-01-01  15406.401318  13751.534121  16789.969780

1 1969-02-01  16165.737458  14486.887740  17634.953132

2 1969-03-01  21384.120631  19738.950363  22926.857539

3 1969-04-01  23512.464086  21939.204670  25105.341478

4 1969-05-01  25026.039276  23544.081762  26718.820580

 

A plot is developed to assist us in evaluating the forecast in the context of the training data.

The new one-year forecast does look sensible, at least by eye.

Manually Assess Forecast Model

It is crucial to develop an objective estimate of a forecast model’s performance.

This can be accomplished by holding some data back from the model, like for the previous 1 year. Then, fitting the model on the first portion of the data, leveraging it to make forecasts on the held-back portion, and calculating and error measure, like the mean absolute error throughout the forecasts. E.g. a simulated out-of-sample forecast.

We can perform this with the samples data by developing a new DataFrame for training with the previous year removed.

# create test dataset, remove last 12 months

train = df.drop(df.index[-12:])

print(train.tail())

 

A prediction can then be made on the previous years of date-times.

We can then recover the forecast values and the expected values from the original dataset and calculate a mean absolute error metric leveraging the sci-kit learn library.

# calculate MAE between expected and predicted values for december

y_true = df[‘y’][-12:].values

y_pred = forecast[‘yhat’].values

mae = mean_absolute_error(y_true, y_pred)

print(‘MAE: %.3f’ % mae)

 

It can also be beneficial to plot the expected vs. predicted values to observe how well the out-of-sample prediction matches the known values.

# plot expected vs actual

pyplot.plot(y_true, label=’Actual’)

pyplot.plot(y_pred, label=’Predicted’)

pyplot.legend()

pyplot.show()

 

Connecting this together, the instance below demonstrates how to assess a Prophet model on a hold-out dataset.

# evaluate prophet time series forecasting model on hold out dataset

from pandas import read_csv

from pandas import to_datetime

from pandas import DataFrame

from fbprophet import Prophet

from sklearn.metrics import mean_absolute_error

from matplotlib import pyplot

# load data

path = ‘https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv’

df = read_csv(path, header=0)

# prepare expected column names

df.columns = [‘ds’, ‘y’]

df[‘ds’]= to_datetime(df[‘ds’])

# create test dataset, remove last 12 months

train = df.drop(df.index[-12:])

print(train.tail())

# define the model

model = Prophet()

# fit the model

model.fit(train)

# define the period for which we want a prediction

future = list()

for i in range(1, 13):

date = ‘1968-%02d’ % i

future.append([date])

future = DataFrame(future)

future.columns = [‘ds’]

future[‘ds’] = to_datetime(future[‘ds’])

# use the model to make a forecast

forecast = model.predict(future)

# calculate MAE between expected and predicted values for december

y_true = df[‘y’][-12:].values

y_pred = forecast[‘yhat’].values

mae = mean_absolute_error(y_true, y_pred)

print(‘MAE: %.3f’ % mae)

# plot expected vs actual

pyplot.plot(y_true, label=’Actual’)

pyplot.plot(y_pred, label=’Predicted’)

pyplot.legend()

pyplot.show()

 

Running the instance first reports the final few rows of the training dataset.

It confirms the training ends in the last month of 1967 and 1968 will be leveraged as the hold-out dataset.

 

2

3

4

5

6

           ds      y

91 1967-08-01  13434

92 1967-09-01  13598

93 1967-10-01  17187

94 1967-11-01  16119

95 1967-12-01  13713

 

Then, a mean absolute error is calculated for the forecast period.

In this scenario, we can observe that the error is approximately 1,336 sales, which is a lot lesser (better) than a naïve persistence model that accomplishes an error of 3,235 sales over the same period.

MAE: 1336.814

 

Lastly, a plot is developed contrasting the actual vs predicted values. In this scenario, we can observe that the forecast is a good fit. The model possesses skill and forecast that appears sensible.

The Prophet Library also furnishes tools to automatically assess models and plot outcomes, even though those tools don’t seem to work well with data above one day in resolution.

Further Reading

This section furnishes additional resources on the subject if you are seeking to delve deeper.

  • Prophet Homepage
  • Prophet GitHub Project
  • Prophet API Documentation
  • Prophet: forecasting at scale, 2017
  • Forecasting at scale, 2017
  • Car Sales Dataset
  • Package ‘prophet’, R Documentation

Conclusion

In this guide, you found out how to leverage the Facebook Prophet library for time series forecasting.

Particularly, you learned:

  • Prophet is an open-source library produced by Facebook and developed for automatic forecasting of univariate time series data.
  • How to fit Prophet models and leverage them to make in-sample and out-of-sample forecasts.
  • How to assess a Prophet model on a hold-out dataset.
Add Comment