### Time Series Prediction with deep learning in Keras

Time series prediction is a tough problem both to frame and to tackle within machine learning.

In this blog article by AICorespot, you will find out how to develop neural network models for time series prediction in Python leveraging the Keras deep learning library.

After going through this post, you will be aware of:

- About the airline passengers univariate time series prediction problem.
- How to phrase time series prediction as a regression problem and generate a neural network model for it.
- How to frame time series prediction with a time lag and generate a neural network model for it.

**Problem Description**

The problem we are taking a look at in this post is the international airline passengers prediction problem.

This is a problem where provided a year and a month, the activity is to forecast the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960 or 12 years, with 144 observations.

Below is a sample of the starting few lines of the file.

1 2 3 4 5 | “Month”,”Passengers” “1949-01”,112 “1949-02”,118 “1949-03”,132 “1949-04”,129 |

We can load this dataset in a simple manner by leveraging the Pandas library. We are not concerned with the date, provided that every observation is separated by the same interval of a single month. Thus, when we load the dataset we can exclude the starting column.

Upon loading, we can easily plot the entire dataset. The code to load and plot the dataset is detailed below.

1 2 3 4 5 | import pandas import matplotlib.pyplot as plt dataset = pandas.read_csv(‘airline-passengers.csv’, usecols=[1], engine=’python’) plt.plot(dataset) plt.show() |

You can observe an upward trend in the plot.

You can additionally observe some periodicity to the dataset that likely corresponds to the northern hemisphere summer holiday period.

We are going to keep things simple and operate with the data as-is.

Typically, it is a good idea to look into several data prep strategies to rescale the data and make it stationary.

**Multilayer Perceptron Regression**

We wish to phrase the time series prediction problem as a regression problem.

That is, provided the number of passengers (in units of thousands) this month, what is the number of passengers in the upcoming month.

We can author a simple function to translate our singular column of data into a two-column dataset. The first column consisting of this month’s (t) passenger count and the second column consisting next months (t+1) passenger count, to be forecasted.

Prior to getting started, let’s initially import all of the functions and classes we intend to leverage. This goes by the assumption that there is an operational SciPy environment with the Keras deep learning library installed.

1 2 3 4 5 | import numpy import matplotlib.pyplot as plt import pandas from keras.models import Sequential from keras.layers import Dense |

We can additionally leverage the code from the prior section to load the dataset as a Pandas dataframe. We can then extract the NumPy array from the dataframe and translate the integer to values to floating point values which are more apt for modelling with a neural network.

1 2 3 4 5 | … # load the dataset dataframe = pandas.read_csv(‘airline-passengers.csv’, usecols=[1], engine=’python’) dataset = dataframe.values dataset = dataset.astype(‘float32’) |

After we model our data and estimate the skill of our model on the training dataset, we are required to get an idea of the skill of the model on new unobserved data. For a normal classification or regression problem we would perform this leveraging cross validation.

With time series data, the sequence of values is critical. A simple strategy that we can leverage is to split the ordered dataset into train and test datasets. The code below calculates the index of the split point and separates the information into the training datasets with 67% of the observations that we can leverage to train our model, leaving the pending 33% for evaluation of the model.

1 2 3 4 5 6 | … # split into train and test sets train_size = int(len(dataset) * 0.67) test_size = len(dataset) – train_size train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:] print(len(train), len(test)) |

Now we can go about defining a function to develop a new dataset as detailed above. The function takes two arguments, the dataset which is NumPy array that we wish to convert into a dataset and the look_back which is the number of prior time steps to leverage as input variables to forecast the next time period, in this scenario, defaulted to 1.

This default will develop a dataset where X is the number of passengers at a provided time (t) and Y is the number of passengers at the next time (t+1)

It can be configured and we will look at developing a differently shaped dataset in the subsequent section.

1 2 3 4 5 6 7 8 9 | … # convert an array of values into a dataset matrix def create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY) |

Let’s take a peek at the impact of this function on the first few rows of the dataset.

1 2 3 4 5 6 | X Y 112 118 118 132 132 129 129 121 121 135 |

If you contrast these first five rows to the original dataset sample listed in the prior section, you can observe the X=t and Y=t+1 pattern in the numbers.

Let’s leverage this function to prep the train and evaluate datasets ready for modelling.

1 2 3 4 5 | … # reshape into X=t and Y=t+1 look_back = 1 trainX, trainY = create_dataset(train, look_back) testX, testY = create_dataset(test, look_back) |

We can now fit a Multilayer Perceptron Model to the training data.

We leverage a simple network with 1 input, 1 hidden layer with 8 neurons and an output layer. The model is fitted leveraging mean squared error, which if we take the square root provides us an error score in the units of the dataset.

We attempted a few rough parameters and settled on the configuration below, but by no means is the network listed optimized.

1 2 3 4 5 6 7 | … # create and fit Multilayer Perceptron model model = Sequential() model.add(Dense(8, input_dim=look_back, activation=’relu’)) model.add(Dense(1)) model.compile(loss=’mean_squared_error’, optimizer=’adam’) model.fit(trainX, trainY, epochs=200, batch_size=2, verbose=2) |

After the model is fitted, we can estimate the performance of the model on the train and evaluate datasets. This will provide us a point of comparison for new models.

1 2 3 4 5 6 | … # Estimate model performance trainScore = model.evaluate(trainX, trainY, verbose=0) print(‘Train Score: %.2f MSE (%.2f RMSE)’ % (trainScore, math.sqrt(trainScore))) testScore = model.evaluate(testX, testY, verbose=0) print(‘Test Score: %.2f MSE (%.2f RMSE)’ % (testScore, math.sqrt(testScore))) |

Lastly, we can produce forecasts leveraging the model for both the train and test dataset to obtain a visual indication of the ability of the model.

Because of how the dataset was prepped, we must alter the forecasts so that they align on the x-axis with the original dataset. Once prepped, the data is plotted, displaying the original dataset in blue, the forecasts for the train dataset in green the predictions on the unobserved test dataset in red.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | … # generate predictions for training trainPredict = model.predict(trainX) testPredict = model.predict(testX) # shift train predictions for plotting trainPredictPlot = numpy.empty_like(dataset) trainPredictPlot[:, :] = numpy.nan trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict # shift test predictions for plotting testPredictPlot = numpy.empty_like(dataset) testPredictPlot[:, :] = numpy.nan testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict # plot baseline and predictions plt.plot(dataset) plt.plot(trainPredictPlot) plt.plot(testPredictPlot) plt.show() |

Connecting this all together, the complete instance is detailed below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | # Multilayer Perceptron to Predict International Airline Passengers (t+1, given t, t-1, t-2) import numpy import matplotlib.pyplot as plt from pandas import read_csv import math from keras.models import Sequential from keras.layers import Dense
# convert an array of values into a dataset matrix def create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)
# load the dataset dataframe = read_csv(‘international-airline-passengers.csv’, usecols=[1], engine=’python’) dataset = dataframe.values dataset = dataset.astype(‘float32′) # split into train and test sets train_size = int(len(dataset) * 0.67) test_size = len(dataset) – train_size train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:] # reshape dataset look_back = 3 trainX, trainY = create_dataset(train, look_back) testX, testY = create_dataset(test, look_back) # create and fit Multilayer Perceptron model model = Sequential() model.add(Dense(12, input_dim=look_back, activation=’relu’)) model.add(Dense(8, activation=’relu’)) model.add(Dense(1)) model.compile(loss=’mean_squared_error’, optimizer=’adam’) model.fit(trainX, trainY, epochs=400, batch_size=2, verbose=2) # Estimate model performance trainScore = model.evaluate(trainX, trainY, verbose=0) print(‘Train Score: %.2f MSE (%.2f RMSE)’ % (trainScore, math.sqrt(trainScore))) testScore = model.evaluate(testX, testY, verbose=0) print(‘Test Score: %.2f MSE (%.2f RMSE)’ % (testScore, math.sqrt(testScore))) # generate predictions for training trainPredict = model.predict(trainX) testPredict = model.predict(testX) # shift train predictions for plotting trainPredictPlot = numpy.empty_like(dataset) trainPredictPlot[:, :] = numpy.nan trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict # shift test predictions for plotting testPredictPlot = numpy.empty_like(dataset) testPredictPlot[:, :] = numpy.nan testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict # plot baseline and predictions plt.plot(dataset) plt.plot(trainPredictPlot) plt.plot(testPredictPlot) plt.show() |

Running the instance reports model performance.

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Consider executing the instance a few times and contrast the average outcome.

Taking the square root of the performance estimates, we can observe that the model has an average error of 23 passengers (in thousands) on the training dataset and 48 passengers (in thousands) on the test dataset.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | … Epoch 195/200 0s – loss: 535.3075 Epoch 196/200 0s – loss: 551.2694 Epoch 197/200 0s – loss: 543.7834 Epoch 198/200 0s – loss: 538.5886 Epoch 199/200 0s – loss: 539.1434 Epoch 200/200 0s – loss: 533.8347 Train Score: 531.71 MSE (23.06 RMSE) Test Score: 2355.06 MSE (48.53 RMSE) |

From the plot, we can observe that the model did a pretty weak job of fitting both the training and the test datasets. It essentially forecasted the same input value as the output.

__Multilayer Perceptron Leveraging the Window Method__

We can also phrase the problem so that several recent time steps can be leveraged to make the forecast for the next time step.

This is referred to as the window strategy, and the size of the window is a parameter that can be tuned for every problem.

For instance, provided the current time (t) we wish to forecast the value at the next time in the sequence (t+1), we can leverage the present time (t) as well as the two prior times (t-1 and t-2)

When phrased as a regression problem the input variables are t-2, t-1, t and the output variable is t+1.

The create_dataset() function we authored in the prior section facilitates us to create this formulation of the time series problem by increasing the look_back argument from 1 to 3.

A sample of the dataset with this formulation looks as follows:

1 2 3 4 5 6 | X1 X2 X3 Y 112 118 132 129 118 132 129 121 132 129 121 135 129 121 135 148 121 135 148 148 |

We can re-run the instance in the prior section with the bigger window size. We will improve the network capacity to manage the extra information. The first hidden layer is increased to 14 neurons and a second hidden layer is included with 8 neurons. The number of epochs is also increased to 400.

The whole code listing with only the window size change is detailed below for completeness.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | # Multilayer Perceptron to Predict International Airline Passengers (t+1, given t, t-1, t-2) import numpy import matplotlib.pyplot as plt from pandas import read_csv import math from keras.models import Sequential from keras.layers import Dense
# convert an array of values into a dataset matrix def create_dataset(dataset, look_back=1): dataX, dataY = [], [] for i in range(len(dataset)-look_back-1): a = dataset[i:(i+look_back), 0] dataX.append(a) dataY.append(dataset[i + look_back, 0]) return numpy.array(dataX), numpy.array(dataY)
# load the dataset dataframe = read_csv(‘international-airline-passengers.csv’, usecols=[1], engine=’python’) dataset = dataframe.values dataset = dataset.astype(‘float32′) # split into train and test sets train_size = int(len(dataset) * 0.67) test_size = len(dataset) – train_size train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:] # reshape dataset look_back = 3 trainX, trainY = create_dataset(train, look_back) testX, testY = create_dataset(test, look_back) # create and fit Multilayer Perceptron model model = Sequential() model.add(Dense(12, input_dim=look_back, activation=’relu’)) model.add(Dense(8, activation=’relu’)) model.add(Dense(1)) model.compile(loss=’mean_squared_error’, optimizer=’adam’) model.fit(trainX, trainY, epochs=400, batch_size=2, verbose=2) # Estimate model performance trainScore = model.evaluate(trainX, trainY, verbose=0) print(‘Train Score: %.2f MSE (%.2f RMSE)’ % (trainScore, math.sqrt(trainScore))) testScore = model.evaluate(testX, testY, verbose=0) print(‘Test Score: %.2f MSE (%.2f RMSE)’ % (testScore, math.sqrt(testScore))) # generate predictions for training trainPredict = model.predict(trainX) testPredict = model.predict(testX) # shift train predictions for plotting trainPredictPlot = numpy.empty_like(dataset) trainPredictPlot[:, :] = numpy.nan trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict # shift test predictions for plotting testPredictPlot = numpy.empty_like(dataset) testPredictPlot[:, :] = numpy.nan testPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict # plot baseline and predictions plt.plot(dataset) plt.plot(trainPredictPlot) plt.plot(testPredictPlot) plt.show() |

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Consider running the instance a few times and contrast the average outcome.

Running the instance furnishes the following output.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | Epoch 395/400 0s – loss: 485.3482 Epoch 396/400 0s – loss: 479.9485 Epoch 397/400 0s – loss: 497.2707 Epoch 398/400 0s – loss: 489.5670 Epoch 399/400 0s – loss: 490.8099 Epoch 400/400 0s – loss: 493.6544 Train Score: 564.03 MSE (23.75 RMSE) Test Score: 2244.82 MSE (47.38 RMSE) |

We can observe that the error was not significantly minimized contrasted to that of the prior section.

Observing the graph, we can observe more structure in the predictions.

Again, the window size and the network architecture were not tuned, this is only a demonstration of how to frame a prediction problem.

Taking the square root of the performance scores we can observe the average error on the training dataset was 23 passengers (in thousands per month) and the average error on the unobserved test set was 47 passengers (in thousands per month.)

**Conclusion**

In this blog article, you found out about how to generate a neural network model for a time series forecasting problem leveraging the Keras deep learning library.

After going through this guide, you are now aware of:

- About the international airline passenger prediction time series dataset.
- How to frame time series prediction problems as a regression problem and develop a neural network model.
- How to leverage the window strategy to frame a time series prediction problem and develop a neural network model.