>Business >Time Series Prediction with deep learning in Keras ### Time Series Prediction with deep learning in Keras

Time series prediction is a tough problem both to frame and to tackle within machine learning.

In this blog article by AICorespot, you will find out how to develop neural network models for time series prediction in Python leveraging the Keras deep learning library.

After going through this post, you will be aware of:

• About the airline passengers univariate time series prediction problem.
• How to phrase time series prediction as a regression problem and generate a neural network model for it.
• How to frame time series prediction with a time lag and generate a neural network model for it.

Problem Description

The problem we are taking a look at in this post is the international airline passengers prediction problem.

This is a problem where provided a year and a month, the activity is to forecast the number of international airline passengers in units of 1,000. The data ranges from January 1949 to December 1960 or 12 years, with 144 observations.

Below is a sample of the starting few lines of the file.

 12345 “Month”,”Passengers”“1949-01”,112“1949-02”,118“1949-03”,132“1949-04”,129

We can load this dataset in a simple manner by leveraging the Pandas library. We are not concerned with the date, provided that every observation is separated by the same interval of a single month. Thus, when we load the dataset we can exclude the starting column.

Upon loading, we can easily plot the entire dataset. The code to load and plot the dataset is detailed below.

 12345 import pandasimport matplotlib.pyplot as pltdataset = pandas.read_csv(‘airline-passengers.csv’, usecols=, engine=’python’)plt.plot(dataset)plt.show()

You can observe an upward trend in the plot.

You can additionally observe some periodicity to the dataset that likely corresponds to the northern hemisphere summer holiday period. We are going to keep things simple and operate with the data as-is.

Typically, it is a good idea to look into several data prep strategies to rescale the data and make it stationary.

Multilayer Perceptron Regression

We wish to phrase the time series prediction problem as a regression problem.

That is, provided the number of passengers (in units of thousands) this month, what is the number of passengers in the upcoming month.

We can author a simple function to translate our singular column of data into a two-column dataset. The first column consisting of this month’s (t) passenger count and the second column consisting next months (t+1) passenger count, to be forecasted.

Prior to getting started, let’s initially import all of the functions and classes we intend to leverage. This goes by the assumption that there is an operational SciPy environment with the Keras deep learning library installed.

 12345 import numpyimport matplotlib.pyplot as pltimport pandasfrom keras.models import Sequentialfrom keras.layers import Dense

We can additionally leverage the code from the prior section to load the dataset as a Pandas dataframe. We can then extract the NumPy array from the dataframe and translate the integer to values to floating point values which are more apt for modelling with a neural network.

 12345 …# load the datasetdataframe = pandas.read_csv(‘airline-passengers.csv’, usecols=, engine=’python’)dataset = dataframe.valuesdataset = dataset.astype(‘float32’)

After we model our data and estimate the skill of our model on the training dataset, we are required to get an idea of the skill of the model on new unobserved data. For a normal classification or regression problem we would perform this leveraging cross validation.

With time series data, the sequence of values is critical. A simple strategy that we can leverage is to split the ordered dataset into train and test datasets. The code below calculates the index of the split point and separates the information into the training datasets with 67% of the observations that we can leverage to train our model, leaving the pending 33% for evaluation of the model.

 123456 …# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) – train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]print(len(train), len(test))

Now we can go about defining a function to develop a new dataset as detailed above. The function takes two arguments, the dataset which is NumPy array that we wish to convert into a dataset and the look_back which is the number of prior time steps to leverage as input variables to forecast the next time period, in this scenario, defaulted to 1.

This default will develop a dataset where X is the number of passengers at a provided time (t) and Y is the number of passengers at the next time (t+1)

It can be configured and we will look at developing a differently shaped dataset in the subsequent section.

 123456789 …# convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1):dataX, dataY = [], []for i in range(len(dataset)-look_back-1):a = dataset[i:(i+look_back), 0]dataX.append(a)dataY.append(dataset[i + look_back, 0])return numpy.array(dataX), numpy.array(dataY)

Let’s take a peek at the impact of this function on the first few rows of the dataset.

 123456 X                                  Y112                             118118                             132132                             129129                             121121                             135

If you contrast these first five rows to the original dataset sample listed in the prior section, you can observe the X=t and Y=t+1 pattern in the numbers.

Let’s leverage this function to prep the train and evaluate datasets ready for modelling.

 12345 …# reshape into X=t and Y=t+1look_back = 1trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)

We can now fit a Multilayer Perceptron Model to the training data.

We leverage a simple network with 1 input, 1 hidden layer with 8 neurons and an output layer. The model is fitted leveraging mean squared error, which if we take the square root provides us an error score in the units of the dataset.

We attempted a few rough parameters and settled on the configuration below, but by no means is the network listed optimized.

 1234567 …# create and fit Multilayer Perceptron modelmodel = Sequential()model.add(Dense(8, input_dim=look_back, activation=’relu’))model.add(Dense(1))model.compile(loss=’mean_squared_error’, optimizer=’adam’)model.fit(trainX, trainY, epochs=200, batch_size=2, verbose=2)

After the model is fitted, we can estimate the performance of the model on the train and evaluate datasets. This will provide us a point of comparison for new models.

 123456 …# Estimate model performancetrainScore = model.evaluate(trainX, trainY, verbose=0)print(‘Train Score: %.2f MSE (%.2f RMSE)’ % (trainScore, math.sqrt(trainScore)))testScore = model.evaluate(testX, testY, verbose=0)print(‘Test Score: %.2f MSE (%.2f RMSE)’ % (testScore, math.sqrt(testScore)))

Lastly, we can produce forecasts leveraging the model for both the train and test dataset to obtain a visual indication of the ability of the model.

Because of how the dataset was prepped, we must alter the forecasts so that they align on the x-axis with the original dataset. Once prepped, the data is plotted, displaying the original dataset in blue, the forecasts for the train dataset in green the predictions on the unobserved test dataset in red.

 1234567891011121314151617 …# generate predictions for trainingtrainPredict = model.predict(trainX)testPredict = model.predict(testX)# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(dataset)plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()

Connecting this all together, the complete instance is detailed below.

 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657 # Multilayer Perceptron to Predict International Airline Passengers (t+1, given t, t-1, t-2)import numpyimport matplotlib.pyplot as pltfrom pandas import read_csvimport mathfrom keras.models import Sequentialfrom keras.layers import Dense # convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1):dataX, dataY = [], []for i in range(len(dataset)-look_back-1):a = dataset[i:(i+look_back), 0]dataX.append(a)dataY.append(dataset[i + look_back, 0])return numpy.array(dataX), numpy.array(dataY) # load the datasetdataframe = read_csv(‘international-airline-passengers.csv’, usecols=, engine=’python’)dataset = dataframe.valuesdataset = dataset.astype(‘float32′)# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) – train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape datasetlook_back = 3trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)# create and fit Multilayer Perceptron modelmodel = Sequential()model.add(Dense(12, input_dim=look_back, activation=’relu’))model.add(Dense(8, activation=’relu’))model.add(Dense(1))model.compile(loss=’mean_squared_error’, optimizer=’adam’)model.fit(trainX, trainY, epochs=400, batch_size=2, verbose=2)# Estimate model performancetrainScore = model.evaluate(trainX, trainY, verbose=0)print(‘Train Score: %.2f MSE (%.2f RMSE)’ % (trainScore, math.sqrt(trainScore)))testScore = model.evaluate(testX, testY, verbose=0)print(‘Test Score: %.2f MSE (%.2f RMSE)’ % (testScore, math.sqrt(testScore)))# generate predictions for trainingtrainPredict = model.predict(trainX)testPredict = model.predict(testX)# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(dataset)plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()

Running the instance reports model performance.

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Consider executing the instance a few times and contrast the average outcome.

Taking the square root of the performance estimates, we can observe that the model has an average error of 23 passengers (in thousands) on the training dataset and 48 passengers (in thousands) on the test dataset.

 123456789101112131415 …Epoch 195/2000s – loss: 535.3075Epoch 196/2000s – loss: 551.2694Epoch 197/2000s – loss: 543.7834Epoch 198/2000s – loss: 538.5886Epoch 199/2000s – loss: 539.1434Epoch 200/2000s – loss: 533.8347Train Score: 531.71 MSE (23.06 RMSE)Test Score: 2355.06 MSE (48.53 RMSE)

From the plot, we can observe that the model did a pretty weak job of fitting both the training and the test datasets. It essentially forecasted the same input value as the output. Multilayer Perceptron Leveraging the Window Method

We can also phrase the problem so that several recent time steps can be leveraged to make the forecast for the next time step.

This is referred to as the window strategy, and the size of the window is a parameter that can be tuned for every problem.

For instance, provided the current time (t) we wish to forecast the value at the next time in the sequence (t+1), we can leverage the present time (t) as well as the two prior times (t-1 and t-2)

When phrased as a regression problem the input variables are t-2, t-1, t and the output variable is t+1.

The create_dataset() function we authored in the prior section facilitates us to create this formulation of the time series problem by increasing the look_back argument from 1 to 3.

A sample of the dataset with this formulation looks as follows:

 123456 X1             X2             X3             Y112           118           132           129118           132           129           121132           129           121           135129           121           135           148121           135           148           148

We can re-run the instance in the prior section with the bigger window size. We will improve the network capacity to manage the extra information. The first hidden layer is increased to 14 neurons and a second hidden layer is included with 8 neurons. The number of epochs is also increased to 400.

The whole code listing with only the window size change is detailed below for completeness.

 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657 # Multilayer Perceptron to Predict International Airline Passengers (t+1, given t, t-1, t-2)import numpyimport matplotlib.pyplot as pltfrom pandas import read_csvimport mathfrom keras.models import Sequentialfrom keras.layers import Dense # convert an array of values into a dataset matrixdef create_dataset(dataset, look_back=1):dataX, dataY = [], []for i in range(len(dataset)-look_back-1):a = dataset[i:(i+look_back), 0]dataX.append(a)dataY.append(dataset[i + look_back, 0])return numpy.array(dataX), numpy.array(dataY) # load the datasetdataframe = read_csv(‘international-airline-passengers.csv’, usecols=, engine=’python’)dataset = dataframe.valuesdataset = dataset.astype(‘float32′)# split into train and test setstrain_size = int(len(dataset) * 0.67)test_size = len(dataset) – train_sizetrain, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]# reshape datasetlook_back = 3trainX, trainY = create_dataset(train, look_back)testX, testY = create_dataset(test, look_back)# create and fit Multilayer Perceptron modelmodel = Sequential()model.add(Dense(12, input_dim=look_back, activation=’relu’))model.add(Dense(8, activation=’relu’))model.add(Dense(1))model.compile(loss=’mean_squared_error’, optimizer=’adam’)model.fit(trainX, trainY, epochs=400, batch_size=2, verbose=2)# Estimate model performancetrainScore = model.evaluate(trainX, trainY, verbose=0)print(‘Train Score: %.2f MSE (%.2f RMSE)’ % (trainScore, math.sqrt(trainScore)))testScore = model.evaluate(testX, testY, verbose=0)print(‘Test Score: %.2f MSE (%.2f RMSE)’ % (testScore, math.sqrt(testScore)))# generate predictions for trainingtrainPredict = model.predict(trainX)testPredict = model.predict(testX)# shift train predictions for plottingtrainPredictPlot = numpy.empty_like(dataset)trainPredictPlot[:, :] = numpy.nantrainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict# shift test predictions for plottingtestPredictPlot = numpy.empty_like(dataset)testPredictPlot[:, :] = numpy.nantestPredictPlot[len(trainPredict)+(look_back*2)+1:len(dataset)-1, :] = testPredict# plot baseline and predictionsplt.plot(dataset)plt.plot(trainPredictPlot)plt.plot(testPredictPlot)plt.show()

Your outcomes might demonstrate variance provided the stochastic nature of the algorithm or evaluation procedure, or variations in numerical accuracy. Consider running the instance a few times and contrast the average outcome.

Running the instance furnishes the following output.

 1234567891011121314 Epoch 395/4000s – loss: 485.3482Epoch 396/4000s – loss: 479.9485Epoch 397/4000s – loss: 497.2707Epoch 398/4000s – loss: 489.5670Epoch 399/4000s – loss: 490.8099Epoch 400/4000s – loss: 493.6544Train Score: 564.03 MSE (23.75 RMSE)Test Score: 2244.82 MSE (47.38 RMSE)

We can observe that the error was not significantly minimized contrasted to that of the prior section.

Observing the graph, we can observe more structure in the predictions.

Again, the window size and the network architecture were not tuned, this is only a demonstration of how to frame a prediction problem.

Taking the square root of the performance scores we can observe the average error on the training dataset was 23 passengers (in thousands per month) and the average error on the unobserved test set was 47 passengers (in thousands per month.) Conclusion

In this blog article, you found out about how to generate a neural network model for a time series forecasting problem leveraging the Keras deep learning library.

After going through this guide, you are now aware of:

• About the international airline passenger prediction time series dataset.
• How to frame time series prediction problems as a regression problem and develop a neural network model.
• How to leverage the window strategy to frame a time series prediction problem and develop a neural network model.