How to begin with Deep Learning for Computer Vision (Seven day byte-sized course)
Our servers are literally bombarded with digital images from photos, video content, Instagram, YouTube, and in times more recent, live video streaming. Going ‘live’ has taken on new meaning, as the level of intimacy and flexibility afforded by livestreaming has been used for purposes both good, and at times – rather unfortunately so, nefarious.
Processing image data is difficult as it needs drawing upon knowledge from divergent domains like digital signal processing, machine learning, statistical methods, and as of late, deep learning.
Deep learning strategies are phasing-out the traditional and statistical strategies on some challenging computer vision problems with singular and simpler models.
In this brief guide, you will discover how you can begin and confidently produce deep learning for computer vision problems leveraging Python within the span of seven days.
Who is the target audience for this crash course?
Prior to beginning, let’s ensure you’re in the right place.
The list here furnishes some general guidelines as to who this course was developed for.
Don’t worry if you don’t match these points precisely, you might just require to brush up in one area or another to keep up the pace.
You are required to know:
- You need to know to navigate your way around fundamental Python, NumPy, and Keras for deep learning.
You do NOT need to be:
- You are not required to be a mathematical wiz!
- You do not require to be a deep learning expert!
- You do not require to be a computer vision researcher.
This crash course goes by the assumption that you have a fundamental Python 2 or 3 SciPy environment with at least NumPy, Pandas, scikit-learn and Keras 2 setup.
Crash-Course Overview
This crash course is subdivided into seven lessons.
You could finish one lesson each day (recommended) or finish all of the lessons in a single day (hardcore). It is really dependant on the time you have available at your disposal and your level of enthusiasm.
Listed here are the seven lessons that you will help you begin and productive with deep learning for computer vision in Python.
- Lesson 01: Deep Learning and Computer Vision
- Lesson 02: Prepping image data
- Lesson 03: Convolutional Neural Networks
- Lesson 04: Image Classification
- Lesson 05: Train Image Classification Model
- Lesson 06: Image Augmentation
- Lesson 07: Face Detection/Identification
Every lesson could take you anywhere from as short as sixty seconds, or up to half an hour. Take your time and finish the lessons at your own rate of learning. Ensure to ask questions and even post results/outcomes in our discussion section.
The lessons might want you to go off and discover how to do things. We will provide you with hints, however part of the point of every lesson is to force you to learn where to search for help on and about the deep learning, computer vision, and the best-of-breed tools in Python.
Lesson 01: Deep Learning and Computer Vision
In this part of the course, you find out about the promise of deep learning strategies for computer vision.
Computer Vision
Computer Vision, or CV for short, is widely defined as assisting computers to “observe” or extract meaning from digital imagery like photographs and videos.
Researches have been going hard at the problems of assisting computers see for more than half a century, and some amazing accomplishments have been made, like the face detection available in advanced cameras and smartphones.
The issue of comprehending images is not solved, and might never be. This is mainly as the world is complicated and tangled. There are very little rules. And yet we can easily and seamlessly recognize objects, people, and context.
Deep Learning
Deep learning is a subfield of machine learning which concerns itself with algorithms inspired by the structure and function of the brain referred to as artificial neural networks.
An attribute of deep learning is that the performance of this variant of model improves by training it with more instances and by enhancing its depth of representational capacity.
On top of scalability, another often specified advantage of deep learning models is their capability to carry out automatic feature extraction from raw data, also referred to as feature learning.
Promise of Deep Learning for Computer Vision
Deep learning strategies are widespread for computer vision, mainly because they are following up on their promise.
A few of the first large demonstrations of the potency of deep learning were in computer vision, particularly image classification. More lately in object detection and face recognition.
The three critical promises of deep learning for computer vision are as listed:
- The Promise of Feature Learning: That is, that deep learning strategies can automatically learn the features from image information needed by the model, instead of needing that the feature detectors be handcrafted and mentioned by a specialist.
- The Promise of Ongoing Improvement: That is, that the performance of deep learning in computer vision is on the basis of real outcomes and that the enhancements seem to be continuing and probably speeding up.
- The Promise of End-to-End Models: That is, that large end-to-end deep learning models can be fit on massive datasets of images or video providing a more general and better-performing approach.
Computer vision is not “solved” but deep learning is needed to get you to the state-of-the-art on many challenging issues in the domain.
Your Task
For this part of the course, you must conduct research and list five impressive applications of deep learning strategies in the domain of computer vision. In the upcoming lesson, you will find out how to prep image data for modelling.
Lesson 02: Prepping Image Data
In this part of the course, you will find out how to prep imagery data for modelling.
Images are made up off matrices of pixel values.
Pixel values are usually unsigned integers in the range between 0 and 255. Even though these pixel values can be put forth directly to neural network models in their raw format, this can have the outcome of challenges during modelling, like slower than expected training of the model.
Rather, there can be massive advantages in prepping the image pixel value before modelling, like merely scaling pixel values to the range 0-1 to centering and even standardizing the values.
This is referred to as normalization and can’t be carried out directly on a loaded image. The instance below leverages the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values.
To start with, let’s confirm that you have the Pillow Library setup.
Then, download a photograph of Bondi Beach in Sydney, Australia, taken by Isabell Schulz and released under a permissive license. Save the image in your present working directory with the filename ‘bondi_beach.jpg’
Then, we can leverage the Pillow library to load the photograph, confirm the min and max pixel values, normalize the values, and confirm the normalization was carried out.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # example of pixel normalization from numpy import asarray from PIL import Image # load image image = Image.open(‘bondi_beach.jpg’) pixels = asarray(image) # confirm pixel range is 0-255 print(‘Data Type: %s’ % pixels.dtype) print(‘Min: %.3f, Max: %.3f’ % (pixels.min(), pixels.max())) # convert from integers to floats pixels = pixels.astype(‘float32’) # normalize to the range 0-1 pixels /= 255.0 # confirm the normalization print(‘Min: %.3f, Max: %.3f’ % (pixels.min(), pixels.max())) |
Your Task
Your task in this part of the course is to execute the example code on the furnished photograph and report the min and max pixel values prior to and after the normalization.
For bonus points, you can update the instance to standardize the pixel values.
In the upcoming lesson, you will find out information about convolutional neural network models.
Lesson 03: Convolutional Neural Networks
In this part of the course, you will find out how to develop a convolutional neural network leveraging a convolutional layer, pooling layer, and completely connected output layer.
Convolutional Layers
A convolutional is the simple application of a filter to an input that has the outcome of an activation. Repetitive application of the same filter to an input outcomes in a map of activations referred to as a feature map, signifying the locations and strength of an identified feature in an input, like an image.
A convolutional layer can be developed by mentioning both the number of filters to learn and the static size of every filter, often referred to as the kernel shape.
Pooling Layers
Pooling layers furnish a strategy to downsampling feature maps by summarizing the presence of features in patches of the feature map.
Maximum pooling, or max pooling, is a pooling activation that calculates the maximum, or biggest, value in every patch of each feature map.
Classifier Layer
After the features have been extracted, they can be interpreted and leveraged to make a forecast, like classification of the variant of object in a photograph.
This can be accomplished by initially flattening the two-dimensional feature maps, and then including a completely connected output layer. For a binary classification problem, the output layer would have a single node that would forecast a value between 0 and 1 for the two classes.
Convolutional Neural Network
The instance below develops a convolutional neural network that expects grayscale imagery with the square size of 256×256 pixels, with a single convolutional layer with 32 filters, each one with the size of 3×3 pixels, a max pooling layer, and a binary classification output layer.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # cnn with single convolutional, pooling and output layer from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Flatten from keras.layers import Dense # create model model = Sequential() # add convolutional layer model.add(Conv2D(32, (3,3), input_shape=(256, 256, 1))) model.add(MaxPooling2D()) model.add(Flatten()) model.add(Dense(1, activation=’sigmoid’)) model.summary() |
Your Task
Your activity in this lesson is to execute the instance and detail how the shape of an input image would be modified by the convolutional and pooling layers.
For additional points, you could attempt including more convolutional or pooling layers and detail the influence it has on the imagery as it flows across the model.
In the upcoming lesson, you will learn how to leverage a deep convolutional network to categorize photographs of objects.
Lesson 04: Image Classification
In this part of the course, you will find out how to leverage a pre-trained model to categorize photographs of objects. Deep convolutional neural network models might take days, or even weeks, to train on really massive datasets.
A method to short-cut this procedure is to re-leverage the model weights from prior-trained models that were produced for standard computer vision benchmark datasets, like the ImageNet image recognition activities.
The instance below leverages the VGG-16 prior-trained model to categorize photographs of objects into one of 1,000 known reasons.
Download this photo of a dog captured by Justin Morgan and put out under a permissive license. Save it in your present working directory with the name ‘dog.jpg’
The instance below will load the photo and put out a forecast, categorizing the object in the photo.
The initial time you execute the instance, the prior-trained model will have to be downloaded, which spans a few hundred megabytes and take a time out on the basis of the speed of your internet connection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | # example of using a pre-trained model as a classifier from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from keras.applications.vgg16 import preprocess_input from keras.applications.vgg16 import decode_predictions from keras.applications.vgg16 import VGG16 # load an image from file image = load_img(‘dog.jpg’, target_size=(224, 224)) # convert the image pixels to a numpy array image = img_to_array(image) # reshape data for the model image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2])) # prepare the image for the VGG model image = preprocess_input(image) # load the model model = VGG16() # predict the probability across all output classes yhat = model.predict(image) # convert the probabilities to class labels label = decode_predictions(yhat) # retrieve the most likely result, e.g. highest probability label = label[0][0] # print the classification print(‘%s (%.2f%%)’ % (label[1], label[2]*100)) |
Your Task
Your task in this lesson is to carry out the instance and report the outcome.
For bonus points, try executing the instance on another photograph of a typical object.
In the upcoming lesson, you will find out how to fit and evaluate a model for image classification.
Lesson 05: Train Image Classification Model
In this portion of the lesson, you will find out how to train and assess a convolutional neural network for image classification.
The Fashion-MNIST clothing classification issue is a new standard dataset leveraged in computer vision and deep learning.
It is a dataset that is made up of 60,000 small square 28 x 28 pixel grayscale imagery of items of 10 variants of clothing, like shoes, t-shirts, dresses, and more.
The instance below loads the dataset, scales the pixel values, then fits a convolutional neural network on the training dataset and assesses the performance of the network on the evaluation dataset.
The instance will be executed in just a few minutes on an advanced CPU, no GPU is needed.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | # fit a cnn on the fashion mnist dataset from keras.datasets import fashion_mnist from keras.utils import to_categorical from keras.models import Sequential from keras.layers import Conv2D from keras.layers import MaxPooling2D from keras.layers import Dense from keras.layers import Flatten # load dataset (trainX, trainY), (testX, testY) = fashion_mnist.load_data() # reshape dataset to have a single channel trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) testX = testX.reshape((testX.shape[0], 28, 28, 1)) # convert from integers to floats trainX, testX = trainX.astype(‘float32’), testX.astype(‘float32’) # normalize to range 0-1 trainX,testX = trainX / 255.0, testX / 255.0 # one hot encode target values trainY, testY = to_categorical(trainY), to_categorical(testY) # define model model = Sequential() model.add(Conv2D(32, (3, 3), activation=’relu’, kernel_initializer=’he_uniform’, input_shape=(28, 28, 1))) model.add(MaxPooling2D()) model.add(Flatten()) model.add(Dense(100, activation=’relu’, kernel_initializer=’he_uniform’)) model.add(Dense(10, activation=’softmax’)) model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’]) # fit model model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=2) # evaluate model loss, acc = model.evaluate(testX, testY, verbose=0) print(loss, acc) |
Your Task
You task in this part of the course is to execute the instance and report the performance level of the model on the test dataset.
For bonus points, attempt varying the configuration of the model, or attempt saving the model and later loading it and leveraging it to make a forecast on new grayscale photographs of clothing.
Post your discoveries in the discussion section. In the upcoming lesson, you will find out how to leverage image authentication on training data.
Lesson 06: Image Augmentation
In this part of the course, you will find out how to leverage image augmentation.
Image data augmentation is a strategy that can be leveraged to artificially broaden the size of a training dataset by developing altered variants of imagery in the dataset.
Training deep learning neural network models on additional data can have the outcome of more skilful models, and the augmentation strategies can develop variations of the imagery that can enhance the ability of the fit models to generalize what they have learned to new images.
The Keras deep learning neural network library furnishes the capability to fit models leveraging image data augmentation through the ImageDataGenerator class.
Download a photo of a bird by AndYaDontStop, put out under a permissive license. Save it onto your present working directory with the name ‘bird.jpg’
Download a photograph of a bird
The instance below will load the photo as a dataset and leverage image augmentation to develop flipped and rotated variants of the imagery that can be leveraged to train a convolutional neural network model.
[Control]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # example using image augmentation from numpy import expand_dims from keras.preprocessing.image import load_img from keras.preprocessing.image import img_to_array from keras.preprocessing.image import ImageDataGenerator from matplotlib import pyplot # load the image img = load_img(‘bird.jpg’) # convert to numpy array data = img_to_array(img) # expand dimension to one sample samples = expand_dims(data, 0) # create image data augmentation generator datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=90) # prepare iterator it = datagen.flow(samples, batch_size=1) # generate samples and plot for i in range(9): # define subplot pyplot.subplot(330 + 1 + i) # generate batch of images batch = it.next() # convert to unsigned integers for viewing image = batch[0].astype(‘uint32’) # plot raw pixel data pyplot.imshow(image) # show the figure pyplot.show() |
Your Task
Your task in this part of the course is to execute the instance and report the impact that the image augmentation has had on the original image.
For additional points, attempt additional variants of image augmentation, assisted by the ImageDataGenerator class.
In the upcoming lesson, you will find out how to leverage a deep convolutional network to identify faces in photographs.
Lesson 7: Face Detection
In this part of the course, you will find out how to leverage a convolutional neural network for face identification.
Face detection is a trivial problem for humans to find a solution to and solutions have been found to a reasonable degree through classical feature-drive techniques, like the cascade classifier.
More lately, deep learning strategies for humans have accomplished bleeding-edge outcomes on standard face detection datasets. One instance is the Multi-Task Cascade Convolutional Neural Network, or MTCNN for short.
The ipazc/MTCNN project furnishes an open source implementation of the MTCNN that can be setup simply as follows:
sudo pip install mtcnn
Download a photo of an individual on the street taken by Holland and put out under a permissive license. Save it onto your present working directory with the name ‘street.jpg’
Download a photo of a person on the street
The instance below will load the photo and leverage the MTCNN model to identify faces and will plot the photo and draw a box around the first identified face.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | # face detection with mtcnn on a photograph from matplotlib import pyplot from matplotlib.patches import Rectangle from mtcnn.mtcnn import MTCNN # load image from file pixels = pyplot.imread(‘street.jpg’) # create the detector, using default weights detector = MTCNN() # detect faces in the image faces = detector.detect_faces(pixels) # plot the image pyplot.imshow(pixels) # get the context for drawing boxes ax = pyplot.gca() # get coordinates from the first face x, y, width, height = faces[0][‘box’] # create the shape rect = Rectangle((x, y), width, height, fill=False, color=’red’) # draw the box ax.add_patch(rect) # show the plot pyplot.show() |
Your Task
Your activity in this lesson is to execute the instance and detail the outcome.
For additional points, attempt the model on another photograph with several faces and update the code instance to draw a box around each identified face.
Conclusion
Pat yourselves on the back and congratulate yourself on how far you’ve come.
You found out about:
- What computer vision is and the promise and influence that deep learning is having on the domain.
- How to scale the pixel values of image data in order to make them ready for modelling.
- How to produce a convolutional neural network model from the ground up.
- How to leverage a pre-trained model to categorize photos of objects
- How to train a model from the ground up to categorize photographs of clothing.
- How to leverage image augmentation to develop altered copies of photos in your training dataset
- How to leverage a pre-trained deep learning model to identify people’s faces in photos.
This is just the start of your journey with deep learning for computer vision. Keep up the practice and generate your skills.