>Business >How to begin with Deep Learning for Computer Vision (Seven day byte-sized course)

How to begin with Deep Learning for Computer Vision (Seven day byte-sized course)

Our servers are literally bombarded with digital images from photos, video content, Instagram, YouTube, and in times more recent, live video streaming. Going ‘live’ has taken on new meaning, as the level of intimacy and flexibility afforded by livestreaming has been used for purposes both good, and at times – rather unfortunately so, nefarious. 

Processing image data is difficult as it needs drawing upon knowledge from divergent domains like digital signal processing, machine learning, statistical methods, and as of late, deep learning. 

Deep learning strategies are phasing-out the traditional and statistical strategies on some challenging computer vision problems with singular and simpler models. 

In this brief guide, you will discover how you can begin and confidently produce deep learning for computer vision problems leveraging Python within the span of seven days. 

Who is the target audience for this crash course? 

Prior to beginning, let’s ensure you’re in the right place. 

The list here furnishes some general guidelines as to who this course was developed for. 

Don’t worry if you don’t match these points precisely, you might just require to brush up in one area or another to keep up the pace. 

You are required to know: 

  • You need to know to navigate your way around fundamental Python, NumPy, and Keras for deep learning.  

You do NOT need to be: 

  • You are not required to be a mathematical wiz! 
  • You do not require to be a deep learning expert! 
  • You do not require to be a computer vision researcher. 

This crash course goes by the assumption that you have a fundamental Python 2 or 3 SciPy environment with at least NumPy, Pandas, scikit-learn and Keras 2 setup.  

Crash-Course Overview 

This crash course is subdivided into seven lessons. 

You could finish one lesson each day (recommended) or finish all of the lessons in a single day (hardcore). It is really dependant on the time you have available at your disposal and your level of enthusiasm. 

Listed here are the seven lessons that you will help you begin and productive with deep learning for computer vision in Python. 

  • Lesson 01: Deep Learning and Computer Vision 
  • Lesson 02: Prepping image data 
  • Lesson 03: Convolutional Neural Networks 
  • Lesson 04: Image Classification 
  • Lesson 05: Train Image Classification Model 
  • Lesson 06: Image Augmentation 
  • Lesson 07: Face Detection/Identification 

Every lesson could take you anywhere from as short as sixty seconds, or up to half an hour. Take your time and finish the lessons at your own rate of learning. Ensure to ask questions and even post results/outcomes in our discussion section. 

The lessons might want you to go off and discover how to do things. We will provide you with hints, however part of the point of every lesson is to force you to learn where to search for help on and about the deep learning, computer vision, and the best-of-breed tools in Python. 

Lesson 01: Deep Learning and Computer Vision 

In this part of the course, you find out about the promise of deep learning strategies for computer vision. 

Computer Vision 

Computer Vision, or CV for short, is widely defined as assisting computers to “observe” or extract meaning from digital imagery like photographs and videos. 

Researches have been going hard at the problems of assisting computers see for more than half a century, and some amazing accomplishments have been made, like the face detection available in advanced cameras and smartphones. 

The issue of comprehending images is not solved, and might never be. This is mainly as the world is complicated and tangled. There are very little rules. And yet we can easily and seamlessly recognize objects, people, and context. 

Deep Learning 

Deep learning is a subfield of machine learning which concerns itself with algorithms inspired by the structure and function of the brain referred to as artificial neural networks. 

An attribute of deep learning is that the performance of this variant of model improves by training it with more instances and by enhancing its depth of representational capacity. 

On top of scalability, another often specified advantage of deep learning models is their capability to carry out automatic feature extraction from raw data, also referred to as feature learning. 

Promise of Deep Learning for Computer Vision 

Deep learning strategies are widespread for computer vision, mainly because they are following up on their promise. 

A few of the first large demonstrations of the potency of deep learning were in computer vision, particularly image classification. More lately in object detection and face recognition. 

The three critical promises of deep learning for computer vision are as listed: 

  • The Promise of Feature Learning: That is, that deep learning strategies can automatically learn the features from image information needed by the model, instead of needing that the feature detectors be handcrafted and mentioned by a specialist. 
  • The Promise of Ongoing Improvement: That is, that the performance of deep learning in computer vision is on the basis of real outcomes and that the enhancements seem to be continuing and probably speeding up. 
  • The Promise of End-to-End Models: That is, that large end-to-end deep learning models can be fit on massive datasets of images or video providing a more general and better-performing approach. 

Computer vision is not “solved” but deep learning is needed to get you to the state-of-the-art on many challenging issues in the domain. 

Your Task 

For this part of the course, you must conduct research and list five impressive applications of deep learning strategies in the domain of computer vision. In the upcoming lesson, you will find out how to prep image data for modelling. 

Lesson 02: Prepping Image Data 

In this part of the course, you will find out how to prep imagery data for modelling. 

Images are made up off matrices of pixel values. 

Pixel values are usually unsigned integers in the range between 0 and 255. Even though these pixel values can be put forth directly to neural network models in their raw format, this can have the outcome of challenges during modelling, like slower than expected training of the model. 

Rather, there can be massive advantages in prepping the image pixel value before modelling, like merely scaling pixel values to the range 0-1 to centering and even standardizing the values. 

This is referred to as normalization and can’t be carried out directly on a loaded image. The instance below leverages the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values. 

To start with, let’s confirm that you have the Pillow Library setup. 

Then, download a photograph of Bondi Beach in Sydney, Australia, taken by Isabell Schulz and released under a permissive license. Save the image in your present working directory with the filename ‘bondi_beach.jpg’ 

Then, we can leverage the Pillow library to load the photograph, confirm the min and max pixel values, normalize the values, and confirm the normalization was carried out. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

# example of pixel normalization 

from numpy import asarray 

from PIL import Image 

# load image 

image = Image.open(‘bondi_beach.jpg’) 

pixels = asarray(image) 

# confirm pixel range is 0-255 

print(‘Data Type: %s’ % pixels.dtype) 

print(‘Min: %.3f, Max: %.3f’ % (pixels.min(), pixels.max())) 

# convert from integers to floats 

pixels = pixels.astype(‘float32’) 

# normalize to the range 0-1 

pixels /= 255.0 

# confirm the normalization 

print(‘Min: %.3f, Max: %.3f’ % (pixels.min(), pixels.max())) 

 

Your Task 

Your task in this part of the course is to execute the example code on the furnished photograph and report the min and max pixel values prior to and after the normalization. 

For bonus points, you can update the instance to standardize the pixel values. 

In the upcoming lesson, you will find out information about convolutional neural network models. 

Lesson 03: Convolutional Neural Networks 

In this part of the course, you will find out how to develop a convolutional neural network leveraging a convolutional layer, pooling layer, and completely connected output layer. 

Convolutional Layers 

A convolutional is the simple application of a filter to an input that has the outcome of an activation. Repetitive application of the same filter to an input outcomes in a map of activations referred to as a feature map, signifying the locations and strength of an identified feature in an input, like an image. 

A convolutional layer can be developed by mentioning both the number of filters to learn and the static size of every filter, often referred to as the kernel shape. 

Pooling Layers 

Pooling layers furnish a strategy to downsampling feature maps by summarizing the presence of features in patches of the feature map. 

Maximum pooling, or max pooling, is a pooling activation that calculates the maximum, or biggest, value in every patch of each feature map. 

Classifier Layer 

After the features have been extracted, they can be interpreted and leveraged to make a forecast, like classification of the variant of object in a photograph. 

This can be accomplished by initially flattening the two-dimensional feature maps, and then including a completely connected output layer. For a binary classification problem, the output layer would have a single node that would forecast a value between 0 and 1 for the two classes. 

Convolutional Neural Network 

The instance below develops a convolutional neural network that expects grayscale imagery with the square size of 256×256 pixels, with a single convolutional layer with 32 filters, each one with the size of 3×3 pixels, a max pooling layer, and a binary classification output layer. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

# cnn with single convolutional, pooling and output layer 

from keras.models import Sequential 

from keras.layers import Conv2D 

from keras.layers import MaxPooling2D 

from keras.layers import Flatten 

from keras.layers import Dense 

# create model 

model = Sequential() 

# add convolutional layer 

model.add(Conv2D(32, (3,3), input_shape=(256, 256, 1))) 

model.add(MaxPooling2D()) 

model.add(Flatten()) 

model.add(Dense(1, activation=’sigmoid’)) 

model.summary() 

 

Your Task 

Your activity in this lesson is to execute the instance and detail how the shape of an input image would be modified by the convolutional and pooling layers. 

For additional points, you could attempt including more convolutional or pooling layers and detail the influence it has on the imagery as it flows across the model. 

In the upcoming lesson, you will learn how to leverage a deep convolutional network to categorize photographs of objects. 

Lesson 04: Image Classification 

In this part of the course, you will find out how to leverage a pre-trained model to categorize photographs of objects. Deep convolutional neural network models might take days, or even weeks, to train on really massive datasets. 

A method to short-cut this procedure is to re-leverage the model weights from prior-trained models that were produced for standard computer vision benchmark datasets, like the ImageNet image recognition activities. 

The instance below leverages the VGG-16 prior-trained model to categorize photographs of objects into one of 1,000 known reasons. 

Download this photo of a dog captured by Justin Morgan and put out under a permissive license. Save it in your present working directory with the name ‘dog.jpg’ 

The instance below will load the photo and put out a forecast, categorizing the object in the photo. 

The initial time you execute the instance, the prior-trained model will have to be downloaded, which spans a few hundred megabytes and take a time out on the basis of the speed of your internet connection. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

# example of using a pre-trained model as a classifier 

from keras.preprocessing.image import load_img 

from keras.preprocessing.image import img_to_array 

from keras.applications.vgg16 import preprocess_input 

from keras.applications.vgg16 import decode_predictions 

from keras.applications.vgg16 import VGG16 

# load an image from file 

image = load_img(‘dog.jpg’, target_size=(224, 224)) 

# convert the image pixels to a numpy array 

image = img_to_array(image) 

# reshape data for the model 

image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2])) 

# prepare the image for the VGG model 

image = preprocess_input(image) 

# load the model 

model = VGG16() 

# predict the probability across all output classes 

yhat = model.predict(image) 

# convert the probabilities to class labels 

label = decode_predictions(yhat) 

# retrieve the most likely result, e.g. highest probability 

label = label[0][0] 

# print the classification 

print(‘%s (%.2f%%)’ % (label[1], label[2]*100)) 

 

Your Task 

Your task in this lesson is to carry out the instance and report the outcome. 

For bonus points, try executing the instance on another photograph of a typical object. 

In the upcoming lesson, you will find out how to fit and evaluate a model for image classification. 

Lesson 05: Train Image Classification Model 

In this portion of the lesson, you will find out how to train and assess a convolutional neural network for image classification. 

The Fashion-MNIST clothing classification issue is a new standard dataset leveraged in computer vision and deep learning. 

It is a dataset that is made up of 60,000 small square 28 x 28 pixel grayscale imagery of items of 10 variants of clothing, like shoes, t-shirts, dresses, and more. 

The instance below loads the dataset, scales the pixel values, then fits a convolutional neural network on the training dataset and assesses the performance of the network on the  evaluation dataset. 

The instance will be executed in just a few minutes on an advanced CPU, no GPU is needed. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

# fit a cnn on the fashion mnist dataset 

from keras.datasets import fashion_mnist 

from keras.utils import to_categorical 

from keras.models import Sequential 

from keras.layers import Conv2D 

from keras.layers import MaxPooling2D 

from keras.layers import Dense 

from keras.layers import Flatten 

# load dataset 

(trainX, trainY), (testX, testY) = fashion_mnist.load_data() 

# reshape dataset to have a single channel 

trainX = trainX.reshape((trainX.shape[0], 28, 28, 1)) 

testX = testX.reshape((testX.shape[0], 28, 28, 1)) 

# convert from integers to floats 

trainX, testX = trainX.astype(‘float32’), testX.astype(‘float32’) 

# normalize to range 0-1 

trainX,testX  = trainX / 255.0, testX / 255.0 

# one hot encode target values 

trainY, testY = to_categorical(trainY), to_categorical(testY) 

# define model 

model = Sequential() 

model.add(Conv2D(32, (3, 3), activation=’relu’, kernel_initializer=’he_uniform’, input_shape=(28, 28, 1))) 

model.add(MaxPooling2D()) 

model.add(Flatten()) 

model.add(Dense(100, activation=’relu’, kernel_initializer=’he_uniform’)) 

model.add(Dense(10, activation=’softmax’)) 

model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’]) 

# fit model 

model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=2) 

# evaluate model 

loss, acc = model.evaluate(testX, testY, verbose=0) 

print(loss, acc) 

 

Your Task 

You task in this part of the course is to execute the instance and report the performance level of the model on the test dataset. 

For bonus points, attempt varying the configuration of the model, or attempt saving the model and later loading it and leveraging it to make a forecast on new grayscale photographs of clothing. 

Post your discoveries in the discussion section. In the upcoming lesson, you will find out how to leverage image authentication on training data. 

Lesson 06: Image Augmentation 

In this part of the course, you will find out how to leverage image augmentation.  

Image data augmentation is a strategy that can be leveraged to artificially broaden the size of a training dataset by developing altered variants of imagery in the dataset. 

Training deep learning neural network models on additional data can have the outcome of more skilful models, and the augmentation strategies can develop variations of the imagery that can enhance the ability of the fit models to generalize what they have learned to new images. 

The Keras deep learning neural network library furnishes the capability to fit models leveraging image data augmentation through the ImageDataGenerator class. 

Download a photo of a bird by AndYaDontStop, put out under a permissive license. Save it onto your present working directory with the name ‘bird.jpg’ 

Download a photograph of a bird 

The instance below will load the photo as a dataset and leverage image augmentation to develop flipped and rotated variants of the imagery that can be leveraged to train a convolutional neural network model. 

 

[Control] 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

# example using image augmentation 

from numpy import expand_dims 

from keras.preprocessing.image import load_img 

from keras.preprocessing.image import img_to_array 

from keras.preprocessing.image import ImageDataGenerator 

from matplotlib import pyplot 

# load the image 

img = load_img(‘bird.jpg’) 

# convert to numpy array 

data = img_to_array(img) 

# expand dimension to one sample 

samples = expand_dims(data, 0) 

# create image data augmentation generator 

datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=90) 

# prepare iterator 

it = datagen.flow(samples, batch_size=1) 

# generate samples and plot 

for i in range(9): 

     # define subplot 

     pyplot.subplot(330 + 1 + i) 

     # generate batch of images 

     batch = it.next() 

     # convert to unsigned integers for viewing 

     image = batch[0].astype(‘uint32’) 

     # plot raw pixel data 

     pyplot.imshow(image) 

# show the figure 

pyplot.show() 

 

Your Task 

Your task in this part of the course is to execute the instance and report the impact that the image augmentation has had on the original image.  

For additional points, attempt additional variants of image augmentation, assisted by the ImageDataGenerator class. 

In the upcoming lesson, you will find out how to leverage a deep convolutional network to identify faces in photographs. 

Lesson 7: Face Detection 

In this part of the course, you will find out how to leverage a convolutional neural network for face identification.  

Face detection is a trivial problem for humans to find a solution to and solutions have been found to a reasonable degree through classical feature-drive techniques, like the cascade classifier. 

More lately, deep learning strategies for humans have accomplished bleeding-edge outcomes on standard face detection datasets. One instance is the Multi-Task Cascade Convolutional Neural Network, or MTCNN for short. 

The ipazc/MTCNN project furnishes an open source implementation of the MTCNN that can be setup simply as follows: 

sudo pip install mtcnn 

Download a photo of an individual on the street taken by Holland and put out under a permissive license. Save it onto your present working directory with the name ‘street.jpg’ 

Download a photo of a person on the street 

The instance below will load the photo and leverage the MTCNN model to identify faces and will plot the photo and draw a box around the first identified face. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

# face detection with mtcnn on a photograph 

from matplotlib import pyplot 

from matplotlib.patches import Rectangle 

from mtcnn.mtcnn import MTCNN 

# load image from file 

pixels = pyplot.imread(‘street.jpg’) 

# create the detector, using default weights 

detector = MTCNN() 

# detect faces in the image 

faces = detector.detect_faces(pixels) 

# plot the image 

pyplot.imshow(pixels) 

# get the context for drawing boxes 

ax = pyplot.gca() 

# get coordinates from the first face 

x, y, width, height = faces[0][‘box’] 

# create the shape 

rect = Rectangle((x, y), width, height, fill=False, color=’red’) 

# draw the box 

ax.add_patch(rect) 

# show the plot 

pyplot.show() 

 

Your Task 

Your activity in this lesson is to execute the instance and detail the outcome. 

For additional points, attempt the model on another photograph with several faces and update the code instance to draw a box around each identified face. 

Conclusion 

Pat yourselves on the back and congratulate yourself on how far you’ve come. 

You found out about: 

  • What computer vision is and the promise and influence that deep learning is having on the domain. 
  • How to scale the pixel values of image data in order to make them ready for modelling. 
  • How to produce a convolutional neural network model from the ground up. 
  • How to leverage a pre-trained model to categorize photos of objects 
  • How to train a model from the ground up to categorize photographs of clothing. 
  • How to leverage image augmentation to develop altered copies of photos in your training dataset 
  • How to leverage a pre-trained deep learning model to identify people’s faces in photos. 

This is just the start of your journey with deep learning for computer vision. Keep up the practice and generate your skills.  

Add Comment