Lab 02 Image Recognition

Lab 02: Image Recognition¶
In this lab session, we will use the gesture classification task as an example to demonstrate how to process image data with deep learning networks. This lab session includes:

Dataset preparation Downloading
Analysis and visualization
Data augmentation

CNN model building From scratch
Transfer learning

Training process Early Stopping
Understanding the learning curve
Layer freezing

Open in google colab ->

Set up TensorFlow¶

First we import some libraries for image processing and utilities as well as TensorFlow. Note that the module “image_dataset_from_directory” is necessary to download our dataset from Google.

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf

from tensorflow.keras.preprocessing import image_dataset_from_directory

# Set the seed value for experiment reproducibility.
tf.random.set_seed(seed)
np.random.seed(seed)

Import the Gesture dataset¶
Download and extract the zip file containing the datasets with tf.keras.utils.get_file.

Tip: Here are more datasets available for you to try.

# Download our dataset used for training
TRAIN_SET_URL = ‘https://storage.googleapis.com/learning-datasets/rps.zip’
path_to_zip = tf.keras.utils.get_file(‘rps.zip’, origin=TRAIN_SET_URL, extract=True, cache_dir=’/content’)
train_dir = os.path.join(os.path.dirname(path_to_zip), “rps”)

# As well as the validation dataset
VAL_SET_URL = ‘https://storage.googleapis.com/learning-datasets/rps-test-set.zip’
path_to_zip2 = tf.keras.utils.get_file(‘rps-test-set.zip’, origin=VAL_SET_URL, extract=True, cache_dir=’/content’)
validation_dir = os.path.join(os.path.dirname(path_to_zip2), “rps-test-set”)

Then we can generate the dataset from the image files in the directory with tf.data.Dataset.

BATCH_SIZE = 32
IMG_SIZE = (96, 96) # why? what is the original image size?

train_dataset = image_dataset_from_directory(train_dir,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)

validation_dataset = image_dataset_from_directory(validation_dir,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE)

Lets display some images of our dataset, together with their class names.

class_names = train_dataset.class_names
num_classes = len(train_dataset.class_names)
print(“Class names:” , class_names)
print(“Number of classes:”, num_classes)

plt.figure(figsize=(9, 9))
for images, labels in train_dataset.take(1):
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(images[i].numpy().astype(“uint8”))
plt.title(class_names[labels[i]])
plt.axis(“off”)

Split test set and validation set¶
We now take one fifth of the validation data set to use as a test set. The validation data set is used to observe whether overfitting occurred during training, while the test data set is used for the final test after training:

val_batches = tf.data.experimental.cardinality(validation_dataset)

test_dataset = validation_dataset.take(val_batches // 5)
validation_dataset = validation_dataset.skip(val_batches // 5)

print(‘Number of validation batches: %d’ % tf.data.experimental.cardinality(validation_dataset))
print(‘Number of test batches: %d’ % tf.data.experimental.cardinality(test_dataset))

Configure the dataset for performance¶
Use buffered prefetching to load images from the disk without having I/O become blocking. To learn more about this method see the data performance guide.

AUTOTUNE = tf.data.AUTOTUNE

train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
validation_dataset = validation_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

The Fist Model¶

Create a CNN model¶
Let’s define a simple Convolutional Neural Network Model(CNN) with several convolutional layers, followed by max pooling layers and a dense layer.

Tips: Information about parameters of Conv2D and Dense layers.

from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

# Create the cnn model
IMG_SHAPE = IMG_SIZE + (3,)
model = Sequential([
layers.InputLayer(input_shape=IMG_SHAPE),
layers.Conv2D(16, 3, padding=’same’, activation=’relu’),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding=’same’, activation=’relu’),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding=’same’, activation=’relu’),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(64, activation=’relu’),
layers.Dense(num_classes)

Compile the model¶
Compile the model before training it. We can define the used optimizer and the learning rate, the loss function, and which metrics to display while training:

base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[‘accuracy’])

model.summary()

Train the model¶
Now we should train the model for 10 epochs and see if it works:

history = model.fit(train_dataset,
epochs=10,
validation_data=validation_dataset)

Learning curves¶
Let’s take a look at the learning curves of the training and validation accuracy/loss of our model:

# Define a function so we can reuse it later
def draw_learning_curves(history):
acc = history.history[‘accuracy’]
val_acc = history.history[‘val_accuracy’]

loss = history.history[‘loss’]
val_loss = history.history[‘val_loss’]

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label=’Training Accuracy’)
plt.plot(val_acc, label=’Validation Accuracy’)
plt.legend(loc=’lower right’)
plt.ylabel(‘Accuracy’)
plt.title(‘Training and Validation Accuracy’)

plt.subplot(2, 1, 2)
plt.plot(loss, label=’Training Loss’)
plt.plot(val_loss, label=’Validation Loss’)
plt.legend(loc=’upper right’)
plt.ylabel(‘Cross Entropy’)
plt.title(‘Training and Validation Loss’)
plt.xlabel(‘epoch’)
plt.show()

draw_learning_curves(history)

You can also check the performance of the model against new data by using the test set:

loss, accuracy = model.evaluate(test_dataset)
print(‘Test accuracy :’, accuracy)

Your simple CNN model achieves 100% accuracy on the training set, it is working! But, on the validation set and test set, the model doesn’t perform as good as a training set, why?

The model fits too well into the training set and then it becomes difficult for the model to generalize to new examples that were not in the training set. Your model recognizes specific images in your training set instead of general patterns, this is called overfitting.

Improvement¶
Some strategies that could to overcome overfitting:

Increase the size of the training set Add more data
Data augmentation

Add dropout layers
Early stop to avoid overtraining
Take model architectures that generalize well

Data augmentation¶
We want to add some random flips and rotations to the input images to get a more “varied” range of inputs. To do this, we define two pre-processing layers that the inputs run through in order:

data_augmentation = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.RandomFlip(‘horizontal’),
tf.keras.layers.experimental.preprocessing.RandomRotation(0.2)

You can add other layers, such as randomly zoom in/out.

Note: These layers are active only during training, when you call model.fit. They are inactive when the model is used in inference mode in model.evaulate or model.predict.

Let’s repeatedly apply these layers to the same image and see the result.

for image, _ in train_dataset.take(1):
plt.figure(figsize=(10, 10))
first_image = image[0]
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
augmented_image = data_augmentation(tf.expand_dims(first_image, 0))
plt.imshow(tf.dtypes.cast(augmented_image[0], tf.uint8))
plt.axis(‘off’)

Embedding data augmentation and dropout in the CNN model¶

model = Sequential([
layers.InputLayer(input_shape=IMG_SHAPE),
data_augmentation,
layers.Conv2D(16, 3, padding=’same’, activation=’relu’),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding=’same’, activation=’relu’),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding=’same’, activation=’relu’),
layers.MaxPooling2D(),
layers.Dropout(0.2),
layers.Flatten(),
layers.Dense(64, activation=’relu’),
layers.Dense(num_classes)

# Compile the model
base_learning_rate = 0.001
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[‘accuracy’])

model.summary()

Model training and learning curve¶
Note: This time we train the model with the early-stop strategy. Read more about tf.keras.callbacks.EarlyStopping here.

history = model.fit(train_dataset,
epochs=100,
callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2),
validation_data=validation_dataset)

draw_learning_curves(history)

loss, accuracy = model.evaluate(test_dataset)
print(‘Test accuracy :’, accuracy)

Transfer learning¶
This part is adapted from a tutorial from TensorFlow. You will create the base model from the MobileNet V2 model developed at Google. This is pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M images and 1000 classes. ImageNet is a research training dataset with a wide variety of categories like jackfruit and syringe. This base of knowledge will help us classify gestures from our specific dataset.

Rescale pixel values¶
In a moment, you will download tf.keras.applications.MobileNetV2 for use as your base model. This model expects pixel values in [-1, 1], but at this point, the pixel values in your images are in the range of [0, 255]. To adapt to our images we use the preprocessing method included with the model.

preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input

Create the base model from the pre-trained convnets¶
First, you need to pick which layer of MobileNet V2 you will use for feature extraction. The very last classification layer (on “top”, as most diagrams of machine learning models go from bottom to top) is not very useful. Instead, you will follow the common practice to depend on the very last layer before the flatten operation. This layer is called the “bottleneck layer”. The bottleneck layer features retain more generality as compared to the final/top layer.

Now, instantiate a MobileNet V2 model pre-loaded with weights trained on ImageNet. By specifying the include_top=False argument, you load a network that doesn’t include the classification layers at the top, which is ideal for feature extraction.

IMG_SHAPE = IMG_SIZE + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights=’imagenet’,
alpha=0.35)

Let’s see what it does to an example batch of images:

image_batch, label_batch = next(iter(train_dataset))
feature_batch = base_model(image_batch)

print(feature_batch.shape)

The output information indicates that the feature extractor converts each 90x90x3 image into a 3x3x1280 block of features.

Freeze the convolutional base¶
It is important to freeze the convolutional base before you compile and train the model. Freezing (by setting layer.trainable = False) prevents the weights in a given layer from being updated during training. MobileNet V2 has many layers, so setting the entire model’s trainable flag to False will freeze all of them.

base_model.trainable = False

Let’s take a look at the base model architecture

base_model.summary()

Add a classification head¶
To generate predictions from the block of features, average over the spatial 3×3 spatial locations, using a tf.keras.layers.GlobalAveragePooling2D layer to convert the features to a single 1280-element vector per image.

global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)
print(feature_batch_average.shape)

Apply a tf.keras.layers.Dense layer to convert these features into a single prediction per image. You don’t need an activation function here because this prediction will be treated as a logit, or a raw prediction value. Positive numbers predict class 1, negative numbers predict class 0.

prediction_layer = tf.keras.layers.Dense(3)
prediction_batch = prediction_layer(feature_batch_average)
print(prediction_batch.shape)

Build and Compile the stacked model¶
Build a model by chaining together the data augmentation, rescaling, base_model and feature extractor layers using the Keras Functional API. As previously mentioned, use training=False as our model contains a BatchNormalization layer.

inputs = tf.keras.Input(shape=IMG_SHAPE)
x = data_augmentation(inputs)
x = preprocess_input(inputs)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[‘accuracy’])

model.summary()

The 0.4M parameters in MobileNet are frozen, but there are 3.8K trainable parameters in the Dense layer. These are divided between two tf.Variable objects, the weights and biases.

Train the model¶
After training for 10 epochs, you should see ~94% accuracy on the validation set.

initial_epochs = 10

loss0, accuracy0 = model.evaluate(validation_dataset)

print(“initial loss: {:.2f}”.format(loss0))
print(“initial accuracy: {:.2f}”.format(accuracy0))

history = model.fit(train_dataset,
epochs=initial_epochs,
validation_data=validation_dataset)

draw_learning_curves(history)

Fine tuning¶
Now we also try to train some layers in the base model to improve the model’s performance.

Let’s take a look to see how many layers are in the base model

print(“Number of layers in the base model: “, len(base_model.layers))

Ok, we can fine-tune the weights in the 54 layers:

base_model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = 100

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer = tf.keras.optimizers.RMSprop(learning_rate=base_learning_rate/10),
metrics=[‘accuracy’])

model.summary()

Continue training the model

fine_tune_epochs = 10
total_epochs = initial_epochs + fine_tune_epochs

history_fine = model.fit(train_dataset,
epochs=total_epochs,
initial_epoch=history.epoch[-1],
validation_data=validation_dataset)

acc = history.history[‘accuracy’] + history_fine.history[‘accuracy’]
val_acc = history.history[‘val_accuracy’] + history_fine.history[‘val_accuracy’]

loss = history.history[‘loss’] + history_fine.history[‘loss’]
val_loss = history.history[‘val_loss’] + history_fine.history[‘val_loss’]

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label=’Training Accuracy’)
plt.plot(val_acc, label=’Validation Accuracy’)
plt.ylim([0.8, 1])
plt.plot([initial_epochs-1,initial_epochs-1],
plt.ylim(), label=’Start Fine Tuning’)
plt.legend(loc=’lower right’)
plt.title(‘Training and Validation Accuracy’)

plt.subplot(2, 1, 2)
plt.plot(loss, label=’Training Loss’)
plt.plot(val_loss, label=’Validation Loss’)
plt.ylim([0, 1.0])
plt.plot([initial_epochs-1,initial_epochs-1],
plt.ylim(), label=’Start Fine Tuning’)
plt.legend(loc=’upper right’)
plt.title(‘Training and Validation Loss’)
plt.xlabel(‘epoch’)
plt.show()

Evaluation and prediction¶
Finaly you can verify the performance of the model on new data using the test set.

loss, accuracy = model.evaluate(test_dataset)
print(‘Test accuracy :’, accuracy)

And now you are all set to use this model to predict the input gesture.

#Retrieve a batch of images from the test set
image_batch, label_batch = test_dataset.as_numpy_iterator().next()
predictions = model.predict_on_batch(image_batch)

#Get predictions
predList = []
for pred in predictions:
predList.append(np.argmax(pred))

predictions = np.asarray(predList)

print(‘Predictions:\n’, predictions)
print(‘Labels:\n’, label_batch)

#Draw fist 10 gestures
plt.figure(figsize=(10, 10))
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(image_batch[i].astype(“uint8”))
plt.title(class_names[predictions[i]])
plt.axis(“off”)

What is next?¶
Train a CNN on another dataset

Dog and cat:
https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip
Horse and human:
https://storage.googleapis.com/laurencemoroney-blog.appspot.com/horse-or-human.zip

Apply all strategies introduced in this notebook to train the CNN
Your can save your well-trained gesture model, which you can use in the next Lab

# dataset loading and preparing, try more augmentation methods

# Creating your CNN model(either from scratch or transfer learning)

# Train the CNN model

# Test the trained CNN model and

# Store it to the hard disk
model.save(‘/content/model_rps.h5’)

from google.colab import files
files.download(‘/content/model_rps.h5’)