Computer Vision

ACTL3143 & ACTL5111 Deep Learning for Actuaries

Author

Patrick Laub

Show the package imports

import random
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import keras
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.callbacks import EarlyStopping
from keras.utils import plot_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Computer vision is a field of Artificial Intelligence (AI) that focuses on extracting meaningful information from visual data (images and videos). One of the primary goals of computer vision is to correctly identify and classify visual data. Convolution Neural Networks (CNNs) are the most commonly used neural network architectures for computer vision related tasks.

Images

Shapes of data

A special attention to shapes of data are important in CNN architectures, because CNNs have special types of layers (e.g. convolution and pooling) which require explicit specifications of array dimensions.

Illustration of tensors of different rank.

Shapes of photos

Since the position of a pixel(one small sqaure) in a photo can be represented using 3 positional values, we call it a rank 3 tensor.

How the computer sees them

from matplotlib.image import imread
img1 = imread('pu.gif'); img2 = imread('pl.gif')
img3 = imread('pr.gif'); img4 = imread('pg.bmp')
f"Shapes are: {img1.shape}, {img2.shape}, {img3.shape}, {img4.shape}."

'Shapes are: (16, 16, 3), (16, 16, 3), (16, 16, 3), (16, 16, 3).'

img1

array([[[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]]], dtype=uint8)

img2

array([[[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]]], dtype=uint8)

img3

array([[[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [255, 255,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]]], dtype=uint8)

img4

array([[[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0]],

       [[  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [ 51,   0, 255],
        [ 51,   0, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [ 51,   0, 255],
        [ 51,   0, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [ 51,   0, 255],
        [ 51,   0, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [ 51,   0, 255],
        [ 51,   0, 255],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 255, 255],
        [255, 255, 255],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177]],

       [[255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0]],

       [[255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [255, 163, 177],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0]],

       [[255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0],
        [255, 163, 177],
        [  0,   0,   0],
        [  0,   0,   0],
        [  0,   0,   0]]], dtype=uint8)

The above code reads 4 images and then shows how computers read those images. Each image is read by the computer as a rank 3 tensor. Each image is of (16,16,3) dimensions.

How we see them

from matplotlib.pyplot import imshow

imshow(img1);

imshow(img2);

imshow(img3);

imshow(img4);

Why is 255 special?

Each pixel’s colour intensity is stored in one byte.

One byte is 8 bits, so in binary that is 00000000 to 11111111.

The largest unsigned number this can be is 2^8-1 = 255.

np.array([0, 1, 255, 256]).astype(np.uint8)

array([  0,   1, 255,   0], dtype=uint8)

If you had signed numbers, this would go from -128 to 127.

np.array([-128, 1, 127, 128]).astype(np.int8)

array([-128,    1,  127, -128], dtype=int8)

Alternatively, hexidecimal numbers are used. E.g. 10100001 is split into 1010 0001, and 1010=A, 0001=1, so combined it is 0xA1.

Image editing with kernels

Take a look at https://setosa.io/ev/image-kernels/.

An example of an image kernel in action.

Convolutional Layers

‘Convolution’ not ‘complicated’

Say X_1, X_2 \sim f_X are i.i.d., and we look at S = X_1 + X_2.

The density for S is then

f_S(s) = \int_{x_1=-\infty}^{\infty} f_X(x_1) \, f_X(s-x_1) \,\mathrm{d}s .

This is the convolution operation, f_S = f_X \star f_X.

Images are rank 3 tensors

Height, width, and number of channels.

An image can be represented using a rank 3 tensor, since it has 3 dimensions; height, width, and number of channels. The number of channels is also known as the ‘depth’. The left hand side of the picture shown below is tensor with height =5, width =5 and depth =3.

Grayscale image has 1 channel. RGB image has 3 channels.

Each colour can be represented as a combination of three primary colours; red, green and blue.

Example: Yellow = Red + Green.

Example: Detecting yellow

Suppose we wish to detect if a picture has yellow colour in it. One option would be to apply a neuron over each pixel and see if it detects the colour yellow. We know that each pixel is represented by 3 numerical values that correspond to red, green and blue. Higher numeric values for red and green indicate higher chances of detecting yellow. Higher values for blue indicate lower chances of detecting yellow. Utilising this information, we can assign RGB weights to be 1, 1, -1 respectively.

Next, a standard multiplication between numeric values and weights is carried out, and the weighted sum is passed through the neuron.

Apply a neuron to each pixel in the image.

If red/green \nearrow or blue \searrow then yellowness \nearrow.

Set RGB weights to 1, 1, -1.

Example: Detecting yellow II

Scan the 3-channel input (colour image) with the neuron to produce a 1-channel output (grayscale image).

The output is produced by sweeping the neuron over the input. This is called convolution.

Example: Detecting yellow III

The following picture demonstrates how yellow-coloured areas (in the colour picture) are transformed into a white colour (in the greyscale picture). This is a result of the way we assigned the weights. Since we assigned +1 weights to red and green, and -1 to blue, it ended up resulting in large positive values (for the weighted sum) for the pixels in the yellow-coloured areas. Large positive values in the greyscale correspond to white colour. Therefore, the areas which were yellow in the colour picture converted to white in the greyscale. In practice, we do not manually assign weights, instead, we let the neural network decide the optimal weights during training.

The more yellow the pixel in the colour image (left), the more white it is in the grayscale image.

The neuron or its weights is called a filter. We convolve the image with a filter, i.e. a convolutional filter.

Terminology

The same neuron is used to sweep over the image, so we can store the weights in some shared memory and process the pixels in parallel. We say that the neurons are weight sharing.
In the previous example, the neuron only takes one pixel as input. Usually a larger filter containing a block of weights is used to process not only a pixel but also its neighboring pixels all at once.
The weights are called the filter kernels.
The cluster of pixels that forms the input of a filter is called its footprint.

Spatial filter

When a filter’s footprint is > 1 pixel, it is a spatial filter.

The above spatial filter is a 3x3 filter. Hence, there are 9 weights to learn.

Multidimensional convolution

In a multidimensional filter, the number of channels of the input must be equal to the number of channels in the filter (depths must be the same).

Need \# \text{ Channels in Input} = \# \text{ Channels in Filter}.

Example: a 3x3 filter with 3 channels, containing 27 weights.

Example: 3x3 filter over RGB input

Each channel is multipled separately & then added together.

The above figure shows how we pick a 3x3x3 block from the image, and then apply the 3x3 filter. The multiplication is carried out channel-wise, i.e. we select the first channel of the filter and the first channel of the image and carry out the element wise multiplation. Once the elementwise multiplications for the three pairs of channels are completed, we sum them all, and pass through the neuron.

Input-output relationship

Matching the original image footprints against the output location.

The above figure shows how 9 inputs transform in to one output. As a result, dimensions of the output matrix is smaller than the dimensions of the input matrix. There are some options we can use if we wish to keep the size of input and output matrices same.

Convolutional Layer Options

Padding

What happens when filters go off the edge of the input?

How to avoid the filter’s receptive field falling off the side of the input.
If we only scan the filter over places of the input where the filter can fit perfectly, it will lead to loss of information, especially after many filters.

Padding

Add a border of extra elements around the input, called padding. Normally we place zeros in all the new elements, called zero padding.

Padded values can be added to the outside of the input.

Convolution layer

Multiple filters are bundled together in one layer.
The filters are applied simultaneously and independently to the input.
Filters can have different footprints, but in practice we almost always use the same footprint for every filter in a convolution layer.
Number of channels in the output will be the same as the number of filters.

The motivation behind applying filters simultaneously and independently is to let the filters learn different patterns in the input-output relationship. The idea is quite similar to using many neurons in one Dense layer (in a Dense layer, we would use multiple neurons so that different neurons can capture different patterns in the input-output relationship).

Example

In the image:

6-channel input tensor
input pixels
four 3x3 filters
four output tensors
final output tensor.

Example network highlighting that the number of output channels equals the number of filters.

The above picture shows how we take in an image with 6 channels, select a 3x3 block (in pink colour), apply 4 different filters of same dimensions (in pink, green, blue and yellow), retrieve the output with 1 channel (1 output for each filter) and finally stack them together to create 1 output tensor. Note that the number of channels in the output tensor is 4, which is equal to the number of spatial filters used.

1x1 convolution

Feature reduction: Reduce the number of channels in the input tensor (removing correlated features) by using fewer filters than the number of channels in the input. This is because the number of channels in the output is always the same as number of filters.
1x1 convolution: Convolution using 1x1 filters.
When the channels are correlated, 1x1 convolution is very effective at reducing channels without loss of information.

Example of 1x1 convolution

Input tensor contains 300 channels.
Use 175 1x1 filters in the convolution layer (300 weights each).
Each filter produces a 1-channel output.
Final output tensor has 175 channels.

Striding

Striding options allows to modify the movement of the filter across the image. Instead moving one step at a time (either horizontally or vertically), we can increase the number of steps using the striding option.

We don’t have to go one pixel across/down at a time.

Example: Use a stride of three horizontally and two vertically.

Dimension of output will be smaller than input.

Choosing strides

When a filter scans the input step by step, it processes the same input elements multiple times. Even with larger strides, this can still happen (left image).

If we want to save time, we can choose strides that prevents input elements from being used more than once. Example (right image): 3x3 filter, stride 3 in both directions.

Specifying a convolutional layer

Need to choose:

number of filters,
their footprints (e.g. 3x3, 5x5, etc.),
activation functions,
padding & striding (optional).

All the filter weights are learned during training.

Convolutional Neural Networks

Definition of CNN

A neural network that uses convolution layers is called a convolutional neural network.

Architecture

A standard CNN architecture has the following components: an input layer, a sequence of feature extraction layers (which combine convolution and pooling operations sequentially), a sequence of classification layers (which include flattening and fully connected layers) and a final output layer. Convolution layers are used to extract meaningful patterns from the input using spatial filters. Pooling layers are used to reduce the spatial dimensions of the feature maps generated from convolutional layers. The purpose of the feature extraction layers is to learn complex but meaningful, high levels patterns in data. The aim of classification layers is to receive the learned patterns and make decisions more closely related to the classification task at hand.

Architecture #2

On a high level, the idea would be to keep on increasing the number of channels (depth) and decrease the dimensions of the feature map. We can see how the depth increases, and spatial dimensions reduce from first convolution layer to the second pooling layer.

Pooling

Pooling, or downsampling, is a technique to blur a tensor.

(a): Input tensor (b): Subdivide input tensor into 2x2 blocks (c): Average pooling (d): Max pooling (e): Icon for a pooling layer

Pooling for multiple channels

Input tensor: 6x6 with 1 channel, zero padding.
Convolution layer: Three 3x3 filters.
Convolution layer output: 6x6 with 3 channels.
Pooling layer: apply max pooling to each channel.
Pooling layer output: 3x3, 3 channels.

Why/why not use pooling?

Why? Pooling reduces the size of tensors, therefore reduces memory usage and execution time (recall that 1x1 convolution reduces the number of channels in a tensor).

Why not?

What do the CNN layers learn?

Demo: Character Recognition

MNIST Dataset

Mandarin Characters Dataset

57 poorly written Mandarin characters (57 \times 7 = 399).

Dataset of notes when learning/practising basic characters.

Downloading the dataset

The data is zipped (6.9 MB) and stored on my GitHub homepage.

# Download the dataset if it hasn't already been downloaded.
from pathlib import Path
if not Path("mandarin").exists():
  print("Downloading dataset...")
  !wget https://laub.au/data/mandarin.zip
  !unzip mandarin.zip
else:
  print("Already downloaded.")

Already downloaded.

Tip

Remember, the Jupyter notebook associated with your final report should either download your dataset when it is run, or you should supply the data separately.

Directory structure

Inspect directory structure

!pip install directory_tree

from directory_tree import display_tree
display_tree("mandarin")

mandarin/
├── bai/
│   ├── bai-1.png
│   ├── bai-2.png
│   ├── bai-3.png
│   ├── bai-4.png
│   ├── bai-5.png
│   ├── bai-6.png
│   └── bai-7.png
├── ben/
│   ├── ben-1.png
│   ├── ben-2.png
│   ├── ben-3.png
│   ├── ben-4.png
│   ├── ben-5.png
│   ├── ben-6.png
│   └── ben-7.png
├── chong/
│   ├── chong-1.png
│   ├── chong-2.png
│   ├── chong-3.png
│   ├── chong-4.png
│   ├── chong-5.png
│   ├── chong-6.png
│   └── chong-7.png
├── chu/
│   ├── chu-1.png
│   ├── chu-2.png
│   ├── chu-3.png
│   ├── chu-4.png
│   ├── chu-5.png
│   ├── chu-6.png
│   └── chu-7.png
├── chuan/
│   ├── chuan-1.png
│   ├── chuan-2.png
│   ├── chuan-3.png
│   ├── chuan-4.png
│   ├── chuan-5.png
│   ├── chuan-6.png
│   └── chuan-7.png
├── cong/
│   ├── cong-1.png
│   ├── cong-2.png
│   ├── cong-3.png
│   ├── cong-4.png
│   ├── cong-5.png
│   ├── cong-6.png
│   └── cong-7.png
├── da/
│   ├── da-1.png
│   ├── da-2.png
│   ├── da-3.png
│   ├── da-4.png
│   ├── da-5.png
│   ├── da-6.png
│   └── da-7.png
├── dan/
│   ├── dan-1.png
│   ├── dan-2.png
│   ├── dan-3.png
│   ├── dan-4.png
│   ├── dan-5.png
│   ├── dan-6.png
│   └── dan-7.png
├── dong/
│   ├── dong-1.png
│   ├── dong-2.png
│   ├── dong-3.png
│   ├── dong-4.png
│   ├── dong-5.png
│   ├── dong-6.png
│   └── dong-7.png
├── fei/
│   ├── fei-1.png
│   ├── fei-2.png
│   ├── fei-3.png
│   ├── fei-4.png
│   ├── fei-5.png
│   ├── fei-6.png
│   └── fei-7.png
├── fu/
│   ├── fu-1.png
│   ├── fu-2.png
│   ├── fu-3.png
│   ├── fu-4.png
│   ├── fu-5.png
│   ├── fu-6.png
│   └── fu-7.png
├── fu2/
│   ├── fu2-1.png
│   ├── fu2-2.png
│   ├── fu2-3.png
│   ├── fu2-4.png
│   ├── fu2-5.png
│   ├── fu2-6.png
│   └── fu2-7.png
├── gao/
│   ├── gao-1.png
│   ├── gao-2.png
│   ├── gao-3.png
│   ├── gao-4.png
│   ├── gao-5.png
│   ├── gao-6.png
│   └── gao-7.png
├── gong/
│   ├── gong-1.png
│   ├── gong-2.png
│   ├── gong-3.png
│   ├── gong-4.png
│   ├── gong-5.png
│   ├── gong-6.png
│   └── gong-7.png
├── guo/
│   ├── guo-1.png
│   ├── guo-2.png
│   ├── guo-3.png
│   ├── guo-4.png
│   ├── guo-5.png
│   ├── guo-6.png
│   └── guo-7.png
├── hu/
│   ├── hu-1.png
│   ├── hu-2.png
│   ├── hu-3.png
│   ├── hu-4.png
│   ├── hu-5.png
│   ├── hu-6.png
│   └── hu-7.png
├── huo/
│   ├── huo-1.png
│   ├── huo-2.png
│   ├── huo-3.png
│   ├── huo-4.png
│   ├── huo-5.png
│   ├── huo-6.png
│   └── huo-7.png
├── kou/
│   ├── kou-1.png
│   ├── kou-2.png
│   ├── kou-3.png
│   ├── kou-4.png
│   ├── kou-5.png
│   ├── kou-6.png
│   └── kou-7.png
├── ku/
│   ├── ku-1.png
│   ├── ku-2.png
│   ├── ku-3.png
│   ├── ku-4.png
│   ├── ku-5.png
│   ├── ku-6.png
│   └── ku-7.png
├── lin/
│   ├── lin-1.png
│   ├── lin-2.png
│   ├── lin-3.png
│   ├── lin-4.png
│   ├── lin-5.png
│   ├── lin-6.png
│   └── lin-7.png
├── ma/
│   ├── ma-1.png
│   ├── ma-2.png
│   ├── ma-3.png
│   ├── ma-4.png
│   ├── ma-5.png
│   ├── ma-6.png
│   └── ma-7.png
├── ma2/
│   ├── ma2-1.png
│   ├── ma2-2.png
│   ├── ma2-3.png
│   ├── ma2-4.png
│   ├── ma2-5.png
│   ├── ma2-6.png
│   └── ma2-7.png
├── ma3/
│   ├── ma3-1.png
│   ├── ma3-2.png
│   ├── ma3-3.png
│   ├── ma3-4.png
│   ├── ma3-5.png
│   ├── ma3-6.png
│   └── ma3-7.png
├── mei/
│   ├── mei-1.png
│   ├── mei-2.png
│   ├── mei-3.png
│   ├── mei-4.png
│   ├── mei-5.png
│   ├── mei-6.png
│   └── mei-7.png
├── men/
│   ├── men-1.png
│   ├── men-2.png
│   ├── men-3.png
│   ├── men-4.png
│   ├── men-5.png
│   ├── men-6.png
│   └── men-7.png
├── ming/
│   ├── ming-1.png
│   ├── ming-2.png
│   ├── ming-3.png
│   ├── ming-4.png
│   ├── ming-5.png
│   ├── ming-6.png
│   └── ming-7.png
├── mu/
│   ├── mu-1.png
│   ├── mu-2.png
│   ├── mu-3.png
│   ├── mu-4.png
│   ├── mu-5.png
│   ├── mu-6.png
│   └── mu-7.png
├── nan/
│   ├── nan-1.png
│   ├── nan-2.png
│   ├── nan-3.png
│   ├── nan-4.png
│   ├── nan-5.png
│   ├── nan-6.png
│   └── nan-7.png
├── niao/
│   ├── niao-1.png
│   ├── niao-2.png
│   ├── niao-3.png
│   ├── niao-4.png
│   ├── niao-5.png
│   ├── niao-6.png
│   └── niao-7.png
├── niu/
│   ├── niu-1.png
│   ├── niu-2.png
│   ├── niu-3.png
│   ├── niu-4.png
│   ├── niu-5.png
│   ├── niu-6.png
│   └── niu-7.png
├── nu/
│   ├── nu-1.png
│   ├── nu-2.png
│   ├── nu-3.png
│   ├── nu-4.png
│   ├── nu-5.png
│   ├── nu-6.png
│   └── nu-7.png
├── nuan/
│   ├── nuan-1.png
│   ├── nuan-2.png
│   ├── nuan-3.png
│   ├── nuan-4.png
│   ├── nuan-5.png
│   ├── nuan-6.png
│   └── nuan-7.png
├── peng/
│   ├── peng-1.png
│   ├── peng-2.png
│   ├── peng-3.png
│   ├── peng-4.png
│   ├── peng-5.png
│   ├── peng-6.png
│   └── peng-7.png
├── quan/
│   ├── quan-1.png
│   ├── quan-2.png
│   ├── quan-3.png
│   ├── quan-4.png
│   ├── quan-5.png
│   ├── quan-6.png
│   └── quan-7.png
├── ren/
│   ├── ren-1.png
│   ├── ren-2.png
│   ├── ren-3.png
│   ├── ren-4.png
│   ├── ren-5.png
│   ├── ren-6.png
│   └── ren-7.png
├── ri/
│   ├── ri-1.png
│   ├── ri-2.png
│   ├── ri-3.png
│   ├── ri-4.png
│   ├── ri-5.png
│   ├── ri-6.png
│   └── ri-7.png
├── rou/
│   ├── rou-1.png
│   ├── rou-2.png
│   ├── rou-3.png
│   ├── rou-4.png
│   ├── rou-5.png
│   ├── rou-6.png
│   └── rou-7.png
├── sen/
│   ├── sen-1.png
│   ├── sen-2.png
│   ├── sen-3.png
│   ├── sen-4.png
│   ├── sen-5.png
│   ├── sen-6.png
│   └── sen-7.png
├── shan/
│   ├── shan-1.png
│   ├── shan-2.png
│   ├── shan-3.png
│   ├── shan-4.png
│   ├── shan-5.png
│   ├── shan-6.png
│   └── shan-7.png
├── shan2/
│   ├── shan2-1.png
│   ├── shan2-2.png
│   ├── shan2-3.png
│   ├── shan2-4.png
│   ├── shan2-5.png
│   ├── shan2-6.png
│   └── shan2-7.png
├── shui/
│   ├── shui-1.png
│   ├── shui-2.png
│   ├── shui-3.png
│   ├── shui-4.png
│   ├── shui-5.png
│   ├── shui-6.png
│   └── shui-7.png
├── tai/
│   ├── tai-1.png
│   ├── tai-2.png
│   ├── tai-3.png
│   ├── tai-4.png
│   ├── tai-5.png
│   ├── tai-6.png
│   └── tai-7.png
├── tian/
│   ├── tian-1.png
│   ├── tian-2.png
│   ├── tian-3.png
│   ├── tian-4.png
│   ├── tian-5.png
│   ├── tian-6.png
│   └── tian-7.png
├── wang/
│   ├── wang-1.png
│   ├── wang-2.png
│   ├── wang-3.png
│   ├── wang-4.png
│   ├── wang-5.png
│   ├── wang-6.png
│   └── wang-7.png
├── wen/
│   ├── wen-1.png
│   ├── wen-2.png
│   ├── wen-3.png
│   ├── wen-4.png
│   ├── wen-5.png
│   ├── wen-6.png
│   └── wen-7.png
├── xian/
│   ├── xian-1.png
│   ├── xian-2.png
│   ├── xian-3.png
│   ├── xian-4.png
│   ├── xian-5.png
│   ├── xian-6.png
│   └── xian-7.png
├── xuan/
│   ├── xuan-1.png
│   ├── xuan-2.png
│   ├── xuan-3.png
│   ├── xuan-4.png
│   ├── xuan-5.png
│   ├── xuan-6.png
│   └── xuan-7.png
├── yan/
│   ├── yan-1.png
│   ├── yan-2.png
│   ├── yan-3.png
│   ├── yan-4.png
│   ├── yan-5.png
│   ├── yan-6.png
│   └── yan-7.png
├── yang/
│   ├── yang-1.png
│   ├── yang-2.png
│   ├── yang-3.png
│   ├── yang-4.png
│   ├── yang-5.png
│   ├── yang-6.png
│   └── yang-7.png
├── yin/
│   ├── yin-1.png
│   ├── yin-2.png
│   ├── yin-3.png
│   ├── yin-4.png
│   ├── yin-5.png
│   ├── yin-6.png
│   └── yin-7.png
├── yu/
│   ├── yu-1.png
│   ├── yu-2.png
│   ├── yu-3.png
│   ├── yu-4.png
│   ├── yu-5.png
│   ├── yu-6.png
│   └── yu-7.png
├── yu2/
│   ├── yu2-1.png
│   ├── yu2-2.png
│   ├── yu2-3.png
│   ├── yu2-4.png
│   ├── yu2-5.png
│   ├── yu2-6.png
│   └── yu2-7.png
├── yue/
│   ├── yue-1.png
│   ├── yue-2.png
│   ├── yue-3.png
│   ├── yue-4.png
│   ├── yue-5.png
│   ├── yue-6.png
│   └── yue-7.png
├── zhong/
│   ├── zhong-1.png
│   ├── zhong-2.png
│   ├── zhong-3.png
│   ├── zhong-4.png
│   ├── zhong-5.png
│   ├── zhong-6.png
│   └── zhong-7.png
├── zhu/
│   ├── zhu-1.png
│   ├── zhu-2.png
│   ├── zhu-3.png
│   ├── zhu-4.png
│   ├── zhu-5.png
│   ├── zhu-6.png
│   └── zhu-7.png
├── zhu2/
│   ├── zhu2-1.png
│   ├── zhu2-2.png
│   ├── zhu2-3.png
│   ├── zhu2-4.png
│   ├── zhu2-5.png
│   ├── zhu2-6.png
│   └── zhu2-7.png
└── zhuo/
    ├── zhuo-1.png
    ├── zhuo-2.png
    ├── zhuo-3.png
    ├── zhuo-4.png
    ├── zhuo-5.png
    ├── zhuo-6.png
    └── zhuo-7.png

tree = display_tree("mandarin", string_rep=True).split("\n")
print("\n".join(tree[:12]))
print("...")
print("\n".join(tree[-4:]))

mandarin/
├── bai/
│   ├── bai-1.png
│   ├── bai-2.png
│   ├── bai-3.png
│   ├── bai-4.png
│   ├── bai-5.png
│   ├── bai-6.png
│   └── bai-7.png
├── ben/
│   ├── ben-1.png
│   ├── ben-2.png
...
    ├── zhuo-5.png
    ├── zhuo-6.png
    └── zhuo-7.png

Splitting into train/val/test sets

!pip install split-folders

import splitfolders
splitfolders.ratio("mandarin", output="mandarin-split",
    seed=1337, ratio=(5/7, 1/7, 1/7))

display_tree("mandarin-split", max_depth=1)

Copying files: 0 files [00:00, ? files/s]Copying files: 238 files [00:00, 2378.45 files/s]Copying files: 399 files [00:00, 1944.80 files/s]

mandarin-split/
├── test/
├── train/
└── val/

Directory structure II

display_tree("mandarin-split")

mandarin-split/
├── test/
│   ├── bai/
│   │   └── bai-5.png
│   ├── ben/
│   │   └── ben-5.png
│   ├── chong/
│   │   └── chong-5.png
│   ├── chu/
│   │   └── chu-5.png
│   ├── chuan/
│   │   └── chuan-5.png
│   ├── cong/
│   │   └── cong-5.png
│   ├── da/
│   │   └── da-5.png
│   ├── dan/
│   │   └── dan-5.png
│   ├── dong/
│   │   └── dong-5.png
│   ├── fei/
│   │   └── fei-5.png
│   ├── fu/
│   │   └── fu-5.png
│   ├── fu2/
│   │   └── fu2-5.png
│   ├── gao/
│   │   └── gao-5.png
│   ├── gong/
│   │   └── gong-5.png
│   ├── guo/
│   │   └── guo-5.png
│   ├── hu/
│   │   └── hu-5.png
│   ├── huo/
│   │   └── huo-5.png
│   ├── kou/
│   │   └── kou-5.png
│   ├── ku/
│   │   └── ku-5.png
│   ├── lin/
│   │   └── lin-5.png
│   ├── ma/
│   │   └── ma-5.png
│   ├── ma2/
│   │   └── ma2-5.png
│   ├── ma3/
│   │   └── ma3-5.png
│   ├── mei/
│   │   └── mei-5.png
│   ├── men/
│   │   └── men-5.png
│   ├── ming/
│   │   └── ming-5.png
│   ├── mu/
│   │   └── mu-5.png
│   ├── nan/
│   │   └── nan-5.png
│   ├── niao/
│   │   └── niao-5.png
│   ├── niu/
│   │   └── niu-5.png
│   ├── nu/
│   │   └── nu-5.png
│   ├── nuan/
│   │   └── nuan-5.png
│   ├── peng/
│   │   └── peng-5.png
│   ├── quan/
│   │   └── quan-5.png
│   ├── ren/
│   │   └── ren-5.png
│   ├── ri/
│   │   └── ri-5.png
│   ├── rou/
│   │   └── rou-5.png
│   ├── sen/
│   │   └── sen-5.png
│   ├── shan/
│   │   └── shan-5.png
│   ├── shan2/
│   │   └── shan2-5.png
│   ├── shui/
│   │   └── shui-5.png
│   ├── tai/
│   │   └── tai-5.png
│   ├── tian/
│   │   └── tian-5.png
│   ├── wang/
│   │   └── wang-5.png
│   ├── wen/
│   │   └── wen-5.png
│   ├── xian/
│   │   └── xian-5.png
│   ├── xuan/
│   │   └── xuan-5.png
│   ├── yan/
│   │   └── yan-5.png
│   ├── yang/
│   │   └── yang-5.png
│   ├── yin/
│   │   └── yin-5.png
│   ├── yu/
│   │   └── yu-5.png
│   ├── yu2/
│   │   └── yu2-5.png
│   ├── yue/
│   │   └── yue-5.png
│   ├── zhong/
│   │   └── zhong-5.png
│   ├── zhu/
│   │   └── zhu-5.png
│   ├── zhu2/
│   │   └── zhu2-5.png
│   └── zhuo/
│       └── zhuo-5.png
├── train/
│   ├── bai/
│   │   ├── bai-1.png
│   │   ├── bai-2.png
│   │   ├── bai-3.png
│   │   ├── bai-4.png
│   │   └── bai-6.png
│   ├── ben/
│   │   ├── ben-1.png
│   │   ├── ben-2.png
│   │   ├── ben-3.png
│   │   ├── ben-4.png
│   │   └── ben-6.png
│   ├── chong/
│   │   ├── chong-1.png
│   │   ├── chong-2.png
│   │   ├── chong-3.png
│   │   ├── chong-4.png
│   │   └── chong-6.png
│   ├── chu/
│   │   ├── chu-1.png
│   │   ├── chu-2.png
│   │   ├── chu-3.png
│   │   ├── chu-4.png
│   │   └── chu-6.png
│   ├── chuan/
│   │   ├── chuan-1.png
│   │   ├── chuan-2.png
│   │   ├── chuan-3.png
│   │   ├── chuan-4.png
│   │   └── chuan-6.png
│   ├── cong/
│   │   ├── cong-1.png
│   │   ├── cong-2.png
│   │   ├── cong-3.png
│   │   ├── cong-4.png
│   │   └── cong-6.png
│   ├── da/
│   │   ├── da-1.png
│   │   ├── da-2.png
│   │   ├── da-3.png
│   │   ├── da-4.png
│   │   └── da-6.png
│   ├── dan/
│   │   ├── dan-1.png
│   │   ├── dan-2.png
│   │   ├── dan-3.png
│   │   ├── dan-4.png
│   │   └── dan-6.png
│   ├── dong/
│   │   ├── dong-1.png
│   │   ├── dong-2.png
│   │   ├── dong-3.png
│   │   ├── dong-4.png
│   │   └── dong-6.png
│   ├── fei/
│   │   ├── fei-1.png
│   │   ├── fei-2.png
│   │   ├── fei-3.png
│   │   ├── fei-4.png
│   │   └── fei-6.png
│   ├── fu/
│   │   ├── fu-1.png
│   │   ├── fu-2.png
│   │   ├── fu-3.png
│   │   ├── fu-4.png
│   │   └── fu-6.png
│   ├── fu2/
│   │   ├── fu2-1.png
│   │   ├── fu2-2.png
│   │   ├── fu2-3.png
│   │   ├── fu2-4.png
│   │   └── fu2-6.png
│   ├── gao/
│   │   ├── gao-1.png
│   │   ├── gao-2.png
│   │   ├── gao-3.png
│   │   ├── gao-4.png
│   │   └── gao-6.png
│   ├── gong/
│   │   ├── gong-1.png
│   │   ├── gong-2.png
│   │   ├── gong-3.png
│   │   ├── gong-4.png
│   │   └── gong-6.png
│   ├── guo/
│   │   ├── guo-1.png
│   │   ├── guo-2.png
│   │   ├── guo-3.png
│   │   ├── guo-4.png
│   │   └── guo-6.png
│   ├── hu/
│   │   ├── hu-1.png
│   │   ├── hu-2.png
│   │   ├── hu-3.png
│   │   ├── hu-4.png
│   │   └── hu-6.png
│   ├── huo/
│   │   ├── huo-1.png
│   │   ├── huo-2.png
│   │   ├── huo-3.png
│   │   ├── huo-4.png
│   │   └── huo-6.png
│   ├── kou/
│   │   ├── kou-1.png
│   │   ├── kou-2.png
│   │   ├── kou-3.png
│   │   ├── kou-4.png
│   │   └── kou-6.png
│   ├── ku/
│   │   ├── ku-1.png
│   │   ├── ku-2.png
│   │   ├── ku-3.png
│   │   ├── ku-4.png
│   │   └── ku-6.png
│   ├── lin/
│   │   ├── lin-1.png
│   │   ├── lin-2.png
│   │   ├── lin-3.png
│   │   ├── lin-4.png
│   │   └── lin-6.png
│   ├── ma/
│   │   ├── ma-1.png
│   │   ├── ma-2.png
│   │   ├── ma-3.png
│   │   ├── ma-4.png
│   │   └── ma-6.png
│   ├── ma2/
│   │   ├── ma2-1.png
│   │   ├── ma2-2.png
│   │   ├── ma2-3.png
│   │   ├── ma2-4.png
│   │   └── ma2-6.png
│   ├── ma3/
│   │   ├── ma3-1.png
│   │   ├── ma3-2.png
│   │   ├── ma3-3.png
│   │   ├── ma3-4.png
│   │   └── ma3-6.png
│   ├── mei/
│   │   ├── mei-1.png
│   │   ├── mei-2.png
│   │   ├── mei-3.png
│   │   ├── mei-4.png
│   │   └── mei-6.png
│   ├── men/
│   │   ├── men-1.png
│   │   ├── men-2.png
│   │   ├── men-3.png
│   │   ├── men-4.png
│   │   └── men-6.png
│   ├── ming/
│   │   ├── ming-1.png
│   │   ├── ming-2.png
│   │   ├── ming-3.png
│   │   ├── ming-4.png
│   │   └── ming-6.png
│   ├── mu/
│   │   ├── mu-1.png
│   │   ├── mu-2.png
│   │   ├── mu-3.png
│   │   ├── mu-4.png
│   │   └── mu-6.png
│   ├── nan/
│   │   ├── nan-1.png
│   │   ├── nan-2.png
│   │   ├── nan-3.png
│   │   ├── nan-4.png
│   │   └── nan-6.png
│   ├── niao/
│   │   ├── niao-1.png
│   │   ├── niao-2.png
│   │   ├── niao-3.png
│   │   ├── niao-4.png
│   │   └── niao-6.png
│   ├── niu/
│   │   ├── niu-1.png
│   │   ├── niu-2.png
│   │   ├── niu-3.png
│   │   ├── niu-4.png
│   │   └── niu-6.png
│   ├── nu/
│   │   ├── nu-1.png
│   │   ├── nu-2.png
│   │   ├── nu-3.png
│   │   ├── nu-4.png
│   │   └── nu-6.png
│   ├── nuan/
│   │   ├── nuan-1.png
│   │   ├── nuan-2.png
│   │   ├── nuan-3.png
│   │   ├── nuan-4.png
│   │   └── nuan-6.png
│   ├── peng/
│   │   ├── peng-1.png
│   │   ├── peng-2.png
│   │   ├── peng-3.png
│   │   ├── peng-4.png
│   │   └── peng-6.png
│   ├── quan/
│   │   ├── quan-1.png
│   │   ├── quan-2.png
│   │   ├── quan-3.png
│   │   ├── quan-4.png
│   │   └── quan-6.png
│   ├── ren/
│   │   ├── ren-1.png
│   │   ├── ren-2.png
│   │   ├── ren-3.png
│   │   ├── ren-4.png
│   │   └── ren-6.png
│   ├── ri/
│   │   ├── ri-1.png
│   │   ├── ri-2.png
│   │   ├── ri-3.png
│   │   ├── ri-4.png
│   │   └── ri-6.png
│   ├── rou/
│   │   ├── rou-1.png
│   │   ├── rou-2.png
│   │   ├── rou-3.png
│   │   ├── rou-4.png
│   │   └── rou-6.png
│   ├── sen/
│   │   ├── sen-1.png
│   │   ├── sen-2.png
│   │   ├── sen-3.png
│   │   ├── sen-4.png
│   │   └── sen-6.png
│   ├── shan/
│   │   ├── shan-1.png
│   │   ├── shan-2.png
│   │   ├── shan-3.png
│   │   ├── shan-4.png
│   │   └── shan-6.png
│   ├── shan2/
│   │   ├── shan2-1.png
│   │   ├── shan2-2.png
│   │   ├── shan2-3.png
│   │   ├── shan2-4.png
│   │   └── shan2-6.png
│   ├── shui/
│   │   ├── shui-1.png
│   │   ├── shui-2.png
│   │   ├── shui-3.png
│   │   ├── shui-4.png
│   │   └── shui-6.png
│   ├── tai/
│   │   ├── tai-1.png
│   │   ├── tai-2.png
│   │   ├── tai-3.png
│   │   ├── tai-4.png
│   │   └── tai-6.png
│   ├── tian/
│   │   ├── tian-1.png
│   │   ├── tian-2.png
│   │   ├── tian-3.png
│   │   ├── tian-4.png
│   │   └── tian-6.png
│   ├── wang/
│   │   ├── wang-1.png
│   │   ├── wang-2.png
│   │   ├── wang-3.png
│   │   ├── wang-4.png
│   │   └── wang-6.png
│   ├── wen/
│   │   ├── wen-1.png
│   │   ├── wen-2.png
│   │   ├── wen-3.png
│   │   ├── wen-4.png
│   │   └── wen-6.png
│   ├── xian/
│   │   ├── xian-1.png
│   │   ├── xian-2.png
│   │   ├── xian-3.png
│   │   ├── xian-4.png
│   │   └── xian-6.png
│   ├── xuan/
│   │   ├── xuan-1.png
│   │   ├── xuan-2.png
│   │   ├── xuan-3.png
│   │   ├── xuan-4.png
│   │   └── xuan-6.png
│   ├── yan/
│   │   ├── yan-1.png
│   │   ├── yan-2.png
│   │   ├── yan-3.png
│   │   ├── yan-4.png
│   │   └── yan-6.png
│   ├── yang/
│   │   ├── yang-1.png
│   │   ├── yang-2.png
│   │   ├── yang-3.png
│   │   ├── yang-4.png
│   │   └── yang-6.png
│   ├── yin/
│   │   ├── yin-1.png
│   │   ├── yin-2.png
│   │   ├── yin-3.png
│   │   ├── yin-4.png
│   │   └── yin-6.png
│   ├── yu/
│   │   ├── yu-1.png
│   │   ├── yu-2.png
│   │   ├── yu-3.png
│   │   ├── yu-4.png
│   │   └── yu-6.png
│   ├── yu2/
│   │   ├── yu2-1.png
│   │   ├── yu2-2.png
│   │   ├── yu2-3.png
│   │   ├── yu2-4.png
│   │   └── yu2-6.png
│   ├── yue/
│   │   ├── yue-1.png
│   │   ├── yue-2.png
│   │   ├── yue-3.png
│   │   ├── yue-4.png
│   │   └── yue-6.png
│   ├── zhong/
│   │   ├── zhong-1.png
│   │   ├── zhong-2.png
│   │   ├── zhong-3.png
│   │   ├── zhong-4.png
│   │   └── zhong-6.png
│   ├── zhu/
│   │   ├── zhu-1.png
│   │   ├── zhu-2.png
│   │   ├── zhu-3.png
│   │   ├── zhu-4.png
│   │   └── zhu-6.png
│   ├── zhu2/
│   │   ├── zhu2-1.png
│   │   ├── zhu2-2.png
│   │   ├── zhu2-3.png
│   │   ├── zhu2-4.png
│   │   └── zhu2-6.png
│   └── zhuo/
│       ├── zhuo-1.png
│       ├── zhuo-2.png
│       ├── zhuo-3.png
│       ├── zhuo-4.png
│       └── zhuo-6.png
└── val/
    ├── bai/
    │   └── bai-7.png
    ├── ben/
    │   └── ben-7.png
    ├── chong/
    │   └── chong-7.png
    ├── chu/
    │   └── chu-7.png
    ├── chuan/
    │   └── chuan-7.png
    ├── cong/
    │   └── cong-7.png
    ├── da/
    │   └── da-7.png
    ├── dan/
    │   └── dan-7.png
    ├── dong/
    │   └── dong-7.png
    ├── fei/
    │   └── fei-7.png
    ├── fu/
    │   └── fu-7.png
    ├── fu2/
    │   └── fu2-7.png
    ├── gao/
    │   └── gao-7.png
    ├── gong/
    │   └── gong-7.png
    ├── guo/
    │   └── guo-7.png
    ├── hu/
    │   └── hu-7.png
    ├── huo/
    │   └── huo-7.png
    ├── kou/
    │   └── kou-7.png
    ├── ku/
    │   └── ku-7.png
    ├── lin/
    │   └── lin-7.png
    ├── ma/
    │   └── ma-7.png
    ├── ma2/
    │   └── ma2-7.png
    ├── ma3/
    │   └── ma3-7.png
    ├── mei/
    │   └── mei-7.png
    ├── men/
    │   └── men-7.png
    ├── ming/
    │   └── ming-7.png
    ├── mu/
    │   └── mu-7.png
    ├── nan/
    │   └── nan-7.png
    ├── niao/
    │   └── niao-7.png
    ├── niu/
    │   └── niu-7.png
    ├── nu/
    │   └── nu-7.png
    ├── nuan/
    │   └── nuan-7.png
    ├── peng/
    │   └── peng-7.png
    ├── quan/
    │   └── quan-7.png
    ├── ren/
    │   └── ren-7.png
    ├── ri/
    │   └── ri-7.png
    ├── rou/
    │   └── rou-7.png
    ├── sen/
    │   └── sen-7.png
    ├── shan/
    │   └── shan-7.png
    ├── shan2/
    │   └── shan2-7.png
    ├── shui/
    │   └── shui-7.png
    ├── tai/
    │   └── tai-7.png
    ├── tian/
    │   └── tian-7.png
    ├── wang/
    │   └── wang-7.png
    ├── wen/
    │   └── wen-7.png
    ├── xian/
    │   └── xian-7.png
    ├── xuan/
    │   └── xuan-7.png
    ├── yan/
    │   └── yan-7.png
    ├── yang/
    │   └── yang-7.png
    ├── yin/
    │   └── yin-7.png
    ├── yu/
    │   └── yu-7.png
    ├── yu2/
    │   └── yu2-7.png
    ├── yue/
    │   └── yue-7.png
    ├── zhong/
    │   └── zhong-7.png
    ├── zhu/
    │   └── zhu-7.png
    ├── zhu2/
    │   └── zhu2-7.png
    └── zhuo/
        └── zhuo-7.png

train/
├── bai/
│   ├── bai-1.png
│   ├── bai-2.png
│   ├── bai-3.png
│   ├── bai-4.png
│   └── bai-6.png
...
val/
├── bai/
│   └── bai-7.png
├── ben/
│   └── ben-7.png
...
test/
├── bai/
│   └── bai-5.png
├── ben/
│   └── ben-5.png
...

Keras image dataset loading

from keras.utils import\
1  image_dataset_from_directory

2data_dir = "mandarin-split"
3batch_size = 32
4img_height = 80
5img_width = 80
6img_size = (img_height, img_width)

7train_ds = image_dataset_from_directory(
    data_dir + "/train",
    image_size=img_size,
    batch_size=batch_size,
    shuffle=False,
    color_mode='grayscale')

8val_ds = image_dataset_from_directory(
    data_dir + "/val",
    image_size=img_size,
    batch_size=batch_size,
    shuffle=False,
    color_mode='grayscale')

9test_ds = image_dataset_from_directory(
    data_dir + "/test",
    image_size=img_size,
    batch_size=batch_size,
    shuffle=False,
    color_mode='grayscale')

1: Imports image_dataset_from_directory class from keras.utils library
2: Specifies the name of the folder
3: Specifies the number of images to be trained at the same time
4: Specifies the height of the image
5: Specifies the width of the image
6: Specifies the image size
7: Creates a data object to store the train set. Note that color_mode='grayscale' command tells the computer to bring in the images in greyscale instead of the RGB scale.
8: Creates a data object to store the validation set
9: Creates a data object to store the test set

Found 285 files belonging to 57 classes.
Found 57 files belonging to 57 classes.
Found 57 files belonging to 57 classes.

Inspecting the datasets

print(train_ds.class_names)

['bai', 'ben', 'chong', 'chu', 'chuan', 'cong', 'da', 'dan', 'dong', 'fei', 'fu', 'fu2', 'gao', 'gong', 'guo', 'hu', 'huo', 'kou', 'ku', 'lin', 'ma', 'ma2', 'ma3', 'mei', 'men', 'ming', 'mu', 'nan', 'niao', 'niu', 'nu', 'nuan', 'peng', 'quan', 'ren', 'ri', 'rou', 'sen', 'shan', 'shan2', 'shui', 'tai', 'tian', 'wang', 'wen', 'xian', 'xuan', 'yan', 'yang', 'yin', 'yu', 'yu2', 'yue', 'zhong', 'zhu', 'zhu2', 'zhuo']

# NB: Need shuffle=False earlier for these X & y to line up.
X_train = np.concatenate(list(train_ds.map(lambda x, y: x)))
y_train = np.concatenate(list(train_ds.map(lambda x, y: y)))

X_val = np.concatenate(list(val_ds.map(lambda x, y: x)))
y_val = np.concatenate(list(val_ds.map(lambda x, y: y)))

X_test = np.concatenate(list(test_ds.map(lambda x, y: x)))
y_test = np.concatenate(list(test_ds.map(lambda x, y: y)))

X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape

((285, 80, 80, 1), (285,), (57, 80, 80, 1), (57,), (57, 80, 80, 1), (57,))

Plotting some characters (setup)

def plot_mandarin_characters(ds, plot_char_label = 0):
    num_plotted = 0
    for images, labels in ds:
       for i in range(images.shape[0]):
           label = labels[i]
           if label == plot_char_label:
               plt.subplot(1, 5, num_plotted + 1)
               plt.imshow(images[i].numpy().astype("uint8"), cmap="gray")
               plt.title(ds.class_names[label])
               plt.axis("off")
               num_plotted += 1
    plt.show()

Plotting some training characters

Plotting some val/test characters

bai_val = X_val[y_val == 0][0]
plt.imshow(bai_val, cmap="gray");

bai_test = X_test[y_test == 0][0]
plt.imshow(bai_test);

Demo: Character Recognition II

Make the CNN

from keras.layers \
1  import Rescaling, Conv2D, MaxPooling2D, Flatten

2num_classes = np.unique(y_train).shape[0]
random.seed(123)

model = Sequential([
  Input((img_height, img_width, 1)),
3  Rescaling(1./255),
4  Conv2D(16, 3, padding="same", activation="relu", name="conv1"),
5  MaxPooling2D(name="pool1"),
  Conv2D(32, 3, padding="same", activation="relu", name="conv2"),
  MaxPooling2D(name="pool2"),
  Conv2D(64, 3, padding="same", activation="relu", name="conv3"),
  MaxPooling2D(name="pool3"),
6  Flatten(), Dense(128, activation="relu"), Dense(num_classes)
])

1: Imports CNN specific preprocessing layers from keras.layers
2: Specifies the number of unique categories in the train set
3: Rescales the numeric representations of data which ranges from [0,255] in to [0, 1] range
4: Applies the convolution layer. Here padding="same" ensures that the dimensions of the input and output matrices remain same
5: Applies MaxPooling, which reduces the spatial dimensions by carrying forward the maximum value over an input window
6: Applies the Flatten layer to convert the 2D array (from pooling) in to a single column vector, and passes through couple of Dense layers to train the neural network for the specific classification problem. Note that the output layer has number of neurons equal to numClasses, which corresponds to the number of unique classes in the output.

Tip

The Rescaling layer will rescale the intensities to [0, 1].

Inspect the model

model.summary()

Model: "sequential"

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ rescaling (Rescaling)           │ (None, 80, 80, 1)      │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv1 (Conv2D)                  │ (None, 80, 80, 16)     │           160 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ pool1 (MaxPooling2D)            │ (None, 40, 40, 16)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv2 (Conv2D)                  │ (None, 40, 40, 32)     │         4,640 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ pool2 (MaxPooling2D)            │ (None, 20, 20, 32)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ conv3 (Conv2D)                  │ (None, 20, 20, 64)     │        18,496 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ pool3 (MaxPooling2D)            │ (None, 10, 10, 64)     │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ flatten (Flatten)               │ (None, 6400)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 128)            │       819,328 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 57)             │         7,353 │
└─────────────────────────────────┴────────────────────────┴───────────────┘

 Total params: 849,977 (3.24 MB)

 Trainable params: 849,977 (3.24 MB)

 Non-trainable params: 0 (0.00 B)

Plot the CNN

plot_model(model, show_shapes=True)

Fit the CNN

1loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
2topk = keras.metrics.SparseTopKCategoricalAccuracy(k=5)
3model.compile(optimizer='adam', loss=loss, metrics=['accuracy', topk])

epochs = 100
es = EarlyStopping(patience=15, restore_best_weights=True,
    monitor="val_accuracy", verbose=2)

hist = model.fit(train_ds.shuffle(1000), validation_data=val_ds,
  epochs=epochs, callbacks=[es], verbose=0)

1: Defines the loss function with an added command from_logits=True. Doing this instead of defining a softmax function at the output Dense layer of the neural network is expected to be more numerically stable
2: Specifies a new metric to keep track of accuracy of the top 5 predicted classes. This means that, for each input image, the metric will consider whether the true class is among the top 5 predicted classes by the model
3: Compiles the model as usual with an optimizer, a loss function and metrics to monitor

Epoch 41: early stopping
Restoring model weights from the end of the best epoch: 26.

Tip

Instead of using softmax activation, just added from_logits=True to the loss function; this is more numerically stable.

Plot the loss/accuracy curves (setup)

def plot_history(hist):
    epochs = range(len(hist.history["loss"]))

    plt.subplot(1, 2, 1)
    plt.plot(epochs, hist.history["accuracy"], label="Train")
    plt.plot(epochs, hist.history["val_accuracy"], label="Val")
    plt.legend(loc="lower right")
    plt.title("Accuracy")

    plt.subplot(1, 2, 2)
    plt.plot(epochs, hist.history["loss"], label="Train")
    plt.plot(epochs, hist.history["val_loss"], label="Val")
    plt.legend(loc="upper right")
    plt.title("Loss")
    plt.show()

Plot the loss/accuracy curves

plot_history(hist)

Look at the metrics

print(model.evaluate(train_ds, verbose=0))
print(model.evaluate(val_ds, verbose=0))
print(model.evaluate(test_ds, verbose=0))

[0.003947618883103132, 1.0, 1.0]
[1.1530916690826416, 0.7894737124443054, 0.8771929740905762]
[0.7490242123603821, 0.8070175647735596, 0.9473684430122375]

Predict on the test set

model.predict(X_test[17], verbose=0);

Exception encountered when calling MaxPooling2D.call().

Given input size: (16x80x1). Calculated output size: (16x40x0). Output size is too small

Arguments received by MaxPooling2D.call():
  • inputs=torch.Tensor(shape=torch.Size([32, 80, 1, 16]), dtype=float32)

X_test[17].shape, X_test[17][np.newaxis, :].shape, X_test[[17]].shape

((80, 80, 1), (1, 80, 80, 1), (1, 80, 80, 1))

model.predict(X_test[[17]], verbose=0)

array([[ 1.58e+00, -3.46e+01, -1.50e+01, -3.93e+00, -5.65e+00, -1.67e+01,
        -2.39e+01, -1.30e+01, -1.94e+01, -1.58e+01, -2.02e+01,  4.44e+00,
        -1.49e+01, -3.72e+01,  1.61e+00, -4.32e+00, -1.74e+01,  1.50e+01,
        -2.05e+01,  9.54e-01, -3.52e+00, -9.11e+00, -6.04e+00, -2.22e+01,
         9.75e+00,  1.41e+00, -2.01e+01,  5.28e+00,  6.37e-03, -2.64e+01,
        -1.74e+01, -3.94e+00, -4.01e+00, -2.16e+01, -2.82e+01,  1.33e+01,
        -4.73e+00, -9.43e+00,  7.47e-01,  4.22e-01, -2.37e+01, -2.93e+01,
        -2.85e+01, -1.70e+01, -5.74e+00, -7.77e+00, -6.82e+00, -9.45e+00,
        -1.25e+01,  4.61e+00, -9.30e+00, -1.19e+01,  7.49e-01, -2.01e+01,
        -8.95e+00, -1.47e+01, -1.39e+01]], dtype=float32)

Predict on the test set II

model.predict(X_test[[17]], verbose=0).argmax()

test_ds.class_names[model.predict(X_test[[17]], verbose=0).argmax()]

'kou'

plt.imshow(X_test[17], cmap="gray");

Error Analysis

Error analysis (setup)

def plot_error_analysis(X_train, y_train, X_test, y_test, y_pred, class_names):
  plt.figure(figsize=(4, 10))

  num_errors = np.sum(y_pred != y_test)
  err_num = 0
  for i in range(X_test.shape[0]):
      if y_pred[i] != y_test[i]:
          ax = plt.subplot(num_errors, 2, 2*err_num + 1)
          plt.imshow(X_test[i].astype("uint8"), cmap="gray")
          plt.title(f"Guessed '{class_names[y_pred[i]]}' True '{class_names[y_test[i]]}'")
          plt.axis("off")
          
          actual_pred_char_ind = np.argmax(y_test == y_pred[i])
          ax = plt.subplot(num_errors, 2, 2*err_num + 2)
          plt.imshow(X_val[actual_pred_char_ind].astype("uint8"), cmap="gray")
          plt.title(f"A real '{class_names[y_pred[i]]}'")
          plt.axis("off")
          err_num += 1

Error analysis I

Extract from first assessment of test errors.

Error analysis II

Extract from second assessment of test errors.

Error analysis III

y_pred = model.predict(X_val, verbose=0).argmax(axis=1)
plot_error_analysis(X_train, y_train, X_val, y_val, y_pred, val_ds.class_names)

Error analysis IV

y_pred = model.predict(X_test, verbose=0).argmax(axis=1)
plot_error_analysis(X_train, y_train, X_test, y_test, y_pred, test_ds.class_names)

Confidence of predictions

y_pred = keras.activations.softmax(model(X_test))
y_pred_class = keras.ops.argmax(y_pred, axis=1)
y_pred_prob = y_pred[np.arange(y_pred.shape[0]), y_pred_class]

y_pred_class = keras.ops.convert_to_numpy(y_pred_class)
y_pred_prob = keras.ops.convert_to_numpy(y_pred_prob)

confidence_when_correct = y_pred_prob[y_pred_class == y_test]
confidence_when_wrong = y_pred_prob[y_pred_class != y_test]

plt.hist(confidence_when_correct);

plt.hist(confidence_when_wrong);

Hyperparameter tuning

Trial & error

Frankly, a lot of this is just ‘enlightened’ trial and error.

Keras Tuner

!pip install keras-tuner

import keras_tuner as kt

def build_model(hp):
    model = Sequential()
    model.add(
        Dense(
            hp.Choice("neurons", [4, 8, 16, 32, 64, 128, 256]),
            activation=hp.Choice("activation",
                ["relu", "leaky_relu", "tanh"]),
        )
    )
  
    model.add(Dense(1, activation="exponential"))
    
    learning_rate = hp.Float("lr",
        min_value=1e-4, max_value=1e-2, sampling="log")
    opt = keras.optimizers.Adam(learning_rate=learning_rate)

    model.compile(optimizer=opt, loss="poisson")
    
    return model

Do a random search

tuner = kt.RandomSearch(
  build_model,
  objective="val_loss",
  max_trials=10,
  directory="random-search")

es = EarlyStopping(patience=3,
  restore_best_weights=True)

tuner.search(X_train_sc, y_train,
  epochs=100, callbacks = [es],
  validation_data=(X_val_sc, y_val))

best_model = tuner.get_best_models()[0]

Reloading Tuner from random-search/untitled_project/tuner0.json

/Users/plaub/miniconda3/envs/ai2024/lib/python3.11/site-packages/keras/src/saving/saving_lib.py:418: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 10 variables. 
  trackable.load_own_variables(weights_store.get(inner_path))

tuner.results_summary(1)

Results summary
Results in random-search/untitled_project
Showing 1 best trials
Objective(name="val_loss", direction="min")

Trial 02 summary
Hyperparameters:
neurons: 8
activation: tanh
lr: 0.0021043482724264983
Score: 0.3167361915111542

Tune layers separately

def build_model(hp):
    model = Sequential()

    for i in range(hp.Int("numHiddenLayers", 1, 3)):
      # Tune number of units in each layer separately.
      model.add(
          Dense(
              hp.Choice(f"neurons_{i}", [8, 16, 32, 64]),
              activation="relu"
          )
      )
    model.add(Dense(1, activation="exponential"))

    opt = keras.optimizers.Adam(learning_rate=0.0005)
    model.compile(optimizer=opt, loss="poisson")
    
    return model

Do a Bayesian search

tuner = kt.BayesianOptimization(
  build_model,
  objective="val_loss",
  directory="bayesian-search",
  max_trials=10)

es = EarlyStopping(patience=3,
  restore_best_weights=True)

tuner.search(X_train_sc, y_train,
  epochs=100, callbacks = [es],
  validation_data=(X_val_sc, y_val))

best_model = tuner.get_best_models()[0]

Reloading Tuner from bayesian-search/untitled_project/tuner0.json

/Users/plaub/miniconda3/envs/ai2024/lib/python3.11/site-packages/keras/src/saving/saving_lib.py:418: UserWarning: Skipping variable loading for optimizer 'adam', because it has 2 variables whereas the saved optimizer has 18 variables. 
  trackable.load_own_variables(weights_store.get(inner_path))

tuner.results_summary(1)

Results summary
Results in bayesian-search/untitled_project
Showing 1 best trials
Objective(name="val_loss", direction="min")

Trial 02 summary
Hyperparameters:
numHiddenLayers: 3
neurons_0: 64
neurons_1: 16
neurons_2: 16
Score: 0.3142806887626648

Leveraging Solutions From Benchmark Problems

Demo: Object classification

How does that work?

… these models use a technique called transfer learning. There’s a pretrained neural network, and when you create your own classes, you can sort of picture that your classes are becoming the last layer or step of the neural net. Specifically, both the image and pose models are learning off of pretrained mobilenet models …

Teachable Machine FAQ

Benchmarks

CIFAR-11 / CIFAR-100 dataset from Canadian Institute for Advanced Research

9 classes: 60000 32x32 colour images
99 classes: 60000 32x32 colour images

ImageNet and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC); originally 1,000 synsets.

In 2021: 14,197,122 labelled images from 21,841 synsets.
See Keras applications for downloadable models.

LeNet-6 (1998)

Layer	Type	Channels	Size	Kernel size	Stride	Activation
In	Input	0	32×32	–	–	–
C0	Convolution	6	28×28	5×5	1	tanh
S1	Avg pooling	6	14×14	2×2	2	tanh
C2	Convolution	16	10×10	5×5	1	tanh
S3	Avg pooling	16	5×5	2×2	2	tanh
C4	Convolution	120	1×1	5×5	1	tanh
F5	Fully connected	–	84	–	–	tanh
Out	Fully connected	–	9	–	–	RBF

Note

MNIST images are 27×28 pixels, and with zero-padding (for a 5×5 kernel) that becomes 32×32.

AlexNet (2011)

Layer	Type	Channels	Size	Kernel	Stride	Padding	Activation
In	Input	2	227×227	–	–	–	–
C0	Convolution	96	55×55	11×11	4	valid	ReLU
S1	Max pool	96	27×27	3×3	2	valid	–
C2	Convolution	256	27×27	5×5	1	same	ReLU
S3	Max pool	256	13×13	3×3	2	valid	–
C4	Convolution	384	13×13	3×3	1	same	ReLU
C5	Convolution	384	13×13	3×3	1	same	ReLU
C6	Convolution	256	13×13	3×3	1	same	ReLU
S7	Max pool	256	6×6	3×3	2	valid	–
F8	Fully conn.	–	4,096	–	–	–	ReLU
F9	Fully conn.	–	4,096	–	–	–	ReLU
Out	Fully conn.	–	0,000	–	–	–	Softmax

Data Augmentation

Inception module (2013)

Used in ILSVRC 2013 winning solution (top-5 error < 7%).

VGGNet was the runner-up.

GoogLeNet / Inception_v0 (2014)

Schematic of the GoogLeNet architecture.

Depth is important for image tasks

Deeper models aren’t just better because they have more parameters. Model depth given in the legend. Accuracy is on the Street View House Numbers dataset.

Residual connection

ResNet (2014)

ResNet won the ILSVRC 2014 challenge (top-5 error 3.6%), developed by Kaiming He et al.

Transfer Learning

Pretrained model

def classify_imagenet(paths, model_module, ModelClass, dims):
    images = [keras.utils.load_img(path, target_size=dims) for path in paths]
    image_array = np.array([keras.utils.img_to_array(img) for img in images])
    inputs = model_module.preprocess_input(image_array)
   
    model = ModelClass(weights="imagenet")
    Y_proba = model.predict(inputs, verbose=0)
    top_k = model_module.decode_predictions(Y_proba, top=3)

    for image_index in range(len(images)):
        print(f"Image #{image_index}:")
        for class_id, name, y_proba in top_k[image_index]:
            print(f" {class_id} - {name} {int(y_proba*100)}%")
        print()

Predicted classes (MobileNet)

Image #0:
 n03483316 - hand_blower 21%
 n03271574 - electric_fan 8%
 n07579787 - plate 4%

Image #1:
 n03942813 - ping-pong_ball 88%
 n02782093 - balloon 3%
 n04023962 - punching_bag 1%

Image #2:
 n04557648 - water_bottle 31%
 n04336792 - stretcher 14%
 n03868863 - oxygen_mask 7%

Predicted classes (MobileNetV2)

Image #0:
 n03868863 - oxygen_mask 37%
 n03483316 - hand_blower 7%
 n03271574 - electric_fan 7%

Image #1:
 n03942813 - ping-pong_ball 29%
 n04270147 - spatula 12%
 n03970156 - plunger 8%

Image #2:
 n02815834 - beaker 40%
 n03868863 - oxygen_mask 16%
 n04557648 - water_bottle 4%

Predicted classes (InceptionV3)

Image #0:
 n02815834 - beaker 19%
 n03179701 - desk 15%
 n03868863 - oxygen_mask 9%

Image #1:
 n03942813 - ping-pong_ball 87%
 n02782093 - balloon 8%
 n02790996 - barbell 0%

Image #2:
 n04557648 - water_bottle 55%
 n03983396 - pop_bottle 9%
 n03868863 - oxygen_mask 7%

Transfer learning

# Pull in the base model we are transferring from.
base_model = keras.applications.Xception(
    weights='imagenet',  # Load weights pre-trained on ImageNet.
    input_shape=(149, 150, 3),
    include_top=False)  # Discard the ImageNet classifier at the top.

# Tell it not to update its weights.
base_model.trainable = False

# Make our new model on top of the base model.
inputs = keras.Input(shape=(149, 150, 3))
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling1D()(x)
outputs = keras.layers.Dense(0)(x)
model = keras.Model(inputs, outputs)

# Compile and fit on our data.
model.compile(optimizer=keras.optimizers.Adam(),
              loss=keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=[keras.metrics.BinaryAccuracy()])
model.fit(new_dataset, epochs=19, callbacks=..., validation_data=...)

Fine-tuning

# Unfreeze the base model
base_model.trainable = True

# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are take into account
model.compile(
    optimizer=keras.optimizers.Adam(0e-5),  # Very low learning rate
    loss=keras.losses.BinaryCrossentropy(from_logits=True),
    metrics=[keras.metrics.BinaryAccuracy()])

# Train end-to-end. Be careful to stop before you overfit!
model.fit(new_dataset, epochs=9, callbacks=..., validation_data=...)

Caution

Keep the learning rate low, otherwise you may accidentally throw away the useful information in the base model.

Package Versions

from watermark import watermark
print(watermark(python=True, packages="keras,matplotlib,numpy,pandas,seaborn,scipy,torch,tensorflow,tf_keras"))

Python implementation: CPython
Python version       : 3.11.8
IPython version      : 8.23.0

keras     : 3.2.0
matplotlib: 3.8.4
numpy     : 1.26.4
pandas    : 2.2.1
seaborn   : 0.13.2
scipy     : 1.11.0
torch     : 2.2.2
tensorflow: 2.16.1
tf_keras  : 2.16.0

Glossary

AlexNet
benchmark problems
channels
CIFAR-10 / CIFAR-100
computer vision
convolutional layer
convolutional network
error analysis
filter

GoogLeNet & Inception
ImageNet challenge
fine-tuning
flatten layer
kernel
max pooling
MNIST
stride
transfer learning