This is the notes of first lesson of the list of lessons in the Part 1 of Fast AI V3 course. Though I went though few lessons on Fast AI course of last year, I was sure to do this course. This is the first course that comes after the first version release of Fast AI library itself. Last year it was in Beta Version (think 0.7 when I took it online) and it is now into version 1.0.14.

What’s your pet

In this lesson we will build our first image classifier from scratch, and see if we can achieve world-class results. Let’s dive in!

The lines in jupyter notebook that starts with ‘%’ are called Line Magics. These are not instructions for Python to execute, but to Jupyter notebook.

They ensure that any edits to libraries you make are reloaded here automatically, and also that any charts or images displayed are shown in this notebook.

The reload_ext autoreload reloads modules automatically before entering the execution of code typed at the IPython prompt.

The next line autoreload 2 imports all modules before executing the typed code.

The documentation on autoreload is available here

The next line is to plot the graphs inside the jupyter notebook. We use matplotlib inline.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

We import all the necessary packages. We are going to work with the fastai V1 library which sits on top of Pytorch 1.0. The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models.

from fastai import *
from fastai.vision import *

Looking at the data

We are going to use the Oxford-IIIT Pet Dataset by O. M. Parkhi et al., 2012 which features 12 cat breeds and 25 dogs breeds. Our model will need to learn to differentiate between these 37 distinct categories. According to their paper, the best accuracy they could get in 2012 was 59.21%, using a complex model that was specific to pet detection, with separate “Image”, “Head”, and “Body” models for the pet photos. Let’s see how accurate we can be using deep learning!

We are going to use the untar_data function to which we must pass a URL as an argument and which will download and extract the data.

help(untar_data)
Help on function untar_data in module fastai.datasets:

untar_data(url: str, fname: Union[pathlib.Path, str] = None, dest: Union[pathlib.Path, str] = None, data=True)
    Download `url` if doesn't exist to `fname` and un-tgz to folder `dest`

The untar_data is great idea for downloading the URL provided and download and untar.

path = untar_data(URLs.PETS); path
PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet')

In Python, we can extend and add new functionalities to the existing python modules. I found this link useful for creating one.

path.ls()
[PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/images'),
 PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/annotations')]

The pathlib thats part of Python 3 has the notation / which is useful to navigate into the directory as in the actual directory. We use to create Path variables with the new location.

path_anno = path/'annotations'
path_img = path/'images'

Lets check all the images in the image directory

fnames = get_image_files(path_img)
fnames[:5]
[PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/images/leonberger_34.jpg'),
 PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/images/pug_203.jpg'),
 PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/images/Siamese_203.jpg'),
 PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/images/scottish_terrier_98.jpg'),
 PosixPath('/home/nbuser/courses/fast-ai/course-v3/nbs/data/oxford-iiit-pet/images/beagle_76.jpg')]
np.random.seed(2)
pat = r'/([^/]+)_\d+.jpg$'

The first thing we do when we approach a problem is to take a look at the data. We always need to understand very well what the problem is and what the data looks like before we can figure out how to solve it. Taking a look at the data means understanding how the data directories are structured, what the labels are and what some sample images look like.

The main difference between the handling of image classification datasets is the way labels are stored. In this particular dataset, labels are stored in the filenames themselves. We will need to extract them to be able to classify the images into the correct categories. Fortunately, the fastai library has a handy function made exactly for this, ImageDataBunch.from_name_re gets the labels from the filenames using a regular expression. A detailed explaination found on an interesting read about the same lines of code in this tutorial is here.

Loading Images:

ImageDataBunch is used to do classification based on images. We use the method from_name_re to represent that the name of the classification is to be got from the name of the file using a regular expression. It takes the following parameters:

  • path_img the path of the images directory
  • fnames the list of files in that directory
  • pat the regex pattern that is used to extract label from the file name
  • ds_tfms the transformations that are needed for the image. This includes centering, cropping and zooming of the images.
  • size the size to which the image is to be resized. This is usually a square image. This is done because of the limitation in the GPU that the GPU performs faster only when it has to do similar computations (such as matrix multiplication, addition and so on) on all the images.

Data Normalisation:

This is done to ensure that the images are easy to do mathematical calculations that we are looking for after that. This includes changing the range of values of RGB from 0-255 to -1 to 1.This is because we have 3 color channels namely Red, Green and Blue. For the pixel values we might have some color channels that varies slightly and some that doesn’t. So, we need to normalise the images with mean as 0 and standard deviation as 1.

One another thing to note is that we are using the residual network, which is pretrained. So, we must use the same normalisation that the residual network is using in order to use the best of the pretrained model.

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224)
data.normalize(imagenet_stats)
<fastai.vision.data.ImageDataBunch at 0x7f21d8961eb8>

Now, we can look into the image samples along with the classification name to check if everything that we have done thus far is doing great.

Its important to check this as we may understand some images might have some kind of issue over the other, like rotated in odd ways, just text on it, 2 different categories of classificatiers on it and so on.

doc(ImageDataBunch)
data.show_batch(rows=3, figsize=(7,6))

We use the data.classes to indicate the total number of distinct labels that were found. In our case, since wwe have extracted the labels from the regular expression, it indicates the number of distinct labels that were extracted from the regular expression.

data.c gives the total number of classifications that were found in the dataset

print(data.classes)
len(data.classes),data.c
['leonberger', 'pug', 'Siamese', 'scottish_terrier', 'beagle', 'Birman', 'Abyssinian', 'great_pyrenees', 'chihuahua', 'havanese', 'japanese_chin', 'yorkshire_terrier', 'Persian', 'Ragdoll', 'pomeranian', 'newfoundland', 'Bombay', 'shiba_inu', 'german_shorthaired', 'Bengal', 'samoyed', 'boxer', 'wheaten_terrier', 'miniature_pinscher', 'english_cocker_spaniel', 'Maine_Coon', 'Sphynx', 'British_Shorthair', 'staffordshire_bull_terrier', 'keeshond', 'saint_bernard', 'american_pit_bull_terrier', 'Russian_Blue', 'american_bulldog', 'english_setter', 'Egyptian_Mau', 'basset_hound']





(37, 37)

Training: resnet34

Now we will start training our model. We will use a convolutional neural network backbone and a fully connected head with a single hidden layer as a classifier. Since we are using Residual Network with 34 hidden units, all we have to do is to add a layer at the end on the residual network to transform the dimension of the residual network to the required output. In our case, it is to the 37 possible outputs.

We will train for 4 epochs (4 cycles through all our data).

We create a learner object that takes the data, network and the metrics . The metrics is just used to print out how the training is performing. We choose to print out the error_rate.

learn = create_cnn(data, models.resnet34, metrics=error_rate)
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /home/nbuser/.torch/models/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:02<00:00, 29535503.58it/s]
learn.fit_one_cycle(4)
Total time: 03:19
epoch  train_loss  valid_loss  error_rate
1      1.156555    0.291909    0.091151    (00:53)
2      0.505356    0.249506    0.077179    (00:48)
3      0.312138    0.212065    0.074518    (00:49)
4      0.240234    0.198288    0.069195    (00:48)
learn.save('stage-1')

Results

Let’s see what results we have got.

We will first see which were the categories that the model most confused with one another. We will try to see if what the model predicted was reasonable or not. In this case the mistakes look reasonable (none of the mistakes seems obviously naive). This is an indicator that our classifier is working correctly.

Furthermore, when we plot the confusion matrix, we can see that the distribution is heavily skewed: the model makes the same mistakes over and over again but it rarely confuses other categories. This suggests that it just finds it difficult to distinguish some specific categories between each other; this is normal behaviour.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(9, figsize=(15,11))

png

doc(interp.plot_top_losses)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

png

interp.most_confused(min_val=2)
[('Ragdoll', 'Birman', 6),
 ('staffordshire_bull_terrier', 'american_pit_bull_terrier', 5),
 ('american_pit_bull_terrier', 'staffordshire_bull_terrier', 5),
 ('Egyptian_Mau', 'Bengal', 5),
 ('Birman', 'Ragdoll', 4),
 ('British_Shorthair', 'Russian_Blue', 4),
 ('Bengal', 'Egyptian_Mau', 3),
 ('english_cocker_spaniel', 'english_setter', 3),
 ('Maine_Coon', 'Ragdoll', 3),
 ('american_pit_bull_terrier', 'american_bulldog', 3),
 ('american_bulldog', 'staffordshire_bull_terrier', 3)]

Unfreezing, fine-tuning, and learning rates

Since our model is working as we expect it to, we will unfreeze our model and train some more.

learn.unfreeze()
learn.fit_one_cycle(1)
Total time: 01:07
epoch  train_loss  valid_loss  error_rate
1      1.070711    0.524961    0.168995    (01:07)

Since the Model underperformed while training after unfreeze, we would like to move to our previous best model that we have saved stage-1. We will finetune to improve from here.

learn.load('stage-1')

We use the lr_find method to find the optimum learning rate. Learning Rate is an important hyper-parameter to look for. We traditionally use $\alpha$ to denote this parameter. If the Learning rate is too slow, we take more time to reach the most accurate result. If it is too high, we might not even end up reaching the accurate result. Learning Rate Finder was idea of automatically getting the magic number (which is near perfect), to get this optimum learning rate. This was introducted in last year’s Fast AI course and continues to be useful.

learn.lr_find()
LR Finder complete, type {learner_name}.recorder.plot() to see the graph.

After we run the finder, we plot the graph between loss and learning rate. We see a graph and typically choose a higher learning rate for which the loss is minimal. The higher learning rate makes sure that the machine ends up learning faster.

learn.recorder.plot()

png

We see around $1e^{-4}$ mark, we have a optimum learning rate. Now that we know the optimum learning rate.

Considering the fact that we are using a pretrained model of resnet-34, we know for sure that our previous layers of this neural network would learn to detect the edges and the later layers would learn complicated shapes such as the dogs and cats itself. We don’t want to ruin out the earlier layers which presumably does a good job of detecting the edges. But would like to improve the model in narrowing down classifying the image of dogs and cats to our needs, which is done in the later layers.

So, we will set a lower learning rate for earlier layers and higher one for the last layers.

The slice is used to provide the learning rate wherein, we just provide the range of learning rates (its min and max). The learning rate is set gradually higher as we move from the earlier layer to the latest layers.

learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,1e-4))
Total time: 02:14
epoch  train_loss  valid_loss  error_rate
1      0.199850    0.185868    0.064538    (01:07)
2      0.194062    0.182656    0.065203    (01:07)

That’s a pretty accurate model!

Training: resnet50

Now we will train in the same way as before but with one caveat: instead of using resnet34 as our backbone we will use resnet50 (resnet34 is a 34 layer residual network while resnet50 has 50 layers. Later in the course you can learn the details in the resnet paper).

Basically, resnet50 usually performs better because it is a deeper network with more parameters. Let’s see if we can achieve a higher performance here.

data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=48)
data.normalize(imagenet_stats)
<fastai.vision.data.ImageDataBunch at 0x7f21d89b3860>
learn = create_cnn(data, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(5)
Total time: 16:40
epoch  train_loss  valid_loss  error_rate
1      0.646226    0.257891    0.085036    (03:53)
2      0.348598    0.244830    0.082399    (03:11)
3      0.236005    0.192446    0.061964    (03:11)
4      0.149788    0.147233    0.044825    (03:11)
5      0.100550    0.138161    0.048121    (03:11)
learn.save('stage-1-50')

It’s astonishing that it’s possible to recognize pet breeds so accurately! Let’s see if full fine-tuning helps:

learn.unfreeze()
learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4))
Total time: 04:27
epoch  train_loss  valid_loss  error_rate
1      0.088951    0.151667    0.050758    (04:27)

In this case it doesn’t, so let’s go back to our previous model.

learn.load('stage-1-50')

We now load the previous best model and would like to improve upon that model.

interp = ClassificationInterpretation.from_learner(learn)
interp.most_confused(min_val=2)
[('american_pit_bull_terrier', 'staffordshire_bull_terrier', 6),
 ('Ragdoll', 'Birman', 5),
 ('Egyptian_Mau', 'Bengal', 5),
 ('Ragdoll', 'Persian', 4),
 ('staffordshire_bull_terrier', 'american_bulldog', 4),
 ('Maine_Coon', 'Ragdoll', 3)]
learn.lr_find()
LR Finder complete, type {learner_name}.recorder.plot() to see the graph.
learn.recorder.plot()

png

learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-6,3e-4))
Total time: 08:26
epoch  train_loss  valid_loss  error_rate
1      0.116429    0.138112    0.049440    (04:17)
2      0.084291    0.129927    0.044166    (04:08)
learn.save('stage-2-50')

Save the model as it seems a little more accurate

Other data formats

path = untar_data(URLs.MNIST_SAMPLE); path
PosixPath('/home/jhoward/.fastai/data/mnist_sample')
tfms = get_transforms(do_flip=False)
data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)
data.show_batch(rows=3, figsize=(5,5))

png

learn = ConvLearner(data, models.resnet18, metrics=accuracy)
learn.fit(2)
VBox(children=(HBox(children=(IntProgress(value=0, max=2), HTML(value='0.00% [0/2 00:00<00:00]'))), HTML(value…


Total time: 00:11
epoch  train loss  valid loss  accuracy
1      0.108823    0.025363    0.991168  (00:05)
2      0.061547    0.020443    0.994112  (00:05)
df = pd.read_csv(path/'labels.csv')
df.head()
name label
0 train/3/7463.png 0
1 train/3/21102.png 0
2 train/3/31559.png 0
3 train/3/46882.png 0
4 train/3/26209.png 0
data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)
data.show_batch(rows=3, figsize=(5,5))
data.classes
[0, 1]

png

data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes
[0, 1]
fn_paths = [path/name for name in df['name']]; fn_paths[:2]
[PosixPath('/home/jhoward/.fastai/data/mnist_sample/train/3/7463.png'),
 PosixPath('/home/jhoward/.fastai/data/mnist_sample/train/3/21102.png')]
pat = r"/(\d)/\d+\.png$"
data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)
data.classes
['3', '7']
data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24,
        label_func = lambda x: '3' if '/3/' in str(x) else '7')
data.classes
['3', '7']
labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]
labels[:5]
['3', '3', '3', '3', '3']
data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)
data.classes
['3', '7']