There are various classical dances that are available all throughout in India. These classical dances are culture specific. For example, Tamil Nadu follows a classical dance called Bharathanaatiyam where as its nearby state, Kerala has a classical dance of Kathakali.

Being not much knowledgable in any of the classical dances, I was triggered by the idea of Jeremy Howard who was suggesting that we might be able to solve deep learning problems even if we ourselves arent the domain experts. So, I decided to try myself out if I could build a model that can perform well in classifying the classical dance names.

I have already done the same. But this time on the latest version of FastAi (V2) and is part of the ongoing course on it which is going to be public by June 2020. If you have checkd out the previous blog post on this, You may consider this as just a version updated blog.

Steps

To build a classifier, we have to build a classifier on our own that can be run end to end, we need to follow the below steps.

  • Download the dataset
  • Do preprocessing of data, if any
  • Create a model to detect the classical dance
  • Deploy it using a single GUI

Downloading the Dataset

To download a dataset, we need some Image Search API. Bing is one such API which provides better image downloading capabilities. But we may have to register to Bing Image Search to get the access key to use the API.

from azure.cognitiveservices.search.imagesearch import ImageSearchClient as api
from msrest.authentication import CognitiveServicesCredentials as auth
from fastai2.vision.all import *
from fastai2.vision.widgets import *

def search_images_bing(key, term, min_sz=128,count=150):
    client = api('https://api.cognitive.microsoft.com', auth(key))
    return L(client.images.search(query=term, count=count, min_height=min_sz, min_width=min_sz).value)

after signing up, add the key in the below cell and list the classes that you want to classify

key='xxx'
classes = ['Bharathanatyam','Kathakali','Kathak','jagoi dance']

Select a path where you would love to do image classification

path=Path('./data/image-classification')

Download images for each class and put it in a different folder where the folder name specifies the class name

path.mkdir(exist_ok=True)
for o in classes:
    dest = (path/o)
    dest.mkdir(exist_ok=True)
    results = search_images_bing(key, o)
    download_images(dest, urls=results.attrgot('content_url'))

After downloading images, verify if the files are generated as appropriate

fns = get_image_files(path)
fns
(#591) [Path('data/image-classification/Bharathanatyam/00000080.jpg'),Path('data/image-classification/Bharathanatyam/00000001.jpg'),Path('data/image-classification/Bharathanatyam/00000125.jpg'),Path('data/image-classification/Bharathanatyam/00000062.jpg'),Path('data/image-classification/Bharathanatyam/00000124.jpg'),Path('data/image-classification/Bharathanatyam/00000015.jpg'),Path('data/image-classification/Bharathanatyam/00000144.jpg'),Path('data/image-classification/Bharathanatyam/00000033.jpg'),Path('data/image-classification/Bharathanatyam/00000140.jpg'),Path('data/image-classification/Bharathanatyam/00000071.jpg')...]

Now that we have downloaded the images. Our next step is to train the model to learn identifying classical dance images.

Look into the images

After we downloaded the images, we need to check if the downloaded images have some corrupt images as well. These images will not open for some reason and hence break when we try to open. We dont want to train our network with these images.

Fast AI comes with a method to verify these images using verify_images method

failed = verify_images(fns)
failed
(#2) [Path('data/image-classification/Kathak/00000069.jpg'),Path('data/image-classification/Kathak/00000117.jpg')]

Delete the corrupted images by using unlink method in Path

failed.map(Path.unlink)
(#2) [None,None]

Creating DataBlock

Creating a datablock is the next step. DataBlock is the place where we specify essentially everything about the data that has to be done before giving it to the model.

classical_dances_db = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.3, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

In the above code:

  • blocks is used to specify the way to generate Dependent and independent variables
  • get_items provides a way to get each item from the path. In this case, get individual image paths as files
  • splitter provides a way to split validation and training data. We have made sure the same set of images are used evertime we randomly divide the dataset into training and test set.
  • get_y a method to get Y value out of individual path value. Since getting Y from the path is little tricky i.e looking at the parent folder name, we specify the way to get y
  • item_tfms specifies transformations that are to be doneat item level. Such as resizing to 128*128 image (usually done through center cropping)

There are some transforms like moving to GPU, Normalisation that are done implicitly.

Now we specified the shell of the data, we now load the data to be used for training using dataloaders. The dataloaders convinently converts all the data that are provided and do all the transformations to be done so that the data can be fed into the model for training.

dls = classical_dances_db.dataloaders(path)

Now that we have transformed the data, we might look at the data if it is transformed right.

dls.show_batch(max_n=4, nrows=1)

Now after some analysis we find that these images like ok. So, lets start by looking into training the machine

Training the model

Train the model with ResNet-18 and finetune 4 epochs

As we are all set for training the machine, we can continue to train the machine learn the type of dance from image

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
epoch train_loss valid_loss error_rate time
0 1.740276 2.458594 0.522727 00:04
epoch train_loss valid_loss error_rate time
0 0.900825 0.744960 0.244318 00:04
1 0.693868 0.367972 0.102273 00:04
2 0.556724 0.302496 0.102273 00:04
3 0.457315 0.283369 0.096591 00:04

Here we created a cnn_learner which takes:

  • dls - The data to be loaded
  • arch - Pretrained Resnet Model (resnet18) so that we can ran faster
  • metrics - The metrics to be displayed after training

Understanding the model

we will analyse the model so that we can get enough information out of it

Confusion Matrix is a matrix which tabulates the Actual Vs Predicted output. If both are same, then out machine did great job in identifying the image. Lets see how our model preformed

Look at the confusion matrix to see which one goes wrong mostly

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

The confusion is mostly around Kathak. The model gets wrong on Kathak often with Bharathanatyam and jagoi dance.

Look at the most confused images

interp.plot_top_losses(4, nrows=2)

Cleanup

There can be cases where some images are messy. Like, The search result which has a dancer who is famous in, say Bharathanatyam performing Kathak. These images can confuse the model in learning. Lets look if thats the case

This is also a place where we have to probably consult people who are data experts. They might help us to know if the image we call as Kathak were actually of Kathak Dance per say.

FastAI comes with this very cool widget which is very useful in these cases

cleaner = ImageClassifierCleaner(learn)
cleaner

The output of the previous one is an interactive widget which allows you to select images that were wrongly classified or images that are to be deleted.

Some common images that had to be removed includes:

  • Photos of probably famous dances that were taken during some interview or award ceremony
  • Agenda of the dance festival
  • The ornaments and dresses thats associated with that dance
  • The image of artist while they do the make over or dressing while preparing for the dance.

We had Images one of each type. Hence we do the appropriate using change and delete methods.

cleaner.change()
cleaner.delete()
(#1) [7]

Rerunning the model after cleaning the data

Since the mode is rerun, the model might have a better at understanding now since the confusing images are removed. So, We are essentially repeating things that we were doing earlier

classical_dances_db_2 = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.3, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))
dls = classical_dances_db_2.dataloaders(path)
dls.show_batch(max_n=4, nrows=1)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
epoch train_loss valid_loss error_rate time
0 1.949091 1.575333 0.369318 00:04
epoch train_loss valid_loss error_rate time
0 0.945574 0.684837 0.238636 00:04
1 0.711181 0.396144 0.119318 00:04
2 0.528489 0.341853 0.090909 00:04
3 0.423833 0.317606 0.102273 00:04
learn.fine_tune(4)
epoch train_loss valid_loss error_rate time
0 0.113944 0.557393 0.096591 00:04
epoch train_loss valid_loss error_rate time
0 0.058908 0.532337 0.113636 00:04
1 0.060650 0.552075 0.125000 00:04
2 0.040418 0.566424 0.119318 00:04
3 0.031581 0.566204 0.102273 00:04
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

Add Augumentations

We do Data Augumentations, as this is a very basic work, we try to have default augumentations in aug_tfms. and try the process one more time

classical_dances_db_3 = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.3, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128),
    batch_tfms=aug_transforms())
dls = classical_dances_db_3.dataloaders(path)
dls.show_batch(max_n=4, nrows=1)
learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
epoch train_loss valid_loss error_rate time
0 1.992708 1.989776 0.454545 00:04
epoch train_loss valid_loss error_rate time
0 0.966205 0.912835 0.278409 00:04
1 0.805900 0.455987 0.147727 00:04
2 0.669330 0.356659 0.113636 00:04
3 0.567247 0.332295 0.107955 00:04
learn.fine_tune(4)
epoch train_loss valid_loss error_rate time
0 0.317089 0.318796 0.107955 00:04
epoch train_loss valid_loss error_rate time
0 0.392914 0.311143 0.096591 00:04
1 0.337256 0.339022 0.113636 00:04
2 0.318557 0.297601 0.096591 00:04
3 0.292514 0.286991 0.090909 00:04
interp.plot_top_losses(4, nrows=2)

Export the model

There is obviously more things that can be done here. But with some good defaults, we get around 91% accuracy. Since we are satisfied with them, lets export the model by calling learn.export()

learn.export()

Loading and Predicting the model

path = Path('/')
path.ls(file_exts='.pkl')
(#1) [Path('export.pkl')]

We can load the model using load_learner method.

learn_inf = load_learner(path/'export.pkl')

Now we can load a random image. It can be outside our dataset. Just for ease I am choosing one from the dataset itself.

path=Path('./data/image-classification')
paths=(path/'Bharathanatyam').ls()
paths[0]
Path('data/image-classification/Bharathanatyam/00000080.jpg')

Now we predict by calling the predict method passing in the path of the image

learn_inf.predict(str(paths[0]))
('Bharathanatyam',
 tensor(0),
 tensor([9.9981e-01, 8.2153e-05, 8.2317e-05, 2.8997e-05]))

As you can see, the file was in the Bharathanatyam folder and the prediction was also Bharathanatyam with the confidence of 99.98%. Hence we got the correct prediction

If you have a real world data and see how it is performing with your set of images, test it here.

It might take some time to start the app as we are using binder to get it running though