Create an image dataset from Google Images and classify the images using Fast.ai

Plaban Nayak
4 min readJul 6, 2020

Deep Learning involving images can be a fascinating field to work with. And most probably the project involves working with Convolutional Neural Networks. Whether it is an image classification or image recognition based project, there is always one common factor, a lot of images. And most of the time you need lots of them to carry out the process of deep learning properly.

We neither want you model to overfit nor underfit. You also don’t want that your model should recognize images wrongly. Well, there is only one way out of it. Get a lot of image data. But sometimes it is not that easy to get perfect images from a website.So we attempt to build our own image dataset for a deep learning project

  1. First, head to Google Images. Then type ‘forests satellite images’. You will find a lot of relevant images.

2. Scroll down until you get all the relevant images that you need. You can also scroll down till you see no more images are loading. Now open the browser’s developer console by right-clicking and going to Inspect.

3. Now click on Console tab.

Copy and paste the following line of code in the console window.

urls=Array.from(document.querySelectorAll(‘.rg_i’)).map(el=> el.hasAttribute(‘data-src’)?el.getAttribute(‘data-src’):el.getAttribute(‘data-iurl’));

window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(‘\n’)));

After you hit Enter, a file should download. This file contains all the URLs of the images.

Create directory and upload urls file into your server

Choose an appropriate name for your labeled images. You can run these steps multiple times to grab different labels.

Download the images from the url downloaded into respective folders created

View data:

After loading the data, below code will transform the data — in this step, with two lines of codes, we will do image augmentation, cropping, padding and other things, prepare the data and normalize the data.

Train the Model

Model architecture

Here will use the method of Transfer Learning to train our model.

We will use ResNet50 which is a pretty powerful CNN model trained on ImageNet dataset.

Model Interpretation

one of the most handy tools in fast.ai library “most_confused” which basically tells us where the model found the images most confusing while training

Visualize top 9 images where the model was confused

Confusion Matrix:

The accuracy is 93.75 % , and this has been achieved with minimal coding and hyperparameter tuning

Use the trained model to make predictions on new image

Image 1:

Model correctly predicted the image to belong to class 1 .i.e. Urban

Image 2:

The above image has been correctly predicted to be that of Forest Cover

References:

Connect with me

--

--