Deep Dive Into Image Detection using Tensorflow

Home » Deep Dive Into Image Detection using Tensorflow

Your brain is really faster than what you think, it takes only 13 milliseconds to see an image, this is something discovered by scientists. But for a computer , detecting images is quite a hard problem.

img 1. Eyes can detect image very easily, but for computers its not that easy

In the last few years mindblowing progress of Machine Learning algorithms have addressed these problems. The use of Neural Networks have brought tremendous progress in this area, specially the use of Convolutional Neural Networks. Convolutional Neural Network is a class of deep, feed forward artificial neural networks that has successfully been applied to analyzing visual imagery.

img 2. a high level picture of CNN with hidden layers

Today I am going to share my experience with Tensorflow for Image Detection. In my case I had a number of pills images, Here is a sample image,

img 3. sample pill image
So, Here our goal is to train the machine with the given pills images, so that any image of the pill that was used to train the machine can be recognised by tensorflow.
At first this seems quite easy, but actually – it’s not!!!
There are two main steps involved for image detection using tensorflow:
  1. Suppose you have a very good machine learning algorithm running, but as long as you do not provide a proper input to it your algorithm would not provide a proper output. Here  is a proper dataset. In this case we also need a proper dataset, In later part I will discuss how to create the dataset
    img 4. In machine learning proper input is important in order to get proper output

    2. The next step is to train your model properly. The trainining will use the dataset that has been used in the previous step.

In this blog I will only be talking about how to make the Dataset:

Now creating the dataset itself contains some steps, let’s cover those one by one.

Collecting the Images: It is recommended to have atlease 100 images of the same class to build a good dataset. Now in my case I had only two images provided by the client, so I needed to create some synthetic images by rotating the image into several different angles, blurring the image, deteriorating the image and also resizing the image.


The above images show how I have created many images from one image, by just rotating the images into various angles. I have used OpenCV to perform these.

Annotating the Images: The next very important step is annotating/labelling the image, it is basically the step where you identify the object in your image. I have used labelImg to do this step. This is a very handy tool and annotations are created in the pascal voc format. Using labelImg is quite easy, just clone the project from . Run, a window will open automatically for labelling. You need to label the object in the image by keeping it in a box, and naming it. Once you save the label it will be saved in a .xml file which will contain the name,path,folder of the image file, along with the max and min values for both x and y axis of the object.

Annotation has been a challenge for me, as this task is pure manual, I have still not been able to find a path to automate the annotation process. So once I come up with a solution I will post it for sure. Here is an example of a sample annotation file :



Creating pbtxt file: The label_map.pbtxt file is mainly used to give a numerical value to each object. It will contain the unique object names along with a unique id associated to it. In my case I have written a simple python script to create this file. Here is  a sample pbtxt file :

id: 1
name: ‘pill5’
id: 2
name: ‘pill6’


Here ‘name’ specifies the object name, and it includes an id with it.

Creating train.txt file: This is another vital file. This train.txt file will be used to train as well as evaluate our model. It is recommended to use atlease 70% data of this file to use for training the model, and the rest 30% to evaluate the model. 
This file is nothing but a mapping between the image name, and the existance of a particular object in that image. The number of train.txt files is equal to the number of class(unique types of objects) you have in your image. 
Suppose you have 5 unique type of pills images, so you have 5 unique classes, in this case you will have 5 train.txt files. Each file will contain the name of all image files, and 1 or -1 beside it based on wheather a particular object is present in that image or not. Preferable name of the train.txt file is ObjectName_train.txt. I have written a python script to create this file. Here is a picture of the train.txt file. Here is a sample train.txt file:

img202.jpg -1
img241.jpg -1
img226.jpg -1
img233.jpg -1
img25.jpg 1
img199.jpg -1

Please make sure that there are enough examples in both the training and evaluation part for each object, else training will fail.

 Creating Image name to Object name mapping file: This file will contain the image name and the object name beside it. You can write a simple python script to create this file. This file will be used while creating the dataset. Here is a sample mapping file :


Creating TFR dataset: There will be two record sets created the train.record and eval.record. The first record file will be used to train the system, and the second file will be used to evaluate it. I have used
 for creating the Tfr dataset. But this script has dealt with only one object wheras I had more than one objects to process, so I needed to iterate the same process of creating the record set for each object. Once this process is finished successfully you will have the train.record and eval.record file created in the mentioned directory, the data in the files will be in binary format. 

Now I am going to talk about the challenges that I had faced while doing the process:

  • The first challenge was I did not have sufficient pills images, for each pill I had only two images the front and the back side of the pill. For Tensorflow to detect object properly it is recommended to have atleast 100 images of the same object, the images could be from different angles, can have multiple resolutions. So in my case I needed to create multiple images from one image. How? I will come to it later. But these were all synthetic images, so for better results its recommended to get all the images from different angles and in different resolution for the same pill.
  • Annotation is quite a tedious task as long as its manual, so this has also been a challenge for me
  • Training the Dataset is quite a cpu intensive task, and it is better accomplished using GPU(Graphics Processing Unit). So for training your dataset it is highly recommended to use a GPU enabled machine.
By | 2018-03-16T09:54:01+00:00 March 16th, 2018|Uncategorized|