STEP BY STEP INSTRUCTIONS FOR BUILDING A DEEP LEARNING MODEL TO DETECT TABLES FROM IMAGES

PLEASE SHARE

There are articles available online that explain how to detect and extract tables from images using deep learning but I realized many beginners like myself are having trouble with building this system because of some minor environment or version issues.

So I decided to write this guide which will provide step by step instructions on how to build a deep learning model to detect and extract tables from images.

At the end of this article, I have also provided a small program that can be used to predict the coordinates of tables using the model we have built. You can refer to this if you would like to integrate this model into your own Python applications. This model is built using the publicly available UNLV dataset.

If you follow all the 84 steps as specified here you will end up with a deep learning system that can detect and extract the tables from images and give you X, Y coordinates for the same.

TOOLS REQUIRED

Here are the tools required to build a model to detect and extract tables from images.

  • Google Cloud Platform
  • NVidia CUDA 9.0
  • NVidia cuDNN v7.3.0 Libraries
  • Python 3.5
  • Luminoth API
  • TensorFlow
  • Java SDK

SETTING UP ENVIRONMENT

Since we are going to build a deep learning model, we need a system with GPU support. I have chosen Google Cloud for the same. 


CAUTION: As of today, writing this blog, Google Cloud has been providing $300 free credit for 12 months. You still need to provide your payment details. Please make sure to read the complete details of this free tier program to understand how it works and the costs involved. (https://cloud.google.com/free/docs/gcp-free-tier). Please make sure to stop/delete the VM instances when you are not using, otherwise, google may automatically debit money from your credit card.

  1. Visit http://cloud.google.com
  2. Click on the “Get started for free” button
  3. Sign in with your Gmail credentials
  4. You need to provide your payment details
  5. Once done, will be redirected to your Google Cloud console,

6. Select “IAM & Admin” and then “Quotas” from the menu.

7) Click on the “Upgrade account” button and the “UPGRADE”

8) After upgrading your account again, select “IAM & Admin” and then “Quotas” from the menu.

9) Filter the quota with below parameters (as shown in below image)

            Quota type = All quotas

            Service = All services

            Metric = GPUs (all regions)

            Location = Global

10) Select the quota by clicking on the checkbox and click on the “EDIT QUOTAS” link

11) Fill the details as shown below and click on the “Next” button,

12) Specify the “New quota limit” as 8 and specify a “Request description” as shown below. Click on the “Submit request” button.

13) You will get a confirmation as below. Now, wait for the email for the confirmation. (Google says it may take 2 business days to process. I got it in about two hours)

14) Once you receive the approval email from google then proceed with the next steps. Select “Compute Engine” and then “VM instances” from the menu.

15) Click on the “Create” button

16) “Create an instance” form will be opened.

17) Enter “Name” as “table-detection-deep-learning-system”

18) Select “Custom” option in “Machine type” select box.

19) Select 4 Core CPUs and 16GB Memory.

20) Specify “Number of GPUs” as 4 and “GPU type” as “NVIDIA Tesla K80”

21) Select “Boot disk” as “Ubuntu 16.04 LTS” specify disk size as 20 GB

22) Click on the “Create” button.

Note: Sometimes Google may ask you to try a different zone or try later if there are no resources available. So you can try selecting a different zone.

23) The VM instance will be created as shown below,

CONNECTING TO THE GOOGLE CLOUD VM USING PUTTY

Please follow the instructions below to connect to the VM instance we created above.

24) Download putty.exe and puttygen.exe from the below URLs,

http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe

http://the.earth.li/~sgtatham/putty/latest/x86/puttygen.exe

25) Execute puttygen.exe

26) Click on the “Generate” button and keep moving the cursor as instructed.

27) Enter your desired username under “Key comment”

28) Enter your desired password under “Key passphrase”

29) Click on “Save private key” button and save the private key file as “private_key.ppk” in your desired location

30) Copy all the public key text highlighted below (You should copy the whole text as it is)

31) Go to Google Cloud console and click on “Compute Engine” -> “VM instances” from the menu

32) Start your VM instance if it’s not running.

33) Copy the IP Address of the VM instance from below (External IP),

34) Click on the VM instance name,

35) Click on “EDIT” link,

36) Under the “SSH Keys” section click on the “Show and edit” link,

37) Paste the public key we copied above as shown below,

38) Click on the “Save” button

39) Execute “putty.exe”

40) Click on the “Connection” -> “SSH” -> “Auth” option.

41) Select the “private_key.ppk” saved from above.

42) Click on “Connection” -> “Data” option.

43) Enter the username specified above under “Auto-login username”

44) Goto “Session” option

45) Enter the IP Address of the VM instance under “Host Name”

46) Enter Port as 22

47) Select “Connection type” as “SSH”

48) Click on “Open” button

49) The warning below will be displayed. Click on the “Yes” button.

50) Enter the passphrase specified above, you will be logged into the system,

51) Switch to the root user by typing the following command,

sudo su -

Note: Make sure you always switch to root user when executing commands in this article.

INSTALLING NVIDIA CUDA

Login into the Ubuntu Instance using putty and switch to the root user,

sudo su -

52) Install Nvidia drivers by executing the following commands,

add-apt-repository ppa:graphics-drivers/ppa


apt-get update


apt-get install nvidia-modprobe


apt-get install nvidia-418

53) Verify that everything is working by running,

nvidia-smi

You will get similar output,

54) Install CUDA 9.0 by executing the following commands,

cd /tmp


wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run


chmod +x cuda_9.0.176_384.81_linux-run


./cuda_9.0.176_384.81_linux-run --extract=/tmp


rm NVIDIA-Linux-x86_64-384.81.run


./cuda-linux.9.0.176-22781540.run

This command will be prompted the following: Respond as mentioned below, (Keep pressing space bar for reading agreement)

Do you accept the previously read EULA?
accept

Enter install path
Press Enter

Would you like to add desktop menu shortcuts?

y

Would you like to create a symbolic link /usr/local/cuda pointing to /usr/local/cuda-9.0?
y

55) Execute the following commands to set the environment variables,

echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc
echo 'export PATH=$PATH:$CUDA_HOME/bin' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64' >> ~/.bashrc
source ~/.bashrc

56) Execute the following command to make sure the setup is successful. The below command will display the GPU’s available,

nvidia-smi

INSTALLING cnDNN LIBRARIES

57) Register yourself at https://developer.nvidia.com/cudnn

58) Click on “Download cuDNN” button

59) Accept the “cuDNN Software License Agreement”

60) Click on “Download cuDNN v7.3.0 (Sept 19, 2018), for CUDA 9.0” link

61) Download the following:

cuDNN v7.3.0 Runtime Library for Ubuntu16.04 (Deb)

cuDNN v7.3.0 Developer Library for Ubuntu16.04 (Deb)

62) The following files will be downloaded.

libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb
libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb

63) To add cuDNN libraries transfer the above 2 files to the /tmp directory of Ubuntu server (Using some FTP software like FileZilla or WinSCP)

64) Execute the following commands to install the cuDNN libraries,

cd /tmp


dpkg -i libcudnn7_7.3.0.29-1+cuda9.0_amd64.deb


dpkg -i libcudnn7-dev_7.3.0.29-1+cuda9.0_amd64.deb

65) Set environment variables by executing the commands,

echo 'export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-9.0/lib64' >> ~/.bashrc


echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64' >> ~/.bashrc


source ~/.bashrc

UPGRADING PYTHON FROM VERSION 2.7 TO 3.5

66) Execute the following commands to upgrade python,

add-apt-repository ppa:deadsnakes/ppa


apt-get update


apt-get install python3.6


rm /usr/bin/python


ln -s /usr/bin/python3 /usr/bin/python


python -V

INSTALLING THE REQUIRED PYTHON PACKAGES

67) Execute the following commands to install the required python packages,

apt install unzip

apt install python3-pip

pip3 install pillow

pip install numpy==1.17.0

apt-get install python3-pandas

pip3 install opencv-python

pip3 install tensorflow-gpu==1.8.0

pip3 install luminoth

SETTING UP DEVELOPMENT ENVIRONMENT & GETTING DATA

I have shared the UNLV training dataset and code in my GitHub so that it can be easily downloaded inside the VM instance.

68) Execute the following commands to download training data & code into the VM.

cd /usr/local

wget https://github.com/rajeshkumarraj82/table-detection-from-images-using-deep-learning/archive/master.zip

unzip master.zip

EXPLORING THE DATASET

Let’s have a look at our dataset before proceeding further. The dataset is available here,

https://github.com/rajeshkumarraj82/table-detection-from-images-using-deep-learning/tree/master/data

The “images” directory is having 403 image files. Each image is having one or more tables along with other contents such as paragraphs.

The “train.csv” & “val.csv” is having x, y coordinates of the tables in the images. Here are the columns of these CSV files,

  • image_id = Name of the image file
  • xmin = Top left x coordinate of table
  • ymin = Top left y coordinate of table
  • xmax = Bottom right x coordinate of table
  • ymax = Bottom right y coordinate of table
  • label = object defined by the above coordinates. The value will be always “table”

Please note, the “train.csv” will be used for training the model and “val.csv” will be used for testing the accuracy.

PREPROCESSING THE IMAGES

69) Create the following directories,

mkdir /usr/local/table-detection-from-images-using-deep-learning-master/data/train

mkdir /usr/local/table-detection-from-images-using-deep-learning-master/data/val

70) Execute the following command to preprocess the images,

cd /usr/local/table-detection-from-images-using-deep-learning-master/

python preprocess.py

GENERATING TENSORFLOW DATA

71) Open the “/usr/local/table-detection-from-images-using-deep-learning-master/data/train.csv” file and add the following header to the first line.

image_id,xmin,ymin,xmax,ymax,label

72) Save and close the editor.

73) Open the “/usr/local/table-detection-from-images-using-deep-learning-master/data/val.csv” file and add the following header to the first line.

image_id,xmin,ymin,xmax,ymax,label

74) Save and close the editor

75) Execute the following command to generate TensorFlow data,

cd /usr/local/table-detection-from-images-using-deep-learning-master/

lumi dataset transform --type csv --data-dir data/ --output-dir tfdata/ --split train --split val --only-classes=table

76) Start the training process,

sudo su -

cd /usr/local/table-detection-from-images-using-deep-learning-master/

chmod -R 777 *

lumi train -c config.yml

77) If the loss gets close to 1.0 you can stop training with <ctrl + c>.

Please note, this training process can take 30+ hours to reach the train_loss value near 1.0. As you can see above, I stopped the training around 1.7 considering my time & budget constraints. But this may impact the accuracy of the model, so I recommend to continue the training until the train_loss value reaches near 1.0 (Note: We can measure the prediction accuracy of a model using train loss value. The lower the train loss value it’s better)

CREATE CHECKPOINT

78) Execute the below command to create a checkpoint,

lumi checkpoint create config.yml

The response will be like,

Creating checkpoint for given configuration…

Checkpoint 73689cee5da2 created successfully.

79) Take a note of the Checkpoint number above.

80) Execute the below command to predict the location of the table in a random image (Use the Checkpoint number noted in the above section)

lumi predict --checkpoint 73689cee5da2 data/images/9549_009.png

81) The response will be having the coordinates of the tables identified,

Predicting data/images/9549_009.png… done.
{“file”: “data/images/9549_009.png”, “objects”: [{“label”: “table”, “bbox”: [1237, 301, 2277, 920], “prob”: 1.0}]}

PREDICTING TABLE LOCATION USING A PYTHON PROGRAM

82) Here’s the python code that will call Luminoth API to predict the table location from an image, (Please note you need to specify the Checkpoint number & image file in this code)

File Name: predict_table_location_from_image.py

from luminoth.tools.checkpoint import get_checkpoint_config
from luminoth.utils.predicting import PredictorNetwork
from PIL import Image as pilimage

# This program  will predict the location of the tables in an image
# It outputs the coordinates of the tables. Using these coordinates we can cut the table portion of the image and use it for further processing

input_file = '/usr/local/table-detection-from-images-using-deep-learning-master/data/images/9549_009.png'
# Specify the luminoth checkpoint here
checkpoint = '73689cee5da2'

config = get_checkpoint_config(checkpoint)
network = PredictorNetwork(config)
image = pilimage.open(input_file).convert('RGB')
objects = network.predict_image(image)

print("NO OF TABLES IDENTIFIED BY LUMINOTH = " + str(len(objects)))
print('-' * 100)

table_counter = 1

for i in range(len(objects)):
    table_idctionary = objects[i]
    coordinate_list = table_idctionary["bbox"]
    xminn = coordinate_list[0]
    yminn = coordinate_list[1]
    xmaxx = coordinate_list[2]
    ymaxx = coordinate_list[3]
    print('TABLE ' + str(table_counter) + ':')
    print('-' * 100)
    print("xminn = " + str(xminn))
    print("yminn = " + str(yminn))
    print("xmaxx = " + str(xmaxx))
    print("ymaxx = " + str(ymaxx))
    table_counter += 1

83) Execute the above python program by,

python predict_table_location_from_image.py

84) The output of the above program will be like,

NO OF TABLES IDENTIFIED BY LUMINOTH = 1

TABLE 1:

xminn = 1237
yminn = 301
xmaxx = 2277
ymaxx = 920

CODE TO CROP THE TABLE PORTION

The following Java code can be used to crop the table portion from an image using the coordinates predicted by the above Python program.

import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

public class GetSubImage {

	public static void main(String[] args) {

		BufferedImage img;
		try {
			img = ImageIO.read(new File("/usr/local/table-detection-from-images-using-deep-learning-master/data/images/9549_009.png"));
			int xminn = 1237;
			int yminn = 301;
			int xmaxx = 2277;
			int ymaxx = 920;
			
			BufferedImage subimage = img.getSubimage(xminn, yminn, (xmaxx-xminn), (ymaxx-yminn));
			ImageIO.write(subimage, "png", new File("/usr/local/table-detection-from-images-using-deep-learning-master/data/images/cropped_image.png"));
		} catch (IOException e) {
			e.printStackTrace();
		}

	}

}

CONCLUSION

Let’s quickly analyze the outcome of the above programs. Here’s the test image (9549_009.png) we used for identifying table locations,

Here’s the cropped image using the coordinates predicted by the model,

From the cropped image above we can see the model predicted the table location properly but it should have reported 2 separate tables instead of one combined table.

So the accuracy of this model can be improved by,

  • Reduce the training loss value to 1.0
  • Adding more training data

So, using this deep learning model we are able to detect and extract tables from images. These cropped table images can be further processed with APIs like Tabula to extract table text content.

Follow Me

Leave a Reply

Your email address will not be published. Required fields are marked *

5 + 3 =