Google Colab for Chest X-Ray Deep learning project
Last 8 months through Data Science Immersive bootcamp gave me a lot of knowledge. Now I have reached the last project which is Deep learning and to be specific Chest X-Ray Images (Pneumonia) dataset on Kaggle is chosen to work on. The main goal for my project is to predict whether the X-Ray images are belong to a healthy person or a pneumonia patient by applying neural network models.So how do we build large deep learning models without using local GPU or RAM?
The answer is Google Colab! It’s an awesome online browser-based platform which is free of cost and allows us to train our models on machines! When I first heard about it I was not believing but go to see it is true. We can work with large datasets, build complex models, and even share our work with others.
The most important thing about Google Colab is that you get free GPUs and TPUs. Training models, especially deep learning ones, takes numerous hours on a CPU. I have faced this issue on my local machine many times. Having GPUs or TPUs, which can train these models in a matter of minutes or seconds sounds like magic.
Always try to use GPU over any other CPU because of the sheer computational power and speed of execution.
So let’s come back to the project.
In this project we will try to increase recall score for pneumonia images (Sensitivity) above 90% and target recall score for normal images (Specificity) above 90%.
The data obtained from Kaggle has 5860 training images, divided into 3 fold of train, validation and test. The validation data had only 16 images which didn’t make sense and that’s why we created our own validation dataset which is 15% split from train data.The data is also highly imbalanced that number of pneumonia images exceeds the number of normal images nearly 3 times.
After manual redistribution of the dataset into 3 folders, train set contains 70%, test and validation and test sets contains 15% of the data and balanced shares of normal and pneumonia images.
It is important to note that initial accuracy levels of the deep learning models visibly increased after balanced redistribution of the images into train, validation and test folders.
To observe accuracy and recall scores throughout the models, 7 models applied.
- Basic neural network model with 2 layers
- Regularized basic neural networks model with dropout
- Convolutional neural networks model
- Deep convolutional neural networks model
- Xception
- VGG3
- VGG5
Recall, accuracy and f1 scores are used for evaluation metrics. As the data is highly imbalanced to increase performance of the last model data augmentation is also applied to the dataset.
As the models got complicated, it is observed accuracy, sensitivity and specificity scores increased throughout the models. Also, it is noticed that data augmentation lead to rise in both recall scores for each labels and increased model performance.
Among all trained models;
- CNN performed the highest accuracy score,
- Baseline and CNN models achieved the highest specificity score,
- Deep CNN and VGG models achieved the highest sensitivity score,
- VGG3 model has the lowest specificity and accuracy,
- CNN model has the minimum sensitivity.
Among all the trained models, initially the best resulted obtained from the CNN model.
The model predicted;
- 109 False positives,
- 125 True positives,
- 384 True negatives,
- 6 False negatives.
The CNN model by performing 98% recall score for Specificity partially achieved the set threshold. However, it is still underperforming in terms of Sensitivity threshold.
This model should be used as a tool by medical experts and specialists which will support their diagnosis and treatment method. To reach higher levels of accuracy and recall score oversampling techniques will be applied.Balancing the number of labels may also lead to higher accuracy and recall scores.
We can augment the dataset and balance the target variables by oversampling methods. We can apply different transfer learning models to observe accuracy , f1 and recall scores. The model can be trained over COVID-19 dataset in order to detect whether the cause of pneumonia is a virus or bacteria.