Distracted Driver Detection Using ConvNets ( < 100 lines of code, no GPUs)
In this post we train a CNN (convolutional neural network) for detecting distracted drivers. We will use the dataset from Kaggle State Farm Distracted Driver Detection Competetion. You can download the data by creating a Kaggle account and following the instructions.
A few points to highlight
- The entire code (data loading, model training, some exploration) is less than 100 lines, thanks to the behind-the-scenes heavy-lifting by Keras!
- The model that I train is also relatively small by modern standards (“only” about 2.5 million parameters), and yet achieves good accuracy (>99%)
- Even though images in the dataset have reasonably high resolution (640x480 pixel), I downsample them to 64x64 pixels. This is just to demonstrate that we don’t need high resolution images for this task.
Exploring the dataset
The training dataset is relatively modest by modern standards, consisting of 22,424 labeled images across 10 classes. Each image is 640x480 pixel color image. The dataset is very evenly balanced across the 10 classes. The classes cover the most common forms of the driver distraction (see table below).
Given the well balanced data, accuracy should be a good metric to evaluate the model performance.
Loading The Data
Loading the data and creating the labels etc. is a breeze, thanks to the directory structure and Keras. The code-snippet below loads the data, resizes the images to 64x64 pixels, rescales/shift the data, creates a 80/20 train/validate split, and creates iterators with the correct batch size!! Pretty neat!!
Model Creation
We use a very simple architecture (literally the first thing that I tried) with 4 conv/max_pool layer combos, one fully connected layer, and a final softmax output. For good measure, we add two dropout layers towards the end. The following simple code snippet does it! Go Keras!!
The last lines of the output from the above code will show that the total number of parameters in the model is slightly north of 2.5 million. And just like that, a small model is born!
Total params: 2,510,538
Trainable params: 2,510,538
Non-trainable params: 0
Model Training
Training the model is almost comically easy as well. The snippet below does it. To be fancy, we add early stopping as well. This runs for about 10 epochs and takes less than 20 minutes on my laptop (32 GB, MacBook Pro)
The final lines of output here show that the validation accuracy is > 99%!!
Epoch 9/20
449/449 [==============================] - 143s 319ms/step - loss: 0.0452 - accuracy: 0.9871 - val_loss: 0.0402 - val_accuracy: 0.9915
Exploring the results
It is remarkable that the almost out-of-the-box model achieves an accuracy of 99.15% on this reasonably complex multiclass classification task. Below we show one example where the model works well, and one where it doesn’t. For the case where the model result is incorrect, it is understandable why the model thinks that the driver is adjusting their makeup.
The data is very well balanced, and the model accuracy is so high, that it almost does not make sense to explore the confusion matrix, F1-score etc. Though, the confusion matrix might be useful to understand the common errors (such as a hand near the head being tagged as “hair and makeup” etc.).
A Word On Computational Performance
My admittedly lousy attempt (MacBook Pro, 32 GB RAM) shows the following average time for making predictions on various batch sizes. The sublinear scaling is clear, indicating that Keras takes advantage of the optimized code for larger batches, while the overhead dominates the smaller batches. The saved model is only 203 KB in size.
The bottom line is that this model can almost be used for realtime applications running on a small device.
Conclusion
That’s it! I trained and tested a “small” convnet for the distracted driver. Thanks to Google (tensorflow) and Facebook (pyTorch) putting their weight behind open source implementations of various deep learning frameworks, its very simple to train models for fairly complex tasks in just a few hours!