Distracted Driver Detection Using ConvNets ( < 100 lines of code, no GPUs)

Ashivni Shekhawat
4 min readJan 12, 2021

--

Sample distracted driver image form the dataset (credits and copyrights to Kaggle)

In this post we train a CNN (convolutional neural network) for detecting distracted drivers. We will use the dataset from Kaggle State Farm Distracted Driver Detection Competetion. You can download the data by creating a Kaggle account and following the instructions.

A few points to highlight

  • The entire code (data loading, model training, some exploration) is less than 100 lines, thanks to the behind-the-scenes heavy-lifting by Keras!
  • The model that I train is also relatively small by modern standards (“only” about 2.5 million parameters), and yet achieves good accuracy (>99%)
  • Even though images in the dataset have reasonably high resolution (640x480 pixel), I downsample them to 64x64 pixels. This is just to demonstrate that we don’t need high resolution images for this task.

Exploring the dataset

The training dataset is relatively modest by modern standards, consisting of 22,424 labeled images across 10 classes. Each image is 640x480 pixel color image. The dataset is very evenly balanced across the 10 classes. The classes cover the most common forms of the driver distraction (see table below).

Class labels and descriptions for the input images

Given the well balanced data, accuracy should be a good metric to evaluate the model performance.

Loading The Data

Loading the data and creating the labels etc. is a breeze, thanks to the directory structure and Keras. The code-snippet below loads the data, resizes the images to 64x64 pixels, rescales/shift the data, creates a 80/20 train/validate split, and creates iterators with the correct batch size!! Pretty neat!!

Code snippet for loading the data. We assume that the “imgs/train/” directory has one sub-directory for each class, c0, c1, …, c9. etc.

Model Creation

We use a very simple architecture (literally the first thing that I tried) with 4 conv/max_pool layer combos, one fully connected layer, and a final softmax output. For good measure, we add two dropout layers towards the end. The following simple code snippet does it! Go Keras!!

The last lines of the output from the above code will show that the total number of parameters in the model is slightly north of 2.5 million. And just like that, a small model is born!

Total params: 2,510,538
Trainable params: 2,510,538
Non-trainable params: 0

Model Training

Training the model is almost comically easy as well. The snippet below does it. To be fancy, we add early stopping as well. This runs for about 10 epochs and takes less than 20 minutes on my laptop (32 GB, MacBook Pro)

The final lines of output here show that the validation accuracy is > 99%!!

Epoch 9/20
449/449 [==============================] - 143s 319ms/step - loss: 0.0452 - accuracy: 0.9871 - val_loss: 0.0402 - val_accuracy: 0.9915

Exploring the results

It is remarkable that the almost out-of-the-box model achieves an accuracy of 99.15% on this reasonably complex multiclass classification task. Below we show one example where the model works well, and one where it doesn’t. For the case where the model result is incorrect, it is understandable why the model thinks that the driver is adjusting their makeup.

Correctly classified (True label: c1, “texting — right”). Left: Original full res image, Right: Low res image used by the convnet.
Incorrectly classified (True label: c0, “safe driving”, predicted label: c9, “hair and makeup”) Left: Original full res image, Right: Low res image used by the convnet.

The data is very well balanced, and the model accuracy is so high, that it almost does not make sense to explore the confusion matrix, F1-score etc. Though, the confusion matrix might be useful to understand the common errors (such as a hand near the head being tagged as “hair and makeup” etc.).

A Word On Computational Performance

My admittedly lousy attempt (MacBook Pro, 32 GB RAM) shows the following average time for making predictions on various batch sizes. The sublinear scaling is clear, indicating that Keras takes advantage of the optimized code for larger batches, while the overhead dominates the smaller batches. The saved model is only 203 KB in size.

The average time, in milliseconds, for making predictions on various batch sizes.

The bottom line is that this model can almost be used for realtime applications running on a small device.

Conclusion

That’s it! I trained and tested a “small” convnet for the distracted driver. Thanks to Google (tensorflow) and Facebook (pyTorch) putting their weight behind open source implementations of various deep learning frameworks, its very simple to train models for fairly complex tasks in just a few hours!

--

--