Paul Weber | Damage detection with deep learning and edge computing

Damage detection with deep learning and edge computing

An image of two chocolate marshmallows, with a border box around each, and a label

Using a deep learning approach, I trained a neural network to process image data. The goal was to identify if the image depicted a specific product (a chocolate marshmallow), where in the image the product was located and if the chocolate coating was damaged. This network had to work under multiple constraints:

It should process a video stream, i.e. 30 frames each second
The system should work close to real-time, like a camera watching a conveyor belt
The video stream should be processed locally, using only a Coral USB Accelerator for training and inference

Initially, this seemed like a pretty hard task. ML-based video processing is very compute-intensive and usually requires dedicated hardware like a GPU or at least a server-class CPU. However, the Coral USB promised to solve this problem. As a Tensor Processing Unit (TPU), an ASIC for light ML applications, it can run a specific set of TensorFlow operations very fast. So in theory, a standard TensorFlow machine learning pipeline could be used to train an object detection model and then run the model on the TPU.

Unfortunately, the TPU does not run TensorFlow models. It only runs specifically models crafted specifically for its platform. Regular models can be converted using a process called quantization, which converts all network operations to 8 Bit operands. As a result, converted models are much less accurate.

Using a MobileNet and after gathering lots of training data, I re-trained a MobileNetv2 to recognize the product, potential damage and to not recognize other non-product images of similar shape or colour.

I then assembled a real-time inference pipeline, in which a camera acquires a video stream. The video stream is fed through the coral device running the neural network, outputting inference observations. Those observations are then used to annotate the frames and re-assemble them into a video stream for presentation.

The project was a success, working with the following limitations:

The model could only be trained offline on a dedicated GPU (Tesla K80).
The inference quality was reduced by the quantization process.
The stream was not temporally stable since the network was trained on individual frames.