Using a deep learning approach, I trained a neural network to process image data. The goal was to identify if the image depicted a specific product (a chocolate marshmallow), where in the image the product was located and if the chocolate coating was damaged. This network had to work under multiple constraints:
Initially, this seemed like a pretty hard task. ML-based video processing is very compute-intensive and usually requires dedicated hardware like a GPU or at least a server-class CPU. However, the Coral USB promised to solve this problem. As a Tensor Processing Unit (TPU), an ASIC for light ML applications, it can run a specific set of TensorFlow operations very fast. So in theory, a standard TensorFlow machine learning pipeline could be used to train an object detection model and then run the model on the TPU.
Unfortunately, the TPU does not run TensorFlow models. It only runs specifically models crafted specifically for its platform. Regular models can be converted using a process called quantization, which converts all network operations to 8 Bit operands. As a result, converted models are much less accurate.
Using a MobileNet and after gathering lots of training data, I re-trained a MobileNetv2 to recognize the product, potential damage and to not recognize other non-product images of similar shape or colour.
I then assembled a real-time inference pipeline, in which a camera acquires a video stream. The video stream is fed through the coral device running the neural network, outputting inference observations. Those observations are then used to annotate the frames and re-assemble them into a video stream for presentation.
The project was a success, working with the following limitations: