REAL-TIME EMBEDDED SYSTEM OF MULTI-TASK CNN FOR ADVANCED DRIVING ASSISTANCE

Masayuki Miyama

doi:10.29284/ijasis.9.2.2023.29-45

Authors

Masayuki Miyama

DOI:

https://doi.org/10.29284/ijasis.9.2.2023.29-45

Keywords:

Object detection, semantic segmentation, disparity estimation, multi-task CNN, advanced driving assistance, embedded GPU, multi-object tracking.

Abstract

In this research, we've engineered a real-time embedded system for advanced driving assistance. Our approach involves employing a multi-task Convolutional Neural Network (CNN) capable of simultaneously executing three tasks: object detection, semantic segmentation, and disparity estimation. Confronted with the limitations of edge computing, we've streamlined resource usage by sharing a common encoder and decoder among these tasks. To enhance computational efficiency, we've opted for a blend of depth-wise separable convolution and bilinear interpolation, departing from the conventional transposed convolution. This strategic change reduced the multiply-accumulate operations to 23.3% and the convolution parameters to 16.7%.Our experimental findings demonstrate that the decoder's complexity reduction not only avoids compromising recognition accuracy but, in fact, enhances it. Furthermore, we've embraced a semi-supervised learning approach to heighten network accuracy when deployed in a target domain divergent from the source domain used during training. Specifically, we've employed manually crafted correct answers only for object detection to train the whole network for optimal performance in the target domain. For the foreground object categories, we generate pseudo-correct responses for semantic segmentation by employing bounding boxes from object detection and iteratively refining them. Conversely, for the background categories, we rely on the initial inference outcomes as pseudo-correct responses, abstaining from further adjustments. Semantic segmentation of object classes with widely different appearances can be achieved thanks to this method, which tells the rough position, size, and shape of each object to the task. Our experimental results substantiate that the incorporation of this semi-supervised learning technique leads to enhancements in both object detection and semantic segmentation accuracy. We implemented this multi-task CNN on an embedded Graphics Processing Unit (GPU) board, added multi-object tracking functionality, and achieved a throughput of 18 fps with 26 Watt power consumption.

References

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Proceeding of International Conference on Learning Representations, 2015, pp. 1-14.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu and A.C. Berg, “SSD: Single shot multibox detector,” 14th European Conference on Computer Vision, 2016, pp. 21-37.

O. Ronneberger, P. Fischer and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234-241.

A. Dosovitskiy, P. Fischer, E. Ilg, P.Hausser, C. Hazrbas and V. Golkov, “FlowNet: Learning optical flow with convolutional network,” IEEE International Conference on Computer Vision, 2015, pp. 2758-2766.

A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv:1704.04861 [cs.CV], 2017, pp. 1-9.

A. Kendall, Y. Gal and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” IEEE International Conference on Computer Vision and Pattern Recognition, 2018, pp. 7842-7491.

T.H. Vu, H. Jain, M. Bucher, P. Cord and Perez, “ADVENT: Adversarial entropy minimization for domain adaptation in semantic segmentation,” IEEE International Conference on Computer Vision and Pattern Recognition, 2019, pp. 2512—2521.

U. Michieli, M. Biasetton, G. Agresti and P. Zanuttigh, “Adversarial learning and self-teaching techniques for domain adaptation in semantic segmentation,” IEEE Transactions on Intelligent Vehicles, vol. 5, no. 3, 2020, pp. 508-518.

S. Sankaranarayanan, Y. Balaji, A. Jain, S.N. Lim and R. Chellappa, “Learning from synthetic data: Addressing domain shift for semantic segmentation,” IEEE International Conference on Computer Vision and Pattern Recognition, 2018, pp. 3752-3761.

G. Li, W. Kang, Y. Liu, Y. Weim and Y. Yang, “Content-consistent matching for domain adaptive semantic segmentation,” 16th European Conference on Computer Vision, 2020, pp. 440—456.

P. Zhang, B. Zhang, T. Zhang, D. Chen, Y. Wang and F. Wen, “Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation,” IEEE international Conference on Computer Vision and Pattern Recognition, 2021, pp. 12409-12419.

M. Miyama, “Robust inference of multi-task convolutional neural network for advanced driving assistance by embedding coordinates,” 8th World Congress on Electrical Engineering and Computer Systems and Sciences, 2022, pp.1-9.

A. Gordon, H. Li, R. Jonschkowski and A. Angelova, “Depth from videos in the wild: unsupervised monocular depth learning from unknown cameras,” IEEE International Conference on Computer Vision, 2019, pp. 8976—8985.

B. Ummenhofer, and H. Zhou, “DeMoN: Depth and motion network for learning monocular stereo,” IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5622-5631.

C. Godard, O. M. Aodha and G. J. Brostow, “Unsupervised monocular depth estimation with left-right consistency,” IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6602-6611.

S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv:1706.05098 [cs.LG], 2017, pp. 1-14.

M. Crawshaw, “Multi-task learning with deep neural networks: a survey,” arXiv:2009.09796 [cs.LG], 2020, pp. 1-43.

Y. Zhang and Q. Yang, “An overview of multi-task learning,” National Science Review, vol. 5, no. 1, 2018, pp. 30-–43.

L. Liebel and M. Korner, “Auxiliary Tasks in Multi-task Learning,” arXiv:1805.06334 [cs.CV], 2018, pp. 1-8.

S. Chennupati, G. Sistu, S. Yogamani and S. Rawashdeh, “AuxNet: Auxiliary tasks enhanced semantic segmentation for automated driving,” 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2019, pp. 645-652.

H. Liu, D. Li, J.Z. Peng, Q. Zhao, L. Tian and Y. Shan, “MTNAS: Search multi-task networks for autonomous driving,” 15th Asian Conference on Computer Vision, 2020, pp. 670—687.

P. Guo, C. Y. Lee and D. Ulbricht, “Learning to branch for multi-task learning,” 37th International Conference on Machine Learning, vol. 198, 2020, pp. 1-12.

M. Reginthala, Y. Iwahori, M.K. Bhuyan, Y. Hayashi, W. Achariyaviriya and B. Kijsirikul, “Interdependent multi-task learning for simultaneous segmentation and detection,” 9th International Conference on Pattern Recognition Applications and Methods, 2020, pp. 1-9 .

W.H. Li, X. Liu and H. Bilen, “learning multiple dense prediction tasks from partially annotated data,” IEEE International Conference on Computer Vision and Pattern Recognition, 2022, pp. 18879-18889.

Y. Wang, Y. H. Tsai, W. C. Hung, W. Ding, S. Liu and M. H. Yang,” Semi-supervised multi-task learning for semantics and depth,” IEEE Winter Conference on Applications of Computer Vision, 2022, pp. 2505-2514.

F. Lu, H. Yu and J. Oh, “Domain adaptive monocular depth estimation with semantic information,” arXiv:2104.05764v1 [cs.CV] , 2021, pp. 1-7.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3213-3223.

https://www.cityscapes-dataset.com/

F. Yu, H. Chen,X.Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan,T. Darrell, “BDD100K: A diverse driving dataset for heterogeneous multitask learning,” arXiv:1805.04687 [cs.CV], 2020, pp. 1-10.

https://www.bdd100k.com/

https://github.com/Xilinx/Vitis-AI/tree/master/models/AI-Model-Zoo, Performance on ZCU104, 103 multi_task_v3.