A Focus on the Deep Learning-based Intelligent Video Surveillance System

  • Weigang Zhang Xi’an Mingde Institute of Technology xi’anShaanxi,710124,China
  • Youzi Li Xi’an Mingde Institute of Technology xi’anShaanxi,710124,China

Abstract

This paper focuses on applying a deep learning-based intelligent video surveillance system, particularly emphasising using the YOLOv7 model for object detection. By reviewing the development of intelligent video surveillance technology, we recognize the importance of deep learning in computer vision. The structure and characteristics of the YOLOv7 model are detailed, including the input layer, backbone network layer, feature fusion layer, and output layer. To validate the model’s performance, we conducted experiments on the VOC dataset, and the results show that the YOLOv7 model achieved an average detection accuracy of 0.89 on this dataset. The experimental results demonstrate the efficiency, accuracy, and robustness of the YOLOv7 model in intelligent video surveillance, providing essential references for optimizing and applying intelligent video surveillance systems

References

[1] Sneha, K. A. (2022). Hyperspectral imaging and target detection algorithms: A review. Multimedia Tools and Applications, 81 (30) 44141–44206. [2] Redmon, J., Divvala, S., Girshick, R., Farhadi, A. (2016). You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (p. 779–788). Las Vegas, NV, USA. [3] Jocher, G. (2020). YOLOv5 by Ultralytics. Retrieved from https://github.com/ultralytics/yolov5 [4] Redmon, J., Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv preprint, arXiv:1804.02767. [5] Jocher, G., Chaurasia, A., Qiu, J. (2023). Ultralytics YOLO. Retrieved from https://github.com/ultralytics/ ultralytics [6] Wang, C. Y., Bochkovskiy, A., Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-theart for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (p. 7464–7475). Vancouver, BC, Canada. [7] Wang, C. Y., Yeh, I. H., Liao, H. Y. M. (2024). YOLOv9: Learning what you want to learn using programmable gradient information. arXiv preprint, arXiv:2402.13616. [8] Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., and Ding, G. (2024). YOLOv10: Real-time end-to-end object detection. arXiv preprint, arXiv:2405.14458. [9] Bochkovskiy, A., Wang, C. Y., and Liao, H. Y. M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint, arXiv:2004.10934. [10] Xu, S., Zhu, J., Jiang, J., et al. (2020). Sea-surface floating small target detection by multi-feature detector based on isolation forest. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 704–715.[11] Hou, Q., Wang, Z., Tan, F., et al. (2021). RISTDnet: Robust infrared small target detection network. IEEE Geoscience and Remote Sensing Letters, 19, 1–5. [12] Zhang, S., et al. (2012). On design and implementing a high definition multi-view intelligent video surveillance system. In Proceedings of the IEEE International Conference on Signal Processing, Communication and Computing (pp. 353–357). [13] Paglinawan, C. C., et al. (2018). Optimization of vehicle speed calculation on Raspberry Pi using sparse random projection. In Proceedings of the IEEE International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment and Management. [14] Xu, J. (2021). A deep learning approach to building an intelligent video surveillance system. Multimedia Tools and Applications, 80, 5495–5515. [15] Lin, T. Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dolla, P. (2014). Microsoft COCO: Common objects in context. In Fleet, D., Pajdla, T., Schiele, B., and Tuytelaars, T. (Eds.), Computer vision – ECCV 2014 (pp. 740–755). Springer International Publishing. [16] Schroff, F., Dmitry, K., and Philbin, J. (2015). FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [17] Zhang, K., Zhang, Z., Li, Z., and Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10), 1499–1503. [18] Girshick, R. (2015). Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). [19] Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). [20] Gool, L. V., Williams, C. K. I., Winn, J., and Zisserman, A. (2010). The Pascal visual object classes(VOC) challenge. International Journal of Computer Vision, 88, 303–338. [21] He, K., Zhang, X., Ren, S., and Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916. [22] Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R. (Eds.), Advances in neural information processing systems 28 (pp. 91–99). Curran Associates, Inc. Retrieved from http:// papers.nips.cc/paper/5638-faster-r-cnn-towards-real-timeobject-detection-with-region-proposal-networks.pdf [23] Kumar, S., and Das, S. K. (2019). Target detection and localization methods using compartmental model for Internet of Things. IEEE Transactions on Mobile Computing, 19(9), 2234–2249. [24] Gillis, D. B. (2020). An underwater target detection framework for hyperspectral imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1798–1810. [25] Ma, J., Tang, L., Xu, M., et al. (2021). STDFusionNet: An infrared and visible image fusion network based on salient target detection. IEEE Transactions on Instrumentation and Measurement, 70, 1–13. [26] Zhao, B., Wang, C., Fu, Q., et al. (2020). A novel pattern for infrared small target detection with generative adversarial network. IEEE Transactions on Geoscience and Remote Sensing, 59(5), 4481–4492. [27] Chang, C. I. (2021). Hyperspectral anomaly detection: A dual theory of hyperspectral target detection. IEEE Transactions on Geoscience and Remote Sensing, 60, 1–20. [28] Huang, S., Cornelis, B., Devolder, B., et al. (2020). Multimodal target detection by sparse coding: Application to paint loss detection in paintings. IEEE Transactions on Image Processing, 29, 7681–7696. [29] Yu, C., Liu, Y., Wu, S., et al. (2022). Pay attention to local contrast learning networks for infrared small target detection. IEEE Geoscience and Remote Sensing Letters, 19, 1–5.
Published
2024-12-26
How to Cite
ZHANG, Weigang; LI, Youzi. A Focus on the Deep Learning-based Intelligent Video Surveillance System. Journal of Digital Information Management(JDIM), [S.l.], v. 22, n. 4, dec. 2024. ISSN 0972-7272. Available at: <https://dline.info/ojs/index.php/jdim/article/view/387>. Date accessed: 21 apr. 2026.