LiVeDet: Lightweight Density-Guided Adaptive Transformer for Online On-Device Vessel Detection

Abstract

Vision-based online vessel detection boosts the automation of waterways monitoring, transportation management and navigation safety. However, a significant gap exists in on-device deployment between general high-performance PCs/servers and embedded AI processors. Existing state-of-the-art (SOTA) online vessel detectors lack sufficient accuracy and are prone to high latency on the edge AI camera, especially in scenarios with dense vessels and diverse distributions. To solve the above issues, a novel lightweight framework with density-guided adaptive Transformer (LiVeDet) is proposed for the edge AI camera to achieve online on-device vessel detection. Specifically, a new instance-aware representation extractor is designed to suppress cluttered background noise and capture instance-aware content information. Additionally, an innovative vessel distribution estimator is developed to direct superior feature representation learning by focusing on local regions with varying vessel density. Besides, a novel dynamic region embedding is presented to integrate hierarchical features represented by multi-scale vessels. A new benchmark comprising 100 high-definition, high-framerate video sequences from vessel-intensive scenarios is established to evaluate the efficacy of vessel detectors under challenging conditions prevalent in dynamic waterways. Extensive evaluations on this challenging benchmark demonstrate the robustness and efficiency of LiVeDet, achieving 32.9 FPS on the edge AI camera. Futhermore, real-world applications confirm the practicality of the proposed method.

Publication
IEEE Robotics and Automation Letters, 2025 (JCR Q2, IF = 4.6)

Star_plot Overview of our LiVeDet.

Avatar
Zijie Zhang
PhD in Mechanical Engineering