In this paper, we develop an online learning-based visual tracking framework that can optimize the target model and estimate the scale variation for object tracking. We propose a recommender-based tracker, which is capable of selecting the representative convolutional neural network (CNN) layers and feature maps autonomously. A sub-network is extracted from the pre-trained CNN to optimize the convolutional feature computing. In addition, the proposed recommender computes the weights of these layers and feature maps. A discriminative target percept of each recommended layer is reconstructed by the weighted sum of the recommended feature maps. Then the target model of the correlation filter is updated by the weighted sum of the target percepts. To deal with scale changes, we propose a spatio-temporal-based min-channel method to estimate the target size variation over time. Experimental results on 50 benchmark datasets and video data from rescue drone demonstrate that the proposed tracker is quite competitive with the state-of-the-art CNN-based trackers in terms of accuracy, scale adaptation, and robustness for UAV related application.
 The overview of SiamAPN tracker. It composes of four subnetworks, i.e., feature extraction network, feature fusion network, anchor proposal network (APN), and muti-classification®ression network.
The overview of SiamAPN tracker. It composes of four subnetworks, i.e., feature extraction network, feature fusion network, anchor proposal network (APN), and muti-classification®ression network.