The automation of unmanned aerial vehicles (UAVs) has been greatly promoted by visual object tracking methods with onboard cameras. However, the random and complicated real noise produced by the cameras seriously hinders the performance of state-of-the-art (SOTA) UAV trackers, especially in low-illumination environments. To address this issue, this work proposes an efficient plug-and-play cascaded denoising Transformer (CDT) to suppress cluttered and complex real noise, thereby boosting UAV tracking performance. Specifically, the novel U-shaped cascaded denoising network is designed with a streamlined structure for efficient computation. Additionally, shallow feature deepening (SFD) encoder and multi-feature collaboration (MFC) decoder are constructed based on multi-head transposed self-attention (MTSA) and multi-head transposed cross-attention (MTCA), respectively. A nested residual feed-forward network (NRFN) is developed to focus more on high-frequency information represented by noise. Extensive evaluation and test experiments demonstrate that the proposed CDT has a remarkable denoising effect and improves UAV nighttime tracking performance.