Abstract:Single-photon 3D imaging based on Single-Photon Avalanche Diodes (SPAD) has witnessed rapid development, yet continues to face challenges in depth information recovery under strong noise conditions, particularly as the synchronous triggering mode of the devices further amplifies noise interference. This paper constructs a photon detection probability response model through the incorporation of error functions, capable of characterizing complex imaging environments, thereby enabling the creation of large-scale single-photon datasets with strong noise. We propose a robust approach specifically designed for single-photon 3D imaging—the Spatial-Temporal Enhancement Network (STE-Net). Its core innovation lies in the Spatial and Temporal Information Boosting Strategy (STIBS), which utilizes 3D convolutional kernels of diverse geometric configurations to fully exploit the potential of three-dimensional convolutional feature learning. Building upon STIBS, we design an efficient feature enhancement module serving as a universal preprocessing component. Through lightweight architecture development inspired by STIBS and incorporating large-kernel convolution concepts, we construct a feature fusion backbone network capable of integrating both shallow and deep features. Extensive experiments on both simulated and real-world datasets demonstrate that STE-Net achieves exceptional performance across various scenarios with different Signal-to-Background Ratios (SBR). Quantitative analysis reveals that under conditions of 0.02 mean signal photons and 0.5 mean noise photons, STE-Net achieves a 0.55 dB improvement in PSNR and reduces RMSE by 7.2% compared to other state-of-the-art methods.