异步双流自监督神经辐射场高效图像分割Efficient image segmentation via asynchronous dual-stream self-supervised neural radiance fields
王政钦,侯明
摘要(Abstract):
为应对当前三维重建驱动的图像分割方法通常受限于高计算复杂度和对人工标注依赖的问题,提出了一种将快慢双流视觉变换器与自监督神经辐射场(neural radiance field,NeRF)模型相结合的分割框架。通过引入异步双流特征编码机制,分别从高频局部细节和低频全局语义2个层面对图像表示进行协同建模,并结合对比蒸馏策略约束双流特征在语义空间中的一致性,从而有效提升模型的多尺度表征能力、特征判别性以及训练过程的稳定性。在公开数据集上的实验结果表明,与主流自监督和NeRF系列算法相比,所提方法在分割精度和三维重建质量方面均有显著提升。结果验证了快慢变换器与自监督迁移学习之间的协同作用,实现了分割泛化能力与计算效率之间的平衡,为复杂场景下高效图像分割提供了一种新范式。
关键词(KeyWords): 神经辐射场(neural radiance field, NeRF);异步双流特征编码;对比蒸馏策略
基金项目(Foundation): 北京市教委科研计划科技一般项目(KM202411232007)
作者(Author): 王政钦,侯明
DOI: 10.16508/j.cnki.11-5866/n.2026.02.003
参考文献(References):
- [1]MILDENHALL B,SRINIVASAN P P,TANCIK M,et al. NeRF:representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM,2021,65(1):99-106.
- [2]MARTIN-BRUALLA R,RADWAN N,SAJJADI M S M,et al.NeRF in the wild:neural radiance fields for unconstrained photo collections[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York, NY, USA:IEEE,2021:7206-7215.
- [3]DENG W, LI H F, SHEN H F. Depth-enhanced neural radiance fields for UAV-based 3-D reconstruction via bidirectional optimization and space warping[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63:5650014.
- [4]CHEN L Y, FLORENCE P, BARRON J T,et al. NeRFsupervision:learning dense object descriptors from neural radiance fields[C]//2022 International Conference on Robotics and Automation(ICRA). New York, NY, USA:IEEE,2022:6496-6503.
- [5]DENG J Y, XIE P F, ZHANG L, et al. ISAR-NeRF:neural radiance fields for 3-D imaging of space target from multiview ISAR images[J]. IEEE Sensors Journal, 2024, 24(7):11705-11722.
- [6]闵莉,董冰洁,安冬.基于多注意力机制与跨特征融合的语义分割算法[J].计算机工程,2024,50(8):282-289.MIN L,DONG B J,AN D. Semantic segmentation algorithm based on multi-attention mechanism and cross-feature fusion[J]. Computer Engineering,2024,50(8):282-289.(in Chinese)
- [7]张重生,陈杰,李岐龙,等.深度对比学习综述[J].自动化学报,2023,49(1):15-39.ZHANG C S, CHEN J, LI Q L, et al. Deep contrastive learning:a survey[J]. Acta Automatica Sinica,2023,49(1):15-39.(in Chinese)
- [8]CORDTS M,OMRAN M,RAMOS S,et al. The cityscapes dataset for semantic urban scene understanding[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). New York, NY, USA:IEEE,2016:3213-3223.
- [9]LIU Z F,LI B R,YANG R M. Research on weakly supervised semantic segmentation algorithm based on modulation-global reasoning[J]. Computer Engineering,2025,51(2):344-355.
- [10]KHOSLA P, TETERWAK P, WANG C, et al. Supervised contrastive learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook:Curran Associates Inc,2020:18661-18673.
- [11]Caron M,Touvron H,Misra I,et al. Emerging properties in selfsupervised vision Transformers[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). New York, NY, USA:IEEE,2021:9630-9640.
- [12]FAN Z W,WANG P H,JIANG Y F,et al. NeRF-SOS:anyview self-supervised object segmentation on complex scenes[C]//Proceedings of the 11th International Conference on Learning Representations. Washington,DC,USA:ICLR,2023:1-17.
- [13]XU Y F, ZHANG Z J,ZHANG M D,et al. Evo-ViT:slow-fast token evolution for dynamic vision Transformer[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence.Vancouver, BC, Canada:AAAI,2022,36(3):2964-2972.
- [14]白宇,梁晓玉,安胜彪.深度学习的2D-3D融合深度补全综述[J].计算机工程与应用,2023,59(13):17-32.BAI Y, LIANG X Y, AN S B. Review of 2D-3D fusion deep completion of deep learning[J]. Computer Engineering and Applications,2023,59(13):17-32.(in Chinese)
- [15]TOUVRON H,CORD M,DOUZE M,et al. Training dataefficient image Transformers&distillation through attention[C]//Proceedings of the 38th International Conference on Machine Learning. San Diego, CA, USA:JMLR,2021:10347-10357.
- [16]BERGMANN P,FAUSER M,SATTLEGGER D,et al. MVTec AD:a comprehensive real-world dataset for unsupervised anomaly detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York,NY,USA:IEEE,2019:9584-9592.
- [17]ZHI S F,LAIDLOW T,LEUTENEGGER S,et al. In-place scene labelling and understanding with implicit scene representation[C]//2021 IEEE/CVF International Conference on Computer Vision(ICCV). New York,NY,USA:IEEE,2021:15818-15827.
- [18]KUNDU A,GENOVA K,YIN X Q,et al. Panoptic neural fields:a semantic object-aware neural scene representation[C]//2022IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Los Alamitos,CA,USA:IEEE Computer Society,2022:12861-12871.