辽宁石油化工大学学报 ›› 2024, Vol. 44 ›› Issue (6): 89-96.DOI: 10.12422/j.issn.1672-6952.2024.06.012

• 信息与控制工程 • 上一篇    

基于通道注意力和时序改进多摄像头的鸟瞰视角目标检测

李伟杰1(), 祁军2(), 潘斌3   

  1. 1.辽宁石油化工大学 人工智能与软件学院,辽宁 抚顺 113001
    2.辽宁石油化工大学 信息与控制工程学院,辽宁 抚顺 113001
    3.辽宁石油化工大学 研究生院,辽宁 抚顺 113001
  • 收稿日期:2024-03-11 修回日期:2024-04-20 出版日期:2024-12-25 发布日期:2024-12-24
  • 通讯作者: 祁军
  • 作者简介:李伟杰(1997⁃),男,硕士研究生,从事计算机视觉方面的研究;E⁃mail:17760703375@163.com
  • 基金资助:
    国家自然科学基金项目(61602228);辽宁省教育厅一般项目(L2020018)

Optimizing Bird's⁃Eye⁃View Object Detection from Multi⁃Camera Images via Channel Attention and Temporal Transformers

Weijie LI1(), Jun QI2(), Bin PAN3   

  1. 1.School of Artificial Intelligence and Software Engineering,Liaoning Petrochemical University,Fushun Liaoning 113001,China
    2.School of Information and Control Engineering,Liaoning Petrochemical University,Fushun Liaoning 113001,China
    3.Graduate School,Liaoning Petrochemical University,Fushun Liaoning 113001,China
  • Received:2024-03-11 Revised:2024-04-20 Published:2024-12-25 Online:2024-12-24
  • Contact: Jun QI

摘要:

基于摄像头构建的感知和检测系统,以较低的成本和较高的分辨率实现目标检测。通过六个单目相机生成的鸟瞰图(BEV)特征可进行目标检测。其中,BEV特征包含物体的位置和尺度,适用于各种自动驾驶任务。BEV检测器通常与深度预训练的图像骨干相结合,但是两者直接连接并不能突出2D特征与3D特征的对应关系。为了解决以上问题,使用通道注意力对输出特征图加权调整提议特征通道,并与深度估计模块相结合,突出了2D与3D特征的关系;通过时序叠加融合方式解决了继承式融合方式中过去信息逐渐丢失的问题,保证了模型能够充分利用历史信息。在NuScenes数据集上进行了广泛的实验,结果表明归一化累计得分(NDS)达到了0.604,比BEVFormer模型提升了0.035,验证了模型的有效性。

关键词: 自动驾驶, 鸟瞰图检测, 通道注意力, 目标检测, 注意力机制, 时空编码器

Abstract:

A perception and detection system based on cameras achieves target detection with lower cost and higher resolution. Target detection is performed using bird's?eye view (BEV) features generated by six monocular cameras. These BEV features include the position and scale of objects, making them suitable for various autonomous driving tasks. BEV detectors are typically combined with the deep pre?trained image backbones, but directly connecting the two does not effectively highlight the correspondence between 2D and 3D features. To address this issue, Channel Attention is applied to weight and adjust the proposed feature channels in the output feature map, and combined with a depth estimation module to emphasize the relationship between 2D and 3D features. Furthermore, a temporal aggregation fusion method is employed to solve the problem of gradual information loss in traditional fusion methods, ensuring that the model can fully leverage historical information. Extensive experiments on the NuScenes dataset show that the model achieves a Normalized Discounted Cumulative Score (NDS) of 0.604, a 0.035 improvement over the BEVFormer model, validating the effectiveness of the proposed approach.

Key words: Autonomous driving, Bird's?eye?view detection, Channel Attention, Object detection, Attention mechanism, Spatiotemporal encoder

中图分类号: 

引用本文

李伟杰, 祁军, 潘斌. 基于通道注意力和时序改进多摄像头的鸟瞰视角目标检测[J]. 辽宁石油化工大学学报, 2024, 44(6): 89-96.

Weijie LI, Jun QI, Bin PAN. Optimizing Bird's⁃Eye⁃View Object Detection from Multi⁃Camera Images via Channel Attention and Temporal Transformers[J]. Journal of Liaoning Petrochemical University, 2024, 44(6): 89-96.

使用本文

0
    /   /   推荐

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://journal.lnpu.edu.cn/CN/10.12422/j.issn.1672-6952.2024.06.012

               https://journal.lnpu.edu.cn/CN/Y2024/V44/I6/89