1 |
XU Y,ZHAI C Y,WANG G L.Vehicle detection system based on adversarial learning and depth estimation[J].Journal of Liaoning Petrochemical University,2020,40(3):83⁃90.
2 |
LI Z Q,WANG W H,LI H Y,et al.BEVFormer:Learning bird's⁃eye⁃view representation from multi⁃camera images via spatiotemporal transformers[C]//Computer Vision⁃ECCV 2022.Cham:Springer,2022:1⁃18.
3 |
NG H M,RADIA K,CHEN J F,et al.Bev⁃seg:Bird's eye view semantic segmentation using geometry and semantic point cloud[EB/OL].(2020⁃06⁃19)[2023⁃11⁃20].https://arxiv.org/abs/2006.11436.
4 |
PHILION J,FIDLER S.Lift,splat,shoot:Encoding images from arbitrary camera rigs by implicitly unprojecting to 3D[C]//Computer Vision⁃ECCV 2020.Cham:Springer,2020:194⁃210.
5 |
HU A,MUREZ Z,MOHAN N,et al.Fiery:Future instance prediction in bird's⁃eye view from surround monocular cameras[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV).Montreal:IEEE,2021:15273⁃15282.
6 |
BRAZIL G,PONS⁃MOLL G,et al.Kinematic 3D object detection in monocular video[C]//Computer Vision⁃ECCV 2020. Cham:Springer,2020:135⁃152.
7 |
MA X Z,OUYANG W L,SIMONELLI A,et al.3D object detection from images for autonomous driving:A survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(5):3537⁃3556.
8 |
LUO W J,YANG B,URTASUN R.Fast and furious:Real time end⁃to⁃end 3D detection, tracking and motion forecasting with a single convolutional net[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:3569⁃3577.
9 |
QI C R,ZHOU Y,NAJIBI M,et al.Offboard 3d object detection from point cloud sequences[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Nashville:IEEE,2021:6130⁃6140.
10 |
KANG K,OUYANG W L,LI H S,et al.Object detection from video tubelets with convolutional neural networks[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas:IEEE,2016:817⁃825.
11 |
YANG C Y,CHEN Y T,TIAN H,et al.BEVFormer v2:Adapting modern image backbones to bird's⁃eye⁃view recognition via perspective supervision[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Vancouver:IEEE,2023:17830⁃17839.
12 |
HU J,Shen L,SUN G.Squeeze⁃and⁃excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132⁃7141.
13 |
RODDICK T,KENDALL A,CIPOLLA R.Orthographic feature transform for monocular 3D object detection[C]//30th British Machine Vision Conference 2019.Cardiff:{BMVA}Press,2019:285.
14 |
WANG Y,CHAO W L,GARG D,et al.Pseudo⁃LiDAR from visual depth estimation:Bridging the gap in 3D object detection for autonomous driving[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Long Beac:IEEE,2019:8437⁃8445.
15 |
PAN B W,SUN J K,LEUNG H Y T.Cross⁃view semantic segmentation for sensing surroundings[J].IEEE Robotics and Automation Letters,2020,5(3):4867⁃4873.
16 |
ZHOU B,KRÄHENBÜHL P.Cross⁃view transformers for real⁃time map⁃view semantic segmentation[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).New Orleans:IEEE,2022:13760⁃13769.
17 |
LIU Y F,WANG T C,ZHANG X Y,et al.PETR:Position embedding transformation for multi⁃view 3d object detection[C]//Computer Vision⁃ECCV 2022.Cham:Springer,2022:531⁃548.
18 |
LI Y H,BAO H,GE Z,et al.Bevstereo: Enhancing depth estimation in multi⁃view 3D object detection with dynamic temporal stereo[J].Proceedings of the AAAI Conference on Artificial Intelligence,2023,37(2):1486⁃1494.
19 |
WANG Z R,MIN C,GE Z,et al.STS:Surround⁃view temporal stereo for multi⁃view 3D detection[EB/OL].(2020⁃08⁃22)[2023⁃12⁃22].https://arxiv.org/abs/2208.10145.
20 |
HUANG J J,HUANG G,ZHU Z,et al.Bevdet:High⁃performance multi⁃camera 3D object detection in bird⁃eye⁃view[EB/OL].(2021⁃12⁃22)[2023⁃12⁃22].https://arxiv.org/abs/2112.11790.
21 |
JIANG Y Q,ZHANG L,MIAO Z W,et al.Polarformer:Multicamera 3D object detection with polar transformers[EB/OL].(2022⁃06⁃30)[2023⁃12⁃24].https://arxiv.org/abs/2206.15398.
22 |
VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.Red Hook:Curran Associates Inc.,2017:6000⁃6010.
23 |
MIECH A,LAPTEV L,SIVIC J.Learnable pooling with context gating for video classification[EB/OL].(2017⁃06⁃21)[2023⁃12⁃25].https://arxiv.org/abs/1706.06905.
24 |
CAO C S,LIU X M,YANG L,et al.Look and think twice:Capturing top⁃down visual attention with feedback convolutional neural networks[C]//2015 IEEE International Conference on Computer Vision (ICCV).Santiago:IEEE,2015:2956⁃2964.
25 |
WANG F,JIANG M Q,QIAN C,et al.Residual attention network for image classification[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE,2017:6450⁃6458.
26 |
NEWELL A,YANG K Y,DENG J.Stacked hourglass networks for human pose estimation[C]//Computer Vision⁃ECCV 2016.Cham:Springer,2016:483⁃499.
27 |
WOO S,PARK J,LEE J Y,et al.CBAM:Convolutional block attention module[C]//Computer Vision⁃ECCV 2018. Cham:Springer,2018:3⁃19.
28 |
ZHU X Z,SU W J,LU L W,et al.Deformable DETR:Deformable transformers for end⁃to⁃end object detection[C]//International Conference on Learning Representations 2021.Vienna:ICLR,2021:1⁃16.
29 |
CAESAR H,BANKITI V,LANG A H,et al. nuScenes:A multimodal dataset for autonomous driving[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle:IEEE,2020:11618⁃11628.
30 |
HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas:IEEE,2016:770⁃778.
31 |
LEE Y W,HWANG J W,LEE S,et al.An energy and GPU⁃computation efficient backbone network for real⁃time object detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Long Beach:IEEE,2019:752⁃760.
32 |
LIN T S,MAIRE M,BELONGIE S,et al.Microsoft COCO:Common objects in context[C]//Computer Vision⁃ECCV 2014.Cham:Springer,2014:740⁃755.
33 |
LOSHCHILOV I,HUTTER F.Decoupled weight decay regularization[C]//International Conference on Learning Representations 2019.New Orleans:ICLR,2019:1⁃8.
34 |
WANG T,ZHU X G,PANG J M,et al.FCOS3D: Fully convolutional one⁃stage monocular 3D object detection[C]//2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).Montreal:IEEE,2021:913⁃922.
35 |
WANG T,ZHU X G,PANG J M,et al.Probabilistic and geometric depth:Detecting objects in perspective[C]//Proceedings of the 5th Conference on Robot Learning.New York:PMLR,2022:1475⁃1485.
36 |
WANG Y,GUIZILINI V C,ZHANG T Y,et al.Detr3D:3D object detection from multi⁃view images via 3D⁃to⁃2D queries[C]//Proceedings of the 5th Conference on Robot Learning.New York:PMLR,2022:180⁃191.