Self⁃Learning PID Control Based on DDPG: Optimization of UAV Obstacle Avoidance in 3D Environments

doi:10.12422/j.issn.1672-6952.2026.02.010

Journal of Liaoning Petrochemical University ›› 2026, Vol. 46 ›› Issue (2): 88-96.DOI: 10.12422/j.issn.1672-6952.2026.02.010

• Information and Control Engineering • Previous Articles

Self⁃Learning PID Control Based on DDPG: Optimization of UAV Obstacle Avoidance in 3D Environments

Xinyue GAO¹(), Ruiyuan ZOU², Jinna LI¹()

^1.School of Information and Control Engineering，Liaoning Petrochemical University，Fushun Liaoning 113001，China
^2.Aircraft Maintenance and Engineering Corporation，Beijing 100621，China

Received:2025-11-25 Revised:2026-01-01 Published:2026-04-25 Online:2026-04-21
Contact: Jinna LI

基于DDPG的自学习PID控制：无人机3D环境避障优化

高心悦¹(), 邹瑞元², 李金娜¹()

^1.辽宁石油化工大学信息与控制工程学院，辽宁抚顺 113001
^2.北京飞机维修工程有限公司，北京 100621

通讯作者: 李金娜
作者简介:高心悦（2000-），女，硕士研究生，从事无人机飞控、强化学习方面的研究；E⁃mail：13619252281@163.com。
基金资助:
国家自然科学基金项目(62073158);辽宁省教育厅基本科研项目(LJKZ0401)

Abstract

Abstract:

Navigation and obstacle avoidance are critical for the successful completion of UAV tasks. However，traditional autonomous flight systems face limitations in complex environments，prompting researchers to explore alternative frameworks such as deep reinforcement learning (DRL). This paper proposes a novel DRL⁃based autonomous control algorithm for UAVs，which integrates the Deep Deterministic Policy Gradient (DDPG) algorithm to self⁃learn an optimal Proportional⁃Integral⁃Derivative (PID) controller.The performance of the proposed algorithm is evaluated through simulations in the Gazebo 3D robotic simulator to validate its effectiveness under complex conditions. Results indicate that the proposed method outperforms numerous existing methods in dynamic environments，particularly in terms of improved stability， faster response speed，and higher success rates.

Key words: Obstacle Avoidance, Deep reinforcement learning, Self?learning PID control, Gazebo

摘要：

导航与避障是无人机顺利完成任务的关键环节。然而，传统自主飞行系统在复杂环境中存在局限性，促使研究人员不断探索深度强化学习（DRL）等替代框架。因此，提出了一种基于深度强化学习的新型无人机自主控制算法；为验证所提算法在复杂环境下的性能提升效果，在加泽博（Gazebo）仿真平台的三维（3D）环境中进行了仿真实验。结果表明，该算法不仅融合了深度确定性策略梯度（DDPG）算法的优势，而且实现了对最优比例⁃积分⁃微分（PID）控制器的自学习；该算法在动态环境中的表现优于多种现有算法，具体体现在稳定性提升、响应速度加快及任务成功率提高等方面。

关键词: 避障, 深度强化学习, 自学习PID控制, 加泽博仿真平台

CLC Number:

TP312

Xinyue GAO, Ruiyuan ZOU, Jinna LI. Self⁃Learning PID Control Based on DDPG: Optimization of UAV Obstacle Avoidance in 3D Environments[J]. Journal of Liaoning Petrochemical University, 2026, 46(2): 88-96.

高心悦, 邹瑞元, 李金娜. 基于DDPG的自学习PID控制：无人机3D环境避障优化[J]. 辽宁石油化工大学学报, 2026, 46(2): 88-96.

Figures/Tables 11

Fig.1 Simulation experiment scenario

Fig.2 UAV coordinate attitude diagram

Fig.3 Quadrotor control model

Fig.4 PID cascade control structure

Fig.5 UAV visual data processing workflow

Table 1 Training parameters of DDPG

序号	训练参数	数值
1	折扣因子	0.99
2	仿真步长	0.01
3	Actor网络学习率	1×10^-6
4	Critic网络学习率	1×10^-4
5	记忆池	10 000

Fig.6 Overall algorithm framework

Fig.7 Comparison of PID control signal amplification based on DQN and PID based on DDPG under disturbances

Fig.8 3D top view of the experimental process

Fig.9 2D plane flight trajectory of UAV

Fig.10 Success rate comparison of different algorithms

References 19

[1]	杨静雯，李涛，杨欣，等.基于避障路径规划的无人直升机空地跟踪控制［J］.辽宁石油化工大学学报，2024，44（1）：71⁃79.
	YANG J W，LI T，YANG X，et al.Collaborative air⁃ground tracking control of unmanned helicopter based on obstacle avoidance path planning［J］.Journal of Liaoning Petrochemical University，2024，44（1）：71⁃79.
[2]	葛琳琳.模糊控制在布料小车定位系统中的应用［J］.辽宁石油化工大学学报，2014，34（5）：52⁃56.
	GE L L.The application of fuzzy control in the positioning systems of unloading carriage［J］.Journal of Liaoning Shihua University，2014，34（5）：52⁃56.
[3]	孙风山，张威，葛琳琳，等.移动机器人在未知环境中避障的控制策略［J］.辽宁石油化工大学学报，2016，36（4）：69⁃72.
	SUN F S，ZHANG W，GE L L，et al.The control strategy of obstacle avoidance for mobile robot in unknown environment［J］.Journal of Liaoning Shihua University，2016，36（4）：69⁃72.
[4]	DUSHIME K，NKENYEREYE L，YOO S K，et al.A review on collision avoidance systems for unmanned aerial vehicles［C］//2021 International Conference on Information and Communication Technology Convergence （ICTC）.Jeju Island： IEEE，2021：1150⁃1155.
[5]	TIAN S S，LI Y A，ZHANG X，et al.Fast UAV path planning in urban environments based on three⁃step experience buffer sampling DDPG［J］.Digital Communications and Networks，2024，10（4）：813⁃826.
[6]	LUO J，WANG Z X，PAN K L.Reliable path planning algorithm based on improved artificial potential field method［J］.IEEE Access，2022，10：108276⁃108284.
[7]	AGUILAR W G，RODRÍGUEZ G A，ÁLVAREZ L，et al.Visual SLAM with a RGB⁃D camera on a quadrotor UAV using on⁃board processing［C］//Advances in Computational Intelligence. Cham：Springer，2017：596⁃606.
[8]	GEE T，JAMES J，VAN DER MARK W，et al.Lidar guided stereo simultaneous localization and mapping （SLAM） for UAV outdoor 3⁃D scene reconstruction［C］//2016 International Conference on Image and Vision Computing New Zealand （IVCNZ）.Palmerston North：IEEE，2016：1⁃6.
[9]	ZHANG H H，HE P K，ZHANG M，et al.UAV target tracking method based on deep reinforcement learning［C］//2022 International Conference on Cyber⁃Physical Social Intelligence （ICCSI）. Nanjing：IEEE，2022：274⁃277.
[10]	YAN H，ZHAO W W，CHEN C，et al.MCTA： multi⁃UAV collaborative target allocation to monitor targets with dynamic importance［C］//2020 6th International Conference on Big Data and Information Analytics （BigDIA）.Shenzhen：IEEE，2020：50⁃57.
[11]	FENG X，GAO R，TANG F J，et al.Multi⁃UAV collaborative reconnaissance based on multi agent deep reinforcement learning［C］//2024 10th International Conference on Big Data and Information Analytics （BigDIA）.Chiang Mai：IEEE，2024：65⁃70.
[12]	SINGH P，ASHURI B.AMEKUDZI⁃KENNEDY A.Application of dynamic adaptive planning and risk⁃adjusted decision trees to capture the value of flexibility in resilience and transportation planning［J］.Transportation Research Record：Journal of the Transportation Research Board，2020，2674（9）：298⁃310.
[13]	WANG Y H，HE Y，YU F R，et al.Efficient resource allocation in multi⁃UAV assisted vehicular networks with security constraint and attention mechanism［J］.IEEE Transactions on Wireless Communications，2023，22（7）：4802⁃4813.
[14]	SPIELBERG S P K，GOPALUNI R B，LOEWEN P D.Deep reinforcement learning approaches for process control［C］//2017 6th International Symposium on Advanced Control of Industrial Processes （AdCONIP）.Taipei：IEEE，2017：201⁃206.
[15]	HU X S，LIU J.An obstacle avoidance design of UAV based on genetic algorithm［J］.Journal of Physics：Conference Series，2020，1633（1）：012119.
[16]	KAHILI K，BOUHALI O，KHENFRI F，et al.Robust intelligent self⁃tuning PID controller for the body⁃rate stabilization of quadrotors［C］//IECON 2019 ⁃ 45th Annual Conference of the IEEE Industrial Electronics Society. Lisbon：IEEE，2019：5281⁃5286.
[17]	MUKHOPADHYAY R，BANDYOPADHYAY S，SUTRADHAR A，et al.Performance analysis of deep Q networks and advantage actor critic algorithms in designing reinforcement learning⁃based self⁃tuning PID controllers［C］//2019 IEEE Bombay Section Signature Conference （IBSSC）. Mumbai：IEEE，2019：1⁃6.
[18]	CETINSAYA B， REINERS D， CRUZ⁃NEIRA C. From PID to swarms： a decade of advancements in drone control and path planning⁃a systematic review （2013⁃2023）［J］.Swarm and Evolutionary Computation，2024，89：101626.
[19]	SUN Q F，DU C Z，DUAN Y X，et al.Design and application of adaptive PID controller based on asynchronous advantage actor⁃critic learning method［J］.Wireless Networks，2021，27（5）：3537⁃3547.

Self⁃Learning PID Control Based on DDPG: Optimization of UAV Obstacle Avoidance in 3D Environments

基于DDPG的自学习PID控制：无人机3D环境避障优化

HTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 11

References 19

Related Articles 3

Recommended Articles

Metrics

[1]	Jingwen YANG, Tao LI, Xin YANG, Mingfei JI. Collaborative Air⁃Ground Tracking Control of Unmanned Helicopter Based on Obstacle Avoidance Path Planning [J]. Journal of Liaoning Petrochemical University, 2024, 44(1): 71-79.
[2]	Yuanjie Liu, Qiang Liu. A Pipe Layout Sequence Optimization Method Based on Disassembly Complexity [J]. Journal of Liaoning Petrochemical University, 2023, 43(2): 86-91.
[3]	Fang Jianfei， Zhang Wei， Ge Linlin. Path Planning of Mobile Robot Based on Obstacle Avoidance Switching Control [J]. Journal of Liaoning Petrochemical University, 2017, 37(4): 65-69.