Journal of Liaoning Petrochemical University ›› 2026, Vol. 46 ›› Issue (2): 88-96.DOI: 10.12422/j.issn.1672-6952.2026.02.010

• Information and Control Engineering • Previous Articles    

Self⁃Learning PID Control Based on DDPG: Optimization of UAV Obstacle Avoidance in 3D Environments

Xinyue GAO1(), Ruiyuan ZOU2, Jinna LI1()   

  1. 1.School of Information and Control Engineering,Liaoning Petrochemical University,Fushun Liaoning 113001,China
    2.Aircraft Maintenance and Engineering Corporation,Beijing 100621,China
  • Received:2025-11-25 Revised:2026-01-01 Published:2026-04-25 Online:2026-04-21
  • Contact: Jinna LI

基于DDPG的自学习PID控制:无人机3D环境避障优化

高心悦1(), 邹瑞元2, 李金娜1()   

  1. 1.辽宁石油化工大学 信息与控制工程学院,辽宁 抚顺 113001
    2.北京飞机维修工程有限公司,北京 100621
  • 通讯作者: 李金娜
  • 作者简介:高心悦(2000-),女,硕士研究生,从事无人机飞控、强化学习方面的研究;E⁃mail:13619252281@163.com
  • 基金资助:
    国家自然科学基金项目(62073158);辽宁省教育厅基本科研项目(LJKZ0401)

Abstract:

Navigation and obstacle avoidance are critical for the successful completion of UAV tasks. However,traditional autonomous flight systems face limitations in complex environments,prompting researchers to explore alternative frameworks such as deep reinforcement learning (DRL). This paper proposes a novel DRL⁃based autonomous control algorithm for UAVs,which integrates the Deep Deterministic Policy Gradient (DDPG) algorithm to self⁃learn an optimal Proportional⁃Integral⁃Derivative (PID) controller.The performance of the proposed algorithm is evaluated through simulations in the Gazebo 3D robotic simulator to validate its effectiveness under complex conditions. Results indicate that the proposed method outperforms numerous existing methods in dynamic environments,particularly in terms of improved stability, faster response speed,and higher success rates.

Key words: Obstacle Avoidance, Deep reinforcement learning, Self?learning PID control, Gazebo

摘要:

导航与避障是无人机顺利完成任务的关键环节。然而,传统自主飞行系统在复杂环境中存在局限性,促使研究人员不断探索深度强化学习(DRL)等替代框架。因此,提出了一种基于深度强化学习的新型无人机自主控制算法;为验证所提算法在复杂环境下的性能提升效果,在加泽博(Gazebo)仿真平台的三维(3D)环境中进行了仿真实验。结果表明,该算法不仅融合了深度确定性策略梯度(DDPG)算法的优势,而且实现了对最优比例⁃积分⁃微分(PID)控制器的自学习;该算法在动态环境中的表现优于多种现有算法,具体体现在稳定性提升、响应速度加快及任务成功率提高等方面。

关键词: 避障, 深度强化学习, 自学习PID控制, 加泽博仿真平台

CLC Number: 

Cite this article

Xinyue GAO, Ruiyuan ZOU, Jinna LI. Self⁃Learning PID Control Based on DDPG: Optimization of UAV Obstacle Avoidance in 3D Environments[J]. Journal of Liaoning Petrochemical University, 2026, 46(2): 88-96.

高心悦, 邹瑞元, 李金娜. 基于DDPG的自学习PID控制:无人机3D环境避障优化[J]. 辽宁石油化工大学学报, 2026, 46(2): 88-96.