基于Q学习的异构多智能体系统最优一致性

doi:10.3969/j.issn.1672-6952.2022.04.011

辽宁石油化工大学学报 ›› 2022, Vol. 42 ›› Issue (4): 59-67.DOI: 10.3969/j.issn.1672-6952.2022.04.011

• 复杂系统智能控制专栏 • 上一篇下一篇

基于Q学习的异构多智能体系统最优一致性

程薇燃(), 李金娜()

辽宁石油化工大学信息与控制工程学院，辽宁抚顺 113001

收稿日期:2022-06-08 修回日期:2022-07-25 出版日期:2022-08-25 发布日期:2022-09-26
通讯作者: 李金娜
作者简介:程薇燃（1996⁃），女，硕士研究生，从事优化控制、强化学习、数据驱动方面研究；E⁃mail：WeiRanCheng@163.com。
李金娜，博士研究生导师，IEEE高级会员，辽宁省高等学校创新人才，辽宁省高等学校杰出青年学者。主持国家自然科学基金面上项目2项、国家自然科学基金青年基金1项、省部级项目10余项。发表学术论文80余篇。其中，IEEE 汇刊发表学术论文15篇，ESI高被引论文1篇；在IEEE Transactions on Neural Networks and Learning Systems， IEEE Transactions on Cybernetics, IEEE Transactions on Industrial Electronics等中科院大类分区I区期刊发表学术论文9篇。科学出版社出版专著1部。获得辽宁省自然科学奖三等奖1项、中国仪器仪表学会技术发明奖二等奖1项、中国产学研合作创新与促进奖产学研合作创新成果奖1项，以及其他省部级奖励二等奖1项、三等奖2项。
基金资助:
国家自然科学基金项目(62073158);辽宁省重点领域联合开放基金项目(2019?KF?03?06);辽宁省教育厅基本科研项目(LJKZ0401);辽宁石油化工大学研究基金项目(2018XJJ?005)

Optimal Consensus of Heterogeneous Multi⁃Agent Systems Based on Q⁃Learning

Weiran Cheng(), Jinna Li()

School of Information and Control Engineering，Liaoning Petrochemical University，Fushun Liaoning 113001，China

Received:2022-06-08 Revised:2022-07-25 Published:2022-08-25 Online:2022-09-26
Contact: Jinna Li

摘要/Abstract

摘要：

对有领导者的异构离散多智能体系统的最优一致性问题，提出了一种无模型的基于非策略强化学习的控制协议设计方法。由于异构多智能体系统的状态矩阵不同，其局部邻居误差的动态表达式比较复杂。与现有的多智能体系统分布式控制方案相比，所提算法减少了计算的复杂性。首先，建立由增广变量构造的多智能体系统全局邻居误差动态表达式。其次，通过二次型形式的值函数得到耦合贝尔曼方程和Hamilton?Jacobi?Bellman（HJB）方程。再次，求解耦合HJB方程的最优解，得到多智能体最优一致性的纳什均衡解，并给出纳什均衡证明。从次，基于无模型的非策略Q学习算法，求解多智能体最优一致性的纳什均衡解。最后，利用批判神经网络结构，结合梯度下降法实现了所提出的算法，并通过仿真实例验证了算法的有效性。

关键词: 多智能体系统, 神经网络, 强化学习, 最优一致性

Abstract:

This paper proposes a model?free control protocol design method based on off?policy reinforcement learning for solving the optimal consensus problem of heterogeneous multi?agent systems with leaders. The dynamic expression of local neighborhood error is complicated for the heterogeneous multi?agent systems because of its different system state matrices. Compared with the existing solution of designing observer for distributed control of multi?agent system, the method of solving global neighborhood error state expression proposed in this paper reduces the complexity of calculation. Firstly, the dynamic expression of global neighborhood error of multi?agent system constructed from augmented variables is established. Secondly, the coupled Bellman equation and HJB equation are obtained through the value function of quadratic form. Then, the Nash equilibrium solution of the multi?agent optimal consensus is obtained by solving the optimal solution of the coupled HJB equation, and the Nash equilibrium proof is given. Thirdly, an off?policy Q?learning algorithm is proposed to learn the Nash equilibrium solution of the multi?agent optimal consensus. Then, the proposed algorithm is implemented by using the critic neural network structure and gradient descent method. Finally, a simulation example is given to verify the effectiveness of the proposed algorithm.

Key words: Multi?agent system, Neural network, Reinforcement learning, Optimal consensus

中图分类号:

TP13

程薇燃, 李金娜. 基于Q学习的异构多智能体系统最优一致性[J]. 辽宁石油化工大学学报, 2022, 42(4): 59-67.

Weiran Cheng, Jinna Li. Optimal Consensus of Heterogeneous Multi⁃Agent Systems Based on Q⁃Learning[J]. Journal of Liaoning Petrochemical University, 2022, 42(4): 59-67.

图/表 5

图1 多智能体系统拓扑图

图2 智能体的批判神经网络权重曲线

图3 领导者和智能体的状态曲线

图4 智能体的邻居误差曲线

图5 4个智能体的状态曲线

参考文献 16

1	王国良，孙媛媛. 基于牵制控制带有协议失效多智能体的一致性［J］. 辽宁石油化工大学学报， 2020， 40（5）： 73⁃78.
2	Zhao W， Li R F， Zhang H P. Finite⁃time distributed formation tracking control of multi⁃UAVs with a time⁃varying reference trajectory［J］. IMA Journal of Mathematical Control and Information， 2018， 35（4）： 1297⁃1318.
3	Ren Y X， Wang Q S， Duan Z S. Optimal distributed leader⁃following consensus of linear multi⁃agent systems： A dynamic average consensus⁃based approach［J］. IEEE Transactions on Circuits and Systems II： Express Briefs， 2022， 69（3）： 1208⁃1212.
4	Li J N， Modares H， Chai T Y， et al. Off⁃policy reinforcement learning for synchronization in multiagent graphical games［J］. IEEE Transactions on Neural Networks and Learning Systems， 2017， 28（10）： 2434⁃2445.
5	张化光，张欣，罗艳红，等.自适应动态规划综述［J］.自动化学报，2013，39（4）： 303⁃311.
6	Jiang Y， Jiang Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems［J］. IEEE Transactions on Neural Networks & Learning Systems， 2017， 25（5）： 882⁃893.
7	Dong H Y，Zhao X W， Yang H Y. Reinforcement learning⁃based approximate optimal control for attitude reorientation under state constraints［J］. IEEE Transactions on Control Systems Technology， 2021， 29（4）： 1664⁃1673.
8	Abouheaf M I， Lewis F L， Mahmoud M S， et al. Discrete⁃time dynamic graphical games： Model⁃free reinforcement learning solution［J］. Control Theory and Technology， 2015（1）： 55⁃69.
9	Qin J H， Li M， Shi Y， et al. Optimal synchronization control of multiagent systems with input saturation via off⁃policy reinforcement learning［J］. IEEE Transactions on Neural Networks and Learning Systems， 2018， 30（1）： 85⁃96.
10	Mu C X， Zhao Q， Gao Z K， et al. Q⁃learning solution for optimal consensus control of discrete⁃time multiagent systems using reinforcement learning［J］. Journal of the Franklin Institute， 2019， 356（13）： 6946⁃6967.
11	Cai H，Lewis F L，Hu G Q，et al. The adaptive distributed observer approach to the cooperative output regulation of linear multi⁃agent systems［J］. Automatica， 2017， 75： 299⁃305.
12	Chen C，Xie K，Lewis F L，et al. Fully distributed resilience for adaptive exponential synchronization of heterogeneous multiagent systems against actuator faults［J］. IEEE Transactions on Automatic Control， 2019， 64（8）： 3347⁃3354.
13	Olfati⁃Saber R， Murray R M. Consensus problems in networks of agents with switching topology and time⁃delays［J］. IEEE Transactions on Automatic Control， 2004， 49（9）： 1520⁃1533.
14	Khoo S， Xie L， Man Z. Robust finite⁃time consensus tracking algorithm for multirobot systems［J］. IEEE/ASME Transactions on Mechatronics， 2009， 14（2）： 219⁃228.
15	Olfati⁃Saber R， Fax J A， Murray R M. Consensus and cooperation in networked multi⁃agent systems［J］. Proceedings of the IEEE， 2007， 95（1）： 215⁃233.
16	李金娜，程薇燃.基于强化学习的数据驱动多智能体系统最优一致性综述［J］.智能科学与技术学报，2020，2（4）：327⁃340.

[1]	杨静雯, 李涛, 杨欣, 冀明飞. 基于避障路径规划的无人直升机空地跟踪控制[J]. 辽宁石油化工大学学报, 2024, 44(1): 71-79.
[2]	赵丽洲, 张宁峰. 双循环背景下石化企业供应链韧性评价研究[J]. 辽宁石油化工大学学报, 2024, 44(1): 89-96.
[3]	赵珣, 陈帅, 邱海洋. 基于改进双向循环神经网络的变压器故障诊断模型研究[J]. 辽宁石油化工大学学报, 2023, 43(5): 75-83.
[4]	于春悦, 曹宇, 程旭. 基于神经网络的银行长期存款客户预测研究[J]. 辽宁石油化工大学学报, 2023, 43(5): 91-96.
[5]	姜楠, 罗林, 王乔, 侯维. 结构优化深度网络的高压断路器机械故障诊断[J]. 辽宁石油化工大学学报, 2023, 43(3): 91-96.
[6]	那新宇, 余华鹏, 金鑫, 王越. 基于ISSA的多变量ORVFL网络自适应预测控制[J]. 辽宁石油化工大学学报, 2023, 43(1): 80-88.
[7]	汤永恒, 潘斌. 基于双向转换网络的域自适应单幅图像去雾方法[J]. 辽宁石油化工大学学报, 2022, 42(6): 78-83.
[8]	刘明, 马嘉悦, 刘晓培, 侯明君, 周妍. 基于动态贝叶斯网络的气化炉烧嘴系统可靠性分析[J]. 辽宁石油化工大学学报, 2022, 42(2): 79-85.
[9]	周学均, 陈小强, 谢磊, 江成龙. 基于EMD的短期风速预测混合模型[J]. 辽宁石油化工大学学报, 2021, 41(6): 79-86.
[10]	王国良，孙媛媛. 基于牵制控制带有协议失效多智能体的一致性[J]. 辽宁石油化工大学学报, 2020, 40(5): 73-78.
[11]	裴小邓，罗林，陈帅，王乔. 面向电力变压器油中溶解气体的卷积神经网络诊断方法[J]. 辽宁石油化工大学学报, 2020, 40(5): 79-85.
[12]	陶文华，陈娇，桂运金，孔平平. DE算法改进的炼焦能耗RBF预测模型[J]. 辽宁石油化工大学学报, 2020, 40(2): 91-96.
[13]	赵士龙，李维军，石成江. 基于优化BP神经网络的铝箔封口检测研究[J]. 辽宁石油化工大学学报, 2019, 39(1): 97-100.
[14]	王先一, 何巨鹏, 谢静, 赵德望, 张云峰. FRP管混凝土轴压柱力学性能影响规律研究[J]. 辽宁石油化工大学学报, 2018, 38(1): 84-92.
[15]	马贵阳，朱赢. GM-BP模型在NGH生成中的预测研究[J]. 辽宁石油化工大学学报, 2018, 38(06): 93-98.

基于Q学习的异构多智能体系统最优一致性

Optimal Consensus of Heterogeneous Multi⁃Agent Systems Based on Q⁃Learning

HTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 16

相关文章 15

编辑推荐

Metrics