辽宁石油化工大学学报

辽宁石油化工大学学报 ›› 2022, Vol. 42 ›› Issue (4): 59-67.DOI: 10.3969/j.issn.1672-6952.2022.04.011

• 复杂系统智能控制专栏 • 上一篇    下一篇

基于Q学习的异构多智能体系统最优一致性

程薇燃(), 李金娜()   

  1. 辽宁石油化工大学 信息与控制工程学院,辽宁 抚顺 113001
  • 收稿日期:2022-06-08 修回日期:2022-07-25 出版日期:2022-08-25 发布日期:2022-09-26
  • 通讯作者: 李金娜
  • 作者简介:程薇燃(1996⁃),女,硕士研究生,从事优化控制、强化学习、数据驱动方面研究;E⁃mail:WeiRanCheng@163.com
    李金娜,博士研究生导师,IEEE高级会员,辽宁省高等学校创新人才,辽宁省高等学校杰出青年学者。主持国家自然科学基金面上项目2项、国家自然科学基金青年基金1项、省部级项目10余项。发表学术论文80余篇。其中,IEEE 汇刊发表学术论文15篇,ESI高被引论文1篇;在IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Cybernetics, IEEE Transactions on Industrial Electronics等中科院大类分区I区期刊发表学术论文9篇。科学出版社出版专著1部。获得辽宁省自然科学奖三等奖1项、中国仪器仪表学会技术发明奖二等奖1项、中国产学研合作创新与促进奖产学研合作创新成果奖1项,以及其他省部级奖励二等奖1项、三等奖2项。
  • 基金资助:
    国家自然科学基金项目(62073158);辽宁省重点领域联合开放基金项目(2019?KF?03?06);辽宁省教育厅基本科研项目(LJKZ0401);辽宁石油化工大学研究基金项目(2018XJJ?005)

Optimal Consensus of Heterogeneous Multi⁃Agent Systems Based on Q⁃Learning

Weiran Cheng(), Jinna Li()   

  1. School of Information and Control Engineering,Liaoning Petrochemical University,Fushun Liaoning 113001,China
  • Received:2022-06-08 Revised:2022-07-25 Published:2022-08-25 Online:2022-09-26
  • Contact: Jinna Li

摘要:

对有领导者的异构离散多智能体系统的最优一致性问题,提出了一种无模型的基于非策略强化学习的控制协议设计方法。由于异构多智能体系统的状态矩阵不同,其局部邻居误差的动态表达式比较复杂。与现有的多智能体系统分布式控制方案相比,所提算法减少了计算的复杂性。首先,建立由增广变量构造的多智能体系统全局邻居误差动态表达式。其次,通过二次型形式的值函数得到耦合贝尔曼方程和Hamilton?Jacobi?Bellman(HJB)方程。再次,求解耦合HJB方程的最优解,得到多智能体最优一致性的纳什均衡解,并给出纳什均衡证明。从次,基于无模型的非策略Q学习算法,求解多智能体最优一致性的纳什均衡解。最后,利用批判神经网络结构,结合梯度下降法实现了所提出的算法,并通过仿真实例验证了算法的有效性。

关键词: 多智能体系统, 神经网络, 强化学习, 最优一致性

Abstract:

This paper proposes a model?free control protocol design method based on off?policy reinforcement learning for solving the optimal consensus problem of heterogeneous multi?agent systems with leaders. The dynamic expression of local neighborhood error is complicated for the heterogeneous multi?agent systems because of its different system state matrices. Compared with the existing solution of designing observer for distributed control of multi?agent system, the method of solving global neighborhood error state expression proposed in this paper reduces the complexity of calculation. Firstly, the dynamic expression of global neighborhood error of multi?agent system constructed from augmented variables is established. Secondly, the coupled Bellman equation and HJB equation are obtained through the value function of quadratic form. Then, the Nash equilibrium solution of the multi?agent optimal consensus is obtained by solving the optimal solution of the coupled HJB equation, and the Nash equilibrium proof is given. Thirdly, an off?policy Q?learning algorithm is proposed to learn the Nash equilibrium solution of the multi?agent optimal consensus. Then, the proposed algorithm is implemented by using the critic neural network structure and gradient descent method. Finally, a simulation example is given to verify the effectiveness of the proposed algorithm.

Key words: Multi?agent system, Neural network, Reinforcement learning, Optimal consensus

中图分类号: 

引用本文

程薇燃, 李金娜. 基于Q学习的异构多智能体系统最优一致性[J]. 辽宁石油化工大学学报, 2022, 42(4): 59-67.

Weiran Cheng, Jinna Li. Optimal Consensus of Heterogeneous Multi⁃Agent Systems Based on Q⁃Learning[J]. Journal of Liaoning Petrochemical University, 2022, 42(4): 59-67.

使用本文

0
    /   /   推荐

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://journal.lnpu.edu.cn/CN/10.3969/j.issn.1672-6952.2022.04.011

               https://journal.lnpu.edu.cn/CN/Y2022/V42/I4/59