软件工程

引用本文:

苏港,叶宝林,姚青,陈滨,张一嘉.基于改进多智能体Nash Q Learning的交通信号协调控制[J].软件工程,2024,27(10):43-49.【点击复制】

分享到：微信更多

基于改进多智能体Nash Q Learning的交通信号协调控制

苏港^1,2, 叶宝林², 姚青¹, 陈滨², 张一嘉¹

(1.浙江理工大学信息科学与工程学院, 浙江杭州 310018;
2.嘉兴大学嘉兴市智慧交通重点实验室, 浙江嘉兴 314001)
1018478742@qq.com; yebaolin@zjxu.edu.cn; q-yao@zstu.edu.cn; chenbin@zjxu.edu.cn; waiting@zstu.edu.cn

摘要: 为了优化区域交通信号配时方案,提升区域通行效率,文章提出一种基于改进多智能体Nash Q Learning的区域交通信号协调控制方法。首先,采用离散化编码方法,通过划分单元格将连续状态信息转化为离散形式。其次,在算法中融入长短时记忆网络(Long Short Term Memory,LSTM)模块,用于从状态数据中挖掘更多的隐藏信息,丰富Q值表中的状态数据。最后,基于微观交通仿真软件SUMO(Simulation of Urban Mobility)的仿真测试结果表明,相较于原始Nash Q Learning交通信号控制方法,所提方法在低、中、高流量下车辆的平均等待时间分别减少了11.5%、16.2%和10.0%,平均排队长度分别减少了9.1%、8.2%和7.6%,平均停车次数分别减少了18.3%、16.1%和10.0%。结果证明了该算法具有更好的控制效果。

关键词: 区域交通信号协调控制马尔科夫决策多智能体Nash Q Learning LSTM SUMO

中图分类号: TP181 文献标识码: A

基金项目: 国家自然科学基金资助项目(61603154);浙江省自然科学基金资助项目(LTGS23F030002);嘉兴市应用性基础研究项目(2023AY11034);工业控制技术国家重点实验室开放课题(ICT2022B52)

Traffic Signal Coordination Control Based on Improved Multi-Agent Nash Q Learning

SU Gang^1,2, YE Baolin², YAO Qing¹, CHEN Bin², ZHANG Yijia¹

(1.School of In formation Science and Engineering, Zhejiang Sci-Tech University, Hangzhou 310018, China;
2.Jiaxing Key Laboratory of Smart Transportations, Jiaxing University, Jiaxing 314001, China)
1018478742@qq.com; yebaolin@zjxu.edu.cn; q-yao@zstu.edu.cn; chenbin@zjxu.edu.cn; waiting@zstu.edu.cn

Abstract: In order to optimize the coordination timing scheme of regional traffic signals and improve traffic efficiency, this paper proposes a regional traffic signal coordination control method based on an improved multi-agent Nash Q Learning. First, a discretization coding method is employed to convert continuous state information into a discrete form by dividing it into cells. Second, a Long Short Term Memory (LSTM) module is incorporated into the algorithm to mine more hidden information from state data and enrich the state data in the Q value table. Finally, simulation tests based on the microscopic traffic simulation software SUMO (Simulation of Urban Mobility) show that, compared to the original Nash Q Learning traffic signal control method, the proposed method reduces the average waiting time for vehicles by 11.5% , 16.2% , and 10.0% under low, medium, and high traffic flows, respectively. It also decreases the average queue length by 9.1% , 8.2% , and 7.6% , and reduces the average number of stops by 18.3% , 16.1% ,and 10.0% . The results demonstrate that this algorithm achieves better control performance.

Keywords: regional traffic signal coordination control Markov decision mult-i agent Nash Q Learning LSTM SUMO

用微信扫一扫