• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:李丽君,张海清,李代伟,向筱铭,于 曦.基于冗余性分析的改进ReliefF特征选择算法[J].软件工程,2023,26(11):48-51.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于冗余性分析的改进ReliefF特征选择算法
李丽君1,4, 张海清1,4, 李代伟1,4, 向筱铭2, 于 曦3
(1.成都信息工程大学软件工程学院, 四川 成都 610225;
2.四川省气象探测数据中心, 四川 成都 610072;
3.成都大学斯特灵学院, 四川 成都 610106;
4.四川省信息化应用支撑软件工程技术研究中心, 四川 成都 610255)
2432094015@qq.com; zhanghq@cuit.edu.cn; ldwcuit@cuit.edu.cn; micxiang@foxmail.com; yuxi@cdu.edu.cn
摘 要: 为了解决ReliefF算法随机抽样会抽取到不具代表性的样本且未考虑特征间相关性的问题,提出基于冗余性分析的ReliefF特征选择算法。首先改进ReliefF的抽样策略,其次将特征权重序列划分为几个子集,分别利用最大信息系数及Pearson系数共同衡量特征相关性,设置相应采样比例剔除冗余特征。将改进算法与其他特征选择算法进行对比,结果表明相较于传统ReliefF,在LightGBM(Light Gradient Boosting Machine,轻量级梯度提升机器学习)上的分类准确率可提升0.63%~12.10%,在SVM(Support Vector Machine,支持向量机)上的分类准确率可提升0.92%~9.06%,改进算法的分类准确率明显优于其他几种特征选择算法,在考虑特征与标签相关性的同时,能有效剔除冗余信息。
关键词: 特征选择;ReliefF算法;最大信息系数;冗余性分析
中图分类号: TP181    文献标识码: A
基金项目: 欧盟项目(598649-EPP-1-2018-1-FR-EPPKA2-CBHE-JP);国家自然科学基金项目(61602064);四川省科技厅项目(2021YFH0107,2022YFS0544,2022NSFSC0571);成都信息工程大学科技创新能力提升计划项目,面向大规模医疗数据的疾病风险评估预测优化研究(KYQN202223)
Improved ReliefF Feature Selection Algorithm Base on Analysis of Redundancy
LI Lijun1,4, ZHANG Haiqing1,4, LI Daiwei1,4, XIANG Xiaoming2, YU Xi3
(1.School of So f tware Engineering, Chengdu University of In f ormation Technology, Chengdu 610225, China;
2.Sichuan Meteorological Observation and Data Centre, Chengdu 610072, China;
3.Stirling College, Chengdu University, Chengdu 610106, China;
4.Sichuan Province Engineering Technology Research Center of Support So f tware of In f ormatization Application, Chengdu 610225, China)

2432094015@qq.com; zhanghq@cuit.edu.cn; ldwcuit@cuit.edu.cn; micxiang@foxmail.com; yuxi@cdu.edu.cn
Abstract: This paper proposes a ReliefF feature selection algorithm based on redundancy analysis to solve the problem of randomly selecting non-representative samples without considering the correlation between features in the ReliefF algorithm. Firstly, the sampling strategy of ReliefF is improved, and then the feature weight sequence is divided into several subsets. The maximum information coefficient and Pearson coefficient are used to jointly measure feature correlation, and corresponding sampling ratios are set to eliminate redundant features. Comparing the improved algorithm with other feature selection algorithms, the results show that compared to traditional ReliefF, the classification accuracy of the improved algorithm can be improved by 0.63% ~12.10% on LightGBM (Light Gradient Boosting Machine), and improved by 0.92% ~9.06% on SVM(Support Vector Machine). The classification accuracy of the improved algorithm is significantly better than other feature selection algorithms, and it can effectively eliminate redundant information while considering the correlation between features and labels.
Keywords: feature selection; ReliefF algorithm; maximum information coefficient; analysis of redundancy


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫