• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:张永强,孔君君,崔 摇,李向南.基于随机森林的硬盘故障率预测研究[J].软件工程,2024,27(3):74-78.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于随机森林的硬盘故障率预测研究
张永强1,4, 孔君君1, 崔 摇2, 李向南3,4
(1.河北科技大学信息科学与工程学院, 河北 石家庄 050018;
2.石家庄常宏智能科技有限公司, 河北 石家庄 050004;
3.石家庄春晓互联网信息技术有限公司, 河北 石家庄 050061;
4.河北省智能物联网技术创新中心, 河北 石家庄 050018)
zyq@hebust.edu.cn; kjunjun555@163.com; cuiyao@changhong.cc; xiangnan.li@chunxiao.net
摘 要: 为了避免硬盘出现故障而造成大量数据丢失,文章提出一种基于随机森林的方法对硬盘的故障进行预测,降低其丢失数据的风险。首先,在数据预处理方面,对所采用的数据做特征映射预处理;其次,通过对决策树进行构建及选取等,构建随机森林预测模型,根据所选取的特征属性预测硬盘故障率所在的区间,并且特征属性的变化能反映出硬盘故障率的变化趋势;最后,对构建的随机森林模型参数进行调优,选取不同的n_estimators参数值进行测试和优化。实验结果表明,与XGBoost(Extreme Gradient Boosting)、LSTM(Long Short-Term Memory)等方法相比,本文方法的F1值(F-Measure)分别提高了0.93%和1.84%,并且对随机森林预测模型的参数值进行不同取值测试,最终准确率达到98.18%,比默认值提高了1.23%,证明该方法能更精确地预测硬盘故障率,反映出硬盘故障率基于特征属性的变化趋势。
关键词: 随机森林;硬盘故障率;故障率预测;特征映射;S.M.A.R.T属性
中图分类号: TP391    文献标识码: A
基金项目: 河北省自然科学基金(F2022208002);河北省高等学校科学技术研究重点项目(ZD2021048)
Research on Hard Disk Fault Rate Prediction Based on Random Forest
ZHANG Yongqiang1,4, KONG Junjun1, CUI Yao2, LI Xiangnan3,4
(1.School of Inf ormation Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018, China;
2.Shijiazhuang Changhong Intelligent Technology Co., Ltd., Shijiazhuang 050004, China;
3.Shijiazhuang Chunxiao Internet Inf ormation Technology Co., Ltd., Shijiazhuang 050061, China;
4. Hebei Technology Innovation Center of Intelligent IoT, Shijiazhuang 050018, China)
zyq@hebust.edu.cn; kjunjun555@163.com; cuiyao@changhong.cc; xiangnan.li@chunxiao.net
Abstract: Aiming at hard disk faults which result in a large amount of data loss, this paper proposes a Random Forest-based method to predict hard disk faults and reduce the risk of data loss. Firstly, in terms of data processing, feature mapping preprocessing for the data used is performed. Secondly, by constructing and selecting Decision Trees, a Random Forest model is constructed to predict the range of hard disk fault rate based on the selected feature attributes, the changes of which reflect the changing trend of hard disk fault rate. Finally, the parameters of the constructed Random Forest model are optimized and tested with different n_estimators parameter values. The experimental results show that compared with methods such as XGBoost (Extreme Gradient Boosting) and LSTM (Long Short Term Memory), the F1 value (F-Measure) of the proposed method has increased by 0.93% and 1.84% , respectively. In addition, the parameter values of the Random Forest model are tested with different values, and the final accuracy reaches 98.18% , which is 1.23% higher than the default value. This proves that the proposed method can predict the hard disk fault rate more accurately and reflect the changing trend of the hard disk fault rate based on feature attributes.
Keywords: Random Forest; hard disk fault rate; fault rate prediction; feature mapping; S.M.A.R.T attribute


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫