• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:徐鸿艳,孙云山,秦琦琳,朱明涛.缺失数据插补方法性能比较分析[J].软件工程,2021,24(11):11-14.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
缺失数据插补方法性能比较分析
徐鸿艳1,孙云山2,秦琦琳1,朱明涛2
(1.天津商业大学理学院,天津 300134;
2.天津商业大学信息工程学院,天津 300134)
2552727224@qq.com; sunyunshan@tjcu.edu.cn; 3099141857@qq.com; 648191948@qq.com
摘 要: 数据缺失问题在现实工作生活中不可避免,为保证信息完整度以便于后续统计分析,尽可能准确地预测填补缺失值则显得尤为重要。基于两组分别服从于高斯分布和伽马分布的模拟数据集和一组非洲地区部分国家预期寿命实际数据,分别预设5%、10%和20%三种缺失比例,利用计算机软件对四种插补方法统计结果进行比较分析。试验结果表明,模拟数据中自回归建模插补和均值插补整体效果略优于最近邻插补和线性回归插补;实际数据中当缺失数据比例较低时,最近邻插补和线性回归插补效果优于前两者,当缺失比例较高时与模拟数据效果无明显差异。
关键词: 缺失数据;插补方法;自回归建模
中图分类号: TP399    文献标识码: A
Comparative Analysis of the Performance of Interpolation Methods for Missing Data
XU Hongyan1, SUN Yunshan2, QIN Qilin1, ZHU Mingtao2
( 1.School of Science, Tianjin University of Commerce, Tianjin 300134, China;
2.School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China)
2552727224@qq.com; sunyunshan@tjcu.edu.cn; 3099141857@qq.com; 648191948@qq.com
Abstract: Data missing is inevitable. In order to ensure information integrity and follow-up statistical analysis, it is particularly important to predict and fill in missing values as accurately as possible. Based on two sets of simulated data sets that are subject to Gaussian distribution and Gamma distribution respectively, and a set of actual life expectancy data of some countries in Africa, three missing ratios of 5%, 10% and 20% are preset respectively, and the statistical results of the four interpolation methods are compared and analyzed by computer software. The experimental results show that the overall effect of auto-regression modeling interpolation and mean interpolation in simulated data is slightly better than that of K-nearest neighbor interpolation and linear regression interpolation. In actual data, when the proportion of missing data is low, K-nearest neighbor interpolation and linear regression is better than the former two, and there is no significant difference in the effect of the simulated data when the missing ratio is high.
Keywords: missing data; interpolation method; autoregressive


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫