• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:兰晓芳,刘 卓,许志豪,肖 毅.基于TF-IDF和TextRank结合的中文文本关键词提取方法———以体育新闻为例[J].软件工程,2023,26(8):6-10.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于TF-IDF和TextRank结合的中文文本关键词提取方法———以体育新闻为例
兰晓芳1, 刘 卓2, 许志豪1, 肖 毅2
(1.湖南农业大学东方科技学院, 湖南 长沙 410128;
2.湖南农业大学信息与智能科学技术学院, 湖南 长沙 410128)
lanxf@stu.hunau.edu.cn; fetty_max@163.com; guapideyouxiang@stu.hunau.edu.cn; xiaoyi@hunau.edu.cn
摘 要: 利用文本挖掘技术进行体育热点分析,可以为体育领域的发展提供更多有用的信息。文中提出了一种基于TF-IDF(Term Frequency-Inverse Document Frequency,词频-逆文档频率)和TextRank(文本排序)的中文文本关键词提取方法,该方法首先采用分词、去除停用词等对文本进行预处理;其次使用TF-IDF算法计算每个词的重要性并进行归一化处理,同时使用TextRank算法权衡单词之间的关系并计算每个单词的得分以进行归一化处理;最后将TF-IDF值和TextRank得分进行加权和得到每个词的综合权重值,最终获得权重值最高的N 个关键词。应用TF-IDF和TextRank结合的方法在F1 值上选择5个关键词时取得了更好的结果,相较于只使用TF-IDF方法或TextRank方法,其关键词提取准确率分别提高约40%和32%。该方法有效提高了关键词提取的准确性和提取效率。
关键词: TF-IDF;TextRank;体育新闻;关键词提取
中图分类号: TP391.1    文献标识码: A
A Chinese Text Keyword Extraction Method Based on the Combination of TF-IDF and TextRank——— A Case Study of Sports News
LAN Xiaofang1, LIU Zhuo2, XU Zhihao1, XIAO Yi2
(1.Oriental College of Science and Technology, Hunan Agricultural University, Changsha 410128, China;
2.College of Information and Intelligent, Hunan Agricultural University, Changsha 410128, China)
lanxf@stu.hunau.edu.cn; fetty_max@163.com; guapideyouxiang@stu.hunau.edu.cn; xiaoyi@hunau.edu.cn
Abstract: Using text mining techniques for sports hot topic analysis can provide more useful information for the development of the sports field. This paper proposes a method for extracting Chinese text keywords based on TF-IDF and TextRank. This method preprocesses the text by tokenizing and removing stop words, and then calculates the importance of each word using the TF-IDF algorithm and normalizes the values. Fianlly, the TextRank algorithm is used to weigh the relationships between words and calculate scores for each word, which are also normalized. Finally, the TF-IDF values and TextRank scores are weighted to obtain a comprehensive weight for each word, ultimately obtaining the N keywords with the highest weight value. The method of combining TF-IDF and TextRank achieved better results when selecting 5 keywords on F1 value, and compared to using only TF-IDF method or TextRank method, the accuracy of keyword extraction increases by about 40% and 32% , respectively. This method effectively improves the accuracy and efficiency of keyword extraction.
Keywords: TF-IDF; TextRank; sports news; keyword extraction


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫