• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:王云云,张云华.基于双词语义增强的BTM主题模型研究[J].软件工程,2020,23(4):1-6.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于双词语义增强的BTM主题模型研究
王云云,张云华1,2
1.(浙江理工大学信息学院,浙江 杭州 310018)1528723134@qq.com;2.605498519@qq.com
摘 要: 针对目前短文本在BTM主题模型建模过程中存在的共现双词之间语义联系较弱的问题,提出一种结合 cw2vec词向量模型的改进BTM主题模型(cw2vec-BTM)。使用cw2vec模型来训练短文本语料得到词向量,并计算词向 量相似度。然后通过设置采样阈值来改进BTM主题模型共现双词的采样方式,增加语义相关词语的被采样概率。实验 结果证明,本文提出的改进模型能有效地提高主题模型的主题凝聚度和KL散度。
关键词: 短文本;BTM主题模型;词向量;吉布斯采样
中图分类号: TP391.1    文献标识码: A
Research on BTM Topic Model Based on Two-Word Meaning Enhancement
WANG Yunyun,ZHANG Yunhua1,2
1.( School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China) 1528723134@qq.com;2.605498519@qq.com
Abstract: Aimingat the problem of weak semantic relationship between co-occurrence words in the short text in the BTM topic model modeling process,an improved BTM topic model (cw2vec-BTM) combined with the cw2vec word vector model was proposed.This research uses the cw2vec model to train short text corpora to obtain word vectors and calculates the word vector similarity.Then by setting the sampling threshold,the sampling method for co-occurrence words in the BTM topic model is improved,while the sampling probability of semantically related words is increased.The experimental results prove that the improved model proposed in this paper can effectively improve the topic cohesion and KL divergence of the topic model.
Keywords: short text;BTM topic model;word vector;gibbs sampling


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫