• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:鲜翠琼,秦 学,朱道恒,操淑敏.一种图文组合相似度算法的设计与优化[J].软件工程,2020,23(8):9-12.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
一种图文组合相似度算法的设计与优化
鲜翠琼,秦 学,朱道恒,操淑敏
(贵州大学大数据与信息工程学院,贵州 贵阳 550025)
1243139443@qq.com; 16702755@qq.com; dhzhu911@163.com; 982913044@qq.com
摘 要: 包含文字和图片的文档作为信息的一种载体,能够极大地丰富信息的表现形式。针对传统计算图文相似 度的算法效率不高的问题,提出一种图文组合相似度算法。将Jaccard相似系数引入余弦相似度,通过加权计算两文本 的相似度,然后用感知哈希算法计算文档中图片相似度并找出最大值,再计算单个文档中所有图片相似度均值,与文本 相似度加权求得文档的图文相似度。最后通过一个文档相似度查重系统验证了该算法能准确高效地完成文档之间相似度 的量化,且优化后的相似度算法能够极大提高该系统的运行效率。
关键词: 余弦相似度算法;Jaccard相似系数;感知哈希算法;文本相似度
中图分类号: TP391.1    文献标识码: A
Design and Optimization of a Similarity Algorithm for Combination of Graphics and Text
XIAN Cuiqiong, QIN Xue, ZHU Daoheng, CAO Shumin
( College of Big Data and Information Engineering, Guizhou University, Guiyang 550025, China )
1243139443@qq.com; 16702755@qq.com; dhzhu911@163.com; 982913044@qq.com
Abstract: As a carrier of information, documents containing both text and graphics can greatly enrich information. In view of the inconsistency of traditional algorithms for calculating similarity between graphics and text, this paper proposes a similarity algorithm for combining graphics and text. Jaccard similarity coefficient is introduced into cosine similarity, and the similarity of two kinds of text are calculated by weighting. Then, the graphic similarity in the document is calculated through PHash algorithm and the maximum value is derived. After that, the average similarity of all the graphics is calculated in a single document, and weighted with the text similarity, thus to obtain the similarity of both graphics and text of document. Finally, a document similarity check system is used to verify that the algorithm accurately and ef ciently quanti es the similarity between documents, and the optimized similarity algorithm greatly improves the ef ciency of the system.
Keywords: cosine similarity algorithm; Jaccard similarity coefficient; PHash similarity; text similarity


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫