• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:焦树恒,张善卿.一种基于Transformer的双流文档图像质量评价算法[J].软件工程,2025,28(2):42-45.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
一种基于Transformer的双流文档图像质量评价算法
焦树恒,张善卿
(杭州电子科技大学计算机学院,浙江 杭州 310018)
yidachuanshuzi@hdu.edu.cn; sqzhang@hdu.edu.cn
摘 要: 为了解决文档图像质量评价网络对图像特征提取不充分、评价指标不恰当等问题,提出了一种基于Transformer的双流文档图像质量评价算法。首先,利用Transformer提取图像特征,计算特征通道间注意力;其次,使用权重模块预测文档图像OCR(光学字符识别)准确率作为文档图像质量得分,使用CNN(卷积神经网络)提取文档全局特征,全连接后预测图像的自然图像得分;最后,将两者得分结合作为预测图像的质量得分。实验结果表明,基于Transformer的双流文档图像质量评价算法在数据集上的皮尔逊线性相关系数(PLCC)达到0.9045,史比尔曼等级相关系数(SROCC)达到0.8775,证明该算法可以预测出更符合人类视觉标准的文档图像质量分数。
关键词: 图像质量评价;文档图像;Transformer;神经网络
中图分类号: TP391    文献标识码: A
基金项目: 国家自然科学基金资助(62172132)
Researchon Dual-stream Document Image Quality Assessment Algorithm Basedon Transformer
JIAO Shuheng, ZHANG Shanqing
(School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China)
yidachuanshuzi@hdu.edu.cn; sqzhang@hdu.edu.cn
Abstract: To address issues such as insufficient feature extraction and inappropriate evaluation metrics in document image quality assessment networks, this paper proposes a Dual-stream Document Image Quality Assessment (DSDIQA)algorithm based on Transformer. Firstly, Transformer is employed to extract image features and calculate attention between feature channels. Secondly, a weighting module is used to predict the OCR (Optical Character Recognition) accuracy of document images as the mage quality score, while a CNN (Convolutional Neural Network) is used to extract the global features of the document and the natural image score of the image is predicted after the full connectivity. Finally, the two scores are combined to form the overall quality score of the predicted image. Experimental results show that the Transformer-based dual-stream document image quality evaluation algorithm achieves a Pearson Linear Correlation Coefficient (PLCC) of 0.904 5 and a Spearman Rank Order Correlation Coefficient (SROCC) of 0.877 5 on the dataset, demonstrating that the algorithm can predict document image quality scores that align more closely with human visual standards.
Keywords: image quality assessment; document image; Transformer; neural network


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫