软件工程

引用本文:

周刚,李捍东,陈烨烨.基于对比学习的文本生成图像[J].软件工程,2025,28(2):38-41.【点击复制】

【打印本页】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】

←前一篇|后一篇→

过刊浏览

分享到：微信更多

基于对比学习的文本生成图像

周刚,李捍东,陈烨烨

(贵州大学电气工程学院,贵州贵阳 550025)
1101808591@qq.com; 470394668@qq.com; zgsrkl@126.com

摘要: 针对在多目标文本生成图像和语义相关度高的情况下,于CUB数据集中进行实验时,发现生成的鸟图像中有许多“多头”“多脚”情况,文章在MA-GAN(多阶段注意力机制的生成对抗网络)模型上加入对比学习以优化图像生成。同时,采用特征插值方法增强图像的某些特征,从而提高语义一致性和文本辨识度。通过在CUB和COCO数据集上的实现验证,改进后模型的IS(InceptionScore)指标分别提高了0.11和2.58,而R 分数(Rprecision)指标分别提高了1.98和1.37,证明了改进后的模型能够解决图像质量和语义一致性问题。

关键词: 文本生成图像对比学习文本特征表示特征插值

中图分类号: TP393 文献标识码: A

Text-to-Image Generation Basedon Contrastive Learning

ZHOU Gang, LI Handong, CHEN Yeye

(School of Electrical Engineering, Guizhou University, Guiyang 550025, China)
1101808591@qq.com; 470394668@qq.com; zgsrkl@126.com

Abstract: When conducting experiments on the CUB dataset with high semantic relevance and multi-object text generated images, it was found that many generated bird images contained instances of "multiple heads" and "multiple feet". To optimize image generation, this paper proposes to enhance the MA-GAN (Multi-stage Attention Mechanism Generative Adversarial Network) model with contrastive learning. Additionally, a feature interpolation method is used to enhance certain image features, thereby improving semantic consistency and text recognition. Experiments on the CUB and COCO datasets verify that that the improved model increases the Inception Score (IS) by 0.11 and 2.58, respectively, and the R-precision (R score) by 1.98 and 1.37, respectively. This demonstrates that the modified model effectively addresses the issues of image quality and semantic consistency.

Keywords: text-to-image generation contrastive learning text feature representation feature interpolation

用微信扫一扫