• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:王怡茹,郑建立,周浩然.基于PubMedBERT预训练模型的医学术语对齐方法研究[J].软件工程,2023,26(11):39-42.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于PubMedBERT预训练模型的医学术语对齐方法研究
王怡茹, 郑建立, 周浩然
(上海理工大学健康科学与工程学院, 上海 200093)
e_wangyiru@163.com; zhengjianli163@163.com; zhouhaoran1908@163.com
摘 要: 随着互联网大健康数字化时代的到来,健康数据海量增长,为解决医疗数据集成应用中异构数据的术语标准化问题,提出一种利用PubMedBERT计算语义相似度实现医学术语对齐的技术。使用特定医学领域预训练模型,结合缩略词扩展方法增强语义信息,并与传统相似度计算模型、BERT(Bidirectional Encoder Representations from Transformers)及其变体相比较。在测试语料上的实验表明,缩略词扩展后PubMedBERT预训练模型TOP1的准确率提高了18.79%,PubMedBERT 模型TOP1、TOP3、TOP5、TOP10的准确率分别达到78.49%、85.69%、87.44%、89.54%,优于其他对比模型。该方法可以为医学术语对齐工作提供一种智能化的解决方案。
关键词: 语义相似度;术语对齐;缩略词扩展;PubMedBERT
中图分类号: TP391.1    文献标识码: A
Research on Medical Term Alignment Method Based on PubMedBERT Pre-training Mode
WANG Yiru, ZHENG Jianli, ZHOU Haoran
(School of Health Science and Engineering, University of Shanghai f or Science and Technology, Shanghai 200093, China)
e_wangyiru@163.com; zhengjianli163@163.com; zhouhaoran1908@163.com
Abstract: In the context of digital era of Internet health, there is a massive growth of health data. In order to solve the problem of terminology standardization for heterogeneous data in medical data integration applications, a technology using PubMedBERT to calculate semantic similarity to achieve medical terminology alignment is proposed. This technology uses pre-trained models in specific medical fields, and enhances semantic information with abbreviation expansion methods. Then it is compared with traditional similarity calculation models, BERT (Bidirectional Encoder Representations from Transformers), and their variants. The experiment on the test corpus shows that the accuracy of PubMedBERT pre-trained model TOP1 has improved by 18.79% after abbreviation expansion, and the accuracy of PubMedBERT models TOP1, TOP3, TOP5, and TOP10 reaches 78.49% , 85.69% , 87.44% , and 89.54% , respectively, which is superior to other comparative models. This method can provide an intelligent solution for medical terminology alignment work.
Keywords: semantic similarity; term alignment; expansion of abbreviations; PubMedBERT


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫