• 首页
  • 期刊简介
  • 编委会
  • 投稿指南
  • 收录情况
  • 杂志订阅
  • 联系我们
引用本文:杨金瑞,刘 继.基于网格的半监督密度峰值聚类算法[J].软件工程,2024,27(5):1-6.【点击复制】
【打印本页】   【下载PDF全文】   【查看/发表评论】  【下载PDF阅读器】  
←前一篇|后一篇→ 过刊浏览
分享到: 微信 更多
基于网格的半监督密度峰值聚类算法
杨金瑞1, 刘 继1,2
(1.新疆财经大学统计与数据科学学院, 新疆 乌鲁木齐 830012;
2.新疆财经大学新疆社会经济统计与大数据中心, 新疆 乌鲁木齐 830012)
1519188386@qq.com; Liuji5000@126.com
摘 要: 为了有效利用已知信息快速地进行数据聚类,提出了一种基于网格的半监督密度峰值聚类(GS-DPC)算法。利用统计信息网格对数据集进行划分,将落在网格内数据点的个数作为局部密度值,计算出每一个网格代表点;根据局部密度值和相对距离值确定聚类中心;利用成对约束集指导聚类过程后得到聚类结果。实验结果表明,GS-DPC算法进行数据聚类算法的平均消耗时间比密度峰值聚类算法(DPC)降低32百分点;GS-DPC算法在6个数据集上的平均精确度(ACC)约为0.84,平均调整互信息(AMI)约为0.68,平均调整兰德系数(ARI)约为0.67,因此GS-DPC算法可以快速且有效地进行数据聚类并获得较好的聚类结果。
关键词: 密度峰值聚类;网格;半监督;STING;成对约束
中图分类号: TP399    文献标识码: A
A Grid-based Semi-supervised Density Peak Clustering Algorithm
YANG Jinrui1, LIU Ji1,2
(1.School of Statistics & Data Science, Xinjiang University of Finance & Economics, Urumqi 830012, China;
2.Xinjiang Social & Economic Statistics & Big Data Application Research Center, Xinjiang University of Finance & Economics, Urumqi 830012, China)
1519188386@qq.com; Liuji5000@126.com
Abstract: In order to efficiently cluster data using known information, a Grid-based Semi-supervised Density Peak Clustering (GS-DPC) algorithm is proposed. The algorithm divides the dataset using statistical information grids, with the number of data points within each grid serving as the local density value to calculate a representative point for each grid. Clustering centers are determined based on local density values and relative distance values, and clustering results are obtained after guiding the clustering process using a pairwise constraint set. Experimental results show that the average time consumption of the GS-DPC algorithm for data clustering is 32 percentage points lower than that of the density peak clustering algorithm (DPC). The GS-DPC algorithm achieves an average accuracy (ACC) of about 0.84, an average Adjusted Mutual Information (AMI) of about 0.68, and an average Adjusted Rand Index (ARI) of about 0.67 on six datasets, demonstrating that it can efficiently and effectively cluster data while obtaining good clustering results.
Keywords: density peak clustering; grid; semi-supervised; STING; pairwise constraint


版权所有:软件工程杂志社
地址:辽宁省沈阳市浑南区新秀街2号 邮政编码:110179
电话:0411-84767887 传真:0411-84835089 Email:semagazine@neusoft.edu.cn
备案号:辽ICP备17007376号-1
技术支持:北京勤云科技发展有限公司

用微信扫一扫

用微信扫一扫