生成式人工智能模型在回答慢性肾脏病相关问题中的性能评价

郑丽; 付长海; 彭旭东; 黄厚源; 宋江曼; 杨松涛

文章摘要

郑丽,付长海,彭旭东,等.生成式人工智能模型在回答慢性肾脏病相关问题中的性能评价[J].中国临床保健杂志,2026,29(2):231-236.

生成式人工智能模型在回答慢性肾脏病相关问题中的性能评价

Performance evaluation of generative artificial intelligence model in responding to chronic kidney disease-related queries

投稿时间：2025-11-10

DOI：10.3969/J.issn.1672-6790.2026.02.017

中文关键词: 肾疾病健康教育生成式人工智能模型结果可重复性结果评价,卫生保健

英文关键词: Kidney diseases Health education Generative artificial intelligence model Reproducibility of results Outcome assessment,health care Fund programs:Capital Health Development Research Special Project

基金项目:首都卫生发展科研专项

作者	单位	E-mail
郑丽	北京通用航天医院药学部,北京 100074
付长海	北京通用航天医院肾内科,北京 100074
彭旭东	联合参谋部警卫局卫生保健处,北京 100017
黄厚源	联合参谋部警卫局卫生保健处,北京 100017
宋江曼	北京通用航天医院全科医疗科,北京 100074
杨松涛	北京通用航天医院肾内科,北京 100074	songtaoyang@aliyun.com

摘要点击次数: 22

全文下载次数: 30

中文摘要:

目的评估中文生成式人工智能(GAI)模型(DeepSeek-R1)回答慢性肾脏病相关问题的准确性与可靠性,为优化患者教育策略提供依据。方法回顾来自社交媒体平台、互联网以及门诊患者自述的与慢性肾脏病相关的常见问题。采用DeepSeek-R1模型回答问题,由3名资深肾内科医生按五级评分体系(1~5分)评估答案,并测试模型回答的可重复性。结果对101个关于慢性肾脏病的常见问题进行检查,最终纳入70个问题。其中60个问题回答完全准确得1分,答案为完全不准确的发生率为0；5个问题回答准确但不全面得2分；4个问题得3分；1个问题得4分。重复问题的答案相似率在86%~100%之间。结论中文GAI模型对慢性肾脏病相关的问诊提供了较为准确的应答,且具有较高的可重复性。

英文摘要:

Objective To evaluate the accuracy and reliability of a Chinese generative artificial intelligence (GAI) model (DeepSeek-R1) in addressing chronic kidney disease (CKD)-related queries,thereby informing optimised patient education strategies.Methods Common CKD-related questions were systematically collated from social media platforms,online sources,and self-reported outpatient queries.The DeepSeek-R1 model was employed to generate responses,which were independently assessed by three senior nephrology specialists using a five-point rating system (1=fully accurate;5=completely inaccurate).Model response reproducibility was also evaluated.Results From 101 CKD-related questions screened,70 met inclusion criteria.Of these:60 responses were rated fully accurate (score 1),5 received a score of 2,4 scored 3,1 scored 4,no responses scored 5.Reproducibility of repeated responses ranged between 86% and 100% across question categories.Conclusions The Chinese GAI model provides relatively accurate responses to inquirie related to chronic kidney disease,and exhibits high reproducibility.

查看全文

关闭