학술논문

结肠镜下计算机辅助息肉检测系统的构建 / Construction of a computer-assisted polyp detection system under colonoscopy
Document Type
Academic Journal
Source
中华消化杂志 / Chinese Journal of Digestion. 38(7):473-478
Subject
结肠镜
计算机辅助息肉检测
分类评价指标
目标检测评价指标
Colonoscopes
Computer-assisted polyp detection
Classification evaluation indexes
Target detection evaluation indexes
Language
Chinese
ISSN
0254-1432
Abstract
目的 构建结肠镜下计算机辅助息肉检测系统,并初步验证其有效性.方法 采用基于Faster R-CNN算法,基于开源框架TensorFlow和Faster R-CNN的开源实现,构建结肠镜下计算机辅助息肉检测系统.按训练集大小和难度设置5个试验组:试验1、2、3、4分别含1 000、2 000、4 000、6 000个训练样本,试验5在6 000个训练样本的基础上增加选取难样本的概率.在不同训练集下,计算该系统检测息肉的灵敏度、特异度等分类评价指标及召回率、精确率等目标检测评价指标.结果 分类评价指标显示,试验1、2、3、4、5的灵敏度分别为90.1%、93.3%、93.3%、93.3%、93.5%,差异有统计学意义(x2=25.324,P<0.01),试验2、3、4、5的灵敏度均高于试验1,差异均有统计学意义(x2=13.964、13.508、13.508、13.386,P均<0.006 25).各试验组特异度和阳性预测值的差异均无统计学意义(P均>0.05).试验1、2、3、4、5阴性预测值分别为90.4%、93.3%、93.3%、93.3%、93.5%,差异有统计学意义(x2=21.862,P<0.01),试验2、3、4、5的阴性预测值均高于试验1,差异均有统计学意义(x2=11.447、11.564、11.755、13.760,P均<0.006 25).训练样本量从1 000增加至2 000时,AUC值提升了2%,进一步增加样本至6 000,AUC值提升幅度<1%,此时保持样本量不变而增加难样本的比例,AUC值又提升了0.4%.目标检测评价指标显示,各试验组召回率分别为73.6%、79.8%、79.5%、79.8%、83.3%,差异有统计学意义(x2=71.936,P<0.01),其中试验2、3、4的召回率均高于试验1,差异均有统计学意义(x2 =25.960、23.492、25.960,P均<0.006 25),试验5的召回率高于试验1、2、3、4,差异均有统计学意义(x2=67.361、9.899、11.527、9.899,P均<0.006 25).试验1、2、3、4、5的精确率分别为87.9%、85.3%、90.2%、91.4%、89.2%,差异有统计学意义(x2=48.194,P<0.01),其中试验3、5的精确率均高于试验2,差异均有统计学意义(x2=24.508、15.223,P均<0.006 25),试验4的精确率高于试验1、2,差异均有统计学意义(x2=13.524、39.120,P均<0.006 25).随着样本数量增多和训练难度加大,对应的F1分数和平均精度均值均稳步上升.结论 本研究初步构建了结肠镜下计算机辅助息肉检测系统,目前灵敏度最高可达93.5%,召回率最高可达83.3%.增大训练集可一定程度上提升息肉检测成绩,但会到达一个瓶颈,此时适当增加训练难度,可进一步提升检测成绩,尤其是召回率.
Objective To set up a computer-assisted polyp detection system under colonoscopy,and to preliminarily verify its effectiveness.Methods Based on Faster R-CNN algorithm and the open source implementation of the open source framework tensorflow and Faster R-CNN,a computer-assisted polyp detection system under colonoscopy was constructed.According to the size and difficulty of the training set,five test groups were set up:test group one,two,three and four contained 1 000,2 000,4 000 and 6 000 training samples,respectively.Test group five increased the probability of selecting the difficult samples based on 6 000 training samples.In different training sets,the sensitivity,specificity,other classification evaluation parameters,and the evaluation parameters of target detection such as recall and precision of this polyps detection system were calculated.Results Classification evaluation parameters showed that the sensitivities of test group one,two,three,four and five were 90.1%,93.3%,93.3%,93.3 % and 93.5 %,respectively,and the difference was statistically significant (x2 =25.324,P<0.01).The sensitivities of test group two,three,four and five were all higher than that of test group one,and the differences were statistically significant (x2 =13.964,13.508,13.508 and 13.386,all P< 0.006 25).There were no significant differences in specificity and positive predictive value among test groups (both P>0.05).The negative predictive values of test group one,two,three,four and five were 90.4%,93.3%,93.3%,93.3% and 93.5%,respectively,and the differences were statistically significant (x2 =21.862,P<0.01).The negative predictive values of test group two,three,four and five were higher than that of test group one,and the differences were statistically significant (x2=11.447,11.564,11.755,13.760;all P<0.006 25).As the training sample size increased from 1 000 to 2 000,the area under curve (AUC) increased by 2%,and further increased the sample size to 6 000,AUC increased by less than 1 %.At this point maintaining the same sample size while increasing the proportion of difficult samples,AUC increased by 0.4%.The results of evaluation parameters of target detection showed that the recall rate of each test group was 73.6%,79.8%,79.5%,79.8% and 83.3%,respectively,and the differences were statistically significant (x2 =71.936,P<0.01).Among them,the recall rates of test group two,three and four were higher than that of test group one,and the differences were statistically significant (x2 =25.960,23.492 and 25.960,all P<0.006 25),and the recall rate of test group five was higher than those of test group one,two,three and four,and the differences were statistically significant (x2=67.361,9.899,11.527 and 9.899;all P<0.006 25).In addition,the precision rates of test group one,two,three,four and five were 87.9%,85.3%,90.2%,91.4% and 89.2%,respectively,and the difference was statistically significant (x2=48.194,P<0.01).The precision rates of test group three and five were higher than that of test group two,and the differences were statistically significant (x2 =24.508 and 15.223,both P<0.006 25),and the precision rate of test group four was higher than those of test group one and two,and the differences were statistically significant (x2=13.524 and 39.120,both P<0.006 25).As samples size and training difficulty increased,the values of F1-score and mean average precision increased steadily.Conclusions This study initially constructed a computer-assisted polyp detection system under colonoscopy.Currently the maximum sensitivity reached 93.5%,and the maximum recall rate reached 83.3%.Increasing the training set size may improve the polyp detection result to a certain degree,however it will reach a bottleneck.At this time,increasing the training difficulty can further improve the detection scores,especially the recall rate.