학술논문

Exploring the Possible Use of AI Chatbots in Public Health Education: Feasibility Study
Document Type
article
Source
JMIR Medical Education, Vol 9, p e51421 (2023)
Subject
Special aspects of education
LC8-6691
Medicine (General)
R5-920
Language
English
ISSN
2369-3762
Abstract
BackgroundArtificial intelligence (AI) is a rapidly developing field with the potential to transform various aspects of health care and public health, including medical training. During the “Hygiene and Public Health” course for fifth-year medical students, a practical training session was conducted on vaccination using AI chatbots as an educational supportive tool. Before receiving specific training on vaccination, the students were given a web-based test extracted from the Italian National Medical Residency Test. After completing the test, a critical correction of each question was performed assisted by AI chatbots. ObjectiveThe main aim of this study was to identify whether AI chatbots can be considered educational support tools for training in public health. The secondary objective was to assess the performance of different AI chatbots on complex multiple-choice medical questions in the Italian language. MethodsA test composed of 15 multiple-choice questions on vaccination was extracted from the Italian National Medical Residency Test using targeted keywords and administered to medical students via Google Forms and to different AI chatbot models (Bing Chat, ChatGPT, Chatsonic, Google Bard, and YouChat). The correction of the test was conducted in the classroom, focusing on the critical evaluation of the explanations provided by the chatbot. A Mann-Whitney U test was conducted to compare the performances of medical students and AI chatbots. Student feedback was collected anonymously at the end of the training experience. ResultsIn total, 36 medical students and 5 AI chatbot models completed the test. The students achieved an average score of 8.22 (SD 2.65) out of 15, while the AI chatbots scored an average of 12.22 (SD 2.77). The results indicated a statistically significant difference in performance between the 2 groups (U=49.5, P