Summary

Assessing 4 artificial intelligence systems’ knowledge of a subspecialty of emergency medicine: clinical toxicology

Santiago Nogué-Xarau1, Montserrat Amigó-Tadín2, José Ríos-Guillermo3

Affiliation of the authors

1Fundación Española de Toxicología Clínica, Barcelona, Spain. 2Área de Urgencias, Hospital Clínic, Barcelona, Spain. 3Departamento de Farmacología Clínica, Hospital Clínic, Barcelona, Spain.

DOI

Quote

Nogué-Xarau S, Amigó-Tadín M, Ríos-Guillermo J. Assessing 4 artificial intelligence systems’ knowledge of a subspecialty of emergency medicine: clinical toxicology. Rev Esp Urg Emerg. 2024;3:15–9

Summary

BACKGROUND AND OBJECTIVE. Artificial intelligence (AI) is a branch of computer technology that develops systems able to perform tasks associated with human intelligence. The main objective of this study was to evaluate AI answers to questions related to clinical toxicology.
MATERIALS AND METHODS. We evaluated 4 AI applications: ChatGPT, Bing, LuzIA, and Bard. Thirty multiple-choice test questions in Spanish about various aspects of clinical toxicology were presented to the applications, and the answers were assessed. Each question included 5 possible answers, 1 of which was correct. In addition to correctness, we evaluated the bibliographic support each application provided. If the application gave an incorrect answer, we rephrased the question, presented it again, and reevaluated the new answer to detect whether question quality influenced performance. Data were recorded for analysis with SPSS. The level of statistical significance was set at P < .05.
RESULTS. The scores achieved by the AI applications were as follows: Bing, 70%; ChatGPT and LuzIA, 67% each; and Bard, 57% (P > .05). The scores improved after the incorrect questions were rephrased, but the differences were not significant. Bing included direct access to 3 references per question and Bard to 4. However, only 7.2% and 0.85% of the references, respectively, were to PubMed-indexed sources.
CONCLUSIONS. All 4 AI applications were able to correctly answer more than half the questions about clinical toxicology. After rephrasing some questions, each system achieved more correct answers. The supporting references the applications provided were few and of poor quality.

 

More articles by the authors

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.