Introduction

This blog is about medical education in the US and around the world. My interest is in education research and the process of medical education.



The lawyers have asked that I add a disclaimer that makes it clear that these are my personal opinions and do not represent any position of any University that I am affiliated with including the American University of the Caribbean, the University of Kansas, the KU School of Medicine, Florida International University, or the FIU School of Medicine. Nor does any of this represent any position of the Northeast Georgia Medical Center or Northeast Georgia Health System.



Wednesday, July 23, 2025

AI, Superintelligence, and the Future of Clinical Reasoning

 AI, Superintelligence, and the Future of Clinical Reasoning

By John E. Delzell Jr., MD, MSPH, MBA, FAAFP

In medical education, we often talk about transformation. Competency-based education, interprofessional learning, simulation, and evidence-based practice have all changed how we prepare the next generation of physicians. But the greatest transformation still on the horizon—and it’s being driven by artificial intelligence (AI).

Over the past few years, AI has moved rapidly from theoretical potential to practical application. Large language models (LLMs) like GPT-4 have already demonstrated the ability to pass the USMLE, summarize research articles, and assist in clinical decision-making. As stated by King and Nori (1), “reducing medicine to one-shot answers on multiple-choice questions, such (USMLE) benchmarks overstate the apparent competence of AI systems and obscure their limitations.” Even so it is clear that we are entering a new era where AI may not just assist doctors—it will outperform them in specific domains. 

What does this mean for medical education? To answer that, we have to look at where the technology is headed and how it intersects with human learning, reasoning, and clinical judgment.

The Promise and Peril of Medical Superintelligence

In June 2025, Microsoft and OpenAI (1)  released a visionary statement on the trajectory toward “medical superintelligence”—a form of AI that not only surpasses human performance on standardized medical benchmarks but demonstrates generalizable clinical reasoning across specialties. Their goal is to build a system that operates with “generalist-like” breadth and “specialist-level” depth, grounded in real-time reasoning and safety. This is not science fiction.

In a recent preprint (2), OpenAI researchers presented GPT-4-MED, an experimental model fine-tuned specifically on clinical data. The team created a set of 304 digital clinical cases that came from the New England Journal of Medicine clinicopathological conference (NEJM-CPC) cases. (2) The cases are stepwise diagnostic encounters where physicians can iteratively ask questions and order tests. As new information becomes available, the physician updates their reasoning, eventually narrowing towards their best final diagnosis. The final diagnosis can then be compared to the “correct” diagnosis which was published in the journal. When I was a student and resident, I loved reading these. I almost never got the correct diagnosis, but I learned a lot from the process. 

In structured evaluations, the new AI model outperformed existing models on dozens of medical tasks, from radiology interpretation to treatment planning. And importantly, when tested against human physician performance on the NEJM cases, it demonstrated reasoning that mimics human diagnostic thinking: considering differential diagnoses, weighing risks and benefits, and accounting for uncertainty.

This level of performance hints at the possibility of AI not only augmenting care but becoming a form of clinical intelligence in its own right. In short, we are no longer talking about tools. We are talking about future colleagues.

Cognitive Load, Expertise, and What Makes a Good Doctor

If AI systems can increasingly perform tasks once limited to trained physicians, what remains uniquely human in the physician role? One answer lies in how we process complexity.

A 2022 paper by Tschandl et al (3) explored how AI and human physicians interact in diagnostic decision-making. Their findings are fascinating: when AI is presented as a peer or assistant, it improves physician accuracy; but when it is given too much credibility (ie: treated as an oracle), physicians defer too quickly, losing the benefits of independent judgment. In essence, the relationship between humans and AI is dynamic—shaped by trust, communication, and cognitive calibration.

This has major implications for education. Medical students and residents must not only learn the traditional content of medicine; they must learn how to work with AI systems—to question, validate, and contextualize recommendations. That means we must teach not just clinical knowledge but metacognition: the ability to understand how we think, and how machines think differently.

And we must recognize that human expertise is not obsolete. As Microsoft notes in its roadmap to superintelligence, there are still many domains where AI falls short—especially in interpreting nuance, assessing values, and navigating ethical complexity.(1) These are precisely the areas where medical educators must continue to lead.

A New Role for the Medical Educator

So how should medical educators respond?

First, we must integrate AI literacy into the curriculum. Just as we teach evidence-based medicine, we now need to teach “AI-based medicine.” Students should understand how these models are trained, what their limitations are, and how to critically appraise their output. This isn’t just informatics—it’s foundational clinical reasoning in the 21st century.

Second, we need to reimagine assessment. Traditional exams measure knowledge recall and algorithmic thinking. But AI can now generate textbook answers on command. Instead, we should assess higher-order skills: contextual judgment, empathy, shared decision-making, and the ability to synthesize information across disciplines. We are not trying to train machines—we are trying to train humans to be the kind of doctors AI can’t be.

Third, we must prepare for a changing scope of practice. As AI takes on more diagnostic and administrative tasks, physicians may find themselves able to focus more on the human aspects of care—narrative, empathy, ethics, and meaning. This is not a diminishment of the physician’s role. It is a refinement. We are moving from being knowledge providers to wisdom facilitators.

The Human-AI Team

One of the most powerful concepts in Microsoft’s vision is the idea of the human-AI team. This is not about replacing doctors with algorithms. It’s about creating a partnership where each party brings unique strengths. AI can process terabytes of data, recognize subtle patterns, and recall every guideline ever published. Humans can listen, connect, and weigh values in the face of uncertainty.

As educators, we must train our learners to be effective members of this team. That means not just accepting AI, but shaping it—participating in its development, informing its design, and advocating for systems that reflect the realities of clinical care. This will not be easy. There will be challenges around bias, privacy, overreliance, and professional identity. But the alternative—ignoring these changes or resisting them—is no longer tenable.

Looking Ahead

Medical education is entering a new frontier. In the coming years, we will need to train learners who are not only competent clinicians, but also agile learners, critical thinkers, and collaborative partners with AI.

This is not the end of the physician. It is the beginning of a new kind of doctor—one who uses technology not as a crutch, but as an amplifier of what makes us human.

And that, to me, is the real promise of medical superintelligence: not that it will replace us, but that it will help us become even better at what we are meant to do—care for people in all their complexity.


References

(1) King D, Nori H. “The path to medical superintelligence”  Microsoft AI. Accessed 7/8/25. Retrieved from https://microsoft.ai/new/the-path-to-medical-superintelligence 

(2) Nori H, Daswani M, Kelly C, Lundberg S, et al. Sequential Diagnosis with Language Models. Cornell University arXiv.  Accessed 7/7/25. Retrieved from https://arxiv.org/abs/2506.22405 

(3) Tschandl P, Rinner C, Apalla Z. et al. Human–computer collaboration for skin cancer recognition. Nat Med 2020; 26: 1229–1234. https://doi.org/10.1038/s41591-020-0942-0


No comments:

Post a Comment