Introduction

This blog is about medical education in the US and around the world. My interest is in education research and the process of medical education.



The lawyers have asked that I add a disclaimer that makes it clear that these are my personal opinions and do not represent any position of any University that I am affiliated with including the American University of the Caribbean, the University of Kansas, the KU School of Medicine, Florida International University, or the FIU School of Medicine. Nor does any of this represent any position of the Northeast Georgia Medical Center or Northeast Georgia Health System.



Wednesday, July 23, 2025

AI, Superintelligence, and the Future of Clinical Reasoning

 AI, Superintelligence, and the Future of Clinical Reasoning

By John E. Delzell Jr., MD, MSPH, MBA, FAAFP

In medical education, we often talk about transformation. Competency-based education, interprofessional learning, simulation, and evidence-based practice have all changed how we prepare the next generation of physicians. But the greatest transformation still on the horizon—and it’s being driven by artificial intelligence (AI).

Over the past few years, AI has moved rapidly from theoretical potential to practical application. Large language models (LLMs) like GPT-4 have already demonstrated the ability to pass the USMLE, summarize research articles, and assist in clinical decision-making. As stated by King and Nori (1), “reducing medicine to one-shot answers on multiple-choice questions, such (USMLE) benchmarks overstate the apparent competence of AI systems and obscure their limitations.” Even so it is clear that we are entering a new era where AI may not just assist doctors—it will outperform them in specific domains. 

What does this mean for medical education? To answer that, we have to look at where the technology is headed and how it intersects with human learning, reasoning, and clinical judgment.

The Promise and Peril of Medical Superintelligence

In June 2025, Microsoft and OpenAI (1)  released a visionary statement on the trajectory toward “medical superintelligence”—a form of AI that not only surpasses human performance on standardized medical benchmarks but demonstrates generalizable clinical reasoning across specialties. Their goal is to build a system that operates with “generalist-like” breadth and “specialist-level” depth, grounded in real-time reasoning and safety. This is not science fiction.

In a recent preprint (2), OpenAI researchers presented GPT-4-MED, an experimental model fine-tuned specifically on clinical data. The team created a set of 304 digital clinical cases that came from the New England Journal of Medicine clinicopathological conference (NEJM-CPC) cases. (2) The cases are stepwise diagnostic encounters where physicians can iteratively ask questions and order tests. As new information becomes available, the physician updates their reasoning, eventually narrowing towards their best final diagnosis. The final diagnosis can then be compared to the “correct” diagnosis which was published in the journal. When I was a student and resident, I loved reading these. I almost never got the correct diagnosis, but I learned a lot from the process. 

In structured evaluations, the new AI model outperformed existing models on dozens of medical tasks, from radiology interpretation to treatment planning. And importantly, when tested against human physician performance on the NEJM cases, it demonstrated reasoning that mimics human diagnostic thinking: considering differential diagnoses, weighing risks and benefits, and accounting for uncertainty.

This level of performance hints at the possibility of AI not only augmenting care but becoming a form of clinical intelligence in its own right. In short, we are no longer talking about tools. We are talking about future colleagues.

Cognitive Load, Expertise, and What Makes a Good Doctor

If AI systems can increasingly perform tasks once limited to trained physicians, what remains uniquely human in the physician role? One answer lies in how we process complexity.

A 2022 paper by Tschandl et al (3) explored how AI and human physicians interact in diagnostic decision-making. Their findings are fascinating: when AI is presented as a peer or assistant, it improves physician accuracy; but when it is given too much credibility (ie: treated as an oracle), physicians defer too quickly, losing the benefits of independent judgment. In essence, the relationship between humans and AI is dynamic—shaped by trust, communication, and cognitive calibration.

This has major implications for education. Medical students and residents must not only learn the traditional content of medicine; they must learn how to work with AI systems—to question, validate, and contextualize recommendations. That means we must teach not just clinical knowledge but metacognition: the ability to understand how we think, and how machines think differently.

And we must recognize that human expertise is not obsolete. As Microsoft notes in its roadmap to superintelligence, there are still many domains where AI falls short—especially in interpreting nuance, assessing values, and navigating ethical complexity.(1) These are precisely the areas where medical educators must continue to lead.

A New Role for the Medical Educator

So how should medical educators respond?

First, we must integrate AI literacy into the curriculum. Just as we teach evidence-based medicine, we now need to teach “AI-based medicine.” Students should understand how these models are trained, what their limitations are, and how to critically appraise their output. This isn’t just informatics—it’s foundational clinical reasoning in the 21st century.

Second, we need to reimagine assessment. Traditional exams measure knowledge recall and algorithmic thinking. But AI can now generate textbook answers on command. Instead, we should assess higher-order skills: contextual judgment, empathy, shared decision-making, and the ability to synthesize information across disciplines. We are not trying to train machines—we are trying to train humans to be the kind of doctors AI can’t be.

Third, we must prepare for a changing scope of practice. As AI takes on more diagnostic and administrative tasks, physicians may find themselves able to focus more on the human aspects of care—narrative, empathy, ethics, and meaning. This is not a diminishment of the physician’s role. It is a refinement. We are moving from being knowledge providers to wisdom facilitators.

The Human-AI Team

One of the most powerful concepts in Microsoft’s vision is the idea of the human-AI team. This is not about replacing doctors with algorithms. It’s about creating a partnership where each party brings unique strengths. AI can process terabytes of data, recognize subtle patterns, and recall every guideline ever published. Humans can listen, connect, and weigh values in the face of uncertainty.

As educators, we must train our learners to be effective members of this team. That means not just accepting AI, but shaping it—participating in its development, informing its design, and advocating for systems that reflect the realities of clinical care. This will not be easy. There will be challenges around bias, privacy, overreliance, and professional identity. But the alternative—ignoring these changes or resisting them—is no longer tenable.

Looking Ahead

Medical education is entering a new frontier. In the coming years, we will need to train learners who are not only competent clinicians, but also agile learners, critical thinkers, and collaborative partners with AI.

This is not the end of the physician. It is the beginning of a new kind of doctor—one who uses technology not as a crutch, but as an amplifier of what makes us human.

And that, to me, is the real promise of medical superintelligence: not that it will replace us, but that it will help us become even better at what we are meant to do—care for people in all their complexity.


References

(1) King D, Nori H. “The path to medical superintelligence”  Microsoft AI. Accessed 7/8/25. Retrieved from https://microsoft.ai/new/the-path-to-medical-superintelligence 

(2) Nori H, Daswani M, Kelly C, Lundberg S, et al. Sequential Diagnosis with Language Models. Cornell University arXiv.  Accessed 7/7/25. Retrieved from https://arxiv.org/abs/2506.22405 

(3) Tschandl P, Rinner C, Apalla Z. et al. Human–computer collaboration for skin cancer recognition. Nat Med 2020; 26: 1229–1234. https://doi.org/10.1038/s41591-020-0942-0


Monday, July 14, 2025

Running the Distance: Joy, Risk, and Why I Keep Lacing Up

John E. Delzell Jr., MD, MSPH, MBA, FAAFP

I still remember the first time I crossed the finish line of a marathon.

It was hot. We were in Orlando (the Disney Marathon). My legs were toast. The crowds were cheering. I definitely cried in those final few meters. Finishing 26.2 miles doesn’t just test your body. It tests your commitment, your mind, your pain threshold, and sometimes your relationship with your toenails.

I’ve run a lot of races since then. Some were fast. Some were slow. Some were surprisingly fun. Others… let’s just say I was glad they ended. But each one taught me something—not just about pacing or hydration, but about myself. About resilience. About joy. About being present in motion.

So, when I recently came across two very different—but equally important—articles on running, I felt compelled to dig a little deeper.

Why We Run (And Why I Still Do)

Let’s start with the why. In 2021, Hugo Vieira Pereira and colleagues (1) published a systematic review in Frontiers in Psychology that asked a simple but profound question: What drives people to run for fun?

Not surprisingly, it’s not about weight loss or physical health—although those show up plenty. They found that psychological and behavioral factors play just as large a role, especially for recreational runners. Things like stress relief, mood elevation, a sense of achievement, or just the pure enjoyment of the run itself. I do enjoy the bling, nothing like posting that picture of your finisher medal on your favorite social media site, but there is much more to the joy of running than a medal. Interestingly, runners with more experience tend to internalize the joy—shifting away from extrinsic motivations (like medals or fitness) toward more intrinsic ones (like emotional well-being or identity).

I get that. Running has long been a reset button. It’s where I process tough days, pray, think, unwind. It’s where I go when I need space, and oddly enough, also where I go when I need community. The running community is incredibly supportive. Long runs with friends have a way of cutting through small talk. You learn a lot about someone at mile 16.

The Pereira review also highlights how consistent runners tend to have high self-regulation skills—planning, goal-setting, time management, and the ability to push through discomfort. That sounds right. You don’t finish a marathon on motivation alone. You finish it because you ran all the invisible miles in the dark before sunrise, when no one was cheering.

The Hidden Risk No One Talks About on Race Day

But running isn’t all runner’s highs and finish-line photos. Every time I pin on a bib number, especially at marathons or halfs, I know I’m also assuming a small—but real—risk. And that brings us to the second article.  Published in 2025 in JAMA, the study by Kim et al (2) tackled the sobering topic of cardiac arrest during long-distance running races. The researchers reviewed over a decade’s worth of data and identified several critical insights:

- Cardiac arrest during organized races is really rare—occurring in about 1 per 100,000 participants—but considering the number of race participants (>23M) not negligible.
- Most cases occurred during marathons (not shorter distances), and more often near the end of the race.
- Interestingly, the incidence of cardiac arrest is stable (compared to 2000-2009) but there has been a significant decline in mortality
- Bystander CPR and the presence of automated external defibrillators (AEDs) significantly improved survival.

As a physician, I’ve always known running carries cardiovascular risk, especially if there’s underlying heart disease, electrolyte imbalances, or unrecognized genetic issues like hypertrophic cardiomyopathy. But reading this paper hit me a little differently—because it’s about my people. My tribe. Ordinary folks pushing themselves to extraordinary limits. As a runner, it reminded me that health screening and preparation matter—even when you’re “fit.” It’s easy to assume that crossing the start line means you're healthy enough. But racing is different from running. The adrenaline, the intensity, the heat, the dehydration—all of it combines into a stress test with real consequences.

Running Smarter, Running Longer

So how do I reconcile the joy of running with the risk it carries?  Honestly, it’s the same way I’ve practiced medicine for 30 years: with a clear-eyed look at the data and a respect for human experience.

First, I take precautions seriously. Regular checkups. Listening to my body. Hydration. Electrolytes. And yes, even slowing down when needed. No PR is worth collapsing for.

Second, I keep running for the same reasons that I started running. I’m not chasing record times anymore. I’m chasing clarity. Fellowship. Flow. Those long runs that leave your muscles sore but your spirit full.

And third, I encourage others, especially new runners, to train smart and listen to their body. Get checked out by your primary care doctor if you are over 40 and new to endurance sports. Don’t ignore chest discomfort, dizziness, or feeling “off” on race day. Carry ID. Know where the aid stations and AEDs are. Be the person who knows CPR.

The truth is, running can be one of the most powerful mental and physical health interventions we have—when done right.

My Finish Lines and What They Taught Me

Each marathon I’ve run has carried its own story. The one where it rained the whole time. The one where I cramped at mile 18. The one I ran with my best friend from high school cheering me at the finish line. Each race reminded me that finishing isn’t about being fast—it’s about being faithful to the training, the effort, the journey.

I’ve been lucky. I’ve stayed healthy, mostly. I’ve never DNF’d. But I’ve seen people collapse. In our first half marathon, a man collapsed and got bystander chest compressions on the course (he lived!). I’ve slowed down to walk someone to the medical tent. And I’ve always been thankful to cross the line—upright, tired, and deeply grateful.

Final Thoughts

Whether you’re finishing first or finishing last, there’s something sacred about committing your body and mind to something hard and seeing it through. Something human.  Running, like life, holds both joy and risk. We run to feel alive, to cope, to connect, to challenge ourselves. And while the road can be unpredictable—especially over 26.2 miles—it’s also where I’ve found some of my clearest moments.

So yes, I’ll keep lacing up. I’ll keep being smart. I’ll keep showing up.

The finish line may only last a few seconds, but the lessons from the road last a lifetime.

References

(1)  Pereira HV, et al. Systematic Review of Psychological and Behavioral Correlates of Recreational Running. Front. Psychol., 06 May 2021; Volume 12  https://doi.org/10.3389/fpsyg.2021.624783  

(2)  Kim JH, Rim AJ, Miller JT, et al. Cardiac Arrest During Long-Distance Running Races. JAMA. 2025;333(19):1699–1707. doi:10.1001/jama.2025.3026           

Friday, July 11, 2025

Revisiting Gender Bias in Learner Evaluations

 Revisiting Gender Bias in Learner Evaluations

If you’ve ever served as a program director, clerkship coordinator, or faculty evaluator in graduate medical education, you’ve likely wrestled with one of the most uncomfortable truths in our field: evaluation is never entirely objective. As much as we strive to be fair and evidence-based, our feedback—both quantitative and narrative—is filtered through a lens of human perception, shaped by culture, context, and yes, bias.

Two studies, published 21 years apart, can help us see just how persistent and nuanced those biases can be—especially around gender.

In 2004, my colleagues and I published a study in Medical Education titled “Evaluation of interns by senior residents and faculty: is there any difference?” (1) We were curious about how interns were assessed by two very different evaluator groups—senior residents and attending physicians. We found something interesting: the ratings given by residents were consistently higher than those from faculty. Senior faculty were surprisingly, significantly less likely to make negative comments. And more than that, the comments from senior residents were often more positive and personal. We speculated about why—perhaps residents were more empathic, closer to the intern experience, or more generous in peer evaluation.

But what we didn’t find in that study—and what medical educators are still working to unpack—was how factors like gender influence evaluations. We did not find any differences in the comments based on the gender of the evaluators, but the numbers were small enough that it was not clear if that had meaning.

That’s where a new article by Jessica Hane and colleagues in the Journal of Graduate Medical Education (2) makes a significant contribution.

The Gender Lens: a distortion or an added quality?

Hane et al. examined nearly 10,000 faculty evaluations of residents and fellows at a large academic medical center over five years. They looked at both numerical ratings and narrative comments, and they did something smart: they parsed out gender differences in both the evaluator and the evaluated. The findings? Gender disparities persist—but in subtle, revealing ways.

On average, women trainees received slightly lower numerical scores than their male counterparts, despite no evidence of performance differences. More strikingly, the language used in narrative comments showed clear patterns: male trainees were more likely to be described with competence-oriented language (“knowledgeable,” “confident,” “leader”), while women were more often praised for warmth and diligence (“caring,” “hard-working,” “team player”).

These aren’t new stereotypes, but their persistence in our evaluation systems is troubling. When we unconsciously associate men with competence and women with effort or empathy, we risk reinforcing old hierarchies. Even well-intentioned praise can pigeonhole trainees in ways that affect advancement, self-perception, and professional identity.

When bias feels like familiarity…

What’s particularly interesting is how these newer findings echo—and contrast with—what we saw back in 2004. Our study didn’t find any differences with gender specifically, but we did notice that evaluators closer to the front lines (senior residents) tended to focus more on relationships, encouragement, and potential. Faculty, and particularly senior faculty on the other hand, leaned toward more critical or objective assessments.

What happens, then, when those lenses intersect with gender? Are residents more likely to relate to and uplift women colleagues in the same way they uplift peers generally? Or does bias show up even in peer feedback, especially in high-stakes environments like residency? Hane’s study doesn’t fully answer that, but it opens the door for future research—and introspection.

The Myth of the “Objective” Evaluation

One of the biggest myths in medical education is that our evaluations are merit-based and free from bias. We put a lot of stock in numerical ratings, milestone checkboxes, and structured forms. But as both of these studies remind us, the numbers are only part of the story—and even they are shaped by deeper cultural narratives.

If you’ve ever read a stack of end-of-rotation evaluations, you know how much weight narrative comments can carry. One well-written paragraph can influence a Clinical Competency Committee discussion more than a dozen Likert-scale boxes. So when those comments are subtly gendered—when one resident is “sharp and assertive” and another is “kind and dependable”—we’re not just describing; we’re defining their potential.  And that’s a problem.

What Can We Do About It?

Fortunately, awareness is the first step to addressing bias, and there are concrete steps we can take. Here are a few that I think are worth highlighting:

1. Train faculty and residents on implicit bias in evaluations. The research is clear: we all carry unconscious biases. But bias awareness training—when done well—can reduce the influence of those biases, especially in high-stakes assessments.

2. Structure narrative feedback to reduce ambiguity. Ask evaluators to comment on specific competencies (e.g., clinical reasoning, professionalism, communication) rather than open-ended impressions. This can shift focus from personal attributes to observable behaviors.

3. Use language analysis tools to monitor patterns. Some residency programs are now using AI tools to scan applications for gendered language (3) and to look at letters of recommendation for concerning language (4). It’s not about punishing faculty—it’s about reflection and improvement.

4. Encourage multiple perspectives. A single evaluation can reflect a single bias. Triangulating feedback from residents, peers, patients, and faculty can provide a fuller, fairer picture of a learner’s strengths and areas for growth.

5. Revisit how we use evaluations in decisions. Promotion and remediation decisions should weigh context. A low rating from one evaluator might reflect bias more than performance. Committees need to be trained to interpret evaluations with a critical eye.

We’re All Still Learning

As someone who’s worked in medical education for decades, I can say with humility that I’ve probably written my fair share of biased evaluations. Not intentionally, but unavoidably. Like most educators, I want to be fair, supportive, and accurate—but we’re all products of our environments. Recognizing that is not an indictment. It’s an invitation.

The Hane study reminds us that even as our systems evolve, old habits linger. The Ringdahl, Delzell & Kruse study showed that who does the evaluating matters. Put those together, and the message is clear: we need to continuously examine how—and by whom—assessments are being made.

Because in the end, evaluations are not just about feedback. They’re about opportunity, identity, and trust. If we want our learning environments to be truly inclusive and equitable, then we have to be willing to see where our blind spots are—and do the hard work of correcting them.

References

(1) Ringdahl, E.N., Delzell, J.E. and Kruse, R.L. Evaluation of interns by senior residents and faculty: is there any difference? Medical Education 2004; 38: 646-651. https://doi.org/10.1111/j.1365-2929.2004.01832.x

(2) Hane J, Lee V, Zhou Y, Mustapha T, et al.  Examining Gender-Based Differences in Quantitative Ratings and Narrative Comments in Faculty Assessments by Residents and Fellows. J Grad Med Educ  2025; 17 (3): 338–346. doi: https://doi.org/10.4300/JGME-D-24-00627.1.   

(3) Sumner MD, Howell TC, Soto AL, Kaplan S, et al.  The Use of Artificial Intelligence in Residency Application Evaluation-A Scoping Review. J Grad Med Educ. 2025; 17 (3): 308-319. doi: 10.4300/JGME-D-24-00604.1. Epub 2025 Jun 16. PMID: 40529251; PMCID: PMC12169010.

(4) Sarraf D, Vasiliu V, Imberman B, Lindeman B. Use of artificial intelligence for gender bias analysis in letters of recommendation for general surgery residency candidates. Am J Surg. 2021; 222 (6): 1051-1059. doi: 10.1016/j.amjsurg.2021.09.034. Epub 2021 Oct 2. PMID: 34674847. 


Tuesday, July 8, 2025

Medical Student Reflection Exercises Created Using AI: Can We Tell the Difference, and Does It Matter?

 

Medical Student Reflection Exercises Created Using AI: Can We Tell the Difference, and Does It Matter?

If you’ve spent any time in medical education recently—whether in lectures, clinical supervision, or curriculum design—you’ve likely been a part of the growing conversation around student (and resident/fellow) use of generative AI. From drafting SOAP notes to summarizing journal articles, AI tools like ChatGPT are rapidly becoming ubiquitous. But now we’re seeing them show up in more personal activities such as reflective assignments. A new question has emerged: can educators really tell the difference between a student’s genuine reflection and something written by AI?

The recent article in Medical Education by Wraith et al (1) took a shot at this question. They conducted an elegant, slightly disconcerting study: faculty reviewers were asked to distinguish between reflective writing submitted by actual medical students and those generated by AI. The results? About as accurate as flipping a coin, maybe a little better. Accuracy was between 64% and 75%, regardless of the faculty member’s experience or confidence. They did seem to get better as they read more reflections.

I’ll admit, when I first read this, I had a visceral reaction. Something about the idea that we can’t tell what’s “real” from what’s machine-generated in a genre that is supposed to be deeply personal—reflective writing—felt jarring. Aren’t we trained to pick up on nuance, empathy, sincerity? But as I sat with it, I realized the issue goes much deeper than just our ability to “spot the fake.” It forces us to confront how we define authenticity, the purpose of reflection in medical education, and how we want to relate to the tools that are now part of our students’ daily workflows.

What Makes a Reflection Authentic?

We often emphasize reflection as a professional habit: a way to develop clinical insight, emotional intelligence, and lifelong learning. But much of that hinges on the assumption that the act of writing the reflection is what promotes growth. If a student bypasses that internal process and asks an AI to “write a reflection on breaking bad news to a patient,” I worry that the learning opportunity is lost.

But here’s the rub: the Wraith study didn’t test whether students were using AI to replace reflection or to aid it. It simply asked whether educators could tell the difference. And they could not do that reliably. This suggests that AI can replicate the tone, structure, and emotional cadence that we expect a medical student to provide in a reflective essay. That is both fascinating and problematic.

If AI can mimic reflective writing well enough to fool seasoned educators, then maybe it is time to reevaluate how we assess reflection in the first place. Are we grading sincerity? Emotional language? The presence of keywords like “empathy,” “growth,” or “uncertainty”? If we do not have a robust framework for evaluating whether reflection is actually happening—as an internal, cognitive-emotional process—then it shouldn’t surprise us that AI fake it by just checking the boxes.

Faculty Attitudes: Cautious Curiosity

Another recent study, this one in the Journal of Investigative Medicine by Cervantes et al (2), explored how medical educators are thinking about generative AI more broadly. They did a survey of 250 allopathic and osteopathic medical school faculty at Nova Southeastern University. Their results revealed a mix of excitement and unease. Most saw potential for improving education—particularly in the ability to conduct more efficient research, tutuoring, task automation, and increased content accessibility—but they were also deeply concerned about professionalism, academic integrity, removal of human interaction in important feedback, and overreliance on AI-generated content.

Interestingly, one of the biggest predictors of positive attitudes toward AI was prior use. Faculty who had experimented with ChatGPT or similar tools were more likely to see educational value and less likely to view it as a threat. That tracks with my own anecdotal experience: once people see what AI can do—and just as importantly, what it can’t do—they develop a more nuanced, measured perspective.

Still, the discomfort lingers. If students can generate polished reflections without deep thought, is the assignment still worth doing? Should we redesign reflective writing tasks to include oral defense or peer feedback? Or should we simply accept that AI will be part of the process and shift our focus toward cultivating meaningful inputs rather than fixating on outputs?

What about using AI-augmented reflection?

Let me propose a middle path. What if we reframe AI not as a threat to reflective writing, but as a catalyst? Imagine a student who types out some thoughts after a tough patient encounter, then asks an AI to help clarify or expand them. They read what the AI produces, agree with some parts, reject others, revise accordingly. The final product is stronger—not because AI did the work, but because it facilitated a richer internal dialogue.

That’s not cheating. That’s collaboration. And it’s arguably closer to how most of us write in real life—drafting, editing, bouncing ideas off others (human or machine). Of course, this assumes we teach students to use AI ethically and reflectively, which means we need to model that ourselves. Faculty development around AI literacy is no longer optional. We must move beyond fear-based policies and invest in practical training, guidelines, and conversations that encourage responsible use.

So, where do we go from here?

A few concrete steps seem worth considering:

1.      Redesign reflective assignments. Move beyond short essays. Try audio reflections, peer feedback, or structured prompts that emphasize personal growth over polished prose.

2.      Focus on process, not just product. Ask students to document how they engaged with the reflection—did they use AI? Did they discuss it with a peer or preceptor? Did it change their thinking?

3.      Embrace transparency. Normalize the use of AI in education and ask students to disclose when and how they used it. Make that part of the learning conversation from the beginning.

4.      Invest in AI literacy. Faculty need space and time to learn what these tools can and can’t do. The more familiar we are as faculty, the better we can guide our students.

5.      Stay curious. The technology isn’t going away. The sooner we stop wringing our hands and start asking deeper pedagogical questions, the better positioned we’ll be to adapt with purpose.

In the end, the real question isn’t “Can we tell if a reflection is AI-generated?” It’s “Are we creating learning environments where authentic reflection is valued, supported, and developed—whether or not AI is in the room?” 

If we can answer yes to that, then maybe it doesn’t matter so much who—or what—wrote the first draft.

References

(1)    Wraith C,  Carnegy A,  Brown C,  Baptista A,  Sam AH.  Can educators distinguish between medical student and generative AI-authored reflections? Med Educ.  2025; 1-8. doi:10.1111/medu.15750

(2)    Cervantes J, Smith B, Ramadoss T, D'Amario V, Shoja MM, Rajput V. Decoding medical educators' perceptions on generative artificial intelligence in medical education. J Invest Med. 2024; 72(7): 633-639. doi:10.1177/10815589241257215

Saturday, July 5, 2025

Reimagining Wellness Curricula: Lessons from the Data and the Frontlines

 

Reimagining Wellness Curricula: Lessons from the Data and the Frontlines

John E Delzell Jr MD MSPH MBA FAAFP

Residency is a crucible. It is where idealism meets reality, where the long hours and emotional toll of clinical training can either forge resilience or fuel burnout. Over the past decade, wellness curricula have emerged as a hopeful antidote to the rising tide of physician distress. But are they working? Two key studies—Coutinho et al’s 2025 longitudinal analysis and Raj’s 2016 systematic review—offer sobering insights and a call to recalibrate our approach.

What the Data Tells Us

Coutinho and colleagues (1) conducted a national longitudinal study (using the CERA protocol) linking wellness curricula in family medicine residency programs to burnout three years post-graduation. Their findings? No significant association between the presence or type of wellness curricula and reduced burnout in early career physicians. That’s a tough pill to swallow, especially given the time and resources invested in these initiatives.

Raj’s systematic review (2) echoes this complexity. While interventions like mindfulness and stress management show promise, the evidence base is thin—limited by small sample sizes, single-site studies, and inconsistent definitions of “well-being”. Autonomy, competence-building, and social connectedness emerged as key predictors of resident well-being, but translating these into curricular components remains elusive.

Beyond Bubble Baths and Burnout Bingo

Let’s be honest: some wellness efforts feel performative. A yoga session here, a gratitude journal there—well-intentioned, but often disconnected from the structural realities of residency. What residents crave isn’t just self-care tips; it’s systemic change. They want protected time, psychological safety, and leadership that models vulnerability and balance.

Coutinho’s study found that working fewer than 60 hours per week during PGY-1 was associated with lower burnout. That’s not a wellness module—it’s a workload adjustment. It suggests that the most impactful “curriculum” might be embedded in scheduling, staffing, and culture, not just in didactics.

Building a Curriculum That Matters

So where do we go from here? First, we need to redefine what wellness curricula actually mean. It’s not just about teaching coping strategies—it’s about embedding well-being into the DNA of training programs. That includes:

- Longitudinal design: One-off workshops don’t cut it. Wellness must be woven throughout the entire residency experience. The curriculum must be CORE to the residency training model.

- Faculty champions: Programs need leaders who advocate for well-being and model it authentically. Leaders must be the Program Directors and APDs and not just a wellness “Champion”. If your program needs a single champion, it is not serious about resident well-being.

- Safe spaces for disclosure: Residents must feel empowered to share struggles without fear of stigma or retaliation. Some of these spaces must be outside of their program and peers. Best if including other specialties

- Feedback loops: Curricula should be dynamic, shaped by resident input and evolving needs.

Raj’s review highlights the importance of autonomy and competence-building. That means giving residents meaningful roles in shaping their learning environment, not just asking them to meditate between consults. Institutions must have a wellness council that is resident-led.

The Missing Piece: Measurement

One of the biggest barriers to progress is the lack of standardized metrics. How do we define and measure “well-being”? Raj calls for a clear definition and validated scale—a crucial step if we want to compare interventions and track outcomes over time.

Without robust data, we risk chasing wellness trends without knowing what actually works. It’s time to move beyond anecdote and toward evidence-informed design. This includes robust studies of curricular models and longitudinal RCTs comparing models.

Culture Eats Curriculum for Breakfast

Ultimately, wellness isn’t a syllabus—it’s a culture. Programs that prioritize psychological safety, mentorship, and humane workloads will outperform those that rely solely on curricular fixes. Residents don’t just learn from lectures; they absorb the ethos of their environment.

If a program teaches mindfulness but punishes vulnerability, the curriculum is moot. If it offers resilience training but ignores toxic hierarchies, it’s window dressing. True wellness requires alignment between values, behaviors, and systems.

A Call to Action

The takeaway from these studies isn’t that wellness curricula are futile—it’s that they must evolve. We need to shift from checkbox interventions to transformative experiences. That means:

- Integrating wellness into core competencies

- Evaluating curricula with rigorous, longitudinal data

- Centering resident voices in design and delivery

- Addressing structural drivers of burnout, not just symptoms

Residency will always be demanding. But it doesn’t have to be depleting. With intentional design, courageous leadership, and a commitment to culture change, we can build training environments that nurture both clinical excellence and human flourishing.

 

References

(1) Anastasia J. Coutinho, MD, MHS; Amanda K. H. Weidner, MPH; Peter F. Cronholm, MD, MSCE.  A National Longitudinal Study of Wellness Curricula in US Family Medicine Residency Programs and Association With Early Career Physician Burnout.  J Grad Med Educ 2025; 17 (3): 320–329. https://doi.org/10.4300/JGME-D-24-00515.1

(2) Kristin S. Raj.  Well-Being in Residency: A Systematic Review.  J Grad Med Educ 2016; 8 (5): 674–684. https://doi.org/10.4300/JGME-D-15-00764.1

Thursday, July 3, 2025

Impact of the USMLE

 

A couple of recent articles got me thinking about USMLE again.

In recent years we have seen some big changes in the role and structure of the USMLE exams. Central to this transformation was the shift to pass / fail Step 1 reporting. As residency programs have grown more used to this change, there has continued to be discussions about the impact and how programs should use medical school and licensing data for residency selection and progression.

Liu, et al. (1) in an article published in 2025 propose that decoupling the USMLE from existing curricula would significantly reduce stress and promote authentic learning. The authors argue that when licensing exams are integrated too tightly into daily teaching, they dominate both students’ time and mental energy, detracting from professional identity development.

This view echoes longstanding concerns. The intense, singular focus on mastering exam content (and specifically multiple-choice exam content) probably diverts student attention from broader curricular goals like clinical reasoning, patient communication, and reflective practice. Liu, et al. suggest that when high-stakes testing is decoupled, the curriculum can prioritize formative assessment, early patient care, and professional socialization—key components of identity development rooted in self-reflection, mentorship, and values-driven practice. I am not sure that I agree with that statement. While in theory students would agree, in practicality in my experience as a clinical clerkship director and as a basic science course director in Year 1 and 2, students complain about content that is not directed at Step 1 or their clinical shelf exams or Step 2.

So, what do students think about the changes to USMLE 1?

Cangialosi et al. (2) in Academic Medicine, provide the student‑authored narrative on the transition of Step 1 to pass/fail scoring. They identify profound ripple effects:

·                  Reduced anxiety but shifted pressure. While removing the numerical score alleviated individual stress, students voiced concerns that focus would simply shift toward Step 2 CK, clerkship grades, and institutional reputation.

·                  Unintended inequities. The loss of a standardized score may advantage students at prestigious institutions while limiting opportunities for others—especially DO students or international medical graduates (IMGs) to differentiate themselves objectively.

·                  Program director adaptation lag. Residency directors will need time to recalibrate application screening methods, placing greater emphasis on qualitative evaluations, letters of recommendation, and narrative assessments.

·                  Logistical challenges. Scheduling Step 2 CK earlier for competitive specialties while allowing sufficient clinical exposure emerged as a thorny constraint.

Students emphasized the importance of proactively addressing these challenges to preserve the positive intent of Step 1’s reform. There were also some good thoughts on the impact to medical schools. Specifically linked to Liu, et al’s concerns on teaching to the test. Maybe a pass / fail USMLE Step 1 allows the medical school to think more broadly about success in the Preclinical curriculum.

Both papers converge on a critical theme: the alignment of assessment structure with educational values. Educators can reclaim core learning space—embedding reflective practice, mentorship, interprofessional learning, communication skills, and early patient engagement.

  • Students need to learn medicine as a profession, not just a test.
  • Faculty and peers can engage in formative feedback, modeling professional behavior.
  • Wellness becomes integral, not merely a box to check.

The curriculum should focus on what doctors actually do—work at the bedside, listen to patients, reflect on ethical dilemmas—not just what we know and how we answer a multiple-choice question.

Translating these insights into educational reform requires action on multiple fronts:

Stakeholder                           Strategy

Medical Schools                     Shift exams like Step 1 to external, remote formats; prioritize formative assessments, reflective portfolios, and structured mentorship.

Residency Programs               Expand holistic review: integrate narrative evaluations, trainee wellness, and clinical performance rather than numeric cutoffs. Provide guidelines/training for new selection criteria.

Students                                  Diversify focus—early involvement in clinical teams, scholarly projects, reflection activities—to shape a well-rounded, professional identity.

Licensing Boards                    Consider alternative evaluation models that emphasize ongoing competence review and real-world skills (e.g., tele-simulated assessments, recertification modules).

There are and will be challenges:

   Removing high-stakes pressure should not cultivate complacency or uneven learning rigor.

   Schools may struggle to consistently develop reliable assessments and ensure faculty development.

   Institutions need mechanisms to support nontraditional learners and protect diversity within residency applicant pools.

   Ongoing research is needed on outcomes: do changes in the national assessment system improve student well-being, patient care, or professional longevity?

Moreover, true reform demands culture change—deprioritizing “test scores as identity” in favor of holistic measures of compassion, resilience, and collaboration.

The considered proposals of Liu et al. and the reflective commentary of Cangialosi et al. together signal an opportunity to rethink medical assessment. We can—and should—reorient from “passing the exam” to “becoming a doctor.”  But to realize this promise requires strategic alignment: from curriculum design to residency selection, to licensing evaluation. It is time to reaffirm that the goal of medical education is not just what students know—but who they become.

 

References

(1) Liu L, Chachad N, Tadjalli A, Rajput V. Decoupling the United States Medical Licensing Examinations (USMLEs) From the Medical Curriculum to Promote Student Well-Being and Professional Identity Development. Cureus. 2025; 17 (5): e83335. doi: 10.7759/cureus.83335. PMID: 40458346; PMCID: PMC12127707

(2) Cangialosi, Peter T.; Chung, Brian C.; Thielhelm, Torin P.; Camarda, Nicholas D.; Eiger, Dylan S.. Medical Students’ Reflections on the Recent Changes to the USMLE Step Exams. Academic Medicine. 2021: 96 (3): 343-348. doi: 10.1097/ACM.0000000000003847