Introduction

This blog is about medical education in the US and around the world. My interest is in education research and the process of medical education.



The lawyers have asked that I add a disclaimer that makes it clear that these are my personal opinions and do not represent any position of any University that I am affiliated with including the American University of the Caribbean, the University of Kansas, the KU School of Medicine, Florida International University, or the FIU School of Medicine. Nor does any of this represent any position of the Northeast Georgia Medical Center or Northeast Georgia Health System.



Wednesday, September 3, 2025

Residency Selection in the Age of Artificial Intelligence: Promise and Peril

 Residency Selection in the Age of Artificial Intelligence: Promise and Peril

Residency selection has always been a high-stakes, high-stress process. For applicants, it can feel like condensing years of study and service into a few fleeting data points. For programs, it is like drinking from a firehose—thousands of applications to sift through with limited time and resources, and an abiding fear of missing the “right fit.” In recent years, the pressures have only grown: more applications, Step 1 shifting to pass/fail, and increased calls for holistic review in the name of equity and mission alignment. 

Into this crucible comes artificial intelligence (AI). Advocates promise that AI can tame the flood of applications, find overlooked gems, and help restore a measure of balance to an overloaded system. Critics worry that it will encode and amplify existing biases, creating new blind spots behind the sheen of algorithmic authority. A set of recent papers provide a window into this crossroads, with one central question: will AI be a tool for fairness, or just a faster way of making the same mistakes?

What We Know About Interviews   

Before even considering AI, it helps to step back and look at the traditional residency interview process. Lin and colleagues (1) recently published a systematic review of evidence-based practices for interviewing residency applicants in Journal of Graduate Medical Education. Their review of nearly four decades of research is sobering: most studies are low to moderate quality, and many of our cherished traditions—long unstructured interviews, interviewer “gut feelings”—have little evidence behind them. What does work? Structure helps. The multiple mini interview (MMI) shows validity and reliability. Interviewer training improves consistency. Applicants prefer shorter, one-on-one conversations, and they value time with current residents. Even virtual interviews, despite mixed reviews, save money and broaden access. 

In other words, structure beats vibe. If interviews are going to continue as a central part of residency selection, they need to be thoughtfully designed and consistently delivered.

The Scoping Review: AI Arrives 

The most important new contribution to this debate is Sumner and colleagues’ scoping review in JGME (2). They examined the small but growing literature on AI in residency application review. Of the twelve studies they found, three-quarters focused on predicting interview offers or rank list positions using machine learning.  Three articles used natural language processing (NLP) to review and analyze letters of recommendation. 

The results are promising but fragmented. Some models could replicate or even predict program decisions with decent accuracy. Others showed how NLP might highlight subtle patterns in narrative data, such as differences in the language of recommendation letters. But strikingly, only a quarter of the studies explicitly modeled bias. Most acknowledged it as a limitation but stopped short of systematically addressing it. The authors conclude that AI in residency recruitment is here, but it is underdeveloped, under-regulated, and under-evaluated. Without common standards for reporting accuracy, fairness, and transparency, we risk building shiny black boxes that give an illusion of precision while quietly perpetuating inequity.

Early Prototypes in Action

Several studies give us a glimpse of what AI might look like in practice. Burk-Rafel and colleagues at NYU (3) developed a machine learning–based decision support tool, trained on over 8,000 applications across three years of internal medicine interview cycles. The training data included 61 features, including demographics, time since graduation, medical school location, USMLE scores or status, awards (AOA), and publications among many others. Their model achieved an area under the ROC of 0.95 and even performed well (0.94) without USMLE scores. Interestingly, when deployed prospectively, it identified twenty applicants for interview who had been overlooked by human reviewers, many of whom later proved strong candidates. Here, AI wasn’t replacing judgment but augmenting it, catching “diamonds in the rough” that busy faculty reviewers had missed.

Rees and Ryder’s work (4) published in Teaching and Learning in Medicine took a different angle, building machine learning algorithm Random Forest models to predict ranked applicants and matriculants in internal medicine. Their models could predict with high accuracy (area under ROC 0.925) who would be ranked, but struggled to predict who would ultimately matriculate (area under ROC 0.597). The lesson: AI may be able to mimic program decisions, but it is far less certain whether those decisions correlate with outcomes that matter—like performance, retention, or alignment with mission.

Finally, Hassan and colleagues in the Journal of Surgical Education (5) directly compared AI with manual selection of surgical residency applicants. Their findings were provocative: the two applicant lists (AI selected vs PD selected) only had an overlap of 7.4%. AI was able to identify high-performing applicants with efficiency comparable to traditional manual selection, but there were significant differences. The AI selected applicants who were more frequently white/Hispanic (p<0.001), more US medical graduates (p=0.027), younger (p=0.024), and had more publications (p<0.001). This raises questions about both list generation processes. There are questions transparency and acceptance by faculty. Programs faculty trust their own collective wisdom, but will they trust an machine learning process that highlights candidates they initially passed over?

Where AI Could Help

Taken together, these studies suggest that AI could help in several ways:

- Managing volume: AI tools can quickly sort thousands of applications, highlighting candidates who meet baseline thresholds or who might otherwise be filtered out by crude metrics.
- Surfacing hidden talent: By integrating many data points, AI may identify applicants overlooked because of a single weak metric, such as a lower Step score or an atypical background.
- Standardizing review: Algorithms can enforce consistency, reducing the idiosyncrasies of individual reviewers.
- Exposing bias: When designed well, AI can make explicit the patterns of selection, shining light on where programs may unintentionally disadvantage certain groups.

Where AI Could Harm

But the risks are equally real:

- Amplifying bias: Models trained on past decisions will replicate the biases of those decisions. If a program historically favored certain schools or demographics, the algorithm will “learn” to do the same.
- False precision: High AUROC scores may mask the reality that models are only as good as their training data. Predicting interviews is not the same as predicting good residents.
- Transparency and trust: Faculty may resist adopting tools they don’t understand, and applicants may lose faith in a process that feels automated and impersonal.
- Gaming the system: When applicants learn which features are weighted, they may tailor applications to exploit those cues—turning AI from a tool for fairness into just another hoop to jump through.

Broad Reflections: The Future of Recruitment

What emerges from these studies is less a roadmap and more a set of crossroads. Residency recruitment is under enormous pressure. AI offers tantalizing relief, but also real danger.

For programs, the key is humility and intentionality. AI should never completely replace human judgment, but it can augment it. Program directors can use AI to help manage scale, to catch outliers, and to audit their own biases. But the human values—commitment to service, value in diversity, and the mission of training compassionate physicians—cannot be delegated to an algorithm.

For applicants, transparency matters most. A process already viewed as opaque will only grow more fraught if decisions are seen as coming from a black box. Clear communication about how AI is being used, and ongoing study of its impact on residency selection is essential. 

For the medical education community, the moment calls for leadership. We need reporting standards for AI models, fairness audits, and shared best practices. Otherwise, each program will reinvent the wheel—and the mistakes.

Residency recruitment has always been an imperfect science, equal parts art and data. AI does not change that. What it does offer is a new lens—a powerful, potentially distorting one. Our task is not to embrace it blindly nor to reject it out of fear, but to use it wisely, always remembering that behind every application is a human being hoping for a chance to serve.

References

(1) Lin JC, Hu DJ, Scott IU, Greenberg PB. Evidence-based practices for interviewing graduate medical education applicants: A systematic review. J Grad Med Educ. 2024; 16 (2): 151-165.

(2) Sumner MD, Howell TC, Soto AL, et al. The use of artificial intelligence in residency application evaluation: A scoping review. J Grad Med Educ. 2025; 17 (3): 308-319.

(3) Burk-Rafel J, Reinstein I, Feng J, et al. Development and validation of a machine learning–based decision support tool for residency applicant screening and review. Acad Med. 2021; 96 (11S): S54-S61.

(4) Rees CA, Ryder HF. Machine learning for the prediction of ranked applicants and matriculants to an internal medicine residency program. Teach Learn Med. 2022; 35 (3): 277-286.

(5) Hassan S, et al. Artificial intelligence compared to manual selection of prospective surgical residents. J Surg Educ. 2025; 82 (1): 103308.

Monday, August 25, 2025

The Future of Simulation in Medical Education: From Novelty to Necessity

 The Future of Simulation in Medical Education: from Novelty to Necessity

Medical education has always wrestled with the challenge of teaching complex, high-stakes skills in an environment where mistakes can carry real consequences. Historically, students learned at the bedside, often relying on apprenticeship models where experience came in unpredictable bursts. While this “see one, do one, teach one” tradition had its strengths, it also left gaps. Simulation-based training (SBT) emerged to fill those gaps, and it is no longer a niche tool—it is a core component of medical education. A recent article describes simulation-based research and innovation. The authors suggest that the next decade will transform simulation from a supplemental experience into a foundational pillar of how we prepare physicians.

Why Simulation Matters

Simulation provides a safe space where learners can make mistakes, reflect, and try again—without putting patients at risk. Elendu and colleagues’ 2024 review (1) highlights several key benefits: learners gain clinical competence more quickly, retain knowledge longer, and demonstrate improved patient safety outcomes. Equally important, simulation supports deliberate practice, structured feedback, and team-based scenarios that mirror the realities of modern healthcare. In an era where patient safety is paramount and medical knowledge is expanding faster than ever, the controlled environment of simulation offers a vital buffer between the classroom and the clinic. 

Emerging Technologies Driving Change

The next wave of simulation training will be shaped by technology. In an article posted by Education Management Solutions (2), artificial intelligence (AI) is poised to revolutionize how scenarios are created and adapted. Instead of static, one-size-fits-all cases, AI can generate patient interactions tailored to a learner’s level, performance, and even biases. Imagine a resident who consistently misses subtle diagnostic cues being repeatedly exposed to cases that hone that specific skill. Adaptive learning, powered by AI, promises to accelerate mastery and personalize education in ways we’ve only begun to imagine.

Another major trend is the improvement in simulation technology such as high-fidelity mannequins (Sim Man and Harvey), virtual endoscopy and ultrasound simulators, and surgical simulators. Virtual Reality and Augmented Reality have moved from gaming into the world of education. (3) VR headsets are smaller, more affordable, and more accessible. For medical schools committed to widening access to education and reducing disparities, portability is a game-changer.
These tools allow learners to step into highly realistic, immersive scenarios. VR can recreate the chaos of a mass casualty event or the precision of an operating room, while AR overlays digital information onto the real world—imagine seeing a patient with anatomy labeled in real time. The potential for engagement and realism is enormous. Still, VR/AR must avoid becoming flashy gimmicks. Their power lies in creating experiences that are both immersive and educationally sound, rooted in clear learning objectives.

Feeling is Believing: the Role of Haptics 

Simulation has long been strong in visual and auditory fidelity, but haptics—the sense of touch—has lagged behind. That is changing. New advances in haptic feedback allow learners to “feel” the resistance of tissue during a procedure, the snap of a joint during reduction, or the subtle give of a vessel wall during cannulation. For skill-based specialties like surgery, obstetrics, and emergency medicine, this tactile realism can shorten the learning curve and increase confidence before performing procedures on patients. A recent systematic review in the Journal of Surgical Education (4) identified the challenge with surgical simulation. Feedback from the surgical instrument which is typical for minimally invasive techniques such as laparoscopy is easier to simulate than the feel of soft tissues in the body. The review identified nine studies of haptics but there is much inconsistency in the evidence.

Competency Tracking

Perhaps one of the most exciting—and potentially controversial—advances is the integration of data analytics into simulation. Systems are emerging that can measure everything from the angle of a needle insertion to the response time in a code scenario. These metrics can provide real-time feedback and generate longitudinal reports of a learner’s progress. For competency-based medical education (CBME), which emphasizes outcomes over time served, such analytics could provide the objective measures we have long struggled to capture. Of course, this raises important questions about how such data are used in assessment, promotion, and even remediation. Transparency and fairness will be critical if analytics are to fulfill their promise without creating new inequities.

Challenges Ahead  

Despite its promise, simulation faces hurdles. Costs are significant—high-fidelity mannequins, VR systems, and haptic devices are expensive, and simulation centers require space, staff, and upkeep. Faculty development is another challenge: effective simulation requires skilled facilitators who can guide debriefings, not just operate the technology. Finally, while simulation improves competence, translating those skills into clinical performance is not automatic. More research, like that synthesized by Elendu et al., is needed to understand how best to integrate simulation into curricula to maximize transfer to patient care. 

Implications for Medical Education

For medical schools (and residency training programs), the message is clear: simulation is not optional. Schools that fail to invest in simulation risk graduating physicians less prepared for the realities of modern healthcare. The most forward-thinking institutions will not only build simulation centers but also embed simulation across the curriculum—from preclinical years through residency. This requires leadership willing to make strategic investments and faculty committed to weaving simulation into teaching, assessment, and remediation. It also requires attention to equity, ensuring that students across campuses and resource levels have access to the same opportunities.

Looking Forward

As simulation matures, its role will expand beyond technical training. It will increasingly serve as a platform for teaching professionalism, interprofessional teamwork, cultural humility, and even resilience. The “hidden curriculum” of medicine—the values, habits, and attitudes we pass on—can be intentionally addressed in simulated spaces. AI-driven avatars may even help address bias, exposing learners to diverse patient populations in ways that are not possible in traditional settings.

In short, the future of simulation is bright. What began as a supplemental tool is becoming the backbone of modern medical education. The convergence education and technology is creating a learning ecosystem that is safer, smarter, and more responsive to individual learners. The challenge for medical educators is not whether to adopt simulation, but how to do so thoughtfully, equitably, and in ways that truly enhance patient care.

 

References

(1)   Elendu C, Amaechi DC, Okatta AU, et al. The impact of simulation-based training in medical education: A review. Medicine  2024; 103 (27): e38813. doi: 10.1097/MD.0000000000038813. PMID: 38968472; PMCID: PMC11224887.

(2)   https://ems-works.com/blog/content/7-future-trends-in-healthcare-simulation-training/

(3)   Dhar E, Upadhyay U, Huang Y, Uddin M, Manias G, Kyriazis D, Wajid U, AlShawaf H, Syed Abdul S. A scoping review to assess the effects of virtual reality in medical education and clinical care. Digit Health. 2023; 9: 20552076231158022. doi: 10.1177/20552076231158022. PMID: 36865772; PMCID: PMC9972057.

(4)   Rangarajan K, Davis H, Pucher PH.  Systematic Review of Virtual Haptics in Surgical Simulation: A Valid Educational Tool? J of Surgical Education 2020; 77 (2); 337-347.  https://doi.org/10.1016/j.jsurg.2019.09.006

Thursday, August 7, 2025

More Than a Prayer: How Chaplaincy Services Shape and Improve Patient Experience

 More Than a Prayer: How Chaplaincy Services Shape and Improve Patient Experience

By John E. Delzell Jr., MD, MSPH, MBA, FAAFP

As physicians and educators, we often talk about the patient experience as if it's only tied to clinical outcomes, nursing care, timely communication, or the cleanliness of the hospital. But a recent study in the Journal of Healthcare Management challenges us to widen that lens. White and colleagues (1) examined a less discussed—yet profoundly impactful—hospital service: the chaplaincy department. Their research poses a simple but powerful question: Does having a chaplaincy department improve hospital patient experience scores? The answer is a strong yes, and the implications are far-reaching for how we think about team-based care and holistic healing.

Study at a Glance

This was a large, multi-year observational study using American Hospital Association (AHA) data and Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) scores. The researchers looked at a sample of 1,215 hospitals between 2016 and 2019, using rigorous multivariate regression modeling to adjust for variables like hospital size, ownership, status as a teaching hospital, and patient demographics.

The key variable of interest? Whether a hospital had a chaplaincy department according to their answer on the AHA Annual survey. The dependent variables were from the HCAHPS report in patient experience domains, specifically global hospital rating of 9 or 10 and the percentage of patients who would definitely recommend the hospital. These variables roll up many of the things that patients (and families) care about and include those related to communication, emotional support, and nursing care. These are the types of things patients remember after discharge, even when they forget the name of their antibiotic.

What Did They Find

Hospitals with chaplaincy departments consistently scored higher across the two primary outcome measures. The differences were small but statistically significant. In bivariate analysis, the hospitals who had a chaplaincy department had 1.6% more patients that were likely to give a top box score of a 9 or 10 (t = –5.04, P < 0.001) than those without a chaplaincy department. 3.1% more respondents would definitely recommend a hospital with a chaplaincy department than those without one (t = –8.91, P < 0.001). When the multiple variable regression model was applied, the results were still statistically significant. Hospitals who had a chaplaincy department had 1.5% (standard error [SE] = 0.58, P < 0.05) more patients giving a top box ranking than hospitals without a chaplaincy department. And 2.2% (SE=0.66, P <0.001) more patients that would definitely recommend the hospital.

They also found that chaplaincy presence was associated with a more significant impact in nonprofit and teaching hospitals, where patient acuity and complexity tend to be higher. This nuance adds an important layer to the conversation: chaplaincy services may be especially beneficial in the very environments where patients are most vulnerable.

Why Does It Matter

This study adds quantitative muscle to what many of us have long known anecdotally—that spiritual care is more than a “nice to have” in the hospital setting. It’s part of the fabric of compassionate, patient-centered care. In a former role, I had the privilege of having administrative oversight for our chaplaincy program, the VC &Mary Puckett Center for Spiritual Care (https://www.nghs.com/spiritual-care). The Center includes hospital chaplain services, pastoral care education for chaplain interns and chaplain residents, a live therapeutic music program, and a center for clinical bioethics. A busy and amazing group.  

As someone who has spent years in academic medicine and hospital leadership, I’ve seen how easily chaplains can be overlooked during strategic planning or budget decisions. They often operate in the background, providing emotional and spiritual care, facilitating difficult family conversations, or simply sitting with a patient in silence when nothing else seems to help. But this research tells us that those seemingly quiet contributions reverberate in powerful ways. Chaplains may not prescribe medications or write orders, but their presence has measurable effects on how patients feel about their care. And in a healthcare environment increasingly focused on value-based metrics, that matters.

Implications for Medical Education

One area where I think this research really resonates is in how we prepare students and residents to think about interprofessional care. We train them in evidence-based medicine, population health, and systems-based practice. But how often do we talk about spiritual care as an evidenced-based part of the clinical care team?

This study opens the door to that conversation. If chaplaincy departments contribute to a better patient experience—and therefore better outcomes, clinically and financially—then students and residents should be taught how to collaborate with chaplains just like they learn to work with pharmacists or nurses or case managers. Incorporating chaplain shadowing, discussions of spiritual assessments, and interprofessional simulations into our curricula could make a real difference. It’s not just about preparing future physicians to treat disease—it’s about preparing them to treat people.

Budget vs. Benefit

Of course, none of this comes without cost. Many hospitals, especially smaller, for profit, and rural hospitals, have cut back on “non-essential” services like a chaplaincy department. (2) But this research challenges that decision. If chaplaincy services drive improvements in HCAHPS scores, then investment in these programs may actually support hospital finances through higher reimbursement rates tied to value-based purchasing. Hospital leaders and CFOs may want to reframe how they see chaplaincy—less as a soft service and more as a strategic investment. This is especially true in teaching hospitals, where patients often face extended stays, complex illnesses, and existential crises that stretch beyond the reach of medicine.

A Broader View of Healing

In the end, this study reminds us that healing isn’t confined to the body. The hospital is not just a place of procedures and prescriptions; it’s also a place of fear, hope, grief, and meaning. Chaplains walk with patients through all of that, often in moments when medical interventions have nothing left to offer. So, the next time you see a chaplain walking the halls—or hear their name mentioned during a family meeting—take a moment to recognize the vital role they play. And maybe even ask yourself: Are we doing enough to support this essential part of our care team?

Because healing doesn’t happen in isolation. It happens in community. And chaplains, it turns out, are part of the reason patients feel seen, heard, and cared for.

 

(1) White KB, McClelland LE, Jennings JAC, Karimi S, Fitchett G. The Impact of Chaplaincy Departments on Hospital Patient Experience Scores. Journal of Healthcare Management 2025; 70 (3): 220-234. DOI: 10.1097/JHM-D-24-00143

(2) White KB, Lee SYD, Jennings JAC, Karimi S, Johnson CE, Fitchett G. Provision of chaplaincy services in U.S. hospitals: A strategic conformity perspective. Health Care Management Review 2023: 48 (4): 342-351. DOI: 10.1097/HMR.0000000000000382

Tuesday, August 5, 2025

Expanding the Pipeline: how do we get more physicians into underserved areas?

 

Expanding the Pipeline: how do we get more physicians into underserved areas?

John E Delzell Jr MD MSPH MBA FAAFP

In the face of ongoing primary‑care workforce shortages, it is important to identify ways to expand the number of physicians who choose practice in health professions shortage areas. This also means that more graduates will need to choose primary care specialties as many rural and underserved communities cannot easily support subspecialty practice. Several recent articles enrich this discussion and illustrate some of the challenges and successes.

Targeted Admissions Strategy

A study by Evans and colleagues (1) from 2020 looked at the medical school admissions process by surveying all 185 US Allopathic and Osteopathic medical schools. Their premise was that there is an inherent social mission of all medical schools to meet the health needs of the public. I am not sure that all medical schools would agree with that statement, but it led to the authors questioning how each school targeted applicants. Specifically, the school’s admission strategy.

The authors had an impressive 72% response rate. Schools were grouped by their targeting strategy-69% used a rural targeting strategy and 67% used an urban-underserved targeting strategy. The strategies used characteristics such as graduation from rural high school, growing up in rural community, growing up in an underserved area, and stated interest in practicing in an underserved area. Interestingly, only 20% of the schools reserved admission slots for students with these characteristics.

Holistic Review for Admissions

Ballejos and colleagues (2) in an article published in 2025 looked at data from a single medical school (the University of New Mexico School of Medicine) over a period of eight years from 2006 to 2013. In this study, the authors looked at all the students matriculating each year and used practice data to identify their post-residency practice location. They were trying to identify objective attributes that might predict in-state practice location. The admissions committee at UNM SOM uses a holistic review to consider objective data (MCAT, GPA), personal attributes (gender, ethnicity), goals (practice plans), and experiences (graduating from rural high school). This is done, at least in part, to assess whether a given applicant is likely to practice in New Mexico and advance the school’s mission.

They performed univariate and multiple regression analyses to compare the graduates. The authors used in-state versus out-of-state practice location identified by NPI number and medical licensure data. They found that only 41.7% of graduates during that time period practiced in New Mexico. Older students and those who graduated from an urban high school were more likely to practice in-state after training. Importantly, most of the variables that they looked at were not significant and the three that were significant only explained 6.4% of the variance.

Not much help to inform the Admissions process.

Admissions impact on Primary Care choice?

Raleigh and colleagues (3) undertook a narrative synthesis to systematically evaluate existing literature on how various medical school admissions practices, including prematriculation programs, are associated with graduates eventually entering primary care specialties. The purpose of their study was to review literature that describes admissions practices and try to determine the impact on the number and percentage of graduates entering primary care. The performed a comprehensive search of English-language peer-reviewed research with outcomes related to primary care choice and identified 34 qualifying articles — mainly single-institution, observational studies. They used narrative synthesis as their evaluation method for two reasons: it allowed them to evaluate and summarize data from a wide variety of methodologies, and it allowed them to provide a narrative description of the identified studies.

The authors found that pre-matriculation programs were consistently associated with higher rates of graduates entering primary care compared to peers. Other predictive factors included self-identified interest in primary care, rural background, and being older at matriculation. They found that not very many schools explicitly prioritize primary care in their admissions criteria. The authors did note that some of the studies looked at primary care in rural environments, and those results were consistent with the overall group of studies.

Raleigh et al. argue that medical schools should consider pre-matriculation programs targeting students already oriented toward primary care. They also emphasize active recruitment of applicants expressing a primary care commitment and call for more rigorous prospective research. These results suggest that structured prematriculation programs can influence specialty choice outcomes, beyond self-selection effects. The program’s design elements — mentorship, academic support, and clear pathways — likely contributed to success.

There are some great programs out there that target students who come from rural and underserved areas, get them into medical school, and then encourage them to pursue practice in underserved areas typically in primary care. Let’s look at a couple of these programs…

University of Missouri—Columbia: Bryant Scholars Pre-Admissions Program

The Bryant Scholars Program (4) guarantees medical school admission to qualified rural students committed to primary care and rural practice. Students must be from Missouri and from a rural county. The program targets rural and under‑resourced applicants, offering tailored support and conditional admission pathways. Students are committed to the MU Rural Scholars Program after matriculation into the medical school.

University of Kansas School of Medicine: Scholars in Health

The Scholars in Health program (5) has two tracks for students interested in underserved practice areas-Rural and Urban. This is an early‑admit conditional acceptance program targeting students who come from rural and underserved backgrounds and intend to return to practice in those areas after graduation. The program offers academic and career mentoring, with guaranteed admission contingent on performance milestones.

Florida State University College of Medicine: Bridge to Clinical Medicine

The Bridge Program (6) is designed to expand the pool of successful medical school applicants who come from rural and urban underserved areas. FSU COM’s Bridge Program is a 12‑month Master’s in Biomedical Sciences for applicants selected from those not initially admitted to FSU COM. Completion with a B or higher and meeting professionalism standards leads to direct consideration for admission.

Common Threads and Best Practices

Across MU’s Bryant Scholars, KU SOM’s Scholars in Health, and FSU COM’s Bridge Program:
1. Targeted recruitment of students from rural and underrepresented backgrounds
2. Conditional admission or master’s bridging with performance thresholds.
3. Integrated academic and clinical exposure.
4. Ongoing mentorship and professionalism assessment.

Prematriculation programs like MU’s Bryant Scholars, KU SOM’s Scholars in Health, and FSU COM’s Bridge Program embody the strategies highlighted in the literature as high impact. They identify likely applicants, reduce barriers, support readiness, foster diversity, and strengthen the rural and underserved practice pipeline. Expanding these models nationally offers a promising route to a more effective, equitable physician workforce.

References

(1)   Evans DV, Jopson AD, Andrilla CA, Longenecker RL, Patterson DG. Targeted Medical School Admissions: A Strategic Process for Meeting Our Social Mission. Fam Med. 2020;52(7):474-482. https://doi.org/10.22454/FamMed.2020.470334.

(2)   Ballejos MP, Riera J, Williams R, Sapién RE. Objective Admissions Data and In-State Practice: What Can We Really Predict? Fam Med. 2025;57(6):435-438. https://doi.org/10.22454/FamMed.2025.503525.

(3)   Raleigh MF, Seehusen DA, Phillips JP, Prunuske J, Morley CP, Polverento ME, Kovar-Gough I, Wendling AL. Influences of Medical School Admissions Practices on Primary Care Career Choice. Fam Med. 2022; 54 (7): 536-541. https://doi.org/10.22454/FamMed.2022.260434. 

(4)   https://medicine.missouri.edu/offices-programs/admissions/bryant-pre-admissions-program

(5)   https://www.kumc.edu/school-of-medicine/academics/premedical-programs/scholars-in-health.html

(6)   https://med.fsu.edu/outreach/masters-bridge-program


Wednesday, July 23, 2025

AI, Superintelligence, and the Future of Clinical Reasoning

 AI, Superintelligence, and the Future of Clinical Reasoning

By John E. Delzell Jr., MD, MSPH, MBA, FAAFP

In medical education, we often talk about transformation. Competency-based education, interprofessional learning, simulation, and evidence-based practice have all changed how we prepare the next generation of physicians. But the greatest transformation still on the horizon—and it’s being driven by artificial intelligence (AI).

Over the past few years, AI has moved rapidly from theoretical potential to practical application. Large language models (LLMs) like GPT-4 have already demonstrated the ability to pass the USMLE, summarize research articles, and assist in clinical decision-making. As stated by King and Nori (1), “reducing medicine to one-shot answers on multiple-choice questions, such (USMLE) benchmarks overstate the apparent competence of AI systems and obscure their limitations.” Even so it is clear that we are entering a new era where AI may not just assist doctors—it will outperform them in specific domains. 

What does this mean for medical education? To answer that, we have to look at where the technology is headed and how it intersects with human learning, reasoning, and clinical judgment.

The Promise and Peril of Medical Superintelligence

In June 2025, Microsoft and OpenAI (1)  released a visionary statement on the trajectory toward “medical superintelligence”—a form of AI that not only surpasses human performance on standardized medical benchmarks but demonstrates generalizable clinical reasoning across specialties. Their goal is to build a system that operates with “generalist-like” breadth and “specialist-level” depth, grounded in real-time reasoning and safety. This is not science fiction.

In a recent preprint (2), OpenAI researchers presented GPT-4-MED, an experimental model fine-tuned specifically on clinical data. The team created a set of 304 digital clinical cases that came from the New England Journal of Medicine clinicopathological conference (NEJM-CPC) cases. (2) The cases are stepwise diagnostic encounters where physicians can iteratively ask questions and order tests. As new information becomes available, the physician updates their reasoning, eventually narrowing towards their best final diagnosis. The final diagnosis can then be compared to the “correct” diagnosis which was published in the journal. When I was a student and resident, I loved reading these. I almost never got the correct diagnosis, but I learned a lot from the process. 

In structured evaluations, the new AI model outperformed existing models on dozens of medical tasks, from radiology interpretation to treatment planning. And importantly, when tested against human physician performance on the NEJM cases, it demonstrated reasoning that mimics human diagnostic thinking: considering differential diagnoses, weighing risks and benefits, and accounting for uncertainty.

This level of performance hints at the possibility of AI not only augmenting care but becoming a form of clinical intelligence in its own right. In short, we are no longer talking about tools. We are talking about future colleagues.

Cognitive Load, Expertise, and What Makes a Good Doctor

If AI systems can increasingly perform tasks once limited to trained physicians, what remains uniquely human in the physician role? One answer lies in how we process complexity.

A 2022 paper by Tschandl et al (3) explored how AI and human physicians interact in diagnostic decision-making. Their findings are fascinating: when AI is presented as a peer or assistant, it improves physician accuracy; but when it is given too much credibility (ie: treated as an oracle), physicians defer too quickly, losing the benefits of independent judgment. In essence, the relationship between humans and AI is dynamic—shaped by trust, communication, and cognitive calibration.

This has major implications for education. Medical students and residents must not only learn the traditional content of medicine; they must learn how to work with AI systems—to question, validate, and contextualize recommendations. That means we must teach not just clinical knowledge but metacognition: the ability to understand how we think, and how machines think differently.

And we must recognize that human expertise is not obsolete. As Microsoft notes in its roadmap to superintelligence, there are still many domains where AI falls short—especially in interpreting nuance, assessing values, and navigating ethical complexity.(1) These are precisely the areas where medical educators must continue to lead.

A New Role for the Medical Educator

So how should medical educators respond?

First, we must integrate AI literacy into the curriculum. Just as we teach evidence-based medicine, we now need to teach “AI-based medicine.” Students should understand how these models are trained, what their limitations are, and how to critically appraise their output. This isn’t just informatics—it’s foundational clinical reasoning in the 21st century.

Second, we need to reimagine assessment. Traditional exams measure knowledge recall and algorithmic thinking. But AI can now generate textbook answers on command. Instead, we should assess higher-order skills: contextual judgment, empathy, shared decision-making, and the ability to synthesize information across disciplines. We are not trying to train machines—we are trying to train humans to be the kind of doctors AI can’t be.

Third, we must prepare for a changing scope of practice. As AI takes on more diagnostic and administrative tasks, physicians may find themselves able to focus more on the human aspects of care—narrative, empathy, ethics, and meaning. This is not a diminishment of the physician’s role. It is a refinement. We are moving from being knowledge providers to wisdom facilitators.

The Human-AI Team

One of the most powerful concepts in Microsoft’s vision is the idea of the human-AI team. This is not about replacing doctors with algorithms. It’s about creating a partnership where each party brings unique strengths. AI can process terabytes of data, recognize subtle patterns, and recall every guideline ever published. Humans can listen, connect, and weigh values in the face of uncertainty.

As educators, we must train our learners to be effective members of this team. That means not just accepting AI, but shaping it—participating in its development, informing its design, and advocating for systems that reflect the realities of clinical care. This will not be easy. There will be challenges around bias, privacy, overreliance, and professional identity. But the alternative—ignoring these changes or resisting them—is no longer tenable.

Looking Ahead

Medical education is entering a new frontier. In the coming years, we will need to train learners who are not only competent clinicians, but also agile learners, critical thinkers, and collaborative partners with AI.

This is not the end of the physician. It is the beginning of a new kind of doctor—one who uses technology not as a crutch, but as an amplifier of what makes us human.

And that, to me, is the real promise of medical superintelligence: not that it will replace us, but that it will help us become even better at what we are meant to do—care for people in all their complexity.


References

(1) King D, Nori H. “The path to medical superintelligence”  Microsoft AI. Accessed 7/8/25. Retrieved from https://microsoft.ai/new/the-path-to-medical-superintelligence 

(2) Nori H, Daswani M, Kelly C, Lundberg S, et al. Sequential Diagnosis with Language Models. Cornell University arXiv.  Accessed 7/7/25. Retrieved from https://arxiv.org/abs/2506.22405 

(3) Tschandl P, Rinner C, Apalla Z. et al. Human–computer collaboration for skin cancer recognition. Nat Med 2020; 26: 1229–1234. https://doi.org/10.1038/s41591-020-0942-0


Monday, July 14, 2025

Running the Distance: Joy, Risk, and Why I Keep Lacing Up

John E. Delzell Jr., MD, MSPH, MBA, FAAFP

I still remember the first time I crossed the finish line of a marathon.

It was hot. We were in Orlando (the Disney Marathon). My legs were toast. The crowds were cheering. I definitely cried in those final few meters. Finishing 26.2 miles doesn’t just test your body. It tests your commitment, your mind, your pain threshold, and sometimes your relationship with your toenails.

I’ve run a lot of races since then. Some were fast. Some were slow. Some were surprisingly fun. Others… let’s just say I was glad they ended. But each one taught me something—not just about pacing or hydration, but about myself. About resilience. About joy. About being present in motion.

So, when I recently came across two very different—but equally important—articles on running, I felt compelled to dig a little deeper.

Why We Run (And Why I Still Do)

Let’s start with the why. In 2021, Hugo Vieira Pereira and colleagues (1) published a systematic review in Frontiers in Psychology that asked a simple but profound question: What drives people to run for fun?

Not surprisingly, it’s not about weight loss or physical health—although those show up plenty. They found that psychological and behavioral factors play just as large a role, especially for recreational runners. Things like stress relief, mood elevation, a sense of achievement, or just the pure enjoyment of the run itself. I do enjoy the bling, nothing like posting that picture of your finisher medal on your favorite social media site, but there is much more to the joy of running than a medal. Interestingly, runners with more experience tend to internalize the joy—shifting away from extrinsic motivations (like medals or fitness) toward more intrinsic ones (like emotional well-being or identity).

I get that. Running has long been a reset button. It’s where I process tough days, pray, think, unwind. It’s where I go when I need space, and oddly enough, also where I go when I need community. The running community is incredibly supportive. Long runs with friends have a way of cutting through small talk. You learn a lot about someone at mile 16.

The Pereira review also highlights how consistent runners tend to have high self-regulation skills—planning, goal-setting, time management, and the ability to push through discomfort. That sounds right. You don’t finish a marathon on motivation alone. You finish it because you ran all the invisible miles in the dark before sunrise, when no one was cheering.

The Hidden Risk No One Talks About on Race Day

But running isn’t all runner’s highs and finish-line photos. Every time I pin on a bib number, especially at marathons or halfs, I know I’m also assuming a small—but real—risk. And that brings us to the second article.  Published in 2025 in JAMA, the study by Kim et al (2) tackled the sobering topic of cardiac arrest during long-distance running races. The researchers reviewed over a decade’s worth of data and identified several critical insights:

- Cardiac arrest during organized races is really rare—occurring in about 1 per 100,000 participants—but considering the number of race participants (>23M) not negligible.
- Most cases occurred during marathons (not shorter distances), and more often near the end of the race.
- Interestingly, the incidence of cardiac arrest is stable (compared to 2000-2009) but there has been a significant decline in mortality
- Bystander CPR and the presence of automated external defibrillators (AEDs) significantly improved survival.

As a physician, I’ve always known running carries cardiovascular risk, especially if there’s underlying heart disease, electrolyte imbalances, or unrecognized genetic issues like hypertrophic cardiomyopathy. But reading this paper hit me a little differently—because it’s about my people. My tribe. Ordinary folks pushing themselves to extraordinary limits. As a runner, it reminded me that health screening and preparation matter—even when you’re “fit.” It’s easy to assume that crossing the start line means you're healthy enough. But racing is different from running. The adrenaline, the intensity, the heat, the dehydration—all of it combines into a stress test with real consequences.

Running Smarter, Running Longer

So how do I reconcile the joy of running with the risk it carries?  Honestly, it’s the same way I’ve practiced medicine for 30 years: with a clear-eyed look at the data and a respect for human experience.

First, I take precautions seriously. Regular checkups. Listening to my body. Hydration. Electrolytes. And yes, even slowing down when needed. No PR is worth collapsing for.

Second, I keep running for the same reasons that I started running. I’m not chasing record times anymore. I’m chasing clarity. Fellowship. Flow. Those long runs that leave your muscles sore but your spirit full.

And third, I encourage others, especially new runners, to train smart and listen to their body. Get checked out by your primary care doctor if you are over 40 and new to endurance sports. Don’t ignore chest discomfort, dizziness, or feeling “off” on race day. Carry ID. Know where the aid stations and AEDs are. Be the person who knows CPR.

The truth is, running can be one of the most powerful mental and physical health interventions we have—when done right.

My Finish Lines and What They Taught Me

Each marathon I’ve run has carried its own story. The one where it rained the whole time. The one where I cramped at mile 18. The one I ran with my best friend from high school cheering me at the finish line. Each race reminded me that finishing isn’t about being fast—it’s about being faithful to the training, the effort, the journey.

I’ve been lucky. I’ve stayed healthy, mostly. I’ve never DNF’d. But I’ve seen people collapse. In our first half marathon, a man collapsed and got bystander chest compressions on the course (he lived!). I’ve slowed down to walk someone to the medical tent. And I’ve always been thankful to cross the line—upright, tired, and deeply grateful.

Final Thoughts

Whether you’re finishing first or finishing last, there’s something sacred about committing your body and mind to something hard and seeing it through. Something human.  Running, like life, holds both joy and risk. We run to feel alive, to cope, to connect, to challenge ourselves. And while the road can be unpredictable—especially over 26.2 miles—it’s also where I’ve found some of my clearest moments.

So yes, I’ll keep lacing up. I’ll keep being smart. I’ll keep showing up.

The finish line may only last a few seconds, but the lessons from the road last a lifetime.

References

(1)  Pereira HV, et al. Systematic Review of Psychological and Behavioral Correlates of Recreational Running. Front. Psychol., 06 May 2021; Volume 12  https://doi.org/10.3389/fpsyg.2021.624783  

(2)  Kim JH, Rim AJ, Miller JT, et al. Cardiac Arrest During Long-Distance Running Races. JAMA. 2025;333(19):1699–1707. doi:10.1001/jama.2025.3026           

Friday, July 11, 2025

Revisiting Gender Bias in Learner Evaluations

 Revisiting Gender Bias in Learner Evaluations

If you’ve ever served as a program director, clerkship coordinator, or faculty evaluator in graduate medical education, you’ve likely wrestled with one of the most uncomfortable truths in our field: evaluation is never entirely objective. As much as we strive to be fair and evidence-based, our feedback—both quantitative and narrative—is filtered through a lens of human perception, shaped by culture, context, and yes, bias.

Two studies, published 21 years apart, can help us see just how persistent and nuanced those biases can be—especially around gender.

In 2004, my colleagues and I published a study in Medical Education titled “Evaluation of interns by senior residents and faculty: is there any difference?” (1) We were curious about how interns were assessed by two very different evaluator groups—senior residents and attending physicians. We found something interesting: the ratings given by residents were consistently higher than those from faculty. Senior faculty were surprisingly, significantly less likely to make negative comments. And more than that, the comments from senior residents were often more positive and personal. We speculated about why—perhaps residents were more empathic, closer to the intern experience, or more generous in peer evaluation.

But what we didn’t find in that study—and what medical educators are still working to unpack—was how factors like gender influence evaluations. We did not find any differences in the comments based on the gender of the evaluators, but the numbers were small enough that it was not clear if that had meaning.

That’s where a new article by Jessica Hane and colleagues in the Journal of Graduate Medical Education (2) makes a significant contribution.

The Gender Lens: a distortion or an added quality?

Hane et al. examined nearly 10,000 faculty evaluations of residents and fellows at a large academic medical center over five years. They looked at both numerical ratings and narrative comments, and they did something smart: they parsed out gender differences in both the evaluator and the evaluated. The findings? Gender disparities persist—but in subtle, revealing ways.

On average, women trainees received slightly lower numerical scores than their male counterparts, despite no evidence of performance differences. More strikingly, the language used in narrative comments showed clear patterns: male trainees were more likely to be described with competence-oriented language (“knowledgeable,” “confident,” “leader”), while women were more often praised for warmth and diligence (“caring,” “hard-working,” “team player”).

These aren’t new stereotypes, but their persistence in our evaluation systems is troubling. When we unconsciously associate men with competence and women with effort or empathy, we risk reinforcing old hierarchies. Even well-intentioned praise can pigeonhole trainees in ways that affect advancement, self-perception, and professional identity.

When bias feels like familiarity…

What’s particularly interesting is how these newer findings echo—and contrast with—what we saw back in 2004. Our study didn’t find any differences with gender specifically, but we did notice that evaluators closer to the front lines (senior residents) tended to focus more on relationships, encouragement, and potential. Faculty, and particularly senior faculty on the other hand, leaned toward more critical or objective assessments.

What happens, then, when those lenses intersect with gender? Are residents more likely to relate to and uplift women colleagues in the same way they uplift peers generally? Or does bias show up even in peer feedback, especially in high-stakes environments like residency? Hane’s study doesn’t fully answer that, but it opens the door for future research—and introspection.

The Myth of the “Objective” Evaluation

One of the biggest myths in medical education is that our evaluations are merit-based and free from bias. We put a lot of stock in numerical ratings, milestone checkboxes, and structured forms. But as both of these studies remind us, the numbers are only part of the story—and even they are shaped by deeper cultural narratives.

If you’ve ever read a stack of end-of-rotation evaluations, you know how much weight narrative comments can carry. One well-written paragraph can influence a Clinical Competency Committee discussion more than a dozen Likert-scale boxes. So when those comments are subtly gendered—when one resident is “sharp and assertive” and another is “kind and dependable”—we’re not just describing; we’re defining their potential.  And that’s a problem.

What Can We Do About It?

Fortunately, awareness is the first step to addressing bias, and there are concrete steps we can take. Here are a few that I think are worth highlighting:

1. Train faculty and residents on implicit bias in evaluations. The research is clear: we all carry unconscious biases. But bias awareness training—when done well—can reduce the influence of those biases, especially in high-stakes assessments.

2. Structure narrative feedback to reduce ambiguity. Ask evaluators to comment on specific competencies (e.g., clinical reasoning, professionalism, communication) rather than open-ended impressions. This can shift focus from personal attributes to observable behaviors.

3. Use language analysis tools to monitor patterns. Some residency programs are now using AI tools to scan applications for gendered language (3) and to look at letters of recommendation for concerning language (4). It’s not about punishing faculty—it’s about reflection and improvement.

4. Encourage multiple perspectives. A single evaluation can reflect a single bias. Triangulating feedback from residents, peers, patients, and faculty can provide a fuller, fairer picture of a learner’s strengths and areas for growth.

5. Revisit how we use evaluations in decisions. Promotion and remediation decisions should weigh context. A low rating from one evaluator might reflect bias more than performance. Committees need to be trained to interpret evaluations with a critical eye.

We’re All Still Learning

As someone who’s worked in medical education for decades, I can say with humility that I’ve probably written my fair share of biased evaluations. Not intentionally, but unavoidably. Like most educators, I want to be fair, supportive, and accurate—but we’re all products of our environments. Recognizing that is not an indictment. It’s an invitation.

The Hane study reminds us that even as our systems evolve, old habits linger. The Ringdahl, Delzell & Kruse study showed that who does the evaluating matters. Put those together, and the message is clear: we need to continuously examine how—and by whom—assessments are being made.

Because in the end, evaluations are not just about feedback. They’re about opportunity, identity, and trust. If we want our learning environments to be truly inclusive and equitable, then we have to be willing to see where our blind spots are—and do the hard work of correcting them.

References

(1) Ringdahl, E.N., Delzell, J.E. and Kruse, R.L. Evaluation of interns by senior residents and faculty: is there any difference? Medical Education 2004; 38: 646-651. https://doi.org/10.1111/j.1365-2929.2004.01832.x

(2) Hane J, Lee V, Zhou Y, Mustapha T, et al.  Examining Gender-Based Differences in Quantitative Ratings and Narrative Comments in Faculty Assessments by Residents and Fellows. J Grad Med Educ  2025; 17 (3): 338–346. doi: https://doi.org/10.4300/JGME-D-24-00627.1.   

(3) Sumner MD, Howell TC, Soto AL, Kaplan S, et al.  The Use of Artificial Intelligence in Residency Application Evaluation-A Scoping Review. J Grad Med Educ. 2025; 17 (3): 308-319. doi: 10.4300/JGME-D-24-00604.1. Epub 2025 Jun 16. PMID: 40529251; PMCID: PMC12169010.

(4) Sarraf D, Vasiliu V, Imberman B, Lindeman B. Use of artificial intelligence for gender bias analysis in letters of recommendation for general surgery residency candidates. Am J Surg. 2021; 222 (6): 1051-1059. doi: 10.1016/j.amjsurg.2021.09.034. Epub 2021 Oct 2. PMID: 34674847. 


Tuesday, July 8, 2025

Medical Student Reflection Exercises Created Using AI: Can We Tell the Difference, and Does It Matter?

 

Medical Student Reflection Exercises Created Using AI: Can We Tell the Difference, and Does It Matter?

If you’ve spent any time in medical education recently—whether in lectures, clinical supervision, or curriculum design—you’ve likely been a part of the growing conversation around student (and resident/fellow) use of generative AI. From drafting SOAP notes to summarizing journal articles, AI tools like ChatGPT are rapidly becoming ubiquitous. But now we’re seeing them show up in more personal activities such as reflective assignments. A new question has emerged: can educators really tell the difference between a student’s genuine reflection and something written by AI?

The recent article in Medical Education by Wraith et al (1) took a shot at this question. They conducted an elegant, slightly disconcerting study: faculty reviewers were asked to distinguish between reflective writing submitted by actual medical students and those generated by AI. The results? About as accurate as flipping a coin, maybe a little better. Accuracy was between 64% and 75%, regardless of the faculty member’s experience or confidence. They did seem to get better as they read more reflections.

I’ll admit, when I first read this, I had a visceral reaction. Something about the idea that we can’t tell what’s “real” from what’s machine-generated in a genre that is supposed to be deeply personal—reflective writing—felt jarring. Aren’t we trained to pick up on nuance, empathy, sincerity? But as I sat with it, I realized the issue goes much deeper than just our ability to “spot the fake.” It forces us to confront how we define authenticity, the purpose of reflection in medical education, and how we want to relate to the tools that are now part of our students’ daily workflows.

What Makes a Reflection Authentic?

We often emphasize reflection as a professional habit: a way to develop clinical insight, emotional intelligence, and lifelong learning. But much of that hinges on the assumption that the act of writing the reflection is what promotes growth. If a student bypasses that internal process and asks an AI to “write a reflection on breaking bad news to a patient,” I worry that the learning opportunity is lost.

But here’s the rub: the Wraith study didn’t test whether students were using AI to replace reflection or to aid it. It simply asked whether educators could tell the difference. And they could not do that reliably. This suggests that AI can replicate the tone, structure, and emotional cadence that we expect a medical student to provide in a reflective essay. That is both fascinating and problematic.

If AI can mimic reflective writing well enough to fool seasoned educators, then maybe it is time to reevaluate how we assess reflection in the first place. Are we grading sincerity? Emotional language? The presence of keywords like “empathy,” “growth,” or “uncertainty”? If we do not have a robust framework for evaluating whether reflection is actually happening—as an internal, cognitive-emotional process—then it shouldn’t surprise us that AI fake it by just checking the boxes.

Faculty Attitudes: Cautious Curiosity

Another recent study, this one in the Journal of Investigative Medicine by Cervantes et al (2), explored how medical educators are thinking about generative AI more broadly. They did a survey of 250 allopathic and osteopathic medical school faculty at Nova Southeastern University. Their results revealed a mix of excitement and unease. Most saw potential for improving education—particularly in the ability to conduct more efficient research, tutuoring, task automation, and increased content accessibility—but they were also deeply concerned about professionalism, academic integrity, removal of human interaction in important feedback, and overreliance on AI-generated content.

Interestingly, one of the biggest predictors of positive attitudes toward AI was prior use. Faculty who had experimented with ChatGPT or similar tools were more likely to see educational value and less likely to view it as a threat. That tracks with my own anecdotal experience: once people see what AI can do—and just as importantly, what it can’t do—they develop a more nuanced, measured perspective.

Still, the discomfort lingers. If students can generate polished reflections without deep thought, is the assignment still worth doing? Should we redesign reflective writing tasks to include oral defense or peer feedback? Or should we simply accept that AI will be part of the process and shift our focus toward cultivating meaningful inputs rather than fixating on outputs?

What about using AI-augmented reflection?

Let me propose a middle path. What if we reframe AI not as a threat to reflective writing, but as a catalyst? Imagine a student who types out some thoughts after a tough patient encounter, then asks an AI to help clarify or expand them. They read what the AI produces, agree with some parts, reject others, revise accordingly. The final product is stronger—not because AI did the work, but because it facilitated a richer internal dialogue.

That’s not cheating. That’s collaboration. And it’s arguably closer to how most of us write in real life—drafting, editing, bouncing ideas off others (human or machine). Of course, this assumes we teach students to use AI ethically and reflectively, which means we need to model that ourselves. Faculty development around AI literacy is no longer optional. We must move beyond fear-based policies and invest in practical training, guidelines, and conversations that encourage responsible use.

So, where do we go from here?

A few concrete steps seem worth considering:

1.      Redesign reflective assignments. Move beyond short essays. Try audio reflections, peer feedback, or structured prompts that emphasize personal growth over polished prose.

2.      Focus on process, not just product. Ask students to document how they engaged with the reflection—did they use AI? Did they discuss it with a peer or preceptor? Did it change their thinking?

3.      Embrace transparency. Normalize the use of AI in education and ask students to disclose when and how they used it. Make that part of the learning conversation from the beginning.

4.      Invest in AI literacy. Faculty need space and time to learn what these tools can and can’t do. The more familiar we are as faculty, the better we can guide our students.

5.      Stay curious. The technology isn’t going away. The sooner we stop wringing our hands and start asking deeper pedagogical questions, the better positioned we’ll be to adapt with purpose.

In the end, the real question isn’t “Can we tell if a reflection is AI-generated?” It’s “Are we creating learning environments where authentic reflection is valued, supported, and developed—whether or not AI is in the room?” 

If we can answer yes to that, then maybe it doesn’t matter so much who—or what—wrote the first draft.

References

(1)    Wraith C,  Carnegy A,  Brown C,  Baptista A,  Sam AH.  Can educators distinguish between medical student and generative AI-authored reflections? Med Educ.  2025; 1-8. doi:10.1111/medu.15750

(2)    Cervantes J, Smith B, Ramadoss T, D'Amario V, Shoja MM, Rajput V. Decoding medical educators' perceptions on generative artificial intelligence in medical education. J Invest Med. 2024; 72(7): 633-639. doi:10.1177/10815589241257215