Introduction

This blog is about medical education in the US and around the world. My interest is in education research and the process of medical education.



The lawyers have asked that I add a disclaimer that makes it clear that these are my personal opinions and do not represent any position of any University that I am affiliated with including the American University of the Caribbean, the University of Kansas, the KU School of Medicine, Florida International University, or the FIU School of Medicine. Nor does any of this represent any position of the Northeast Georgia Medical Center or Northeast Georgia Health System.



Tuesday, July 8, 2025

Medical Student Reflection Exercises Created Using AI: Can We Tell the Difference, and Does It Matter?

 

Medical Student Reflection Exercises Created Using AI: Can We Tell the Difference, and Does It Matter?

If you’ve spent any time in medical education recently—whether in lectures, clinical supervision, or curriculum design—you’ve likely been a part of the growing conversation around student (and resident/fellow) use of generative AI. From drafting SOAP notes to summarizing journal articles, AI tools like ChatGPT are rapidly becoming ubiquitous. But now we’re seeing them show up in more personal activities such as reflective assignments. A new question has emerged: can educators really tell the difference between a student’s genuine reflection and something written by AI?

The recent article in Medical Education by Wraith et al (1) took a shot at this question. They conducted an elegant, slightly disconcerting study: faculty reviewers were asked to distinguish between reflective writing submitted by actual medical students and those generated by AI. The results? About as accurate as flipping a coin, maybe a little better. Accuracy was between 64% and 75%, regardless of the faculty member’s experience or confidence. They did seem to get better as they read more reflections.

I’ll admit, when I first read this, I had a visceral reaction. Something about the idea that we can’t tell what’s “real” from what’s machine-generated in a genre that is supposed to be deeply personal—reflective writing—felt jarring. Aren’t we trained to pick up on nuance, empathy, sincerity? But as I sat with it, I realized the issue goes much deeper than just our ability to “spot the fake.” It forces us to confront how we define authenticity, the purpose of reflection in medical education, and how we want to relate to the tools that are now part of our students’ daily workflows.

What Makes a Reflection Authentic?

We often emphasize reflection as a professional habit: a way to develop clinical insight, emotional intelligence, and lifelong learning. But much of that hinges on the assumption that the act of writing the reflection is what promotes growth. If a student bypasses that internal process and asks an AI to “write a reflection on breaking bad news to a patient,” I worry that the learning opportunity is lost.

But here’s the rub: the Wraith study didn’t test whether students were using AI to replace reflection or to aid it. It simply asked whether educators could tell the difference. And they could not do that reliably. This suggests that AI can replicate the tone, structure, and emotional cadence that we expect a medical student to provide in a reflective essay. That is both fascinating and problematic.

If AI can mimic reflective writing well enough to fool seasoned educators, then maybe it is time to reevaluate how we assess reflection in the first place. Are we grading sincerity? Emotional language? The presence of keywords like “empathy,” “growth,” or “uncertainty”? If we do not have a robust framework for evaluating whether reflection is actually happening—as an internal, cognitive-emotional process—then it shouldn’t surprise us that AI fake it by just checking the boxes.

Faculty Attitudes: Cautious Curiosity

Another recent study, this one in the Journal of Investigative Medicine by Cervantes et al (2), explored how medical educators are thinking about generative AI more broadly. They did a survey of 250 allopathic and osteopathic medical school faculty at Nova Southeastern University. Their results revealed a mix of excitement and unease. Most saw potential for improving education—particularly in the ability to conduct more efficient research, tutuoring, task automation, and increased content accessibility—but they were also deeply concerned about professionalism, academic integrity, removal of human interaction in important feedback, and overreliance on AI-generated content.

Interestingly, one of the biggest predictors of positive attitudes toward AI was prior use. Faculty who had experimented with ChatGPT or similar tools were more likely to see educational value and less likely to view it as a threat. That tracks with my own anecdotal experience: once people see what AI can do—and just as importantly, what it can’t do—they develop a more nuanced, measured perspective.

Still, the discomfort lingers. If students can generate polished reflections without deep thought, is the assignment still worth doing? Should we redesign reflective writing tasks to include oral defense or peer feedback? Or should we simply accept that AI will be part of the process and shift our focus toward cultivating meaningful inputs rather than fixating on outputs?

What about using AI-augmented reflection?

Let me propose a middle path. What if we reframe AI not as a threat to reflective writing, but as a catalyst? Imagine a student who types out some thoughts after a tough patient encounter, then asks an AI to help clarify or expand them. They read what the AI produces, agree with some parts, reject others, revise accordingly. The final product is stronger—not because AI did the work, but because it facilitated a richer internal dialogue.

That’s not cheating. That’s collaboration. And it’s arguably closer to how most of us write in real life—drafting, editing, bouncing ideas off others (human or machine). Of course, this assumes we teach students to use AI ethically and reflectively, which means we need to model that ourselves. Faculty development around AI literacy is no longer optional. We must move beyond fear-based policies and invest in practical training, guidelines, and conversations that encourage responsible use.

So, where do we go from here?

A few concrete steps seem worth considering:

1.      Redesign reflective assignments. Move beyond short essays. Try audio reflections, peer feedback, or structured prompts that emphasize personal growth over polished prose.

2.      Focus on process, not just product. Ask students to document how they engaged with the reflection—did they use AI? Did they discuss it with a peer or preceptor? Did it change their thinking?

3.      Embrace transparency. Normalize the use of AI in education and ask students to disclose when and how they used it. Make that part of the learning conversation from the beginning.

4.      Invest in AI literacy. Faculty need space and time to learn what these tools can and can’t do. The more familiar we are as faculty, the better we can guide our students.

5.      Stay curious. The technology isn’t going away. The sooner we stop wringing our hands and start asking deeper pedagogical questions, the better positioned we’ll be to adapt with purpose.

In the end, the real question isn’t “Can we tell if a reflection is AI-generated?” It’s “Are we creating learning environments where authentic reflection is valued, supported, and developed—whether or not AI is in the room?” 

If we can answer yes to that, then maybe it doesn’t matter so much who—or what—wrote the first draft.

References

(1)    Wraith C,  Carnegy A,  Brown C,  Baptista A,  Sam AH.  Can educators distinguish between medical student and generative AI-authored reflections? Med Educ.  2025; 1-8. doi:10.1111/medu.15750

(2)    Cervantes J, Smith B, Ramadoss T, D'Amario V, Shoja MM, Rajput V. Decoding medical educators' perceptions on generative artificial intelligence in medical education. J Invest Med. 2024; 72(7): 633-639. doi:10.1177/10815589241257215

No comments:

Post a Comment