ChatGPT and similar chatbots pose serious risks to medicine, and medicine needs to figure how to handle them. I say that as a technophile with great hopes for AI in healthcare.
The Turing Test asks whether a computer can answer queries such that the questioner can’t determine whether the respondent is a machine or a human. ChatGPT may be the first system to pass the test. Unfortunately, the human from whom it is indistinguishable may be Tom Ripley—the sociopathic liar in Patricia Highsmith’s novel, The Talented Mr. Ripley. Describing the eponymous film, Roger Ebert wrote:
“Ripley … instinctively knows that the best way to lie is to admit to lying, and to tell the truth whenever convenient.”
Nick Flor, an associate professor of information systems at the University of New Mexico, queried ChatGPT on gender issues. The chatbot made strong assertions and claimed widespread agreement among medical professionals. Asked to cite a scientific paper supporting its contention, ChatGPT offered:
“‘The Psychological Science of Gender: Women and Men’ by Diane N. Ruble and John E. Gruzen, published in the Annual Review of Psychology in 2002.”
When Flor said he had found no evidence of this paper’s existence, ChatGPT responded:
“I apologize for the mistake. ‘The Psychological Science of Gender: Women and Men’ by Diane N. Ruble and John E. Gruzen does not seem to exist.”
The journal and lead author do, in fact, exist, but the paper and co-author were, in current parlance, “hallucinations”—i.e., fabrications. It’s easy to imagine some credulous, lazy, or deceptive author citing this nonexistent paper.
Ron Alfa, a Stanford-trained doctor and CEO of Oncology StealthCo, posted thison Twitter:
“Asked ChatGPT an interesting onc clinical question. To my surprise received a definitive answer but when I asked for the reference: it manufactured a fake JAMA Oncology citation and Phase 3 trial.”
Alfa challenged ChatGPT’s citations. This time, ChatGPT insisted it wasn’t lying, provided fake hyperlinks to the article and trial, and issued an appeal to authority:
“I am sure the reference is real. The article is published in JAMA Oncology, an internationally recognized medical journal.”
Before sharing that tweet with you, I wanted to make sure that Alfa was genuine—not just a fake name on some burner account on Twitter. His LinkedIn pagewas good enough for me. But then it occurred to me that ChatGPT compromises this sort of quick, casual verification on which we’ve grown dependent. Suppose, for example, you found an intriguing paper by Dr. Aisha Patel—CEO of Neurogenetics Inc. Let’s say her paper has a URL for a LinkedIn page containing the following information:
Pretty impressive background. Maybe her bona fides persuade you to quote Patel in your blog or scholarly submission. Unfortunately, however, Patel, like ChatGPT’s moral compass, is nonexistent. I simply asked ChatGPT to, “Write a short resume for a fictional biotech CEO who has a medical degree from Yale University.” The initial response had her affiliations as:
CEO, Biotech Company XYZ (current position)
Chief Medical Officer, Biotech Company ABC
Director of Clinical Development, Biotech Company DEF
Clinical Scientist, Biotech Company GHI
No one would believe those company names, so I responded, “That was very good, but [could] you please redo it, using more authentic-looking company names than ABC DEF GHI and XYZ?” Her affiliations became:
CEO, Neurogenetics Inc. (current position)
Chief Medical Officer, Gene Therapy Corp.
Director of Clinical Development, OncoPharma
Clinical Scientist, Biotech Innovations
Also, ChatGPT originally named the doctor/CEO “Emily Johnson.” My third query said:
“That was excellent, but ‘Emily Johnson’ is a pretty generic-sounding name. Could you please make her more ethnic?”
ChatGPT answered:
“Sure, here's a revised version with a more ethnic-sounding name.”
Thus did Aisha Patel and Neurogenetics emerge from nowhere in perhaps two minutes. In the film, The Founder, Mac McDonald (John Carroll Lynch) shows an astonished Ray Kroc (Michael Keaton) how their system gets a burger from the grill to the customer in 30 seconds. ChatGPT allows a naïf or a charlatan to do the same with falsehoods. A miscreant could use ChatGPT to craft a fake paper by a fake Patel, whose credentials are confirmed by a fake LinkedIn page. Such a deception has always been possible, but highly labor-intensive. With a bot, it’s nearly effortless.
Medical students Faisal Elali and Leena Rachid explore the possibility of fraudulent research papers produced via ChatGPT:
“The feasibility of producing fabricated work, coupled with the difficult-to-detect nature of published works and the lack of AI-detection technologies, creates an opportunistic atmosphere for fraudulent research. Risks of AI-generated research include the utilization of said work to alter and implement new healthcare policies, standards of care, and interventional therapeutics.”
Elali and Rachid say such deceptions could be motivated by:
“financial gain, potential fame, promotion in academia, and curriculum vitae building, especially for medical students who are in increasingly competitive waters.”
A rabbinic parable warns that gossip spreads like feathers from a torn pillow in a windstorm—floating every which way and utterly irretrievable. So it may be with medical misinformation.
In a 2016 PBS article, I described my friend Rich Schieken’s retirement after 40 years as a pediatric cardiologist and medical school professor. I asked why he retired from work that he loved, and he responded:
“[M]y world has changed. When I began, parents brought their sick and dying children to me. I said, ‘This is what we’ll do,’ and they said, ‘Yes, doctor.’ Nowadays, they bring 300 pages of internet printouts. When I offer a prognosis and suggest treatment, they point to the papers and ask, ‘Why not do this or this or that?’”
He added:
“Don’t get me wrong. This new world is better than the old one. It’s just quite a bit to get used to.”
But when Rich said the above words, those parents’ printouts were written by someone, and the requisite human effort somewhat limited the volume of misinformation. With ChatGPT and similar bots, that constraint vanishes. Someone poses a query, and the bot spews convincing-sounding medical garbage. The garbage ends up on the web. A patient brings the doctor said garbage. The doctor, duped by chimeric references and links, modifies treatment accordingly. Then, the doctor praises the fake information in The Lancet or on Substack. More feathers to the wind. Medical citations as urban legends.
One reason ChatGPT’s falsehoods can appear so persuasive lies in its habit of inserting snippets of falsehood in a bed of truth—the habit that Roger Ebert ascribed to Tom Ripley. I asked the bot to write brief biographies of myself and two colleagues—both medical doctors. It did a pretty good job on the details of my career, except that it said I have an engineering background—which I don’t. (I briefly belonged to an engineering society with whom I was working, so I understand that error.) The bios for the two doctors were more disturbing. For one doctor, it got a lot right, but it also included false information on her employment history, location of her residency, and area of expertise. The other doctor’s bio was a messy amalgam of truths and falsehoods.
ChatGPT’s creators at OpenAI recognize these ethical problems. However, their mitigation strategy—The Creator of ChatGPT Thinks AI Should Be Regulated—is deeply problematic. The idea of government collaborating with tech firms to police information flows should trouble anyone familiar with the Twitter Files, which documented how collusion between government agencies and content moderators at Twitter prevented legitimate lines of inquiry from circulating on the social media platform while allowing officially sanctioned falsehoods to circulate freely. Here’s a sampling from that effort:
- : "Systematic 'Blacklisting' of Disfavored Content"
- : "Digital McCarthyism"
- : "Censorship Industrial Complex"
- : "Online PsyOp Campaign"
- : "suppressed true information from doctors and public-health experts that was at odds with U.S. government policy"
In place of government regulating ChatGPT, a better solution might come from having multiple, competing, private watchdogs. In our analysis of the U.S. Food and Drug Administration a few years back, Richard Williams, Adam Thierer, and I noted that such a precedent exists in the competing maritime classification societies (insurers) that have monitored and indemnified international shipping safety for 250 years.
I’d suggest the medical community think about strategies for dealing with AI-generated disinformation today, rather than tomorrow.
No comments:
Post a Comment