As mental health concerns continue to grow, a range of new tools have emerged to help people seek support, including artificial intelligence chatbots. Many users are turning to systems like ChatGPT for advice, emotional support and even therapy-like conversations.
However, recent research indicates that these systems may not yet be ready to perform these roles. A new study is now taking a closer look at how these AI chatbots measure up against the ethical standards that guide real psychotherapy.
The research was conducted by a team at Brown University working alongside mental health professionals to examine how large language models behave when placed in counseling scenarios. The study was led by Zainab Iftikhar, a PhD candidate in computer science, who evaluated whether AI systems could follow ethical guidelines commonly required in psychotherapy.
By comparing chatbot responses to established professional standards, the researchers aimed to better understand how these systems handle sensitive mental health conversations.
The researchers examined how AI chatbots responded when prompted to act as therapists using established psychotherapy methods. While these prompts can guide the tone of a chatbot’s response, the models do not actually perform therapy. Instead, they generated responses based on patterns in their training data, outputting language that mirrors therapeutic concepts without the underlying judgment or understanding of a trained professional.
Iftikhar explained that prompts simply steer the system’s responses rather than change the model itself.
“You don’t change the underlying model or provide new data, but the prompt helps guide the model’s output based on its pre-existing knowledge and learned patterns,” she said in a news release.
To observe how these systems behaved in real interactions, the researchers conducted a series of simulated counseling sessions using several widely used AI models. Seven trained peer counselors with experience in cognitive behavioral therapy participated in the simulation, engaging in self-counseling conversations with the chatbots. The models included versions of OpenAI’s GPT models, Anthropic’s Claude and Meta’s Llama.
After the sessions were completed, transcripts of the conversations were reviewed by three licensed clinical psychologists who evaluated the transcripts for potential ethical concerns.
After reviewing the transcripts, the psychologists identified several recurring ethical concerns in the chatbot responses. The researchers documented 15 different risks that appeared across the conversations. These issues ranged from providing overly generic advice to reinforcing harmful assumptions about users or others. In some cases, the models used language that appeared empathetic, using phrases that meant to signal understanding, although the systems cannot genuinely interpret emotions or personal context.
The risks were grouped into several broader patterns. Some responses failed to adapt to a person’s specific circumstances, offering generalized suggestions instead of carefully tailored guidance.
Others pushed the conversation in ways that resembled therapeutic direction, but without the collaborative decision-making expected in real counseling. The researchers also observed instances where the models struggled to appropriately handle sensitive topics or potential crisis situations, raising concerns about how these systems might respond if users used them for serious mental health support.
Despite the concerns identified in the study, the researchers emphasized that AI could still play a role in expanding access to mental health resources. They note that these tools may help widen access for people who face obstacles such as cost or limited provider availability. However, the study also makes clear that any use of these systems in mental health settings should come with caution, stronger oversight and clear limits.
Iftikhar said that the goal of the research is not to discourage the use of AI altogether but to encourage greater awareness of its limitations.
“If you’re talking to a chatbot about mental health, these are some things that people should be looking out for,” she said.
The study highlighted the importance of carefully evaluating AI systems before they are widely used in sensitive settings. Ellie Pavlick, a computer science and linguistics assistant professor at Brown who was not involved in the research, said the work demonstrates how difficult it can be to fully understand the risks of these technologies.
“The reality of AI today is that it’s far easier to build and deploy systems than to evaluate and understand them,” Pavlick said.
