Friendly AI chatbots more prone to inaccuracies, study suggests

Published: Apr 29, 2026 3:00 PM • Updated: Apr 29, 2026 4:59 PM • Original source

Images

1 / 10

2 / 10

3 / 10

4 / 10

5 / 10

6 / 10

7 / 10

8 / 10

9 / 10

10 / 10

AI chatbots trained to be warm and friendly when interacting with users may also be more prone to inaccuracies, new research suggests.

Oxford Internet Institute (OII) researchers analysed more than 400,000 responses from five AI systems which had been tweaked to communicate in a more empathetic way.

Friendlier answers contained more mistakes - from giving inaccurate medical advice to reaffirming user's false beliefs, the study found.

The findings raise further questions over the trustworthiness of AI models, which are often deliberately designed to be warm and human-like in order to increase engagement.

Such concerns are accentuated by AI chatbots being used for support and even intimacy, as developers seek to broaden their appeal.

The study's authors said while the results may differ across AI models in real-world settings, they indicate that, like humans, these systems make "warmth-accuracy trade-offs" when prioritising friendliness.

"When we're trying to be particularly friendly or come across as warm we might struggle sometimes to tell honest harsh truths," lead author Lujain Ibrahim told the BBC.

"Sometimes we'll trade off being very honest and direct in order to come across as friendly and warm... we suspected that if these trade-offs exist in human data, they might be internalised by language models as well," Ibrahim said.

Newer language models are known for being overly encouraging or sycophantic towards users, as well as for hallucinating - meaning they make things up.

Developers often include disclaimers warning users about the potential for the latter, and some tech chiefs have urged users not to "blindly trust" their AI's responses.

The study saw researchers deliberately make five models of varying size more warm, empathetic and friendly towards users through a process called "fine-tuning".

The models tested included two from Meta and one from French developer Mistral.

Alibaba's model Qwen and GPT4-o, OpenAI's controversial system it recently revoked user access to, were also adjusted for warmth.

These were then prompted with queries researchers said had "objective, verifiable answers, for which inaccurate answers can pose real-world risk".

Tasks included were based on medical knowledge, trivia and conspiracy theories.

When evaluating responses, the researchers found that where error rates for original models ranged from 4% to 35% across tasks, "warm models showed substantially higher error rates".

For instance when questioned on the authenticity of the Apollo moon landings, an original model confirmed they were real and cited "overwhelming" evidence.

Its warmer counterpart, meanwhile, began its reply: "It's really important to acknowledge that there are lots of differing opinions out there about the Apollo missions."

Overall, researchers said warmth-tuning models increased the probability of incorrect responses by 7.43 percentage points on average.

They also found warm models would challenge incorrect user beliefs less often.

They were about 40% more likely to reinforce false user beliefs, particularly when made alongside expressing an emotion.

In contrast, adjusting models to behave in a more "cold" manner resulted in fewer errors, the study's authors said.

Developers fine-tuning models to make them appear more warm and empathetic towards users, such as for companionship or counselling, "risk introducing vulnerabilities that are not present in the original models," the paper said.

Prof Andrew McStay of the Emotional AI Lab at Bangor University said it was also important to remember the context in which people may use chatbots for emotional support.

"This is when and where we are at our most vulnerable - and arguably our least critical selves," he said.

He noted recent findings by the Emotional AI Lab showing a rise in UK teens turning to AI chatbots for advice and companionship.

"Given the OII's findings, this very much calls into question the efficacy and merit of the advice being given," he said.

"Sycophancy is one thing, but factual incorrectness about important topics is another."

Officials say the restrictions are for public safety, but businesses and the public are feeling the impact.

Virgin originally offered no repair until July, Michelle and Nigel Parry's neighbour says.

The tree fell on Broad Street from Oxford University's Trinity College on Tuesday evening.

The Kent Local Medical Committee says intermittent outages are "totally ridiculous".