AI models make stuff up. How can hallucinations be
controlled?
It is hard to do so without also limiting models’ power
It is an increasingly familiar experience. A request for help to a large language model (llm) such as Openai’s Chatgpt is promptly met by a response that is confident, coherent and just plain wrong. In an ai model, such tendencies are usually described as hallucinations. A more informal word exists, however: these are the qualities of a great bull’s hitter.
There are kinder ways to put it. In its instructions to
users, Openai warns that Chatgpt “can make mistakes”. Anthropic, an
American ai company, says that its llm Claude “may display
incorrect or harmful information”; Google’s Gemini warns users to
“double-check its responses”. The through line is this: no matter how fluent
and confident ai-generated text sounds, it still cannot be trusted.
Hallucinations make it hard to rely on ai systems
in the real world. Mistakes in news-generating algorithms can spread
misinformation. Image generators can produce art that infringes on copyright,
even when told not to. Customer-service chatbots can promise refunds they
shouldn’t. (In 2022 Air Canada’s chatbot concocted a bereavement policy, and
this February a Canadian court has confirmed that the airline must foot the
bill.) And hallucinations in ai systems that are used for diagnosis
or prescription can kill.
All the leaves are brown
The trouble is that the same abilities that allow models to
hallucinate are also what make them so useful. For one, llms are a form of
“generative” ai, which, taken literally, means they make things up to
solve new problems. They do this by producing probability distributions for
chunks of characters, or tokens, laying out how likely it is for each possible
token in its vocabulary to come next. The mathematics dictate that each token
must have a non-zero chance of being chosen, giving the model flexibility to
learn new patterns, as well as the capacity to generate statements that are
incorrect. The fundamental problem is that language models are probabilistic,
while truth is not.
This tension manifests itself in a number of ways. One is
that llms are not built to have perfect recall in the way a search engine
or encyclopaedia might. Instead, because the size of a model is much smaller
than the size of its training data, it learns by compressing. The model becomes
a blurry picture of its training data, retaining key features but at much lower
resolution. Some facts resist blurring—“Paris”, for example, may always be the
highest-probability token following the words “The capital of France is”. But
many more facts that are less statistically obvious may be smudged away.
Further distortions are possible when a pretrained llm is
“fine-tuned”. This is a later stage of training in which the model’s weights,
which encode statistical relationships between the words and phrases in the
training data, are updated for a specific task. Hallucinations can increase if
the llm is fine-tuned, for example, on transcripts of conversations,
because the model might make things up to try to be interesting, just as a
chatty human might. (Simply including fine-tuning examples where the model says
“I don’t know” seems to keep hallucination levels down.)
Tinkering with a model’s weights can reduce hallucinations.
One method involves creating a deliberately flawed model trained on data that
contradict the prompt or contain information it lacks. Researchers can then
subtract the weights of the flawed model, which are in part responsible for its
output, from those of the original to create a model which hallucinates less.
It is also possible to change a model’s “temperature”. Lower
temperatures make a model more conservative, encouraging it to sample the most
likely word. Higher temperatures make it more creative, by increasing the
randomness of this selection. If the goal is to reduce hallucinations, the
temperature should be set to zero. Another trick is to limit the choice to the
top-ranked tokens alone. This reduces the likelihood of poor responses, while
also allowing for some randomness and, therefore, variety.
Clever prompting can also reduce hallucinations. Researchers
at Google DeepMind found that telling an llm to “take a deep breath
and work on this problem step-by-step” reduced hallucinations and improved
problem solving, especially of maths problems. One theory for why this works is
that ai models learn patterns. By breaking a problem down into
smaller ones, it is more likely that the model will be able to recognise and
apply the right one. But, says Eduardo Ponti at the University of Edinburgh,
such prompt engineering amounts to treating a symptom, rather than curing the
disease.
Perhaps, then, the problem is that accuracy is too much to
ask of llms alone. Instead, they should be part of a larger system—an
engine, rather than the whole car. One solution is retrieval augmented
generation (rag), which splits the job of the ai model into two
parts: retrieval and generation. Once a prompt is received, a retriever model
bustles around an external source of information, like a newspaper archive, to
extract relevant contextual information. This is fed to the generator model
alongside the original prompt, prefaced with instructions not to rely on prior
knowledge. The generator then acts like a normal llm and answers.
This reduces hallucinations by letting the llm play to its
strengths—summarising and paraphrasing rather than researching. Other external
tools, from calculators to search engines, can also be bolted onto an llm in
this way, effectively building it a support system to enhance those skills it
lacks.
Even with the best algorithmic and architectural
antipsychotics available, however, llms still hallucinate. One
leaderboard, run by Vectara, an American software company, tracks how often
such errors arise. Its data shows that gpt-4 still hallucinates in 3% of
its summaries, Claude 2 in 8.5% and Gemini Pro in 4.8%. This has prompted
programmers to try detecting, rather than preventing, hallucinations. One clue
that a hallucination is under way lies in how an llm picks words. If
the probability distribution of the words is flat, ie many words have similar
likelihoods of being chosen, this means that there is less certainty as to
which is most likely. That is a clue that it might be guessing, rather than
using information it has been prompted with and therefore “knows” to be true.
Another way to detect hallucination is to train a
second llm to fact-check the first. The fact-checker can be given the
“ground truth” along with the llm’s response, and asked whether or not
they agree. Alternatively, the fact-checker can be given several versions of
the llm’s answer to the same question, and asked whether they are all
consistent. If not, it is more likely to be a hallucination. nvidia, a
chipmaker, has developed an open-source framework for building guardrails that
sit around an llm to make it more reliable. One of these aims to
prevent hallucinations by deploying this fact-checking when needed.
Although such approaches can decrease the hallucination
rate, says Ece Kamar, head of the ai frontiers lab at Microsoft, “it
is unclear whether any of these techniques is going to completely get rid of
hallucinations.” In many cases, that would be akin to self-sabotage. If
an llm is asked to generate ideas for a fantasy novel, for example,
its output would be disappointing if limited to the world as it is.
Consequently, says Dr Kamar, her research aims not to get rid of all
hallucinations, but rather to stop the model from hallucinating when it would
be unhelpful.
Safe and warm
The hallucination problem is one facet of the larger
“alignment” problem in the field of ai: how do you get ai systems
to reliably do what their human users intend and nothing else? Many researchers
believe the answer will come in training bigger llms on more and better
data. Others believe that llms, as generative and probabilistic models,
will never be completely rid of unwanted hallucinations.
Or, the real problem might be not with the models but with
its human users. Producing language used to be a uniquely human
capability. llms’ convincing textual outputs make it all too easy to
anthropomorphise them, to assume that llms also operate, reason and
understand like humans do. There is still no conclusive evidence that this is
the case. llms do not learn self-consistent models of the world. And even
as models improve and the outputs become more aligned with what humans produce
and expect, it is not clear that the insides will become any more human. Any
successful real-world deployment of these models will probably require training
humans how to use and view ai models as much as it will require
training the models themselves. ■
Curious about the world? To enjoy our mind-expanding
science coverage, sign up to Simply
Science, our weekly subscriber-only newsletter.
Explore more
This article appeared in the Science & technology
section of the print edition OF Economist Feb 2024 under the headline "Silicon dreamin’"
- AI
models make stuff up. How can hallucinations be controlled?
- Scientists
want to tackle multiple sclerosis by treating the kissing virus
- A
variety of new batteries are coming to power EVs
Why recorded music will never feel as good as the real thing
Comments
Post a Comment