Featured
- Get link
- Other Apps
What are AI agents?
The next big thing is AI tools that can do more complex
tasks. Here’s how they will work.
When ChatGPT was first released, everyone in AI was talking
about the new generation of AI assistants. But over the past year, that
excitement has turned to a new target: AI agents.
Agents featured prominently in Google’s annual I/O conference
in May, when the company unveiled its new AI
agent called Astra, which allows users to interact with it using audio and
video. OpenAI’s new GPT-4o
model has also been called an AI agent.
And it’s not just hype, although there is
definitely some of that too. Tech companies are ploughing vast sums into
creating AI agents, and their research efforts could usher in the kind of
useful AI we have been dreaming about for decades. Many experts,
including Sam
Altman, say they are the next big thing.
But what are they? And how can we use them?
How are they defined?
It is still early days for research into AI agents, and the
field does not have a definitive definition for them. But simply, they are AI
models and algorithms that can autonomously make decisions in a dynamic world,
says Jim Fan, a senior research scientist at Nvidia who leads the company’s AI agent’s
initiative.
The grand vision for AI agents is a system that can execute a
vast range of tasks, much like a human assistant. In the future, it could help
you book your vacation, but it will also remember if you prefer swanky hotels,
so it will only suggest hotels that have four stars or more and then go ahead
and book the one you pick from the range of options it offers you. It will then
also suggest flights that work best with your calendar, and plan the itinerary
for your trip according to your preferences. It could make a list of things to
pack based on that plan and the weather forecast. It might even send your
itinerary to any friends it knows live in your destination and invite them
along. In the workplace, it could analyse your to-do list and execute
tasks from it, such as sending calendar invites, memos, or emails.
One vision for agents is that they are multimodal, meaning
they can process language, audio, and video. For example, in Google’s Astra
demo, users could point a smartphone camera at things and ask the agent
questions. The agent could respond to text, audio, and video inputs.
These agents could also make processes smoother for
businesses and public organizations, says David Barber, the director of the
University College London Centre for Artificial Intelligence. For example, an
AI agent might be able to function as a more sophisticated customer service
bot. The current generation of language-model-based assistants can only
generate the next likely word in a sentence. But an AI agent would have the
ability to act on natural-language commands autonomously and process customer
service tasks without supervision. For example, the agent would be able to analyse
customer complaint emails and then know to check the customer’s reference
number, access databases such as customer relationship management and delivery
systems to see whether the complaint is legitimate, and process it according to
the company’s policies, Barber says.
Broadly speaking, there are two different categories of
agents, says Fan: software agents and
embodied agents.
Software agents run on
computers or mobile phones and use apps, much as in the travel agent example
above. “Those agents are very useful for office work or sending emails or
having this chain of events going on,” he says.
Embodied agents are agents that are situated in a 3D
world such as a video game, or in a robot. These kinds of agents might
make video
games more engaging by letting people play with non- player characters
controlled by AI. These sorts of agents could also help build
more useful robots that could help us with everyday tasks at home,
such as folding laundry and cooking meals.
Fan was part of a team that built an embodied AI agent
called MineDojo in the popular
computer game Minecraft. Using a vast trove of data collected from the
internet, Fan’s AI agent was able to learn new skills and tasks that allowed it
to freely explore the virtual 3D world and complete complex tasks such as
encircling llamas with fences or scooping lava into a bucket. Video games are
good proxies for the real world, because they require agents to understand
physics, reasoning, and common sense.
In a new
paper, which has not yet been peer-reviewed, researchers at Princeton say
that AI agents tend to have three different characteristics. AI systems are
considered “agentic” if they can pursue difficult goals without being
instructed in complex environments. They also qualify if they can be instructed
in natural language and act autonomously without supervision. And finally, the
term “agent” can also apply to systems that are able to use tools, such as web
search or programming, or are capable of planning.
Are they a new thing?
The term “AI agents” has been around for years and has meant
different things at different times, says Chirag Shah, a computer science
professor at the University of Washington.
There have been two waves of agents, says Fan. The current
wave is thanks to the language model boom and the rise of systems such as ChatGPT.
The previous wave was in 2016, when Google DeepMind
introduced AlphaGo, its AI system that can play—and win—the game Go. AlphaGo
was able to make decisions and plan strategies. This relied on reinforcement
learning, a technique that rewards AI algorithms for desirable behaviours.
“But these agents were not general,” says Oriol Vinyals, vice
president of research at Google DeepMind. They were created for very specific
tasks—in this case, playing Go. The new generation of foundation-model-based AI
makes agents more universal, as they can learn from the world humans interact
with.
“You feel much more that the model is interacting with the
world and then giving back to you better answers or better assisted assistance
or whatnot,” says Vinyals.
What are the limitations?
There are still many open questions that need to be answered.
Kanjun Qiu, CEO and founder of the AI start up Imbue, which is working on
agents that can reason and code, likens the state of agents to where
self-driving cars were just over a decade ago. They can do stuff, but they’re
unreliable and still not really autonomous. For example, a coding agent can
generate code, but it sometimes gets it wrong, and it doesn’t know how to test
the code it’s creating, says Qiu. So humans still need to be actively involved
in the process. AI systems still can’t fully reason, which is a critical step
in operating in a complex and ambiguous human world.
“We’re nowhere close to having an agent that can just
automate all of these chores for us,” says Fan. Current systems “hallucinate
and they also don’t always follow instructions closely,” Fan says. “And that
becomes annoying.”
Another limitation is that after a while, AI agents lose
track of what they are working on. AI systems are limited by their context
windows, meaning the amount of data they can take into account at any given
time.
“ChatGPT can do coding, but it’s not able to do long-form
content well. But for human developers, we look at an entire GitHub repository
that has tens if not hundreds of lines of code, and we have no trouble
navigating it,” says Fan.
To tackle this problem, Google has increased its models’
capacity to process
data, which allows users to have longer interactions with them in which
they remember more about past interactions. The company said it is working on
making its context windows infinite in the future.
For embodied agents such as robots, there are even more
limitations. There is not
enough training data to teach them, and researchers are only just
starting to harness the power of foundation models in robotics.
So amid all the hype and excitement, it’s worth bearing in
mind that research into AI agents is still in its very early stages, and it
will likely take years until we can experience their full potential.
That sounds cool. Can I try an AI agent now?
Sort of. You’ve most likely tried their early prototypes,
such as OpenAI’s ChatGPT and GPT-4. “If you’re interacting with software that
feels smart, that is kind of an agent,” says Qiu.
Right now the best agents we have are systems with very
narrow and specific use cases, such as coding assistants, customer service
bots, or workflow automation software like Zapier, she says. But these are a
far cry from a universal AI agent that can do complex tasks.
“Today we have these computers and they’re really powerful,
but we have to micromanage them,” says Qiu.
OpenAI’s ChatGPT plug-ins, which allow people to create
AI-powered assistants for web browsers, were an attempt at agents, says Qiu.
But these systems are still clumsy, unreliable, and not capable of reasoning,
she says.
Despite that, these systems will one day change the way we
interact with technology, Qiu believes, and it is a trend people need to pay
attention to.
“It’s not like, ‘Oh my God, all of a sudden we have AGI’ ...
but more like ‘Oh my God, my computer can do way more than it did five years
ago,’” she says.
- Get link
- Other Apps
Popular Posts
- Get link
- Other Apps
- Get link
- Other Apps
Comments
Post a Comment