Featured
- Get link
- Other Apps
OpenAI o1
Last week OpenAI released a new model called o1 (previously referred to
under the code name “Strawberry” and, before that, Q*) that blows GPT-4o out of the
water for this type of purpose.
Unlike previous models that are well suited for language
tasks like writing and editing, OpenAI o1 is focused on multistep “reasoning,”
the type of process required for advanced mathematics, coding, or other
STEM-based questions. It uses a “chain of thought” technique, according to
OpenAI. “It learns to recognize and correct its mistakes. It learns to break
down tricky steps into simpler ones. It learns to try a different approach when
the current one isn’t working,” the company wrote in a blog post on its
website.
OpenAI’s tests point to resounding success. The
model ranks in the 89th percentile on questions from the competitive coding
organization Code forces and would be among the top 500 high school students in
the USA Math Olympiad, which covers geometry, number theory, and other math topics.
The model is also trained to answer PhD-level questions in subjects ranging
from astrophysics to organic chemistry.
In math Olympiad questions, the new model is 83.3% accurate,
versus 13.4% for GPT-4o. In the PhD-level questions, it averaged 78% accuracy, compared with 69.7%
from human experts and 56.1% from GPT-4o. (In light of these accomplishments,
it’s unsurprising the new model was pretty good at writing a poem for our
nuptial games, though still not perfect; it used more Ts and Ss than instructed
to.)
So why does this matter? The bulk of LLM
progress until now has been language-driven, resulting in chatbots or voice
assistants that can interpret, analyze, and generate words. But in addition to
getting lots of facts wrong, such LLMs have failed to demonstrate the types of
skills required to solve important problems in fields like drug discovery,
materials science, coding, or physics. OpenAI’s o1 is one of the first signs
that LLMs might soon become genuinely helpful companions to human researchers
in these fields.
It’s a big deal because it brings “chain-of-thought”
reasoning in an AI model to a mass audience, says Matt Welsh, an AI researcher
and founder of the LLM startup Fixie.
“The reasoning abilities are directly in the model, rather
than one having to use separate tools to achieve similar results. My
expectation is that it will raise the bar for what people expect AI models to
be able to do,” Welsh says.
That said, it’s best to take OpenAI’s comparisons to
“human-level skills” with a grain of salt, says Yves-Alexandre de Montjoye, an
associate professor in math and computer science at Imperial College London.
It’s very hard to meaningfully compare how LLMs and people go about tasks such
as solving math problems from scratch.
Also, AI researchers say that measuring how well a model like o1 can
“reason” is harder than it sounds. If it answers a given question correctly, is
that because it successfully reasoned its way to the logical answer? Or was it
aided by a sufficient starting point of knowledge built into the model? The
model “still falls short when it comes to open-ended reasoning,” Google AI
researcher François Chollet wrote on X.
Finally, there’s the price. This reasoning-heavy model
doesn’t come cheap. Though access to some versions of the model is included in
premium OpenAI subscriptions, developers using o1 through the API will pay
three times as much as they pay for GPT-4o—$15 per 1 million input tokens in
o1, versus $5 for GPT-4o. The new model also won’t be most users’ first pick
for more language-heavy tasks, where GPT-4o continues to be the better option,
according to OpenAI’s user surveys.
What will it unlock? We won’t know until
researchers and labs have the access, time, and budget to tinker with the new
mode and find its limits. But it’s surely a sign that the race for models that
can out reason humans has begun.
- Get link
- Other Apps
Popular Posts
- Get link
- Other Apps
- Get link
- Other Apps
Comments
Post a Comment