Featured
- Get link
- Other Apps
How Will We Instruct AI in the Future?
Prompt Engineering: Are we flogging a dead horse? Image credit: The author on Midjourney.
Just as we finally started to master the art of prompt
engineering after months and years, some people are already saying that we are
riding a terminally ill horse.
They argue that prompt engineering, as we know it, will
become irrelevant in as little as 6 months, maybe 2 years tops. The
problem it addresses will then be solved in a completely different way, or will
no longer exist — though opinions and ideas differ as to how and why
exactly.
Let’s take a closer look at these arguments and assess their
merits. Is prompt engineering merely a short-lived AI summer hit? Or is
it here to stay? How will it change and evolve in the upcoming years?
What is Prompt Engineering and Why Should We Care?
A simple prompt like “ChatGPT, please write an email to
my employees praising them for profit growth in 2023 and at the same time
announce a significant reduction in staff in an empathetic way” doesn’t
require much prompt engineering. We simply write down what we need and it
usually works.
However, prompt engineering becomes crucial in the context
of applications or data science. In other words, it’s essential
when the model, as part of a platform, has to handle numerous queries or
larger, more complex data sets.
The aim of prompt engineering in applications is usually to
elicit correct responses to as many individual interactions as possible while
avoiding certain responses altogether: For example, a prompt template
in an airline service bot should be so well-crafted that the bot:
- can retrieve
the relevant information based on the user’s question
- can process
the information to give the user an accurate, comprehensible,
concise answer
- can
decide whether it is capable of answering a question or
not and can give the user an indication where to find additional
information
- never
allows itself to get manipulated by users into granting discounts, free
vouchers, upgrades or other goodies
- does
not engage in off-topic dialogues about subjects such as
politics, religion or compulsory vaccination
- Does
not tell ambiguous jokes about flight attendants and pilots
- Does
not allow prompt hijacking
Prompt engineering provides the model with the necessary
data to generate the response in a comprehensible format, specifies the task,
describes the response format, provides examples of appropriate and
inappropriate responses, whitelists topics, defines edge cases and prevents
abusive user prompts.
Apart from a few key phrases, the prompt does not rely on
magic formulas that trigger crazy hidden switches in the model. Instead, it
describes the task in a highly structured way, as if explaining it to a smart
newbie who is still unfamiliar with my company, my customers, my products and the
task in question.
So Why is Prompt Engineering a Dead Horse?
Those who make this claim typically give several reasons,
with varying degrees of emphasis.
The top reasons given are:
1) While the problem formulation is
still necessary for the model, the actual crafting of the prompt — i.e.,
choosing the right phrases — is becoming less critical.
2) Models are increasingly better at understanding
us. We can roughly scribble down the task for the model in a few
bullet points, and thanks to its intelligence, the model will infer the
rest on its own..
3) The models will be so personalized that
they will know me better and be able to anticipate my needs.
4) Models will generate prompts
autonomously or improve my prompts.
5) The models will be integrated into agentic setups,
enabling them to independently devise a plan to solve a problem and also check
the solution.
6) We will no longer write prompts, we
will program them.
The Generative AI List of Lists: 5000 Models, Tools,
Technologies, Applications, & Prompts
A curated list of resources on generative AI. Updated June 1st,
2024 — new models
1) Prompt engineering is dead. But the problem
formulation for the model is still needed and is becoming even more important.
The Harvard Business Review, for example, argues this way.
Yes, yes, I agree: The problem formulation for the
model is really the most important thing.
But that is precisely the central component of
prompt engineering: specifying exactly how the model should respond,
process the data and so on.
This is similar to how software development is not about the
correct placement of curly brackets, but rather about the precise, fail-safe,
and comprehensible formulation of an algorithm that solves a problem.
AUTO and so on — the AUTOMAT framework in the prompt
engineering cheat sheet
For example, the screenshot above shows the AUTOMAT
framework, which specifies how to build a prompt instruction. Around 90% of
this is communicating the task clearly and in the right structure to the model.
The other 10% is pixie dust — or very special expressions that
make your prompting more successful.
So, perhaps this is not how prompt engineering will die.
2) The models will understand us better and better. So,
in the future, we will sketch out our task for the model in a few bullet points
and the model will somehow understand us. Because it’s so smart.
Here is
a Medium story on that. Here and here are some Reddit discussions.
I think we all agree that the capabilities of the models
will evolve quickly: We will be able to have them solve increasingly complex
and extensive tasks. But now let’s talk about instruction quality and size.
Let’s assume we have some genies that can solve tasks of
various complexity: One can conjure me a spoon, the second a suit, the third a
car, the fourth a house and the fifth a castle. I would probably brief the
genies more extensively and more precisely the better they are, and the more
complex the objective. Because I have a higher degree of freedom with complex
items than with simple ones.
And that is exactly the case. I’ve been working with
generative AI since GPT-2. And I’ve never spent as much time on prompt engineering
as I do today. Not because the models are dumber. Because the tasks are much
more demanding.
The only true thing about the above argument is that prompt
engineering becomes more efficient with better models: If I put twice as much
effort into prompt engineering in newer models, I get an output of a value,
complexity, and precision that is probably closer to four times what I used to
get with older models.
At least that’s how it feels to me.
3) The models will be personalized and tailored to me. So
they’ll know exactly what I want and I won’t have to explain it to them.
There’s something to this argument. When I ask a future
model to suggest a weekend with my wife for our 5th wedding anniversary, it
will already know that my wife is a vegan and that I prefer the mountains to
the beach. And that we’re a bit short of cash at the moment because we just
bought a house. I no longer need to say that explicitly.
But I don’t see the greatest potential for individualization
in relation to a person and their preferences and tasks. Rather, it is in
relation to organizations, their business cases, and data.
Imagine if a model could correctly execute an instruction
like: “Could you please ask the Tesla guys if they have sent the last
deliverables to the client and then check with finance when we can issue the
final invoice?”
The smartest model in the world can’t execute this
instruction because it doesn’t know the “Tesla guys” (a project for Tesla Inc?
Or a biopic on Nikola?), the deliverables and who exactly “finance” is and how
to reach them.
It could be a game changer if a model could understand
something like this. It would just be a helpful fully skilled colleague. The
difference between a valuable, experienced employee and a newcomer (in whom you
would rather invest time) is often not in their broader domain knowledge, but
in their contextual knowledge: They know the people, the data, the processes of
my organization. They know what works, what doesn’t, and whom to talk to when
things don’t work. An experienced model could have this contextual
knowledge. That would be incredibly valuable: Working 24/7 for just a
few cents per hour.
This would be useful not just for ad hoc jobs like the Tesla
instruction above, but also in an application context, such as in setting up a
structured AI-based system for processing insurance claims, for example: A
model which knows how we in the claims department of our insurance company
process customers emails could be a great help. For instance, it could assist
in setting up algorithms and prompts for automated processing.
At present, it is not clear how a model could obtain this
knowledge. Traditional fine-tuning with structured training material would
certainly require far too much effort in terms of data provision. Letting the
model continuously follow the conversations and messages of a team or
department could be a possible solution. Maybe you have a better idea?
Even with a personal model, that doesn’t mean that I
won’t need to do any prompt engineering. But I can let the model, or the system
in which it is embedded, do a lot of the heavy lifting for me.
4) The models will write their own prompts or at least
optimize our prompts
We have already seen that this is not easy because the model
cannot know what I want and exactly how I want it.
But there is one thing that helps us here: instead of giving
the model an instruction on what to do, we can simply bombard it with examples
and let it write a prompt. For example, here I ask a model to write me
a prompt based on input-output pairs:
### Instruction
Act as a smart prompt engineer. Write a prompt for me, that could translate
all of the following inputs into the outputs.
If it makes sense for your, add more input / output examples which are not
too close to the existing examples.
Format everything to a well-crafted prompt
### Input / Output
Input: 'Fagoideae', Output: 'Fagaceae'
Input: 'Washington', Output: '<WRONG INPUT>'
Input: 'Ciconiidae', Output: 'Ciconiiformes'
Input: 'Pizza', Output: '<WRONG INPUT>'
Input: 'Fungi', Output: 'Eukaryota'
Prompt:
And here’s the model output (GPT-4o):
### Prompt
You are a highly accurate biological classification translator. Your task is
to translate inputs representing biological taxa into their correct higher
taxonomic group. If the input does not represent a valid biological taxon,
respond with '<WRONG INPUT>'. Here are some examples to guide you:
Input: 'Fagoideae'
Output: 'Fagaceae'
Input: 'Washington'
Output: '<WRONG INPUT>'
Input: 'Ciconiidae'
Output: 'Ciconiiformes'
Input: 'Pizza'
Output: '<WRONG INPUT>'
Input: 'Fungi'
Output: 'Eukaryota'
Additional Examples:
Input: 'Felidae'
Output: 'Carnivora'
Input: 'Hominidae'
Output: 'Primates'
Input: 'Canis'
Output: 'Carnivora'
Input: 'Table'
Output: '<WRONG INPUT>'
Input: 'Chordata'
Output: 'Animalia'
Translate the following input accordingly:
Input:
The prompt is pretty good. For example, the first
instruction contains something that I, as a prompt engineer, may not know and
don’t need to know — the model here generalizes my examples in a clever way:
Your task is to translate inputs representing biological
taxa into their
correct higher taxonomic group. If the input does not represent a valid
biological taxon, respond with '<WRONG INPUT>'.
The model only inferred this information from the examples —
I didn’t provide this generalization explicitly. And it found some better and
more striking examples than I provided.
However, the prompt still needs some improvement. For
example, you wouldn’t start it with “### Prompt” because the model you send it
to always knows that this is the prompt. Also, a headline like “Additional
Examples” makes little sense in the prompt, but only in the discussion between
the prompt engineer and the PE assistant. And the examples are too numerous and
too similar for few-shot prompting.
Beyond automated prompt engineering (APE), Automatic Prompt Optimization (APO),
which can improve an existing prompt, is very interesting. However, I
typically need evaluation data for this, meaning the system must be able to
test prompts against input and output data. Of course, this is something that I
usually don’t have in large quantities.
Last remark (I know, some of you will ask that):
Why does the model have to write a prompt or optimize a prompt, if it is so
smart? Couldn’t it just internally use the better prompt to directly give the
response?
Yes — if we have a single use of the prompt. But usually, in
an application context, we will use the prompt a million times or maybe send it
to a different model, so it makes sense to optimize it.
5) The models will be integrated into an agentic setup
Agentic approaches in AI go one step further than automated
prompt engineering.
As a user, I can just enter 1 or 2 sentences, such as:
“Give me three suggestions for a gourmet weekend in
Copenhagen, including travel from London, with exact dates and prices. I have a
budget of around EUR 800 for 2 people.”
The autonomous agent then makes a plan of what information
it needs, where to get it from, how it processes and checks it, and finally
packages it into suggestions. In the process of finding a solution, it
writes itself a series of prompts, search queries or API calls.
In turn, it sends these to itself or to other models, search
engines or applications. At the moment, most of these agents are pretty
bad, or at least not as good as current models, which can often solve such
a query quite decently in a single call — but the potential is huge.
Here you can try two of them out and understand how an agent
works: AgentGPT or AIAgent.app
In the future, it may be possible for such a system to find
a solution to a complex problem iteratively, trying out different approaches
with little input.
But — you knew there would be a but — tasks in a
business context that go beyond the gourmet weekend require much more
specification. For example, I would have to write a detailed agent prompt,
specifying the exact way an agent has to process insurance claims in my
company. The agent is not able to just find this out by trial and
error.
So, no, this will not eliminate the need for prompt engineering.
6) We will program prompts
There are numerous approaches to natively integrate prompts
into programming languages or to program them directly (see here: dspy, APPL). Some
say that this will eliminate the need for prompt engineering.
Here are some examples, how a prompt could look like. In
Dspy, a chain-of-thought prompt would be written as follows:
class CoT(dspy.Module):
def __init__(self):
super().__init__()
self.prog = dspy.ChainOfThought("question
-> answer")
def forward(self, question):
return
self.prog(question=question)
In APPL, a simple prompt looks like this:
import appl
from appl import gen, ppl
appl.init() # initialize APPL
@ppl # the @ppl decorator marks the
function as an `APPL function`
def greeting(name: str):
f"Hello World! My name is {name}." # Add text to the prompt
return gen() # call the default LLM with the current
prompt
print(greeting("APPL")) # call
`greeting` as a normal Python function
One of the core motives for using prompt programming is the
ability to evaluate and optimize prompts in an automated way. In
these frameworks, this can be an easy task.
On the other hand, programming makes prompt
engineering much more complex for the average user because it
makes the whole process more opaque. One reason prompt engineering gained a lot
of popularity was because prompts are written in a natural language —
and not in code.
Programmed prompts with self-optimization functions can
lead to excellent and completely unexpected results. Some models can solve math
problems better when
the prompt embeds them in a Star Trek scenario:
Stark Trek Prompting, source: Battle, Gollapudi: The
Unreasonable Effectiveness of Eccentric Automatic Prompts.
With good, very rich ground truth data available for
optimization, this can perfectly work (at least for simple tasks like
8th grade math problems). Unfortunately, for various reasons, this data usually
does not exist, and the business problems are often are significantly more
complex. With chatbots or assistants, it is incredibly difficult to
systematically measure whether an answer is correct — there can be any number
of correct answers to many questions.
But still, I actually believe that a significant portion of
prompt engineering in the future could involve prompt programming.
Nevertheless, circling back to our main question: Will prompt engineering be
dead? No, but maybe it will involve even more engineering!
So, what is the future of prompt engineering: Is it dead?
Still alive, but changing drastically? How will it evolve?
Here are my four take-aways: Prompt engineering will undergo
massive changes across several dimensions.
I am absolutely convinced that models will greatly extend
their capabilities in the future. At the same time, this will make prompting
even more challenging. The smarter the genies get, the more they can do, the
more degrees of freedom the possible solution has. So I have to specify what I
want and how I want it in a detailed and structured way.
Personalized models will know a lot about me, my interests,
my work, my team, my organization. They will contribute the knowledge to the
solution of the task. I will be able to prompt a model like I can brief an
experienced colleague with only references to entities and processes of our
team or organization. This will massively simplify prompting.
Prompt engineering and deterministic algorithms will grow
together. I can describe both the prompt and the deterministic
algorithms in which it is integrated — the APIs that it calls, the
multi-prompting processes — into a single language.
Everything required for this scenario is actually already
available in various fragmented solutions, the only thing missing is
the ChatGPT of hybrid prompting.
The good news: Prompt Engineering will become much,
much more powerful!
And the bad: It’s going to get a lot more complicated.
Using my model as a prompt assistant helps me to develop
prompts and evaluation procedures. The models will observe and monitor
their own responses and those of other models, warning me of hallucinations,
drift or error and suggesting solutions.
A prompt will no longer be just text, but an entity
consisting of many prompt candidates, test data, evaluation procedures, and
other criteria.
This whole process will be not being a coding task — the
briefing to set up such a system could be in natural language, a meta prompt.
Once again, a lot is already available — it is just not here
in a reasonably easy-to-use framework.
My dear AI aficionados and fellow prompt engineers: I am
looking forward to an incredibly exciting future — one of which we do not yet
know exactly what it will bring, except, of course, that it will continue to be
extremely thrilling as well challenging.
- Get link
- Other Apps
Popular Posts
- Get link
- Other Apps
- Get link
- Other Apps
Comments
Post a Comment