Known Public Domain

How Will We Instruct AI in the Future?

Prompt Engineering: Are we flogging a dead horse? Image credit: The author on Midjourney.

Just as we finally started to master the art of prompt engineering after months and years, some people are already saying that we are riding a terminally ill horse.

They argue that prompt engineering, as we know it, will become irrelevant in as little as 6 months, maybe 2 years tops. The problem it addresses will then be solved in a completely different way, or will no longer exist — though opinions and ideas differ as to how and why exactly.

Let’s take a closer look at these arguments and assess their merits. Is prompt engineering merely a short-lived AI summer hit? Or is it here to stay? How will it change and evolve in the upcoming years?

What is Prompt Engineering and Why Should We Care?

A simple prompt like “ChatGPT, please write an email to my employees praising them for profit growth in 2023 and at the same time announce a significant reduction in staff in an empathetic way” doesn’t require much prompt engineering. We simply write down what we need and it usually works.

However, prompt engineering becomes crucial in the context of applications or data science. In other words, it’s essential when the model, as part of a platform, has to handle numerous queries or larger, more complex data sets.

The aim of prompt engineering in applications is usually to elicit correct responses to as many individual interactions as possible while avoiding certain responses altogether: For example, a prompt template in an airline service bot should be so well-crafted that the bot:

can retrieve the relevant information based on the user’s question
can process the information to give the user an accurate, comprehensible, concise answer
can decide whether it is capable of answering a question or not and can give the user an indication where to find additional information
never allows itself to get manipulated by users into granting discounts, free vouchers, upgrades or other goodies
does not engage in off-topic dialogues about subjects such as politics, religion or compulsory vaccination
Does not tell ambiguous jokes about flight attendants and pilots
Does not allow prompt hijacking

Prompt engineering provides the model with the necessary data to generate the response in a comprehensible format, specifies the task, describes the response format, provides examples of appropriate and inappropriate responses, whitelists topics, defines edge cases and prevents abusive user prompts.

Apart from a few key phrases, the prompt does not rely on magic formulas that trigger crazy hidden switches in the model. Instead, it describes the task in a highly structured way, as if explaining it to a smart newbie who is still unfamiliar with my company, my customers, my products and the task in question.

So Why is Prompt Engineering a Dead Horse?

Those who make this claim typically give several reasons, with varying degrees of emphasis.

The top reasons given are:

1) While the problem formulation is still necessary for the model, the actual crafting of the prompt — i.e., choosing the right phrases — is becoming less critical.

2) Models are increasingly better at understanding us. We can roughly scribble down the task for the model in a few bullet points, and thanks to its intelligence, the model will infer the rest on its own..

3) The models will be so personalized that they will know me better and be able to anticipate my needs.

4) Models will generate prompts autonomously or improve my prompts.

5) The models will be integrated into agentic setups, enabling them to independently devise a plan to solve a problem and also check the solution.

6) We will no longer write prompts, we will program them.

The Generative AI List of Lists: 5000 Models, Tools, Technologies, Applications, & Prompts

A curated list of resources on generative AI. Updated June 1st, 2024 — new models

1) Prompt engineering is dead. But the problem formulation for the model is still needed and is becoming even more important.

The Harvard Business Review, for example, argues this way.

Yes, yes, I agree: The problem formulation for the model is really the most important thing.

But that is precisely the central component of prompt engineering: specifying exactly how the model should respond, process the data and so on.

This is similar to how software development is not about the correct placement of curly brackets, but rather about the precise, fail-safe, and comprehensible formulation of an algorithm that solves a problem.

AUTO and so on — the AUTOMAT framework in the prompt engineering cheat sheet

For example, the screenshot above shows the AUTOMAT framework, which specifies how to build a prompt instruction. Around 90% of this is communicating the task clearly and in the right structure to the model. The other 10% is pixie dust — or very special expressions that make your prompting more successful.

So, perhaps this is not how prompt engineering will die.

2) The models will understand us better and better. So, in the future, we will sketch out our task for the model in a few bullet points and the model will somehow understand us. Because it’s so smart.

Here is a Medium story on that. Here and here are some Reddit discussions.

I think we all agree that the capabilities of the models will evolve quickly: We will be able to have them solve increasingly complex and extensive tasks. But now let’s talk about instruction quality and size.

Let’s assume we have some genies that can solve tasks of various complexity: One can conjure me a spoon, the second a suit, the third a car, the fourth a house and the fifth a castle. I would probably brief the genies more extensively and more precisely the better they are, and the more complex the objective. Because I have a higher degree of freedom with complex items than with simple ones.

And that is exactly the case. I’ve been working with generative AI since GPT-2. And I’ve never spent as much time on prompt engineering as I do today. Not because the models are dumber. Because the tasks are much more demanding.

The only true thing about the above argument is that prompt engineering becomes more efficient with better models: If I put twice as much effort into prompt engineering in newer models, I get an output of a value, complexity, and precision that is probably closer to four times what I used to get with older models.

At least that’s how it feels to me.

3) The models will be personalized and tailored to me. So they’ll know exactly what I want and I won’t have to explain it to them.

There’s something to this argument. When I ask a future model to suggest a weekend with my wife for our 5th wedding anniversary, it will already know that my wife is a vegan and that I prefer the mountains to the beach. And that we’re a bit short of cash at the moment because we just bought a house. I no longer need to say that explicitly.

But I don’t see the greatest potential for individualization in relation to a person and their preferences and tasks. Rather, it is in relation to organizations, their business cases, and data.

Imagine if a model could correctly execute an instruction like: “Could you please ask the Tesla guys if they have sent the last deliverables to the client and then check with finance when we can issue the final invoice?”

The smartest model in the world can’t execute this instruction because it doesn’t know the “Tesla guys” (a project for Tesla Inc? Or a biopic on Nikola?), the deliverables and who exactly “finance” is and how to reach them.

It could be a game changer if a model could understand something like this. It would just be a helpful fully skilled colleague. The difference between a valuable, experienced employee and a newcomer (in whom you would rather invest time) is often not in their broader domain knowledge, but in their contextual knowledge: They know the people, the data, the processes of my organization. They know what works, what doesn’t, and whom to talk to when things don’t work. An experienced model could have this contextual knowledge. That would be incredibly valuable: Working 24/7 for just a few cents per hour.

This would be useful not just for ad hoc jobs like the Tesla instruction above, but also in an application context, such as in setting up a structured AI-based system for processing insurance claims, for example: A model which knows how we in the claims department of our insurance company process customers emails could be a great help. For instance, it could assist in setting up algorithms and prompts for automated processing.

At present, it is not clear how a model could obtain this knowledge. Traditional fine-tuning with structured training material would certainly require far too much effort in terms of data provision. Letting the model continuously follow the conversations and messages of a team or department could be a possible solution. Maybe you have a better idea?

Even with a personal model, that doesn’t mean that I won’t need to do any prompt engineering. But I can let the model, or the system in which it is embedded, do a lot of the heavy lifting for me.

4) The models will write their own prompts or at least optimize our prompts

We have already seen that this is not easy because the model cannot know what I want and exactly how I want it.

But there is one thing that helps us here: instead of giving the model an instruction on what to do, we can simply bombard it with examples and let it write a prompt. For example, here I ask a model to write me a prompt based on input-output pairs:

### Instruction
Act as a smart prompt engineer. Write a prompt for me, that could translate
all of the following inputs into the outputs.
If it makes sense for your, add more input / output examples which are not
too close to the existing examples.
Format everything to a well-crafted prompt

### Input / Output
Input: 'Fagoideae', Output: 'Fagaceae'
Input: 'Washington', Output: '<WRONG INPUT>'
Input: 'Ciconiidae', Output: 'Ciconiiformes'
Input: 'Pizza', Output: '<WRONG INPUT>'
Input: 'Fungi', Output: 'Eukaryota'
Prompt:

And here’s the model output (GPT-4o):

### Prompt
You are a highly accurate biological classification translator. Your task is
to translate inputs representing biological taxa into their correct higher
taxonomic group. If the input does not represent a valid biological taxon,
respond with '<WRONG INPUT>'. Here are some examples to guide you:

Input: 'Fagoideae'
Output: 'Fagaceae'

Input: 'Washington'
Output: '<WRONG INPUT>'

Input: 'Ciconiidae'
Output: 'Ciconiiformes'

Input: 'Pizza'
Output: '<WRONG INPUT>'

Input: 'Fungi'
Output: 'Eukaryota'

Additional Examples:

Input: 'Felidae'
Output: 'Carnivora'

Input: 'Hominidae'
Output: 'Primates'

Input: 'Canis'
Output: 'Carnivora'

Input: 'Table'
Output: '<WRONG INPUT>'

Input: 'Chordata'
Output: 'Animalia'

Translate the following input accordingly:

Input:

The prompt is pretty good. For example, the first instruction contains something that I, as a prompt engineer, may not know and don’t need to know — the model here generalizes my examples in a clever way:

Your task is to translate inputs representing biological taxa into their
correct higher taxonomic group. If the input does not represent a valid
biological taxon, respond with '<WRONG INPUT>'.

The model only inferred this information from the examples — I didn’t provide this generalization explicitly. And it found some better and more striking examples than I provided.

However, the prompt still needs some improvement. For example, you wouldn’t start it with “### Prompt” because the model you send it to always knows that this is the prompt. Also, a headline like “Additional Examples” makes little sense in the prompt, but only in the discussion between the prompt engineer and the PE assistant. And the examples are too numerous and too similar for few-shot prompting.

Beyond automated prompt engineering (APE), Automatic Prompt Optimization (APO), which can improve an existing prompt, is very interesting. However, I typically need evaluation data for this, meaning the system must be able to test prompts against input and output data. Of course, this is something that I usually don’t have in large quantities.

Last remark (I know, some of you will ask that): Why does the model have to write a prompt or optimize a prompt, if it is so smart? Couldn’t it just internally use the better prompt to directly give the response?

Yes — if we have a single use of the prompt. But usually, in an application context, we will use the prompt a million times or maybe send it to a different model, so it makes sense to optimize it.

5) The models will be integrated into an agentic setup

Agentic approaches in AI go one step further than automated prompt engineering.

As a user, I can just enter 1 or 2 sentences, such as:

“Give me three suggestions for a gourmet weekend in Copenhagen, including travel from London, with exact dates and prices. I have a budget of around EUR 800 for 2 people.”

The autonomous agent then makes a plan of what information it needs, where to get it from, how it processes and checks it, and finally packages it into suggestions. In the process of finding a solution, it writes itself a series of prompts, search queries or API calls.

In turn, it sends these to itself or to other models, search engines or applications. At the moment, most of these agents are pretty bad, or at least not as good as current models, which can often solve such a query quite decently in a single call — but the potential is huge.

Here you can try two of them out and understand how an agent works: AgentGPT or AIAgent.app

In the future, it may be possible for such a system to find a solution to a complex problem iteratively, trying out different approaches with little input.

But — you knew there would be a but — tasks in a business context that go beyond the gourmet weekend require much more specification. For example, I would have to write a detailed agent prompt, specifying the exact way an agent has to process insurance claims in my company. The agent is not able to just find this out by trial and error.

So, no, this will not eliminate the need for prompt engineering.

6) We will program prompts

There are numerous approaches to natively integrate prompts into programming languages or to program them directly (see here: dspy, APPL). Some say that this will eliminate the need for prompt engineering.

Here are some examples, how a prompt could look like. In Dspy, a chain-of-thought prompt would be written as follows:

class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought("question -> answer")

    def forward(self, question):
        return self.prog(question=question)

In APPL, a simple prompt looks like this:

import appl
from appl import gen, ppl

appl.init() # initialize APPL

@ppl # the @ppl decorator marks the function as an `APPL function`
def greeting(name: str):
f"Hello World! My name is {name}." # Add text to the prompt
return gen() # call the default LLM with the current prompt

print(greeting("APPL")) # call `greeting` as a normal Python function

One of the core motives for using prompt programming is the ability to evaluate and optimize prompts in an automated way. In these frameworks, this can be an easy task.

On the other hand, programming makes prompt engineering much more complex for the average user because it makes the whole process more opaque. One reason prompt engineering gained a lot of popularity was because prompts are written in a natural language — and not in code.

Programmed prompts with self-optimization functions can lead to excellent and completely unexpected results. Some models can solve math problems better when the prompt embeds them in a Star Trek scenario:

Stark Trek Prompting, source: Battle, Gollapudi: The Unreasonable Effectiveness of Eccentric Automatic Prompts.

With good, very rich ground truth data available for optimization, this can perfectly work (at least for simple tasks like 8th grade math problems). Unfortunately, for various reasons, this data usually does not exist, and the business problems are often are significantly more complex. With chatbots or assistants, it is incredibly difficult to systematically measure whether an answer is correct — there can be any number of correct answers to many questions.

But still, I actually believe that a significant portion of prompt engineering in the future could involve prompt programming. Nevertheless, circling back to our main question: Will prompt engineering be dead? No, but maybe it will involve even more engineering!

So, what is the future of prompt engineering: Is it dead? Still alive, but changing drastically? How will it evolve?

Here are my four take-aways: Prompt engineering will undergo massive changes across several dimensions.

I am absolutely convinced that models will greatly extend their capabilities in the future. At the same time, this will make prompting even more challenging. The smarter the genies get, the more they can do, the more degrees of freedom the possible solution has. So I have to specify what I want and how I want it in a detailed and structured way.

Personalized models will know a lot about me, my interests, my work, my team, my organization. They will contribute the knowledge to the solution of the task. I will be able to prompt a model like I can brief an experienced colleague with only references to entities and processes of our team or organization. This will massively simplify prompting.

Prompt engineering and deterministic algorithms will grow together. I can describe both the prompt and the deterministic algorithms in which it is integrated — the APIs that it calls, the multi-prompting processes — into a single language.

Everything required for this scenario is actually already available in various fragmented solutions, the only thing missing is the ChatGPT of hybrid prompting.

The good news: Prompt Engineering will become much, much more powerful!

And the bad: It’s going to get a lot more complicated.

Using my model as a prompt assistant helps me to develop prompts and evaluation procedures. The models will observe and monitor their own responses and those of other models, warning me of hallucinations, drift or error and suggesting solutions.

A prompt will no longer be just text, but an entity consisting of many prompt candidates, test data, evaluation procedures, and other criteria.

This whole process will be not being a coding task — the briefing to set up such a system could be in natural language, a meta prompt.

Once again, a lot is already available — it is just not here in a reasonably easy-to-use framework.

My dear AI aficionados and fellow prompt engineers: I am looking forward to an incredibly exciting future — one of which we do not yet know exactly what it will bring, except, of course, that it will continue to be extremely thrilling as well challenging.

Known Public Domain - Bytes

Search This Blog

Comments

Post a Comment