Markus Reiter-Haas

University Assistant
@Graz University of Technology

GET IN TOUCH
Markus Reiter-Haas Markus Reiter-Haas
Retracing and Returning from Prompt Engineering

Retracing and Returning from Prompt Engineering

Date: 2023, Dec 08

Retracing and Returning from Prompt Engineering

Nowadays, there are calls to go “beyond” prompt engineering. By retracing the steps of prompt engineering, we will see that it is a return rather than a novel direction. However, that return might still be the way forward.

Zero-Shot Predictions To better understand prompt engineering, a good place to start is to consider zero-shot learning. Zero-shot learning tries to generalize the predictions without being explicitly trained to do so, e.g., to unobserved classes in the training stage. To do so, zero-shot models use auxiliary data and exploit patterns in the training data. For instance, in zero-shot classification, a label name (or description) can be embedded and the similarity to the samples can be computed. Furthermore, it might be beneficial to add additional information, such as a prompt template. The idea is nice if executed well. Using a template such as <sentence> The previous sentence is about <label> together with a pretrained textual entailment model is a valid approach. Here, the hypothesis of the second sentence is determined given the first sentence. Therefore, the same model can be used to deal with multiple (even unseen) tasks.

Optimizing Prompts The intuitive next question is: “What makes a good template or prompt in general?” Welcome to prompt engineering. However, while humans will tend to natural text (e.g., when interacting with Chatbots), deep learning models never see the actual text. In deep learning, everything is converted to tokens (represented as numerical IDs) and subsequently as embeddings (i.e., numerical floating point vectors). Hence, there is no need to construct textual prompts; rather, inserting special tokens that convey their meaning with specific embeddings is an even better approach for machines. The next logical endeavor is to, thus, to create good prompt tokens. Therefore, we need to evaluate the performance of prompts, where we need data. Now we arrive at Prompt Tuning, where the argument goes full circle, as these tokens can again be optimized with machine learning. Given enough data and computing power, good prompts can easily be found. However, as data might be hard to come by, the amount of data required should ideally be small. Otherwise, a custom model could be trained from scratch.

Relating back to fine-tuning For inspiration on how to use small amounts of data with large pretrained models efficiently, fine-tuning is a good place to look at. Fine-tuning is well-established and supported by many good ideas, such as contrastive learning. But wait, if there is a specific goal and data to learn, then the initial model could be adapted to that. Predict the target and optimize the model, no extra steps required. Moreover, in some cases, such as when the target is a number/score, a fine-tuned model does not require post-processing, like extracting the actual result from the generated content but predicting it directly. Therefore, we also do not require guardrails (e.g., in the form of additional instruction prompts) to ensure that the output is in the correct format. Even with guard rails, the process might fail, e.g., when the model presents it in the wrong format nonetheless or refuses to answer at all (maybe add more “please” to the prompt).

Lessons learned from IR There are several lessons from information retrieval (IR) that are relevant to making Human-AI interaction more meaningful. In many IR systems, the relationship (i.e., relevance) between queries (similar to prompts) and documents (similar to generated text) is learned, e.g., from user behavior. Therefore, even novice users can use such systems without requiring them to carefully refine their queries. In my opinion, today’s models have a long way to go in that regard. IR systems also support their users with interface components rather than solely relying on the query language, e.g., filters to limit the results to recent dates rather than using the daterange: operator. In a similar vein, AI interfaces could specify common personas rather than telling them via prompts. First strides here are already being made, e.g., in Copilot with options of creative, balanced, or precise responses. IR systems have also become very good with different modalities, e.g., text vs image search. In AI systems, you have to tell the model that and hope it understands your request and obeys. IR system also supports query formalization by suggesting corrections (if deemed as potentially incorrect), auto-completions (which improves user efficiency), as well as alternative queries. AI responses might prompt for additional information if the query is too ambiguous, but is otherwise not comparable in this aspect. Finally, modern IR systems no longer just provide documents, but information represented in many different forms (e.g., knowledge panels, extractive Q/A boxes). Increasingly, plugins help the AI to bridge this gap, but are by definition not part of the base system yet.

TLDR I see lots of potential for AI systems to better satisfy user needs. At the moment, prompt engineering acts as a sound intermediate solution. However, I see the trend of going beyond prompt engineering or rather returning from it to well-established approaches as inevitable. I believe that by learning from patterns again, rather than adapting prompts, we will get closer to the true potential of AI systems.

Version 0.1; Errata: -