Taming the Wild Side of AI

Adapting to the Unpredictability of LLMs

Aug 17, 2024

Is Wildly Unpredictable Suitable for Business Use? Image created by DALL·E 3

The Main Idea

Guidance on effective generative AI prompting will smooth the path to full adoption for your teams. Weak prompting and whacky responses can lead to a negative experience and turn people off to AI tools. It’s important to acknowledge and frame up the unexpected tendencies of these systems. By understanding that their quirks are – in some cases – useful features, it will give us the patience and the adaptability to harness these for the benefit of our teams.

“Expect the unexpected.” - Oscar Wilde, 1895

“Keep your hands on the steering wheel.” - Tesla autopilot, 2024

These two quotes capture the essence of our challenges reining in AI. We benefit from their wild side, but we need to keep our eyes on things to get the most out of them.

Summary

We live in a world where adaptation to unpredictable change is essential. And we need to expect some of those surprises to come from our AI agents and bots. Learning to adjust our work approaches using them as assistants introduces a new type of uncertainty. These AI assistants are both refreshing and frustrating. They beam with helpful confidence only to have you wonder moments later if they are sputtering nonsense. When you expect to be going along a particular train of thought, they will pivot and reveal a new vista that may yield novel opportunities for you. Or they might get stuck in a tarpit of boxed-in thinking that requires you to start from scratch.

They delight but we must wonder: how can we adjust to their uncertainty?

Guidance:

1. Prompt Adaptation

While researchers debate and iterate on the varied approaches to prompting, we might make the mistake of assuming that what’s right for the researcher is a good fit for everyone else. There is much to learn and understand about prompt research. However, for most folks, this is more than they need to be productive and reap the rewards of generative AI. As I suggested in my previous article, the learning curve is not steep thanks to their natural language interface. Folks can start prompting and adapt as they learn.

2. Prompting for Prompts

On complex tasks requiring more structured and detailed prompts you can ask the LLM to generate the prompt for you. Have it suggest a series of different prompt steps.

Example:

“I need your help writing a prompt. I want to develop a custom GPT for reviewing employee 360 peer performance feedback. Evaluate the content through several lenses: bias, appropriate examples, balance, and tone. Provide me with prompt templates that I can use. For each template, cite research that explains what this prompt structure may or may not best align with the task at hand. Use the enclosed company performance guidelines to enhance the prompt instructions.”

3. Prompt Libraries

For teams that are using LLMs to assist in repetitive tasks, a shared prompt library is a good way to help spread ideas for structuring instructions to LLMs. This is where your early adopters can significantly help the majority by laying down learning related to the specific types of task a functional team has to accomplish. For example, the Custom GPT feature offered by ChatGPT has limitations on instruction length. Knowing how to create effective instructions and templates for common prompt response output formats can help teams ramp up on this feature.

4. Hallucination Antidotes

An area of frustration and risk when working with LLMs is spotting and mitigating hallucinations—when the model generates information that seems plausible but is incorrect or fabricated.

In this section, let’s start with a prompt that is ripe with hallucination potential, despite appearances to the contrary:

“Count the number of cells that are not blank between A2:A122”

Clarify Expectations and Create Context

When asking the LLM to “count the number of cells that are not blank between A2:A122” my unstated hope was that it would tell me the formula to enter into my spreadsheet. Instead, it cheerfully told me “there are 3 non-blank cells between A2 and A122.” After I got done giggling about this hallucination, I cracked open the analysis and discovered that ChatGPT had created mock data under the covers and gave me the correct answer based on what it assumed I was asking.

*Is this a hallucination or a completely obedient LLM, craving context?*

A better prompt would have been to “provide the Google Sheet formula to count the number of cells that are not blank between A2:A122.”

Six more words added to the prompt makes a world of difference.

Set the Tone and Expectations

Many hallucinations are the result of assumptions the model is making by filling in the blanks. Rather than a hallucination where the model is making up a response, it’s actually making assumptions based on gaps in the prompt. By asking the LLM to hold you responsible for clarifying ambiguities, you can avoid some hallucination situations. For example, "Don't make assumptions about ambiguities in my requests. Ask for clarification rather than filling in the gaps yourself.”

*Notably better results just by putting the LLM on notice about assumptions.*

Request Citations

If you ask the LLM to cite sources or specify that the information should be drawn from a provided dataset or document, it will help keep the AI grounded in reality. For example start a conversation with “Cite any sources for your responses including links to web documentation where I can confirm the accuracy of your response.” A friend of mine had ChatGPT review a stack of legal documents related to a legal proceeding. It fabricated contractual points that didn’t exist in the documents. Imagine the awkward lawyer exchange that might have started had the responses not been checked!

When requesting citation with my non-blank cells prompt, I got the right response the first time. Interestingly, there was no content cited, but my putting it on notice changed the context enough for it to not flub the answer.

*No citations, but these context setters really help.*

Iterative Checking

After receiving a response, don’t take it at face value. Use follow-up prompts to verify and cross-check critical pieces of information. For instance, ask the AI to explain how it arrived at a particular conclusion or to provide supporting data.

Purge the Watermark

LLMs hide digital watermarks in their responses. Each model vendor has a proprietary method that they use to subtly alter their responses so they can be programmatically identified. This is true for responses that include writing, code, and images. Responses longer than 50-100 words or 50 lines of code should be assumed to be watermarked. This is a terrific feature that will help spot lazy effort in school homework assignments or plagiarism in research. It’s also a good nudge to take ownership of your work by ditching the copy-pasta routine and using the prompt response as a resource from which you can write with your own voice. This transcription effort has the side effect of having you quality check the response for potential hallucination.

Be the Human

Don’t take your hands off the steering wheel. Scrutinize every response.

5. The Joys and Frustrations of Non-Determinism

LLMs may give different responses to the same question each time it's asked due to the randomness in its decision-making process. This variability in responses is a feature not a bug. Think of it like brainstorming with a creative partner who comes up with different ideas each time you ask for input. This can help you to explore a problem, iterating and getting different ideas without getting stuck in a particular line of thinking. It’s vital for creativity and evaluation of options as you dive deep into a problem. It also keeps LLMs from being rather boring and predictable. But this feature can drive analysts and developers mad with frustration. We’re accustomed to a world where a set of instructions provided to a software platform will be followed the same way every time. Let’s explore mitigations for problems where precision and predictability is crucial.

A Deterministic Necessity in Data Ops: Data Ingestion, Analysis, and Transformation:

Let’s say you operate a part of the business that has the responsibility to bring in new partner data feeds. You get a new feed, analyze it, get some statistics and learn about it, identify errors, cleanse, then output it as some other format. Every week, you have to do the same thing. Quality and consistency are imperative! An LLM can handle this challenge quickly, but without guardrails, it will do so inconsistently.

Before considering options, let’s keep in mind what these systems are really good at. They excel at creative iteration and exploration. They will help you look at something in a different way. We should distinguish the methods employed during the research and ideation phase of a problem from the operational phase of its solution. In one case, we don’t care as much about inconsistency. In the other case, this is a huge problem.

Options for Combating Non-Determinism In Data Ops

Just as in life, better results come from more initial effort on the part of us humans. Here’s an increasingly rigorous set of consistency guardrails.

Weak: In the prompt, tell the AI to “provide a consistent and precise answer.”

Better: In the prompt, tell the AI to “provide a consistent and precise answer in the following format [describe steps to take and types of info you’d like to see from each step].”

Good: Write code to the LLM’s API and use a low temperature setting to reduce randomness.

Really Good: Try OpenAI’s Structured Outputs to enforce a rigid and reliable prompt response structure.

Best: Have the AI LLM write a program for you that accomplishes your requirements. For example

“Write me a python program that will open a Google Sheet, analyze, look for statistical anomalies, and output the clean version to a Snowflake database following the enclosed schema. Design and support a json-based column mapping config (between Google Sheet and Snowflake) file that is read in from an environment variable.”

Why is this the best? Because LLMs are fantastic at writing data transformation programs. Once the code is written, it’s 100% deterministic and doesn’t rely on the AI to operate it. It is the lowest cost and most flexible option. It also decouples your company’s operations from LLM vendor lock-in.

On a Leash: Playful But Under Control

We want our AI assistants to have many attributes: creative, fresh, accurate, and predictable. But these attributes can’t coexist in many business cases. Adapt your approach for each workflow striking the right balance based on the need. By continuously refining your methods and sharing the most effective ones with your team, you can harness AI's full potential while minimizing its pitfalls.