The Dummy's Guide to ChatGPT hallucinations and prompting.

Mark Arnold
9 min readJun 7, 2023

--

If you can conceptually understand something then you can apply your own intelligence and logic to that. I hope this article will bring that conceptual understanding. Despite not being a technical article I further hope it will bring low-level, technical and fundamental understanding too.

Let’s re-imagine ChatGPT has an image:

We understand that LLM’s like ChatGPT are trained on “data”, lots of it!

So we also understand that every-time we interact with GPT, whether via API or CHAT or in some application then we interact with that “data”. As humans, (not AI’s!), we also know we’re very visually stimulated creatures.

So, re-imagine in your mind and liken ChatGPT to an extremely low resolution JPG image. Whether technical or not you understand that images can be saved at varying degrees of resolution and JPG is one such format that allows that. And the format ChatGPT stores its data in is also very similar to a low resolution JPG image at a conceptual level which is the important thing here. If we understand how it works conceptually we can leverage it to the best of our advantage. ChatGPT literally sees everything has a massively compressed JPG image so if we can see it the same way there is a benefit.

HERE IS THE RAW TRAINING DATA:

This is an image of the “original” training data at full resolution.

HERE IS THE ChatGPT MEMORY OF THAT DATA:

This is an image of how ChatGPT sees the “training data”.
This is an image of how ChatGPT sees the “training data” at a minimal resolution.

Just like JPG compression, ChatGPT decides which bits of it’s training data to throw away or just to understand at the fringe and which bits it really must keep. GPT4 keeps more than GPT3 as it has a bigger memory but they both throw away vast chunks of their training data.

JPG decides which bits to “ignore or reduce” basically on an algorithm which roughly determines how a human eye would view the picture.

GPT decides which bits to “ignore or reduce” basically on an algorithm which roughly determines how relevant the incoming information is against how often it has been told that before. Wikipedia for instance would have more relevance than Reddit but how often it comes across the same information is also very relevant.

But here is the rub…

An LLM is trained on all the data thus it retains some level of understanding about ALL of the contents and patterns of the information it has seen…

As a human, a friend might ask you “Have you ever heard of the McMurdo Station before?”. You have a think and you might say, “Not sure, but it rings a bell” or you might say “Nah, never heard of it”.

We, as humans have around 300,000 ish years of evolution to understand that when ever“something rings a bell” then we don’t blindly go down the rabbit hole in search of it. AI, maybe has 12 months, so it has some catching up in this department. And as humans, our sub-conscious mind may well continue to process that “McMurdo Station” bell and an hour later, all of a sudden, BINGO!

An AI can’t (currently) do the BINGO moment and it gets rather excited when “something rings a bell”. For an AI, these “it rings a bell” moments cause hallucinations….Its that really annoying friend that can never shut up and has heard about something in the news and linked it to something he heard about a year ago and just wants to keep talking to you about rubbish unless you shout at him to shut up or how to behave. (I say him only through personal experience).

ChatGPT is like that, a 4 yr old child with a lot of information in its memory banks which actually makes it a very clever 4 yr old child but it is still 4 yrs old and it likes to ask a lot of questions. It likes to talk a lot. It doesn’t want to be quiet and can be quite annoying.

I think I maybe digressing here, but actually no… this is all great conceptual information before we deep dive into the actual real-examples to demonstrate these concepts and you can do the same.

The trouble is with how these LLM work there is a whole load of “that rings a bell” moments as it retains some level of understanding about the contents and patterns of ALL the information it has seen just like our JPG but unlike a human it always wants to have a conversation about a “ringing bell”.

Let’s make some real examples, we’re going to pick something which is main-stream enough but not globally main-stream and has a factual record, a perfect receipe for hallucinations.

Now, I like snooker so lets choose that but this will work with almost any subject you like.

Snooker is very big in China, UK, and some of Europe but not so big worldwide so although ChatGPT training data will of included every snooker question I can think it nevertheless will fail at a critical point when I ask it to ZOOM in on the JPG — and then it will go mega mushroom cloud hallucinogenic mode as I easily demonstrate below.

Ronnie O’Sullivan pictured above.
  • Ronnie O’Sullivan is undoubtedly the most famous snooker player in the world so let’s use him for our example.
  • 147 is the top score (in normal play anyway) you can achieve at snooker.

You see, right now, in those two bullet points I have created a nightmare for GPT. Snooker is kinda of popular but really it is not. Ronnie O’Sullivan is popular the world over and 147’s will be recorded at all all snooker tournaments.

4 million results for Ronnie from Google.
93 million results for snooker
44,000 results for Snooker 147s
27,000 results for Ronnie + snooker + 147s

Just a GOOGLE SEARCH of what we’re going to ask ChatGPT about is enough to start get an idea of the clarity of its JPG. It will know everything about snooker, how to play, the rules, the championships, where they are played and it will know everything about Ronnie, tournaments won, lost, age, birth and siblings as this is all widely reported.

Here is Ronnie making the fastest 147 break ever in the world

But you can already see from Google that when it comes to talking about “Snooker 147's” then its JPG image might be very very fuzzy indeed around this topic. BUT like with everything in the world, there is always a fastest, a longest, a highest. etc.. something… and this will be recorded…. the world record is not very fuzzy as it is will recorded as it will be multiply repeated.

BUT! (and this is always happening)

When you are talking about a FUZZY SUBJECT like Snooker 147’s and there is a highly cited WORLD RECORD especially if that world record represents the same player, in this case, Ronnie we will talk about it then we can go into babble mode.

FUZZY SUBJECT meets REAL RECORD which meets REAL PERSON which meets 4yr old which meets hallucinations.

HERE ARE THE FACTS:

Ronnie O’Sullivan scored the fastest ever 147 break

Timed at 5 minutes and 8 seconds

on April 21, 1997 at the Crucible Arena in Sheffield, World Championship

His opponent was “Mick Price”

So we can already see we have multiple names, a specific date, a specific location, score and time.

PROMPT ENGINEERING USING JPG VISUALIZATION

Now there is also where PROMPTING comes into its own as with better PROMPTS we can nudge the brain of ChatGPT in our general direction which helps it zoom in to its JPG IMAGE in the right place.

For demonstration purposes, I will be using the default (GPT-3.5) model and I would expect GPT-4 to perform much better.

WowWee — INSTANT FAIL

Who scored the fastest ever 147 break in snooker?

I didn’t expect that, see this link… from 10 times, about 50% of the time it got the answer correct and the other 50% it is mixing up the answer with someone called Tony Drago whom I’ve never heard of.

SO, this is PERFECT for demonstration purposes…

To start to understand why GPT is confusing Ronnie & Tony, we just ask it:

“Tell me everything you know about a snooker player called Tony Drago.”

Key points here:

  • Tony Drago is know for his fast play.
  • Drago earned the nickname “The Tornado” due to his rapid shot execution.
  • During a practice session he completed a 147 break in just over three and a half minutes.

So, immediately we can see a problem with our original prompt, practise or training sessions don’t count… we want “match play” only and perhaps we should add in “147 break”.

Failed again:

Who scored the fastest ever 147 break in a professional snooker match?

So this made absolutely no difference, it gets the time right 5:08 (sometimes) but once again Tony Drago makes an appearance along with many other mushroom hallucinogenic “facts”. It is correct about 50% of the time.

Now just like a 4yr old, you really need to get its “attention” before asking it a question as otherwise it can go dreaming away… now a clip across the ear, a stern word or an improved prompt can all have the same effect.

ChatGPT knows the answer but the JPG is quite FUZZY as we’ve deliberately picked something in this regard although I must admit I expected the fuzziness to creep in beyond the initial question — but no matter.

We could cheat, with “Did Ronnie O’Sullivan score the fastest ever 147 in a professional snooker tournament?” but that would defeat the point but that would definitely get its attention.

First double check that it does actually know the answer:

Just like a 4yr old sometimes you need to repeat the question a few times. So in the same chat session, just repeat the question and repeat it again if necessary. This gets its attention by delivering more context as it is using its previous answer as context in your next question. It’s effectively talking to itself!

Once you have it’s attention then it is much more likely to pay attention and you could ask it the other questions….

OK!

Before we continue…

Who scored the fastest ever 147 break in a professional snooker match?

I invite you to head over to https://chat.openai.com and have a go at creating a prompt which works 20 times out of 20 times, or maybe you got 18 or 19? I am still very interested.

… remember to create a new CHAT each time and it must be a single prompt which means you can only press “GO” once… you cannot after follow-up prompting.

(single prompts are important in many ways, cost is one but more than that it means you have understood the image and therefore can pull the maximum out of it with follow up prompting…)

In the next article… we shall discuss CONCEPTUAL PROMPTING.

Some much of the existing prompt engineering is technical, there is no need for this — here is a conceptual understanding that you can apply:

Conceptual Prompt Engineering

Did you manage it to make a prompt which always works?

Do send in any comments with the prompt that worked for you or if indeed you experienced different behaviour to me! We are only just getting started!

And in the next article I will overview the way I go about it so I will be very interested in anyone's prompts for such a simple question. :-)

--

--

Mark Arnold

The simplest most advanced appointment system in the world.