WTF ChatGPT?
March 16, 2025 7:31 AM   Subscribe

What happened here?

Please explain it like I don’t anything about how LLMs work, because I don’t.

I’ve seen the dingus do good work (Its answer to “Why did Bishop give back the gun?” not only correctly inferred the context but gave as through and reasoned an answer as one could have asked for) and bad (very confused on who killed whom in The Godfather). But this seems like an example of the thing being “bullied” into a hallucination—despite my not being very assertive. (Which I think it can pick up on?)

So confused.
posted by Lemkin to Technology (15 answers total) 1 user marked this as a favorite
 
Here's my take, as someone who knows just enough about LLMs to be dangerous. You didn't ask a question, you made a declarative statement that you believed she'd done a commercial, and the LLM took this as a given & expanded on this idea. (Though they will also sometimes hallucinate when answering questions.)

Have you heard the term "stochastic parrot"? LLMs are mimics that give statistically plausible responses to human inputs, and they're trained to be helpful - so it's really easy for them to hallucinate when given user inputs that are ambiguous or misleading.
posted by toastedcheese at 7:42 AM on March 16 [6 favorites]


LLMs don’t “know” anything. They look at a bunch of data on how humans talk and try to sound like a human. They have a lot of examples to draw from, and they use bits of those examples to respond to you. Sometimes the information is correct. Other times it’s not. The model isn’t designed to give correct information—it’s designed to sound like a human. It might agree with something that is totally incorrect and make up details to support it. It doesn’t matter that none of this happened. It achieves the goal of sounding like a human having a conversation—and that’s all it was designed to do.
posted by brook horse at 7:49 AM on March 16 [19 favorites]


To further clarify, it’s building off of your response—that’s why you got what you did. If you give it a totally incorrect premise, it will sometimes contradict it, but other times it just continues the conversation regardless of its basis in reality because the goal is simply to sound like a reasonable response—and it has no way to “check” if something is actually true or not.
posted by brook horse at 7:56 AM on March 16 [8 favorites]


I asked chatgpt:
"Did Kim Gordon ever provide voiceover work for a Cap'n Crunch commercial?"
Reply:
"There is no verifiable evidence that Kim Gordon, co-founder of the band Sonic Youth, ever provided voiceover work for a Cap'n Crunch commercial. This notion appears to have originated from a discussion on MetaFilter, where a user shared an interaction with an AI language model that incorrectly affirmed such a commercial existed.
This incident highlights the potential for AI-generated misinformation. After reviewing available sources, it seems that any connection between Kim Gordon and Cap'n Crunch is unfounded."

If you think it is wrong, just challenge it
posted by canoehead at 8:01 AM on March 16 [5 favorites]


It can't "pick up on" things like how assertive you are being.

You say that you don't (know?) anything about how LLMs work. Although in general most of the LLM products you encounter will also have been grafted on to other components that help shape and guide their output, I think you'll find it helpful to keep in mind that the central LLM is a predictive text generator. The Large Language Model is essentially a large and detailed map of statistical patterns in the text on which the model has been trained. From that map, it generates plausible output.

That output is conditioned on the context you provide with your prompt, but again as a matter of statistical patterns -- what words plausibly follow as responses, given the starting point of the linguistic context of your prompt?

The core capacity of these systems to produce language that feels natural and fluent relies on this predictive process. Whatever other constraints are put on the system, the language generation always works as a predictive text generator.
posted by redfoxtail at 8:03 AM on March 16 [5 favorites]


Response by poster: it has no way to “check” if something is actually true or not

Is that an insuperable problem given today's technology? Seems like someone could let it google, find nothing on point, then at say at least something like "I haven't come across that before."
posted by Lemkin at 9:09 AM on March 16 [1 favorite]


If you think it is wrong, just challenge it

The other thing is that these LLMs are designed to be agreeable. When Bing Chat was first turned on, the personality was, uh, a bit too feisty.

my theory is that LLM makers had two options: either have their chatbots double down on their statements or meekly accept correction from users. the logic looks like this to me:
                   | LLM is mistaken | LLM is correct
-------------------+-----------------+---------------
deny correction    | LLM's error     | user error
accept correction  | all good        | user error
They have only vague and aspirational control over whether the LLM is correct or not, but they do have direct control over whether the LLM is meek or not, and taking the bottom row means that the LLM appears to make fewer mistakes, or at least fewer mistakes that can't be blamed on user error.

And, when an LLM was actually wrong, and accedes to the correction, the user then feels like they've taught the LLM something, which makes them more positively disposed towards it, even though the LLM got it totally wrong. People are totally willing to accidentally clever hans themselves into thinking these LLMs are more clever than they really are, and it's way easier to do this when the LLM is agreeable than when it's not, even when it's not actually any more clever or correct.
posted by BungaDunga at 9:12 AM on March 16 [6 favorites]


ChatGPT does have a "search" mode that lets it, basically, summarize Google search results for you. It's still easily confused, and of course these days Google search results are polluted with LLM output, so ymmv.
posted by BungaDunga at 9:14 AM on March 16 [3 favorites]


(correcting an LLM makes the user feel clever, the LLM meekly agreeing makes the user feel like the LLM must be clever because it "recognized" that the user is right. So tweaking an LLM to be meek can endear it to users. people often mistake "agrees with me using longer words" for intelligence in other people so it's not surprising that it works for LLMs)
posted by BungaDunga at 9:20 AM on March 16 [5 favorites]


Yes, LLMs now have access to the open web, but as I understand it they're not "searching" it as such - their primary functionality is producing text, not finding or analyzing information.

I suspect an LLM would really struggle to, in this case, prove a negative - that Kim Gordon *didn't* do these commercials. There are no websites dedicated to all of the commercials members of Sonic Youth *didn't* do, so the only relevant source is you stating that she did.
posted by toastedcheese at 9:21 AM on March 16 [3 favorites]


If you think ChatGPT and related things are giving bad information now, imagine what it will be like when the internet is flooded with LLM generated nonsense.

There is a theory that Russia is already doing that. They generate millions of pages of propaganda which isn't easily accessible to people, but can get slurped up by LLMs, thus influencing the next generation.
posted by It's Never Lurgi at 9:22 AM on March 16 [4 favorites]


Is that an insuperable problem given today's technology?

Mostly. You can't point an LLM at a web search and have it understand what it gets, because LLMs don't really understand things.*

What it would normally do if you type "I thought Kim Gordon did a Cap'n Crunch commercial" is generate some text that's statistically likely to follow that sentence.

Say you let it search the web, so it generates a screen full of search results. To a tenth-assed approximation, now it's going to generate some text that's statistically likely to follow that prompt and screenful of search results. Maybe "I haven't seen that before" would be likely to follow that big bolus of textvomit, but depending on the training etc it might turn out that "What you're thinking of is that Kim Gordon is the real Archduchess Chocula" is even more likely text.

*But a truly huge pile of the statistical associations in English text sometimes results in close approximations of a thing that looks like understanding, which is where you start getting Very Interesting questions about how much we're understanding and how much we're doing a big statistical associations thing that approximates understanding.
posted by GCU Sweet and Full of Grace at 10:08 AM on March 16 [3 favorites]


Best answer: Is that an insuperable problem given today's technology? Seems like someone could let it google, find nothing on point, then at say at least something like "I haven't come across that before."

This sort of process would all be handled by the kind of thing I mentioned in my answer above as "other components" that can be combined with, or grafted on to, an LLM for a specific tool. But the generative LLM component itself doesn't google, find things, assess whether it had found something "on point," or anything else of that kind. As long as you are keeping an LLM in the loop, what it is contributing is the capacity to generate predictive text on the basis of a combination of what it has been trained on and the context provided to it.

So, broadly, the elements of "today's technology" that help with this part of the process are separate from and, in many ways, actively in tension with, the underlying technology that uses the LLM to generate natural-sounding text.

==

Bonus content:

That said, LLMs (again, remember that this stands for "Large Language Model" -- the detailed model of statistical relationships between words in the training data) can also be used to create automated systems for classifying text-based data.

This kind of system can use the language model in combination with other components to create an algorithm to predict what category a new piece of data belongs to. So the existence of LLMs can help programmers make more effective classifier systems for certain kinds of data. One possible use case could be a tool for estimating the likelihood that a given set of google results does or does not contain information that is relevant to a given search question.

THAT said,

- It's far from a foregone conclusion that anyone has developed (or will develop) a purpose-built classifier of this kind that is reliable or useful, especially for the very wide range of content types that come up in google searches.
- This sort of thing would be a separate module, not part of the chat function itself. Its contribution could be to determine when the generative text component should be bypassed altogether, or pass certain constraints to that component.
- The generative text part will still function as described above, producing text on the basis of what is statistically plausible.
posted by redfoxtail at 10:12 AM on March 16 [3 favorites]


Is that an insuperable problem given today's technology? Seems like someone could let it google

An LLM isn’t designed to analyze factual information. It’s designed to analyze language patterns. Basically, you would need to create an entire new type of program that analyzes factual information, and then you could attach it to an LLM to get a natural-sounding response. But current LLMs aren’t attached to anything like that. So we have to come up with that technology first.
posted by brook horse at 10:30 AM on March 16 [6 favorites]


A lot of people assume, like you do, that the "point" of Chat-GPT is to retrieve information, but it's actual point is to chat to you.

I work with a data model that has been given some selective information about certain concepts and then just a shitload of really objective business data - how many years in business, how many users of a specific software, which modules they own, what general industry, how many support tickets, just like really normal business operations data - and I still have to tell it "Do not hallucinate, do not give answers that cannot be explicitly derived from the data, do not make assumptions, do not suggest alternate outcomes" and on and on to tell IT how to tell ME what I'm trying to find out.

I ended up asking "why can't we just tell it not to do that stuff?" and the data scientists were like "we could, but it might unduly skew the data, plus also then it wouldn't be able to if we wanted it to" and I was like "you mean hallucinate" and they were like "well, be really creative in its responses" and then I had to go get under the covers for a while because I'm pretty worried about the world right now. Like, basically they told me I'd have to engineer the absolutely perfect prompt that all users of this model agreed were (probably) the correct limitations to apply to it and they would make me one, but it's easier to waste my time having to come up with and apply the perfect limitations every time so I probably shouldn't ask.

And this is apparently how the monsters of tech do it too.
posted by Lyn Never at 4:47 PM on March 16 [4 favorites]


« Older How long to leave a knock-off Command strip to...   |   E Newer »

You are not logged in, either login or create an account to post comments