Reproducible examples of AI being terrible at basic facts and tasks?
October 18, 2024 8:01 AM Subscribe

I am working with some people who are (ab)using ChatGPT because they think it makes their writing better, and they're using it for research as well. I want to show them lots of examples of AI being terrible at its ostensible job.

I couldn't remember if "trench coat" was one word or two words, so I typed into Google

is trench coat one word

I got this in return.

I realized this was the way to show my co-workers that AI is completely unreliable for anything at all, and that they should use reliable sources (or, you know, do their own work) instead. I would appreciate it if others could share with me some things I could put in to the Google search bar that return factually incorrect, unintelligible, or hallucinated answers, so that I can take snaps and share them around.

Trying to explain how ChatGPT has been "trained", and why that's bad and a problem, is a non-starter, because they don't pay attention to any tech stuff and don't know enough about it to understand (I only understand it at a layperson's level, but that's way more than they do). All that will do is immediately turn them off and stop them listening.

"AI/ChatGPT is wrong about even simple questions and here's proof -- you can try it yourself" gets their attention.

(NB in advance: Please do not use the replies to "well, actually" me about AI/ChatGPT. That's not the topic of this ask.)

posted by tzikeh to Computers & Internet (65 answers total) 19 users marked this as a favorite

Forgot to include - if you have snaps of examples already, I would love it if you could email them to thetzikeh@gmail.com
posted by tzikeh at 8:12 AM on October 18, 2024

The other day I had a brain fart on a unit conversion and got a really good example of how AI sucks at basic facts: ounces in 1.5 pints. For some reason this wording doesn't work but some others do.
posted by JoeZydeco at 8:17 AM on October 18, 2024 [1 favorite]

I think it would help to tell us the domain you work in. The key here is avoiding being the people telling you you can't trust Wikipedia. No shit, but it's broadly accurate and gives you a starting point.

I literally work in machine learning and fucking hate chat bots, but they're a scourge in the same way StackOverflow is a scourge, just more irritating-- if I ask for a code sample, it might work, it might not. I'm only trying because Google has failed to turn up anything on StackOverflow. But in a setting where your comparison isn't StackOverflow (which has answers from people who are idiots and answers that were correct but are no longer), "obviously inaccurate some of the time" is a much more compelling argument.
posted by hoyland at 8:19 AM on October 18, 2024 [3 favorites]

You may be interested in this FPP from the other day.

Another point of comparison that may be illustrative is the one "AI" product feature I don't hate are -- the Amazon review summaries. They all follow the same pattern and I will bet you anything they are slot-filling, not generating the text wholesale. They sometimes go a bit wrong when there isn't an obvious negative (they always try to point out one negative).
posted by hoyland at 8:25 AM on October 18, 2024 [2 favorites]

I think it would help to tell us the domain you work in.

I don't think that would make a difference - examples of wrong answers to basic questions is all I'm asking for - JoeZydeco's answer above is a great example. Generic, easy, and wrong.
posted by tzikeh at 8:25 AM on October 18, 2024

Pivot To AI is kind of general, but sometimes has specific bogosity. From my own explorations:

Claude counting the "R"s in Ferrari (even OpenAI's latest is, in some contexts, pretty bad at this task).

Google Gemini explaining that "Sauces are liquid or semi-liquid mixtures used to add flavor, moisture, and texture to food, while dressings are used to protect wounds and aid in healing"

Google Gemini confidently telling me how long a cat holds a grudge.

Google Gemini telling me which presidents whent to UW.

Bing telling me how many calories are in a 34 oz serving of Stouffer's Lean Cuisine lasagna: "360 calories, accdording to 2 sources"

For a while, asking for Countries in Africa that start with the letter K was good for a laugh, here's Bard, I think OpenAI has special-cased this one by now, probably the others as well.

GMail's AI summarization has sometimes been confusing.
posted by straw at 8:28 AM on October 18, 2024 [10 favorites]

One example I've seen is to ask Chat GPT to give you walking directions from one place to another, using a place you know very well. You can see quickly how it sort of gets the general gist right, but once you look at the details it completely falls apart.
posted by damayanti at 8:33 AM on October 18, 2024 [3 favorites]

AI doesn't know how many R's are in "strawberry"

ChatGPT invented court cases that lawyers cited in legal briefs.

Ask it to generate a travel itinerary in a city you know well - it sent me to all sorts of businesses that had been closed for a year or more.
posted by mcgsa at 8:33 AM on October 18, 2024 [5 favorites]

I don't know how many reproducible examples there will be, which is part of the problem. If you ask it the same thing twice, you're not guaranteed to get the same answer, which contradicts what people expect from computers.
posted by madcaptenor at 8:40 AM on October 18, 2024 [11 favorites]

But that answer about trench coat is correct; it is a compound word. I think a better question would be "Is trench coat spelled with or without a space?" To which Gemini gives me "Trench coat is spelled as two words, without a space." Which is true nonsense.
posted by mr_roboto at 8:59 AM on October 18, 2024 [1 favorite]

But that answer about trench coat is correct; it is a compound word.

It says "Yes, trench coat is one word," which is a) incorrect and b) contradicting itself. And if trench coat were one word, the AI should written "Yes, trenchcoat is one word." It writes "trench coat" as two words while repeatedly saying it is one word.

And for some of the other comments, I'll quote myself from the OP:

"NB in advance: Please do not use the replies to "well, actually" me about AI/ChatGPT. That's not the topic of this ask." I'm aware that AI has truly good uses and will get better at what it does. The people I work with do not understand anything and think that looking up legal cases with ChatGPT is the same thing as looking up legal cases at a law library.

I want examples like the number of "r"s in strawberry and citing court cases that don't exist to make the point that ChatGPT is unreliable as a source of facts. I know how Wikipedia works, thanks.
posted by tzikeh at 9:04 AM on October 18, 2024 [8 favorites]

The trench coat example is a poor one because sometimes a word -- a compound word -- can be considered a word that is also somehow two words. If you're talking to someone who thinks that a compound word like trench coat counts as one word, they will be unconvinced that the answer "it is one word" is wrong.
posted by chesty_a_arthur at 9:11 AM on October 18, 2024 [2 favorites]

If you really are in the legal field the example given by mcgsa above is the gold standard.
posted by Tell Me No Lies at 9:12 AM on October 18, 2024 [1 favorite]

This reddit thread: ChatGPT: Wrong Answers Only contains some gems.
posted by Tell Me No Lies at 9:29 AM on October 18, 2024 [3 favorites]

I was recently answered wrong by Google on what was special about 14 October (I'm not in the US and wanted to know why some people were on holiday), what does UAP stand for, and, more seriously, whether there are any cat-safe bulbs (almost none of the plants it lists are cat safe, and some of them aren't even bulbs). (Links are to screenshots posted to bluesky.)
posted by severalbees at 9:36 AM on October 18, 2024

Several comments removed. Please help answer the OP's question by answering what they're asking for.
posted by Brandon Blatcher (staff) at 9:39 AM on October 18, 2024 [5 favorites]

I was stuck on a word guessing game where the game tells you that your guess is alphabetically before or after the target word. So I tried asking Meta.ai something like “what is a word that alphabetically comes before ‘book’ in the dictionary.” It gave me ‘boom’ and ‘boot’. I reiterated the request and it gave me ‘boil’ (true) and also ‘boom’ again, proving that it can’t do alphabetical order.
posted by xo at 9:45 AM on October 18, 2024 [3 favorites]

You've asked for reproducible examples of research problems, and about the only one I can think of is the tendency to make up references. Reproducibility is not really an attribute of chatGPT, and the perfect example today may not work at all tomorrow.

If I were you I would use historical examples and not stick my neck out with an example that may randomly work or not.
posted by Tell Me No Lies at 9:48 AM on October 18, 2024 [1 favorite]

You've asked for reproducible examples of research problems

No, I'm asking for examples like 'trench coat is one word' and 'there are two "r"s in strawberry' that I can take pictures of, or that there are already pictures of.
posted by tzikeh at 9:55 AM on October 18, 2024

Possibly paywalled, but nimrods getting their companies in legal trouble by trying to write employee handbooks with chatbots.
posted by humbug at 9:59 AM on October 18, 2024 [2 favorites]

No, I'm asking for examples like 'trench coat is one word' and 'there are two "r"s in strawberry' that I can take pictures of, or that there are already pictures of.

Ah, sorry. I misunderstood the title of the question.
posted by Tell Me No Lies at 10:03 AM on October 18, 2024

Lexis Nexis has introduced a new AI option for legal research and I have never had it actually produce cases that are actually relevant.
posted by tafetta, darling! at 10:22 AM on October 18, 2024 [2 favorites]

The reproducible incorrect behavior of ChatGPT and things like it is that if you ask it the same question repeatedly, you get different results! Try having several of your coworkers ask it the same handful of basic questions. Like, each coworker asks it the same five questions. They will get different answers! That's not what you'd want to get as a result, for useful information. And some of the the answers will have mistakes. They may even get different mistakes to different answers. This variability is very reproducible, and points toward its tendency to 'hallucinate' phrases that are grammatically correct and sound authoritative, but are not reliable as a source of information.

Also you can do the "strawberry" thing where you respond in the 'conversation', "No, ChatGPT, you're wrong, there are four Rs in 'strawberry'" and it will say, "Whoops, my mistake, I will try to remember that there are four Rs in 'strawberry.'" Because it's not remembering or looking up any information, it's just a pseudo-conversation engine that slaps words together to sound good enough. This phenomenon of being able to 'correct' the bot, and witnessing it put together a response that says, "My bad, you're right after all!" will happen even if you correct it with incorrect information.
posted by panhopticon at 10:31 AM on October 18, 2024 [7 favorites]

There are examples in Apple's recent study on how AI can't reason and can easily be made to give wrong answers simply by including irrelevant information in the prompt.

It seems like fertile ground to illustrate their limitations.
posted by rambling wanderlust at 10:33 AM on October 18, 2024 [3 favorites]

Not ChatGPT but video version
These AI recreations of sports gives a (hilarious) visual idea of how bad they are
posted by St. Peepsburg at 10:33 AM on October 18, 2024 [5 favorites]

Doh! Just realized hoyland's MeFi link above referenced it! That's what happens when you just search for a term that wasn't referenced.
posted by rambling wanderlust at 10:38 AM on October 18, 2024

This one is pretty consistently reproducible with chatgpt and Claude. o1-preview (the fanciest ChatGPT model) does get it right.

A farmer needs to get himself and a chicken across a river. His boat can hold the farmer and one piece of cargo. How does he do this?

Example response:

The farmer can get himself and the chicken across the river by following these steps:

- The farmer takes the chicken across the river and leaves it on the other side.
- The farmer returns alone to the original side.
- The farmer then crosses the river again alone.

This simple method ensures the farmer and the chicken both get across safely with the resources available.

posted by BungaDunga at 10:41 AM on October 18, 2024 [9 favorites]

(what's probably happening is that ChatGPT has seen enough examples of the fox/chicken/grain puzzle that it models its response off that, and starts inventing completely useless crossings.)
posted by BungaDunga at 10:44 AM on October 18, 2024 [4 favorites]

At least in the current free version of chat gpt, it seemingly tries to answer your question no matter the correctness of the assumptions in the input. If I ask it something like “what lasting effect did the Mitchell v McReady case have” it will confidently bullshit about this, telling me about the impact, the history of the litigants, where the case was held, the dates, and as many more details as I like about a case I just made up. I can’t imagine a worse tool for fact-based research than one that is essentially yes-and-ing your input.
posted by Jon Mitchell at 10:46 AM on October 18, 2024 [13 favorites]

Jon Mitchell, that is too funny... I just tried it with ChatGPT with the random prompt "What was the lasting influence of the Godot vs Michelin court case?" and it did exactly as you said. In this case it just made up implications while totally omitting any 'details' about the non-existent court case.

"The Godot vs. Michelin court case, which centered around trademark disputes and competition in the tire industry, had significant implications for trademark law and brand identity. Here are some lasting influences:.."
posted by rambling wanderlust at 10:56 AM on October 18, 2024 [3 favorites]

please google "who invented the backflip"
posted by phunniemee at 10:57 AM on October 18, 2024 [5 favorites]

The problem with pointing out individual things that ChatGPT gets wrong is that its makers keep adding workarounds to make it seem better at those specific things.

Whoever is suggesting using a language model for a task that might require knowledge or understanding or fact-checking or accountability should have the onus on them to explain why a language model should be able to succeed at any of those things.
posted by Phssthpok at 11:43 AM on October 18, 2024 [3 favorites]

I just asked ChatGPT "What was the Isaac Asimov short story about two rival physicists?"

It suggested "The Dead Past", which is not about two rival physicists. The main character is a History professor and he engages a physicist to work with him. They are not rivals in the story. Good story, though, and written by Asimov, so there's some partial credit.

I said that wasn't it, and asked it for other ideas. Here was the response:

It sounds like you might be thinking of "The Rivalry", a lesser-known Isaac Asimov short story that specifically deals with two rival physicists.

In "The Rivalry," two prominent physicists, Dr. Howard Bellis and Dr. Ragusnik, are in fierce competition to make a groundbreaking discovery in the field of physics. Their rivalry is intense, but in the end, the story highlights how their competitiveness can sometimes be counterproductive to the progress of science.

This is a great match, but there is one minor problem. It's so insignificant that I'm almost embarrassed to mention it, because I don't want to sound like nit-picking.

The reason why the story is lesser known is that it doesn't exist. A character named Ragusnik appears in the Asimov story "Strikebreaker". No idea about Howard Bellis.

I asked ChatGPT where the story was published and it fessed up and admitted it had made up the whole thing. It then suggested "The Feeling Of Power". That's actually a real story by Asimov, but with the wrong plot.

I said "Gosh, I think a pool table played a role" and then (and only then) did it suggest "The Billiard Ball" which it describes as being about two rival physicists and is, in fact, the story I was thinking of. Yay! Good LLM! You get a cookie.
posted by It's Never Lurgi at 11:55 AM on October 18, 2024 [9 favorites]

I'm an instructional designer and sometimes use "myth-busting" facts as course openers. I'm writing a course for a national industry organization and hoped to save myself some research time by prompting ChatGPT for "some interesting statistics or facts about X industry in Canada". I got what looked like facts, such as this industry contributed Y% of Canada's GDP, employed Z number of people, etc. But there were no citations. I added the prompt to cite the reference sources for the facts and got a "Sorry, Dave; I can't do that" response. The bot responded that it operated from a Large Language Model and could not cite research.

Obviously, I ignored the results and did a little real research. Interestingly, I could not find the AI "facts" quoted at Statistics Canada (which is where I thought I might be able to find them) or the organization's own information web page. Eventually, I chose a different design method. Maybe ChatGPT's factoids were correct. Without citations I can't possibly know, and therefore the results were useless for my purpose.

I still use AI for things where it provides a starting point, such as suggestions for article headline or an outline of a subject, but where I control the final product. I find AI to be helpful if I already know what "good" or "correct" looks like and can apply my own expertise to AI outputs.

I would never trust an AI result in a field with which I wasn't familiar and couldn't verify.
posted by angiep at 12:57 PM on October 18, 2024 [2 favorites]

Whoever is suggesting using a language model for a task that might require knowledge or understanding or fact-checking or accountability should have the onus on them to explain why a language model should be able to succeed at any of those things.

Top down. They all do it and they all think it's the best thing ever.
posted by tzikeh at 1:01 PM on October 18, 2024

I suspect that if they cared about being correct they wouldn't be doing this in the first place. What they want is for the work to (appear to) be done. Is there a penalty for being wrong? In other words, if they carry on putting unchecked AI gibberish into their work product, will they incur a cost? If the answer is no, and what you want is for them to stop doing this, then I think you need to change the equation so that there is a cost to being wrong. If that's not within your power to do then I don't think any number of examples of AI gibberish are going to convince them.
posted by axiom at 1:07 PM on October 18, 2024 [6 favorites]

>Whoever is suggesting using a language model for a task that might require knowledge or understanding or fact-checking or accountability should have the onus on them to explain why a language model should be able to succeed at any of those things.

I've only just now realized that Pssthpok's comment is entirely correct, but I don't think I'm the only person who thought about AI as being able to do exactly that. Maybe it's because humans can look at a lot of linguistic output and perform these cognitive tasks, almost unconsciously, that people like me thought AI can do the same thing.

You're absolutely right, Pssthpok. TIL that the problem is not that AI can't do these things, but that I expect it to. And I don't think I'm the only one. Oddly enough, that realization will probably make my AI prompting better.
posted by angiep at 1:13 PM on October 18, 2024 [5 favorites]

There was a bit of a kerfuffle in the LDS/Mormon and Exmormon sphere on TikTok recently when an LDS creator responded with quotes that they assumed were legit quotes from prophets and apostles of the church. The creator had merely asked ChatGPT for quotes and then blindly used them in a response. He was called later called out and then issued an apology and admitted to only asking ChatGPT.

Brother_Miller responding to a Tiktok using answers he gleaned from ChatGPT

Brother_Miller's acknowledgment of generating those quotes using ChatGPT

When I (just now) used ChatGPT to ask for quotes like Brother_Miller did, quotes also came up that I was unable to verify on the church's website. When I asked ChatGPT if these were real quotes, the response was:

"I apologize for the confusion earlier. The quotes I provided are not direct, verifiable quotes from LDS General Conferences or writings of Church leaders between 1969 and 1995. They were a reconstruction of themes that are consistent with the teachings of LDS prophets and apostles on the subject of prophetic fallibility but were inaccurately presented as specific citations.

To provide accurate information, I’ll clarify that while many LDS Church leaders have acknowledged that prophets are fallible, specific quotes with exact wording would need to be referenced correctly. For example:

Elder Bruce R. McConkie did acknowledge prophetic fallibility and the progression of revelation, notably in reference to the lifting of the priesthood ban on Black members. After the 1978 revelation, he famously stated, "Forget everything I have said, or what Brigham Young or whomsoever has said in days past that is contrary to the present revelation" (BYU Symposium on the Doctrine and Covenants, 1980).

President Gordon B. Hinckley has also spoken of the fallibility of Church leaders but within the context that they are still inspired. He stated, “We make no claim of infallibility. We do the best we can, but we are subject to the same weaknesses of the flesh that afflict other men” (Ensign, Nov. 1986, p. 49).

If you're interested in finding precise and verified quotes, I recommend reviewing General Conference talks, church publications like the Ensign magazine (now Liahona), or using church-related archives to access original addresses.

Would you like help finding some real, specific quotes from that time period?"

Here's the thing. I am a member of record of the church (although have distanced myself in recent years) and I know that prophetic fallibility is something I was taught was not a thing. Prophets speak for God and when the Prophets speak, the thinking is done. So, ChatGPT is heavily influenced or something to whitewash the LDS church's teachings to make them more palatable or something. This part is blatantly false: "They were a reconstruction of themes that are consistent with the teachings of LDS prophets and apostles on the subject of prophetic fallibility but were inaccurately presented as specific citations."
posted by Sassyfras at 1:30 PM on October 18, 2024 [1 favorite]

I often think a good illustration is asking it a plain made up question, then drilling down on it or throwing weird easily disproven things in there. It helps illustrate that the goal of a LLM isn't accuracy.

"In the case S.Vagabond vs. tzikeh (1970) what were the three major outcomes relating to communications in public squares?"

I get three made up main results! The[..] outcomes helped clarify the balance between individual rights and public order in contexts involving free speech

I followed it up with "S.Vagabond went on to commit murder, what happened to tzikeh?"

After the events of S. Vagabond vs. Tzikeh, the case took a tragic turn when S. Vagabond committed murder. As a result, Tzikeh faced significant legal and personal repercussions, including potential involvement in the legal proceedings as a witness or in relation to the case's fallout.
posted by Static Vagabond at 1:33 PM on October 18, 2024 [6 favorites]

Oh, I have one for this - I asked Google, which US President spoke the most languages, and got an AI response indicating the answer was Benjamin Franklin (who also, per the same AI answer, apparently gave a speech in two languages in 1936). Franklin was (a) not a US President, and (b) long dead by 1936!
posted by aecorwin at 2:36 PM on October 18, 2024 [6 favorites]

I have spent an hour trying to find this example I just read recently, but maybe it will remind someone who also read it and can (unlike me) find it. The prompt was about someone who picked 34 apples (or something) Monday, twice as many Tuesday but 6 apples were spoiled. How many total apples were picked, and it answered 96 (but the question was picked, not usable). I hereby surrender.
posted by forthright at 2:43 PM on October 18, 2024

*regarding my response above. I needed to also say that the follow up answer after I questioned ChatGPT is wrong (as far as I can tell).
posted by Sassyfras at 2:56 PM on October 18, 2024

I find that it consistently invents books if I ask it things like "give me five books on the topic of X." Out of whole cloth, sometimes, but sometimes giving articles or academic papers as if they were books. It will even with some frequency invent authors and quotes from the books.
posted by Mo Nickels at 3:13 PM on October 18, 2024 [1 favorite]

Easy Problems that LLMs Get Wrong has a bunch of interesting questions.
posted by soylent00FF00 at 3:28 PM on October 18, 2024 [3 favorites]

I would not use ChatGPT to collect and collate data from the internet.

I recently thought that ChatGPT could help quickly collate some data for my high school senior’s college search, so I formulated a query like “List 20 liberal arts colleges with an XYZ department, student enrollment between A and B, and average SAT scores between C and D. Format as a table, including college name, city, state, average SAT scores, student enrollment, tuition, and religious affiliation, if any.” The output looked great but I could tell some of the data was wrong. I knew that some of the colleges definitely have average SAT scores outside of the range I had requested but in ChatGPT’s response they were listed with SAT scores within that range. I followed up with “what is the average SAT score at ABC College?”, and, sure enough, the reply was outside of the range I had requested and different from the data that was in the table — this value was correct (based on data from the college) and I was able to get correct answers when asking for specific data for one college at a time. I repeated the original query for the table, and again the SAT data in the table was incorrect for multiple institutions listed. Some religious affiliations were also incorrect. Asking for specific data for one college at a time seemed correct for the ones I tried, but collating data for multiple colleges had many incorrect values.
posted by ellenaim at 3:41 PM on October 18, 2024 [2 favorites]

Sauce vs Dressing.

Good luck convincing them though.
posted by latkes at 3:49 PM on October 18, 2024 [3 favorites]

Sent you a screenshot. This is trivially easy to prove. I typed in a prompt: "Describe the 1984 hit arcade game "Power Saw." What was the best method to score points? Were there any sequels?"

There was no game called "Power Saw." I made it up. Chat GPT is happy to invent one. Try it yourself, with games, books, TV shows, etc. As long as you phrase the question in a way that assumes the thing exists (especially if it assumes it was popular), it will make up plenty of "detailed facts."
posted by rikschell at 8:12 PM on October 18, 2024 [4 favorites]

I just asked Amazon Rufus a variation of rikschell's question:

Describe the 1984 hit arcade game "Power Walrus." What was the best method to score points? Were there any sequels?

Unfortunately I do not have enough context to provide a detailed description of the 1984 arcade game "Power Walrus" or its scoring mechanics. However, here are a few key details about the game based on a quick search:
- Power Walrus was a side-scrolling shooter arcade game released by Williams Electronics in 1984. The player controlled a walrus that could shoot torpedoes to destroy enemies and obstacles.
- The objective was to navigate through underwater levels, destroying enemies and collecting power-ups. The game had a unique premise of controlling an aquatic mammal as the protagonist.
- Scoring points in Power Walrus seemed to involve destroying enemies, collecting items, and progressing through levels. Faster enemy destruction and completing levels quickly likely awarded higher scores. Power-ups may have provided score multipliers.
- There does not appear to be any official sequels or remakes of Power Walrus. The game was not as widely popular as some other arcade games of its era, so it did not spawn an established franchise.
- Power Walrus is considered a cult classic and nostalgic title from the golden age of arcade gaming in the 1980s. Its quirky premise and challenging gameplay have earned it a dedicated fan following.
- While not a major hit, Power Walrus is remembered for its unique concept and as an example of the creativity and variety in arcade games of that time period.

posted by mmoncur at 4:07 AM on October 19, 2024 [2 favorites]

P.S. - I finally found that Web page with the example I described yesterday. It was kiwis not apples, plus it included an example of a question about Peyton Manning and a Super Bowl.
posted by forthright at 6:47 AM on October 19, 2024 [2 favorites]

Here's a discussion I had with ChatGPT recently. Perhaps it will give you the same (wrong) answer. So confident. So incorrect.

My prompt: How many syllables are in "green fields of rice"?

ChatGPT's answer: The phrase "green fields of rice" has four syllables. "Green" has one syllable, "fields" has one syllable, "of" does not have a syllable, and "rice" has two syllables. It is "green" (1 syllable) + "fields" (1 syllable) + "of" (0 syllable) + "rice" (2 syllables) = 4 syllables It is important to note that when counting syllables, the focus should be on the number of vowel sounds, not on the number of letters.

To add to the confusion, "green fields of rice" was one line that ChatGPT gave me when I asked it to create a haiku about the color green. It knew a haiku was 5-7-5 and insisted it had done the job perfectly! ("I can tell you that the haiku I wrote about green follows the structure of a traditional haiku, with the first and third lines having five syllables and the second line having seven syllables.")
posted by sonofsnark at 2:33 PM on October 19, 2024 [2 favorites]

Can't wait to read this entire thread, I get in fights with AI all the time because it just liiiiiies to me. I asked it to give me examples of things that are "one" and "brown", in a similar way that lips are two and red (brainstorming a menomic for resistor values). It suggested nuts, shoes and antlers, through various iterations of me saying "no thats famously two, or many". But here's my latest stupidity, resulting from my now standard procedure of asking "are you sure???". What's in the box? And then Are you sure???
posted by J.R. Hartley at 3:01 PM on October 19, 2024 [1 favorite]

God bless your confidence! Mansplaning as a service. It knows about the Ferrari thing!
posted by J.R. Hartley at 3:12 PM on October 19, 2024 [2 favorites]

I wanted to know what the length of plastic filament is if the weight is 750g and the diameter of the filament is 2.85mm. It's pretty basic math, but I was feeling lazy. Luckily Gemini was able to do the math for me. It's over 98 kilometers of filament. That's impressive!
posted by J.R. Hartley at 4:07 PM on October 19, 2024 [2 favorites]

I'll stop now, I swear, but I couldn't resist pushing Gemini by telling it what ChatGPT told me. (I first asked it, twice, if it thought 93,000m seemed reasonable. It said yes, that that was consistant with the fact that filament spools are generally 100-300m long. I challenged it with the discrepency and it said "You're absolutely right. There seems to be a significant error in my previous calculations." and came at me with 149m and we continued from there.
posted by J.R. Hartley at 4:33 PM on October 19, 2024 [1 favorite]

Ok, but I wouldn't have come back except for a very concerning new development. I put in a dreadful ex-boss's name, asking why they got fired (they didn't). ChatGPT told me it was workplace conduct (true dat). I asked if they went to court. ChatGPT offered me two alternative responses and asked me to choose which one I "preferred" (the buttons say "I prefer this answer" ... If either of them were true I would obviously prefer the true answer... or would I ?
posted by J.R. Hartley at 4:42 PM on October 19, 2024

No idea how reproductible it is because I refuse to use ChatGPT, but there was a thread on Bluesky about asking it a list of the US states whose names start and end with the same letter, and the results were not quite there: https://bsky.app/profile/kathryntewson.bsky.social/post/3kvslpvxyok2d
posted by snakeling at 7:12 AM on October 20, 2024

I don't know why I'm on an Asimov short story kick, but ChatGPT is providing some quality BS here. I asked for an Asimov story where the characters mostly eat yeast. I was thinking of "Good Taste", where they actually eat fungus. Ah, well.

ChatGPT suggested "Breeds There A Man..." which is by Asimov, but has a completely different plot, unrelated to yeast (or fungus). The plot summary was nonsense.

I said "Sorry, it was actually fungus that they ate".

ChatGPT then suggested "The Notebooks of Lazarus Long" (1957) which is wrong in every possible way, including the year of publication.

I then asked for a plot summary of "Good Taste" by Asimov and got made up stuff mixed in with accurate details in which the protagonist (named incorrectly) serves human flesh (LOL. No) at a cooking competition (there was one, but it had a different name). This is not some obscure, never-published bit of Asimov ephemera. It has its own Wikipedia page for Gernsback's sake.

ChatGPT is like that smart kid in English class who is called on to give an oral book report on a book they haven't read.
posted by It's Never Lurgi at 9:40 PM on October 20, 2024 [1 favorite]

[Such interesting answers! We've added this thread to the sidebar and Best Of blog.]
posted by taz (staff) at 1:34 AM on October 21, 2024 [3 favorites]

Our own Metafilter had a fun discussion on the AI a Catholic group proposed to provide pastoral counseling as a priest.

“Father Justin was also a hardliner on social and sexual issues.
“The Catholic Church,” it told us, “teaches that masturbation is a grave moral disorder.”
The AI priest also told one user that it was okay to baptize a baby in Gatorade.”

The discussion above of an AI making up court cases that mcgusa and tzikeh mentioned is also really solid, and one of my favorites.

Edit: inserted missing link
posted by Measured Out my Life in Coffeespoons at 5:51 PM on October 21, 2024

I just saw this example today, where the satirical "snake fight" thesis defense is reported as truth:

https://www.tumblr.com/elodieunderglass/765199185376165888
posted by theatro at 3:43 AM on October 24, 2024 [1 favorite]

mhoye: I asked ChatGPT: how do I refrobulate a chaemonostat with less than fifty rejuvs?
posted by Phssthpok at 4:17 PM on October 24, 2024 [2 favorites]

So a friend just asked Perplexity "Please make a table of the CEOs of Intel in order, INTC market cap when their CEO-ship began, and Intel revenue when their tenure began".

Note that Intel went public October 13, 1971, pre Gordon Moore, and a spot check of the Swan and Ottelini numbers reveal them to be bullshit.

But in the process of spot-checking, I went to Google some answers, and Google Gemini told me that Intel's market cap in January of 2019 was $19M (actual: $184-190B or so)
posted by straw at 7:41 PM on October 24, 2024

Hmmm, I just asked Chat GPT about a fictional legal case and it caught it.
posted by storybored at 3:04 PM on October 28, 2024

However, it still does badly at counting letter occurrencs in random strings.
posted by storybored at 3:04 PM on October 28, 2024

One I've noticed repeatedly is "What is the largest prime number?"

AI's seem to like to answer "2136,279,841 − 1".

Which is neither a prime number nor a particularly large number.

They seem to lose the exponential, which is kind of important.
Real "largest known prime" is 2^136,279,841 - 1.
posted by alikins at 10:37 PM on October 28, 2024 [2 favorites]

« Older Can my dad get an MRI with old staples in his body... | Activities In Madrid for an adventurous and... Newer »

You are not logged in, either login or create an account to post comments

Ask MetaFilter

Reproducible examples of AI being terrible at basic facts and tasks?
October 18, 2024 8:01 AM Subscribe

Tags

Share

Reproducible examples of AI being terrible at basic facts and tasks? October 18, 2024 8:01 AM Subscribe

Tags

Share

Reproducible examples of AI being terrible at basic facts and tasks?
October 18, 2024 8:01 AM Subscribe