How can I compare the frequency of two words in English?
July 7, 2023 5:04 AM Subscribe
I want to compare two words to see which is more commonly used. ("Shard" vs. "shart") I don't want to compare searches, because people don't necessarily search on a word they know well. Google Ngram seems to stop at 2012, and only measures uses in books. Is there a tool that compares how common words are across the Internet?
I came to recommend the same corpus, which I've used in the past, when it didn't require registration. It used to just be a corpus of American words, but now includes a British corpus as well.
posted by pangolin party at 6:11 AM on July 7
posted by pangolin party at 6:11 AM on July 7
Maybe I'm missing something but "shart" seems to be a recent coinage meaning expelling of feces along with a fart. MW gives 2003 as first recorded usage in print. "Shard" may have that meaning in some circles (eg urban dictionary does have that sense, as the third entry, after two for crystal meth, dated 2012), but it is also a common English word meaning a piece or fragment, going back to Old English, with the modern spelling occurring for a few hundred years at least.
My point being, using corpora as listed above will give you a literal comparison of frequency of those strings, but it will be largely about two completely different words, because while you can be pretty sure "shart" is about poop, you have no idea how many "shard"s are about that.
Apologies if this is obvious but it seemed you may be looking for which synonym is more common, and I don't think you can so that this way.
posted by SaltySalticid at 6:22 AM on July 7 [7 favorites]
My point being, using corpora as listed above will give you a literal comparison of frequency of those strings, but it will be largely about two completely different words, because while you can be pretty sure "shart" is about poop, you have no idea how many "shard"s are about that.
Apologies if this is obvious but it seemed you may be looking for which synonym is more common, and I don't think you can so that this way.
posted by SaltySalticid at 6:22 AM on July 7 [7 favorites]
Maybe I'm missing something in your question, but why don't you just do a search?
Bing says "shard" gives "About 387,000 results" and "shart" gives "About 180,000 results". These are theoretically how many times the word in quotes appears on web pages that are crawled.
A Google search produces higher numbers but a similar ratio.
posted by Umami Dearest at 6:35 AM on July 7
Bing says "shard" gives "About 387,000 results" and "shart" gives "About 180,000 results". These are theoretically how many times the word in quotes appears on web pages that are crawled.
A Google search produces higher numbers but a similar ratio.
posted by Umami Dearest at 6:35 AM on July 7
I think google ngrams goes up to 2019, if that timeline makes a difference.
posted by crazy with stars at 7:19 AM on July 7
posted by crazy with stars at 7:19 AM on July 7
I would ask an LLM---they're trained on this stuff.
posted by MisantropicPainforest at 7:30 AM on July 7
posted by MisantropicPainforest at 7:30 AM on July 7
I would ask an LLM---they're trained on this stuff.
Just because it's seen a lot of words doesn't mean that it can meaningfully give you answers about their statistics. The words LLMs produce are based on words that it's seen, so it will say facts (and falsehoods) that are similar to the words it's seen. It's like how, if someone asks me about how my brain works, I have to just say stuff based on other things I've read about brains, I can't introspect my own brain very much, even though I have one.
There's a chance it might be able to guess whether "dog" or "antediluvian" is a more common word, but I wouldn't trust it with anything that's not already obvious.
posted by BungaDunga at 7:54 AM on July 7 [5 favorites]
Just because it's seen a lot of words doesn't mean that it can meaningfully give you answers about their statistics. The words LLMs produce are based on words that it's seen, so it will say facts (and falsehoods) that are similar to the words it's seen. It's like how, if someone asks me about how my brain works, I have to just say stuff based on other things I've read about brains, I can't introspect my own brain very much, even though I have one.
There's a chance it might be able to guess whether "dog" or "antediluvian" is a more common word, but I wouldn't trust it with anything that's not already obvious.
posted by BungaDunga at 7:54 AM on July 7 [5 favorites]
Response by poster: For clarity, I want to name something "shard" in a work of fiction, and I'm hearing, "Oh, that sounds too much like "shart." If "shart" is a rare usage, then that's an argument for not worrying about it.
Number of search results doesn't work because people might be looking up "shart" because they've never heard it before and they don't know what it is, while they're not looking up "shard" because they don't need the Internet to tell them what a shard is.
posted by musofire at 7:54 AM on July 7 [1 favorite]
Number of search results doesn't work because people might be looking up "shart" because they've never heard it before and they don't know what it is, while they're not looking up "shard" because they don't need the Internet to tell them what a shard is.
posted by musofire at 7:54 AM on July 7 [1 favorite]
The number of search results has nothing to do with the number of people looking up a term.
posted by Umami Dearest at 8:11 AM on July 7 [4 favorites]
posted by Umami Dearest at 8:11 AM on July 7 [4 favorites]
I wouldn't worry the slightest that your readers will be reminded of "shart" in connection with a thing named "shard" in your fiction. "Shart" is a word that many people likely have heard of, but not a word that anyone who isn't into South Park-esque juvenile humor hears with any regularity. It would never occur to me. Honestly, I wonder about the people who are telling you it does.
posted by slkinsey at 8:22 AM on July 7 [7 favorites]
posted by slkinsey at 8:22 AM on July 7 [7 favorites]
"Shard" is a common enough usage that the tallest building in London is called that.
posted by LionIndex at 8:26 AM on July 7 [6 favorites]
posted by LionIndex at 8:26 AM on July 7 [6 favorites]
I'm not sure how common a word is makes that much of a difference. "Fuck" is a very common word but I wouldn't associate "duck" with it at all. Nobody looks askance at "DuckTales".
Also, "Shards of Earth" is pretty good precedent for it working just fine in a name for something.
posted by BungaDunga at 8:43 AM on July 7 [3 favorites]
Also, "Shards of Earth" is pretty good precedent for it working just fine in a name for something.
posted by BungaDunga at 8:43 AM on July 7 [3 favorites]
I came in here because I'm a word nerd and your question was intriguing...but also professionally I'm a book editor, so with your update my answer is emphatically to go ahead and name something shard. It's a Beavis & Butthead level joke to imagine a reader going "heh heh that's like shart." Shard is a word; use it.
posted by BlahLaLa at 9:10 AM on July 7 [4 favorites]
posted by BlahLaLa at 9:10 AM on July 7 [4 favorites]
I'm wondering who you're getting your feedback on this association from. I'm aware of both words, but it would never occur to me to make this association unless I were led there by the author by a pun, the general tone of the piece, etc. I think the best thing to do here is to consider the audience you're writing for and how likely that particular audience is to know the word "shard".
posted by epj at 12:35 PM on July 7 [1 favorite]
posted by epj at 12:35 PM on July 7 [1 favorite]
Best answer: I think this search on Google Trends is what you want? Also, Google Books Ngram viewer goes up to 2019, here's the results for that.
Shard has far more usage in both results.
posted by Aleyn at 8:16 PM on July 7
Shard has far more usage in both results.
posted by Aleyn at 8:16 PM on July 7
You are not logged in, either login or create an account to post comments
It seems you have to register as user to run a search, otherwise I would have checked your words myself.
posted by paper chromatographologist at 6:01 AM on July 7 [1 favorite]