Help me catch the online content thieves.
July 29, 2009 10:29 PM Subscribe
Is there a good online plagiarism tool that will work for writing that isn't term papers? It's for online content that I suspect was stolen word for word from other sources without citation.
Anonymous because it's work related. I was hired to by the online branch of a retail firm to spruce up some blog content they had produced primarily for SEO purposes. A lot of it was poorly written and made some excessive claims about the products, so they wanted me to make it all conform to the house tone and style and soften the claims.
About halfway through the project, while fact-checking something, I discovered that a significant chunk of the text had been lifted directly from another website. Further random poking around found numerous other examples. It appears to come primarily from other corporate websites and low quality impersonal blogs (the kind that probably function just for SEO purposes themselves).
I can't search every sentence of every article, but I need to figure how much of any given piece is original. Is there an effective tool (preferably free, but I'm less picky every minute this goes on) for detecting stolen content that works like the paper plagiarism checkers teachers use?
Anonymous because it's work related. I was hired to by the online branch of a retail firm to spruce up some blog content they had produced primarily for SEO purposes. A lot of it was poorly written and made some excessive claims about the products, so they wanted me to make it all conform to the house tone and style and soften the claims.
About halfway through the project, while fact-checking something, I discovered that a significant chunk of the text had been lifted directly from another website. Further random poking around found numerous other examples. It appears to come primarily from other corporate websites and low quality impersonal blogs (the kind that probably function just for SEO purposes themselves).
I can't search every sentence of every article, but I need to figure how much of any given piece is original. Is there an effective tool (preferably free, but I'm less picky every minute this goes on) for detecting stolen content that works like the paper plagiarism checkers teachers use?
Copyscape (it is designed to work on web pages, so you will need to upload the suspect content to a web page before you scan it).
posted by phoenixy at 10:38 PM on July 29, 2009 [2 favorites]
posted by phoenixy at 10:38 PM on July 29, 2009 [2 favorites]
As voltairemodern says, if you have only one or a few documents then just googling a short phrase as a whole i.e. with "around it" gives a surprisingly low number of options. I do this occasionally when suspicious of students and once you have something you can break a document down pretty rapidly.
posted by biffa at 1:55 AM on July 30, 2009
posted by biffa at 1:55 AM on July 30, 2009
Seconding Googling suspicious chunks of text. If it's cobbled together from a lot of sources you may have trouble with a plagarism checking tool.
posted by Jilder at 3:27 AM on July 30, 2009
posted by Jilder at 3:27 AM on July 30, 2009
Is there a good online plagiarism tool that will work for writing that isn't term papers? It's for online content...
Yeah, there are many tools that do this: Google, Yahoo, etc. Enter a portion of the text you're wondering about, and see if any other sources come up.
posted by Jaltcoh at 6:37 AM on July 30, 2009
Yeah, there are many tools that do this: Google, Yahoo, etc. Enter a portion of the text you're wondering about, and see if any other sources come up.
posted by Jaltcoh at 6:37 AM on July 30, 2009
follow-up from the OP
Thanks for the responses so far. Even though the content isn't live anywhere yet, Copyscape and FairShare both look promising.posted by jessamyn at 10:47 AM on July 30, 2009
Just googling isn't really an effective option because I have 50 1,000 word articles that all need to be checked. Google cuts off searchs at 32 words. I've discovered what I have through random suspicious phrase searching, but it's not practical for all that content. If I don't choose the right 32 words, I think an article is safe when it isn't. That's why I'm looking for something that searches in a more comprehensive way.
I think that a 32 word match is excessive. A plagiarizer can change one word of the 32 and you will not find a match in your search. Searching for between 6 and 10 words together in quotes will reveal what you want to know. Pick a chunk of 6-10 words (good words with as few common words as possible) from each article, put them in quotes, plug them into a google search. Then peruse the resulting matches. Check the articles for each chunk.
I once had a statistic about the chances of having 6 words be identical, which I've since forgotten. It's very low, unless you're picking common or cliched phrases. You'll have to do this 50 times, but it shouldn't take you very long.
posted by Barry B. Palindromer at 11:01 AM on July 30, 2009
I once had a statistic about the chances of having 6 words be identical, which I've since forgotten. It's very low, unless you're picking common or cliched phrases. You'll have to do this 50 times, but it shouldn't take you very long.
posted by Barry B. Palindromer at 11:01 AM on July 30, 2009
Just as a quick example. Taken from a chunk in my 3rd sentence above.
Searching "together in quotes" yields 4,520,000 search results.
Searching "words together in quotes" yields 4 search results.
Searching "words together in quotes will" yields 1 result.
Searching "words together in quotes will reveal" yields 0 results.
If I paid better attention in my computations linguistics course, I could give you some technical details about this. Unfortunately I didn't, but the important thing to remember is that you don't need to look at large chunks of text to find plagiarizers. Relatively small chunks will do.
posted by Barry B. Palindromer at 11:11 AM on July 30, 2009
Searching "together in quotes" yields 4,520,000 search results.
Searching "words together in quotes" yields 4 search results.
Searching "words together in quotes will" yields 1 result.
Searching "words together in quotes will reveal" yields 0 results.
If I paid better attention in my computations linguistics course, I could give you some technical details about this. Unfortunately I didn't, but the important thing to remember is that you don't need to look at large chunks of text to find plagiarizers. Relatively small chunks will do.
posted by Barry B. Palindromer at 11:11 AM on July 30, 2009
I find this useful for checking copyvios on Wikipedia. It's not a magic wand, but it helps.
posted by dirtynumbangelboy at 7:34 PM on July 30, 2009
posted by dirtynumbangelboy at 7:34 PM on July 30, 2009
Just googling isn't really an effective option because I have 50 1,000 word articles that all need to be checked. Google cuts off searchs at 32 words. I've discovered what I have through random suspicious phrase searching, but it's not practical for all that content.
Oh, I don't think that sounds impossible. You could try several distinctive phrases per essay. If each search takes a few seconds, then each essay could take about a minute. You could do all 50 essays within an hour.
posted by Jaltcoh at 7:24 AM on August 3, 2009
Oh, I don't think that sounds impossible. You could try several distinctive phrases per essay. If each search takes a few seconds, then each essay could take about a minute. You could do all 50 essays within an hour.
posted by Jaltcoh at 7:24 AM on August 3, 2009
This thread is closed to new comments.
Depending on the answers to these questions, a simple google search might be your best bet. After all, it's likely that if the articles are plagiarized, a simple internet search is how the plagiarist found the originals.
posted by voltairemodern at 10:37 PM on July 29, 2009