History Filter
August 14, 2013 4:06 PM Subscribe
Is there a statistical measure that I can rely on in evaluating papers or articles in a historical subject? If you teach history, what measure would you rely on in evaluating student papers and take-home exam essays for plagiarism?
I am evaluating short papers or articles (from 100 to 1,500 words) that cover a historical era; some of these are narrative or biographical essays. I suspect that a few of the contributors are rewriting Wikipedia articles or other web content. I've been using plagiarism-detection software (not Turnitin, which I can't afford) and find very few verbatim matches, but I am also trying to measure similarity.
Between the fuzzy logic method and the Levenshtein distance method, which metric is more reliable at detecting similarity? If two percentages of similarity are different, which should I rely upon? My impression so far is that more than 30% for either method is bad, but will it help me to be more stringent?
I'm aware that similarity testing will also pick up the standard information these articles must cover, including names, titles, dates, events, and the sequence of events.
I am familiar with the secondary sources for the period but I suspect that I will have to re-read them all, make note of stylistic features that might be reproduced by authors of derivative content, and rely on "this doesn't feel right."
When I do find certifiably derivative work, I intend to have the author rewrite his / her work or be dropped from the project.
Please MeMail me if you want specifics about the project because I don't want to hang it and my own difficulties out to dry on a public forum. The project is a compendium rather than original research, it will be published, and the publisher will also evaluate the submissions for derivative content.
I am evaluating short papers or articles (from 100 to 1,500 words) that cover a historical era; some of these are narrative or biographical essays. I suspect that a few of the contributors are rewriting Wikipedia articles or other web content. I've been using plagiarism-detection software (not Turnitin, which I can't afford) and find very few verbatim matches, but I am also trying to measure similarity.
Between the fuzzy logic method and the Levenshtein distance method, which metric is more reliable at detecting similarity? If two percentages of similarity are different, which should I rely upon? My impression so far is that more than 30% for either method is bad, but will it help me to be more stringent?
I'm aware that similarity testing will also pick up the standard information these articles must cover, including names, titles, dates, events, and the sequence of events.
I am familiar with the secondary sources for the period but I suspect that I will have to re-read them all, make note of stylistic features that might be reproduced by authors of derivative content, and rely on "this doesn't feel right."
When I do find certifiably derivative work, I intend to have the author rewrite his / her work or be dropped from the project.
Please MeMail me if you want specifics about the project because I don't want to hang it and my own difficulties out to dry on a public forum. The project is a compendium rather than original research, it will be published, and the publisher will also evaluate the submissions for derivative content.
I assume this is not an option or you would have used it, but...do you work for a university? Do they subscribe to any kind of service such as Turnitin?
posted by epanalepsis at 5:54 AM on August 15, 2013
posted by epanalepsis at 5:54 AM on August 15, 2013
This thread is closed to new comments.
To my understanding, identifying a paper as plagiarized is less about the percentage of the paper that's drawn from another source, and more about whether or not that source is clearly identified. It's more of a yes-or-no question ("This paper has been plagiarized") than something to be determined in degrees ("This paper is 40% plagiarized"). If significant chunks of the paper come from somewhere else but are cited as such, that would definitely call the author's capabilities into question, but it's a different kind of issue.
So, I suppose my answer is that if you're finding anything at all that sets off the "doesn't feel right" alarm and can be traced to Wikipedia, etc., and the original source isn't credited, that's plagiarism. It doesn't matter if it's 5% or 50% of the text.
If you're more concerned about how much derivative content feels okay in a historical project (assuming the sources are credited), that probably depends on whatever understanding you have with the publisher about this project, at least in part, no? If you get to set those parameters yourself, I'd probably err on the side of allowing fairly little derivative content (like, a couple of sentences, at most). A 1,500-word article is really not that long. In even shorter pieces - like, under 500 words - I would hope not to find any, or what is that author really contributing to the project?
(FWIW, I have eleven semesters of grading for university art history courses under my belt, which, at 40 to 120 students per class, works out to... a percentage of my life spent grading papers that I'd rather not think about. In that time, I don't think I've had even three real cases of calculated plagiarism, but there have been countless authors who simply didn't understand how to acknowledge sources properly. Clarifying expectations about that seems most important to me in cases like these.)
posted by Austenite at 5:15 PM on August 14, 2013