how can I have the community help tag hudreds of text documents?
April 18, 2011 11:41 AM   Subscribe

Looking for web based Open Source tool for allowing tagging of text documents by a community

I'm looking to get some community tagged input regarding a series of text documents.

For example, if a paragraph mentioned a forest, a user could tag the document with the "nature", "forest" "green"

It would be desired to tag/highlight the sentence or paragraph with that word.

At the end we can look for all documents with certain tags.

People would have logins, and could have their tags removed if they were abusive.

There would be between 300-3000 documents, perhaps up to low 10,000s

Certain users would be able to add new documents.

If there was a feature to help find redundant or similar user added documents that would be a plus.

We could then show all the tags for the entire corpus of data or on a document by document basis.
posted by bottlebrushtree to Computers & Internet (4 answers total)
It might be overkill for your task, but Document Cloud is worth a look.
posted by tmcw at 1:04 PM on April 18, 2011

Which formats do you mean by "text documents"? Office files as well as PDFs? What about HTML?

Do you need to maintain the original source files, or can you just convert everything to HTML and tag it there?
posted by holloway at 3:34 PM on April 18, 2011

This sounds like something you might be able to submit to Mechanical Turk, if you could break up your tasks into small chunks.
posted by KDj82kao at 5:18 PM on April 18, 2011

Response by poster: > Which formats do you mean by "text documents"? Office files as well as PDFs? What about HTML?

These would be pure text files, potentially pasted in or uploaded HTML formatted text.
We have the ability to reformat them to be pure text or unicode as desired.

We can convert to HTML and tag that as needed.

We'd like part of the interface to be that people can see similar documents by tag.

We may have people come from mechanical turk to do work, but need the desired interface for consumers of the data.
posted by bottlebrushtree at 6:14 PM on April 18, 2011

« Older Refi under water?   |   There's data just waiting to be scraped.... Newer »
This thread is closed to new comments.