How can I annotate and tag text within multiple documents, so that I can categorize and search through my annotations later?
January 20, 2008 7:41 PM   Subscribe

How can I annotate and tag text within multiple documents, so that I can categorize and search through my annotations later?

I would like to do an academic project that involves investigating how the meaning of a particular phrase has changed over time. Basically, there is a particular phrase that is used in legal settings (e.g. court), and I suspect that its meaning has changed over the years (i.e. people are throwing it around more loosely these days, and it has come to mean several different things).

I want to analyze the way this term has evolved, using several thousands of pages of court documents as my data set. Ideally, I (or a friend) would go through the documents and "tag" every occurrence of this term with a label that describes how the term was used. Later, I want to be able to filter through this list so that I can see all occurrences in one category, and then the next, and so on. So basically I want to have something along the lines of del.icio.us, except I'm tagging text snippets instead of URLs. Make sense?

Is there any existing solution for this that you guys can think of? I dug around and found WordSmith/Concordance, TextSTAT, and a few other similar concordance programs, but they don't have the flexibility with tagging and so on that I'm looking for. I know how to program, but want to make sure first that there's no existing solution to this.

Any further ideas? (Feel free to suggest if you might have a better general approach, too!)
posted by lunchbox to Computers & Internet (5 answers total) 5 users marked this as a favorite
 
Try this recent but similar thread to get a start on a couple of good programs.
posted by Phire at 9:38 PM on January 20, 2008


Thanks Phire. What I'm looking for is conceptually similar to the tagging features of those sites (e.g. Connotea, CiteULike), except I want to tag/flag bits of the text itself instead of entire documents.

To restate the problem in a simpler way, imagine I had a sprawling text file hundreds of pages long. I want to find the quickest way to tag all the sentences that use "green" in the sense of the color green, and all those that use "green" as in "environmentally friendly". (Basically, it consecutively shows me each sentence with the word "green", and each time I click one of two labels to categorize it.) Then, later, I would be able to show my results to other people by letting them filter by one of the two categories, and see all the sentences in context where green means "environmentally friendly".

I'm thinking that somebody must have implemented an in-document tagging system before, since I figure historical linguists and other people who do systematic analyses on large bodies of text would need this.
posted by lunchbox at 10:22 PM on January 20, 2008


The thing you're looking for is called "qualitative analysis". There's several pieces of software that will allow you to tag text (and even sound and images) and then explore how and where they occur. Many years ago, I worked on a program called "NUD*ist" and there was a competitor called "Ethnograph". I haven't kept up with the field, but those should give you a start on the current possibilities.
posted by outlier at 11:58 PM on January 20, 2008


That's it, outlier! I downloaded Atlas.ti, and that is pretty much what I am looking for. Also of use is OneNote, which allows tagging of text. Thanks for the help guys.
posted by lunchbox at 6:04 PM on January 21, 2008


Open source: Weft QDA.
posted by idb at 8:48 PM on January 21, 2008


« Older iPhone Sync Help   |   Delivery in NYC for Valentines Day? Newer »
This thread is closed to new comments.