How do I create a searchable index of comic strips?
July 12, 2010 9:53 PM   Subscribe

What is an appropriate method to create a searchable index of comic strips?

As a bit of an intellectual exercise (but possibly to implement in real life), I'm wondering how I would go about creating a searchable index of comic strips (multiple strips of the one series, not multiple different comics). I'm looking to be able to search on the text within the strip (commentary or in the character speech bubbles) as well as the subject matter (ie comic strip features a treehouse, a canyon etc). The general idea being if you were trying to locate a specific comic strip to illustrate a situation/report/blog post and all you could remember was the general context or perhaps a fragment of text, you could search and find it using this index.

So the first step is to create the index of terms. My idea at the moment is to scan the comic strips in and then use a service such as Amazon's Mechanical Turk to get people to transcribe the text and describe the general 'concepts' of the strip.

The second step is to make the index searchable. One way would be to put these terms in a database and write a search front end. Another way, might be to stick all the terms on a webpage (one per strip) and let Google index them (working on the principle that Google search will be far superior to anything search function I could code up).

Does anyone have any other suggestions on how to do this? (assume that I can't go to the publisher and magically get an index from them :)
posted by tobtoh to Technology (4 answers total) 2 users marked this as a favorite
 
Best answer: Oh No Robot is a service that crowd sources comic transcription to the readers, kind of like a specialized Mechanical Turk.
posted by jedicus at 9:56 PM on July 12, 2010 [1 favorite]


I did a comic strip search engine for an advanced WWW class. We did a quick investigation of available OCR, but it was apparent that the state of the art and our target was not compatible. Turns out handwritten text has many of the same properties as hand drawn cartoons, and confuses the hell out of things.

We just pulled in all the data we could from the web (date, title, text tags) and automated the pull via RSS. Search was done by MySQL fulltext, because your target is not that large really. We were able to put in about ten million fake comic rows populated from a dictionary without much slowdown.

But if getting the job done is more important than following plagarism guidelines, just use Oh No Robot.
posted by pwnguin at 1:32 AM on July 13, 2010


Best answer: It will not help you describe the scene, but I've heard of people using Evernote to do this as their text recognition automatically searches for any text you have in any graphic. (The use case was people putting every Dilbert strip in their notebooks and you can call it up based on a phrase the characters said.)
posted by MCMikeNamara at 8:22 AM on July 13, 2010


Response by poster: Thanks for the suggestions so far - the comic is paper based and not online so I can't use Oh No Robot (at least from what I have read given it assumes the comic is web based).

The Evernote idea sounds promising though!
posted by tobtoh at 12:46 PM on July 13, 2010


« Older Sorry, your friend's not hot enough.   |   Should I sand this harp, and if so, how? Newer »
This thread is closed to new comments.