Training a machine learning/neural network on comic book images
November 13, 2017 11:05 AM   Subscribe

I have what seems like it should be a fairly straightfoward machine learning/image feature identification problem. I want to identify speech bubbles and captions in comic book pages. Am I out of my mind for thinking that this is something an amateur programmer can take a crack at, and if not, how should I approach it?

I’m a self-taught amateur programmer; the only time I write code is when I’m trying to automate some drudgery away, but I’ve got a pretty good understanding of basic programming and have written fairly involved scripts, well over 100 lines long, that do actually useful things in narrow contexts.

Something I spend a lot of time doing in my other life is drawing text boxes around unlettered speech bubbles in Adobe Indesign. It seems like figuring out what parts of an image are likely to be speech bubbles is the kind of thing modern machine learning/computer vision techniques should be pretty good at.

(For what it’s worth, The endpoint I’m trying to get to would be a tool that can take a directory full of images, process them, and generate an InDesign script that will automatically place text boxes in the appropriate places in the layout document.)

I’ve got access to a corpus of tens of thousands of pages. If necessary, I’m willing to spend some time cropping out examples of what speech bubbles & other recurring features look like, to facilitate training. But how, concretely, should I go about this? Are there particular frameworks or libraries I should use?

N.B.: The network can’t just guess whether a given image contains speech bubbles—it needs to be able to derive a bounding box or something similar anywhere it thinks it’s found a speech bubble.

Please help me automate my own job away, lol
posted by Sokka shot first to Computers & Internet (3 answers total) 6 users marked this as a favorite
I have used OpenCV to do face recognition with Haar cascades. The hard part is generally building the training set. I don't know if Haar cascades are the right place to start for speech bubbles, but OpenCV rocks, and there are a gazillion tutorials out there for using OpenCV.

And a quick search for "opencv recognizing comic book speech bubbles" shows that you're not the only person looking at this problem either.
posted by straw at 11:10 AM on November 13, 2017 [2 favorites]

Short answer: no you're not out of your mind. You can do it!
First I would advise you to keep aside the machine learning and try low-tech methods using some image-processing library eg. opencv or scikit-image and python. Such as the one suggested in this stackoverflow answer.

Now if you haven't used opencv before, there will be a bit of a learning curve since it is all too clearly a python interface tacked on top of a C++ library. Use Jupyter and just learn about the functions that you really need using trial and error on sample images.

Coming back to your project - the advantage you have is the speech bubbles should (hopefully) have well-defined edges. Look up cv2.edges and cv2.contours, and try them on grayscale images. Now comes the problem of identifying the contours that actually correspond to the bubble. This can be tricky if there are other large contours in the comic, and if the bubble merges with the border. In the answer I linked to, the asker already has a way to detect regions containing text (remember you only need detection *not* recognition). He is probably using some simple method like erosion and dilation in succession to get white blobs around thin black lines that are likely to be text. Again trial and error is your friend. The good news is once you have the parameters tuned correctly, they should apply to all pages in the comic and others with 'similar' design.

Good luck!

posted by tirutiru at 6:00 AM on November 14, 2017

Here's a similar project. (Airbnb designers using machine learning to recognise drawings as particular interface elements.) See whether they've published more details about how they've done it, or perhaps reach out to some of the developers.
posted by snarfois at 6:19 AM on November 14, 2017

« Older Removing suggested places on Google Maps   |   Get a list of open documents and their desktops on... Newer »
This thread is closed to new comments.