Create a list from multiple green answers
July 27, 2010 10:35 AM   Subscribe

I've favorited xx number of questions on the green. The majority of the answers are book titles and or authors. I would like to create a compilation list from all the threads.

Is there an easier way to do this instead of going through each thread and coping / pasting the information into a spreadsheet and weeding out the dupes?
posted by bleucube to Computers & Internet (11 answers total) 3 users marked this as a favorite
 
Response by poster: Here's a few of the favorite posts:

http://ask.metafilter.com/159939/I-loved-Inception-Can-you-recommend-any-books-that-would-be-similar
http://ask.metafilter.com/157637/Mess-with-my-head
http://ask.metafilter.com/157617/Conan-drops-the-Fbomb-fantasy-books-for-adults
http://ask.metafilter.com/153563/And-no-not-Ayn-Rand
http://ask.metafilter.com/143751/Help-make-me-uncomfortable
http://ask.metafilter.com/141316/Please-tell-me-what-to-read
http://ask.metafilter.com/129895/Mindblowing-literature
http://ask.metafilter.com/112610/SciFiFilter-Im-not-usually-a-fan-of-SciFi-and-yet-I-loved-Enders-Game-Where-to-next
http://ask.metafilter.com/95595/Recommend-some-ghostly-mystery-books-please
http://ask.metafilter.com/97098/Creep-me-out-literally
http://ask.metafilter.com/96398/SciFi-novels-on-unusual-planets
http://ask.metafilter.com/88705/Is-Cory-Doctorow-Worth-a-damn
http://ask.metafilter.com/87449/Singularity-SciFi-My-Nerdy-Request
http://ask.metafilter.com/87272/Stories-that-take-place-in-Hell-Purgatory-comas-nightmares-memory-etc-etc
http://ask.metafilter.com/82321/Thoughprovoking-scifi
http://ask.metafilter.com/54756/Similar-books-to-The-Terror
posted by bleucube at 10:38 AM on July 27, 2010




Well, to expand a bit on that link, you could maybe use favorites to mark the answers you'd like to see, then use that URL to get them all on one page. Then CTRL+a, CTRL+c, CTRL+v to your favorite text editor/spreadsheet app.
posted by carsonb at 10:40 AM on July 27, 2010


Response by poster: Carson, thanks for the responds so far. Actually what I want to do is parse the answers and pull out all the book titles and authors in each of the links. Going through and marking the answer a favorite would probably take the same amount of time copying and pasting the answers.

Wondering if there is an online tool that will pull all the text out of hyperlink contents.
posted by bleucube at 10:46 AM on July 27, 2010


I think using Python to scrape these pages for the information you're looking for would probably be pretty trivial, but I don't know how to do it yet.
posted by proj at 10:54 AM on July 27, 2010


LibraryThing can, I believe, take an amazon wishlist and turn it into a list of books. If you have threads that are fairly heavy with amazon links, you may be able to feed in the HTML page [and have an account, which is somewhat limited in the free accounts but you can at least see if it works] and get a list of books that are then exportable in a number of formats.
posted by jessamyn at 11:10 AM on July 27, 2010


If my memory serves (always a risk) someone did this for a few of the past book threads.

How's this wiki page suit your needs?
posted by inigo2 at 11:20 AM on July 27, 2010


Ah crap, that doesn't really help, now that I'm digging into that page. Sorry.
posted by inigo2 at 11:22 AM on July 27, 2010


You might also consider creating a custom RSS feed and see if you Yahoo Pipes can do anything with it. Just a stab.
posted by proj at 11:26 AM on July 27, 2010


Actually what I want to do is parse the answers and pull out all the book titles and authors in each of the links...Wondering if there is an online tool that will pull all the text out of hyperlink contents.

It would be a fairly easy task to take the pages and feed them through an HTML parser to give you a list of href anchors but that isn't going to give you a list of books.

The problem being there isn't any one standard way people represent books. Some use Book, Author, some Author "Book", some Book by Author, Some Book, Author etc and then theres Book, Author'sIntial; AuthorFirstName, AuthorLastName etc to contend with as well. Trying to account for all these would likely give you a mess of regular expressions that looks like an invocation to Cthulu and still doesn't give you a list of books.

You might have some luck in the simplifying cutting and pasting by doing some crude pre-processing. e.g.
curl http://ask.metafilter.com/157637/Mess-with-my-head | grep amazon
curl http://ask.metafilter.com/157637/Mess-with-my-head | grep by | grep -v posted
posted by tallus at 12:57 PM on July 27, 2010


Here's a list of all the URLs from those threads with dupes removed:

for F in 159939 157637 157617 153563 143751 141316 129895 112610 95595 97098 96398 88705 87449 87272 82321 54756; do curl -s http://ask.metafilter.com/$F/ >thread-$F; done; perl -0777 -ne 'while (m,href="(http://[^"]+)",sg) { $urls{$1} = 1 unless $1 =~ m,(metafilter\.com/|feedburner\.com/), } }{ print "$_\n" for (sort keys %urls) ' thread-* > list

output here
posted by Rhomboid at 2:59 PM on July 27, 2010 [1 favorite]


« Older DC Comics graphic novels   |   Recommendations for sex books? Like Kama Sutra... Newer »
This thread is closed to new comments.