can i download a copy of someone else's blog?
July 7, 2013 4:32 PM   Subscribe

i need to pull out all the content from someone else's wordpress blog. help me do this without going through each page individually!

i'm doing my first qualitative research project (yay!), and need to pull all the text out of this blog so i can pull it into nvivo. going through the archives means pages and pages of click through and save as. ideally i would like to save these as pdfs (i think?).

a twitter friend suggested wget. i'm on windows 7, and downloaded and installed gnuwin32 successfully ...but i now i don't know how to run wget, or if it came with gnuwin32? all the stuff i'm finding on wget assumes a basic level of understanding i clearly do not possess, and most of it isn't windows. i'm just go around in sad, tiny circles in command prompt.

basically i'm lost. while ask mefi isn't a tech-specific forum, my googler skills are not finding me the magical forum-for-newbies that i need.
bonus points and gold stars for tipping me off to more appropriate help resources. step-by-step instructions would be holy heck appreciated!
posted by tamarack to Computers & Internet (5 answers total) 3 users marked this as a favorite
 
Did you see this thread from a couple of days ago?
posted by Admiral Haddock at 4:48 PM on July 7, 2013


I don't know if this helps, but have you noticed that they use some kind of infinite scrolling functionality there? So all of the content is effectively on the front page, and maybe you could capture that as a single file, and then break it up however you want.
posted by unknowncommand at 4:52 PM on July 7, 2013


Response by poster: thanks, @admiral haddock! funnily, that thread didn't seem to appear in my search results for 'wget'. from reading these responses it looks like the answer there is save as pdf page by sad page? doesn't look like wget is necessarily what i want...? or at least it doesn't really clarify wget for me. i've tried running similar commands with error messages, so i don't know if i'm in the wrong directory, don't have wget properly installed or what.

@unknowncommand -- i've tried this. it crashes pretty consistently before it will load the entire blog. it is way too long! but thank you.


all is appreciated.
posted by tamarack at 4:57 PM on July 7, 2013


Best answer: wget doesn't have anything to do with the PDF side; it just makes a local copy of the blog by following links. Another way to do it is described in An Introduction to Compassionate Screen Scraping. Neil Caren's Big Data tutorials might also be useful.

There's also the NVivo add-in NCapture which might do everything you need.
posted by scruss at 5:04 PM on July 7, 2013 [3 favorites]


Response by poster: thank you @scruss! the nvivo plugin looks like it might be the magic bullet :) much appreciated! i'm completely new to nvivo, so didn't think of looking to the software for a solution. definitely going to check out the big data posts as well. thank you!
posted by tamarack at 5:09 PM on July 7, 2013


« Older Identifying MAC addresses on your network   |   Two person strategy games for the iPad? Newer »
This thread is closed to new comments.