How Do I Save an Entire Blog Offline?
January 18, 2011 3:01 PM   Subscribe

Is there a good way to save a blog offline including archived posts?

Despite quite a bit of searching I have not found an easy way to do this. There are several blogs that I would like to peruse at my leisure when I don't have access to the internet. Rather than copying and pasting or right-click saving each page, is there a tool or gadget that will take a blog URL and extract all of the posts (and hopefully images as well). It would be nice if the links were preserved as well but not necessary.

Most of the information I have found are for people who want to export their own blogs. Is this possible if I am not the blog owner?
posted by amicamentis to Technology (9 answers total) 7 users marked this as a favorite
You can use wget to do this. Assuming you are Windows... wget will already be in your Linux distro. Not sure about Macs, but since it's based on Unix I would guess you can get it for that too.
posted by COD at 3:30 PM on January 18, 2011

There are command-line HTTP clients that can do this -- for instance, cUrl and wget. Both are available on all Unix-like systems (MacOS, Linux, etc.).
posted by phliar at 3:32 PM on January 18, 2011

SiteSucker is a fantastic free app for this (for Mac and now iOS (?)), especially if you find a command-line intimidating
posted by misterbrandt at 3:38 PM on January 18, 2011

Windows alternative: httrack
posted by Su at 3:43 PM on January 18, 2011 [1 favorite]

At work I use this to archive a semester's worth of student blogging (we use WP, hence the reject line):

wget --recursive --convert-links --page-requisites --level inf --reject 'wp-login.php,xmlrpc.php' --cut-dirs=1 --adjust-extension --no-parent$SEMESTER/

Note that the last '/' is important, otherwise you end up with more on that server than you bargained for.
posted by sbutler at 5:02 PM on January 18, 2011 [1 favorite]

Any of the above is probably a better solution, but I occasionally use Acrobat Professional to scrape entire blogs and wikis for offline reading.
posted by coolguymichael at 6:29 PM on January 18, 2011

If you're using Firefox, then Scrapbook.
posted by SuperSquirrel at 6:34 AM on January 19, 2011

I have tried httrack and Scrapbook and must be doing something wrong, as it captures the main blog page but none of the 'previous entries' pages. Is there a quick fix for this?
posted by amicamentis at 8:25 AM on January 21, 2011

In Scrapbook, do you have the "in-depth capture" settings set high enough? You can indicate how many levels you want the links saved.

0 - just the current page you're on
1 - the current page, plus the page at each link
2 - the current page, plus the page at each link and the page at each link on THAT page
3 - etc., etc.

Once you set the level high enough, you should see a list of all the pages it will save. At that point, you can deselect pages that you don't want (links contained in ads, for example).
posted by SuperSquirrel at 10:08 AM on January 22, 2011

« Older Can you please recommend some children's book...   |   Is parenting going to suck forever? Newer »
This thread is closed to new comments.