How Do I Save an Entire Blog Offline?
January 18, 2011 3:01 PM Subscribe
Is there a good way to save a blog offline including archived posts?
Despite quite a bit of searching I have not found an easy way to do this. There are several blogs that I would like to peruse at my leisure when I don't have access to the internet. Rather than copying and pasting or right-click saving each page, is there a tool or gadget that will take a blog URL and extract all of the posts (and hopefully images as well). It would be nice if the links were preserved as well but not necessary.
Most of the information I have found are for people who want to export their own blogs. Is this possible if I am not the blog owner?
Despite quite a bit of searching I have not found an easy way to do this. There are several blogs that I would like to peruse at my leisure when I don't have access to the internet. Rather than copying and pasting or right-click saving each page, is there a tool or gadget that will take a blog URL and extract all of the posts (and hopefully images as well). It would be nice if the links were preserved as well but not necessary.
Most of the information I have found are for people who want to export their own blogs. Is this possible if I am not the blog owner?
There are command-line HTTP clients that can do this -- for instance, cUrl and wget. Both are available on all Unix-like systems (MacOS, Linux, etc.).
posted by phliar at 3:32 PM on January 18, 2011
posted by phliar at 3:32 PM on January 18, 2011
SiteSucker is a fantastic free app for this (for Mac and now iOS (?)), especially if you find a command-line intimidating
posted by misterbrandt at 3:38 PM on January 18, 2011
posted by misterbrandt at 3:38 PM on January 18, 2011
At work I use this to archive a semester's worth of student blogging (we use WP, hence the reject line):
wget --recursive --convert-links --page-requisites --level inf --reject 'wp-login.php,xmlrpc.php' --cut-dirs=1 --adjust-extension --no-parent http://example.org/$SEMESTER/
Note that the last '/' is important, otherwise you end up with more on that server than you bargained for.
posted by sbutler at 5:02 PM on January 18, 2011 [1 favorite]
wget --recursive --convert-links --page-requisites --level inf --reject 'wp-login.php,xmlrpc.php' --cut-dirs=1 --adjust-extension --no-parent http://example.org/$SEMESTER/
Note that the last '/' is important, otherwise you end up with more on that server than you bargained for.
posted by sbutler at 5:02 PM on January 18, 2011 [1 favorite]
Any of the above is probably a better solution, but I occasionally use Acrobat Professional to scrape entire blogs and wikis for offline reading.
posted by coolguymichael at 6:29 PM on January 18, 2011
posted by coolguymichael at 6:29 PM on January 18, 2011
Response by poster: I have tried httrack and Scrapbook and must be doing something wrong, as it captures the main blog page but none of the 'previous entries' pages. Is there a quick fix for this?
posted by amicamentis at 8:25 AM on January 21, 2011
posted by amicamentis at 8:25 AM on January 21, 2011
In Scrapbook, do you have the "in-depth capture" settings set high enough? You can indicate how many levels you want the links saved.
0 - just the current page you're on
1 - the current page, plus the page at each link
2 - the current page, plus the page at each link and the page at each link on THAT page
3 - etc., etc.
Once you set the level high enough, you should see a list of all the pages it will save. At that point, you can deselect pages that you don't want (links contained in ads, for example).
posted by SuperSquirrel at 10:08 AM on January 22, 2011
0 - just the current page you're on
1 - the current page, plus the page at each link
2 - the current page, plus the page at each link and the page at each link on THAT page
3 - etc., etc.
Once you set the level high enough, you should see a list of all the pages it will save. At that point, you can deselect pages that you don't want (links contained in ads, for example).
posted by SuperSquirrel at 10:08 AM on January 22, 2011
« Older Can you please recommend some children's book... | Is parenting going to suck forever? Newer »
This thread is closed to new comments.
posted by COD at 3:30 PM on January 18, 2011