How can I automatically archive web pages I visit?
June 20, 2012 10:20 AM Subscribe
Is there an easy way to archive all the content of web pages I visit, saving the pages to my hard drive?
For example, sometimes I'm viewing pages of old transactions, and it would be nice to simply be able to click the "Next" link without also having to click "Save As" on every page.
I'm familiar with wget, but I think I need something that will simply permanently cache pages I'm browsing to - mostly because I'm interested in pages I have to log in to and that require cookies ... and I'm not always interested in whole sites, but rather just the parts I'm browsing to. Also, some pages will be using Javascript and frames, which I'm not sure I could wrangle in wget. I want to capture ALL the text and graphics displayed in my browser.
I've looked at the Firefox extension Slogger, which looks like it would be perfect, but (a) it hasn't been updated since 2006 and (b) it didn't work when I tried it with an ancient version of Firefox.
I made a quick attempt at using Polipo (by itself, without Tor), but my first attempt at configuring it didn't work (it wasn't caching anything).
I would vastly prefer saving actual HTML and images rather than PDFs or screenshots.
I'd REALLY like to avoid anything that requires an online service; I want this to live and run on my computer.
It would be nice if it worked completely automatically, so I didn't even have to click a button to archive each page.
It would be especially nice if it maintained the structure and naming of the original web pages, much as wget does.
Is there some easy, minimal-config way to turn on an extension or add-on, fully archive the content of the web pages I visit, and then turn archiving off?
Thanks!
For example, sometimes I'm viewing pages of old transactions, and it would be nice to simply be able to click the "Next" link without also having to click "Save As" on every page.
I'm familiar with wget, but I think I need something that will simply permanently cache pages I'm browsing to - mostly because I'm interested in pages I have to log in to and that require cookies ... and I'm not always interested in whole sites, but rather just the parts I'm browsing to. Also, some pages will be using Javascript and frames, which I'm not sure I could wrangle in wget. I want to capture ALL the text and graphics displayed in my browser.
I've looked at the Firefox extension Slogger, which looks like it would be perfect, but (a) it hasn't been updated since 2006 and (b) it didn't work when I tried it with an ancient version of Firefox.
I made a quick attempt at using Polipo (by itself, without Tor), but my first attempt at configuring it didn't work (it wasn't caching anything).
I would vastly prefer saving actual HTML and images rather than PDFs or screenshots.
I'd REALLY like to avoid anything that requires an online service; I want this to live and run on my computer.
It would be nice if it worked completely automatically, so I didn't even have to click a button to archive each page.
It would be especially nice if it maintained the structure and naming of the original web pages, much as wget does.
Is there some easy, minimal-config way to turn on an extension or add-on, fully archive the content of the web pages I visit, and then turn archiving off?
Thanks!
Best answer: You could use a proxy for this. I can't vouch for it, but try Proxy Offline Browser.
From the site:
"By passing all your web traffic through WebAssistant, you instantly and transparently build a copy of all the pages you visit - so they're yours to surf offline whenever you like."
posted by SNACKeR at 12:11 PM on June 20, 2012 [3 favorites]
From the site:
"By passing all your web traffic through WebAssistant, you instantly and transparently build a copy of all the pages you visit - so they're yours to surf offline whenever you like."
posted by SNACKeR at 12:11 PM on June 20, 2012 [3 favorites]
HTTrack - Windows, Mac OS X, Linux
SiteSucker - Mac OS X
posted by yclipse at 1:46 PM on June 20, 2012
SiteSucker - Mac OS X
posted by yclipse at 1:46 PM on June 20, 2012
Internet Explorer has "File > Save As" and then one of the choices is: "Web Archive, single file (*.mht)" which can be opened/edited with Word - but in reality is just a CAB/ZIP file, containing the page, images, etc.
But... it's not automatic.
... Or - if you have OneNote installed, there is an Internet Explorer extension to send the webpage to OneNote...
posted by jkaczor at 5:46 PM on June 20, 2012
But... it's not automatic.
... Or - if you have OneNote installed, there is an Internet Explorer extension to send the webpage to OneNote...
posted by jkaczor at 5:46 PM on June 20, 2012
Response by poster: SNACKeR - Thank you! WebAssistant (Proxy Offline Browser) seems to be doing exactly what I need - and as a bonus, it works with any browser, not just Firefox.
Thank you so much!
posted by kristi at 10:27 AM on June 22, 2012
Thank you so much!
posted by kristi at 10:27 AM on June 22, 2012
This thread is closed to new comments.
posted by SuperSquirrel at 11:36 AM on June 20, 2012