How do I download all the pages in a website?
July 1, 2013 8:16 AM
This particular site has about 50 pages of good reference material. I would like to download load it all before it goes away. Is there a program anyone can recommend that does this? I'm using Windows 7. Thanks!
If you're using Firefox, the Scrapbook extension allows downloading via wildcards. So if the pages you're interested in have similar URLs, or if the links you want are all listed on one page, you can download them all at once.
posted by SuperSquirrel at 8:23 AM on July 1, 2013
posted by SuperSquirrel at 8:23 AM on July 1, 2013
wget is what you want:
wget --mirror -p --convert-links -P ./[local-directory] http://website-to-mirror.com
posted by usonian at 8:25 AM on July 1, 2013
wget --mirror -p --convert-links -P ./[local-directory] http://website-to-mirror.com
posted by usonian at 8:25 AM on July 1, 2013
HTTrack Website Copier not only downloads the site, but it stores it on your computer so that you are able to use it as if you were on the actual site.
posted by 1367 at 8:52 AM on July 1, 2013
posted by 1367 at 8:52 AM on July 1, 2013
Along with the other wget command-line options in usonian's answers, I would add --no-check-certificate to forestall the occasional failure due to SSL certificate issues. FWIW, I also find that --mirror often brings in too much. The following is enough in my experience:
wget --no-parent --no-check-certificate -rkKp http://website-to-mirror.com
(Not arguing with usonian's answer, just offering a possible alternative.)
posted by StrawberryPie at 9:54 AM on July 1, 2013
wget --no-parent --no-check-certificate -rkKp http://website-to-mirror.com
(Not arguing with usonian's answer, just offering a possible alternative.)
posted by StrawberryPie at 9:54 AM on July 1, 2013
Wget is the old reliable of web site spidering utils. Useful for those occasions when you want to grab something and don't want to launch a web browser to do it. Personally I like the GUI on httrack under both Windows and Linux.
posted by endotoxin at 10:14 AM on July 1, 2013
posted by endotoxin at 10:14 AM on July 1, 2013
I like the grabmybooks addon for Firefox. It turns the website into a epub file so you can read it on your e-reader. It has a 'grab tabs' feature so you can have 20 tabs open at once and it will grab all the text in them in the sequence. There are some issues when there are a lot of tables, or frames, etc., but it works pretty well 95% of the time.
posted by jyorraku at 7:09 PM on July 1, 2013
posted by jyorraku at 7:09 PM on July 1, 2013
I ended up using the HTTrack Website Copier. It worked perfectly and was very easy to get started. Thanks to all.
posted by Rad_Boy at 8:43 PM on July 1, 2013
posted by Rad_Boy at 8:43 PM on July 1, 2013
This thread is closed to new comments.
It also can depend on how the website you are looking to grab is structured. Most extensions can follow links to grab sub pages but can have issues if java or other scripting stuff is used.
posted by Captain_Science at 8:22 AM on July 1, 2013