How do I download all the pages in a website?
July 1, 2013 8:16 AM Subscribe
This particular site has about 50 pages of good reference material. I would like to download load it all before it goes away. Is there a program anyone can recommend that does this? I'm using Windows 7. Thanks!
If you're using Firefox, the Scrapbook extension allows downloading via wildcards. So if the pages you're interested in have similar URLs, or if the links you want are all listed on one page, you can download them all at once.
posted by SuperSquirrel at 8:23 AM on July 1, 2013
posted by SuperSquirrel at 8:23 AM on July 1, 2013
I've used BlackWidow to grab copies of sites.
posted by belladonna at 8:24 AM on July 1, 2013 [1 favorite]
posted by belladonna at 8:24 AM on July 1, 2013 [1 favorite]
wget is what you want:
wget --mirror -p --convert-links -P ./[local-directory] http://website-to-mirror.com
posted by usonian at 8:25 AM on July 1, 2013 [1 favorite]
wget --mirror -p --convert-links -P ./[local-directory] http://website-to-mirror.com
posted by usonian at 8:25 AM on July 1, 2013 [1 favorite]
Best answer: HTTrack Website Copier not only downloads the site, but it stores it on your computer so that you are able to use it as if you were on the actual site.
posted by 1367 at 8:52 AM on July 1, 2013 [3 favorites]
posted by 1367 at 8:52 AM on July 1, 2013 [3 favorites]
Along with the other wget command-line options in usonian's answers, I would add --no-check-certificate to forestall the occasional failure due to SSL certificate issues. FWIW, I also find that --mirror often brings in too much. The following is enough in my experience:
wget --no-parent --no-check-certificate -rkKp http://website-to-mirror.com
(Not arguing with usonian's answer, just offering a possible alternative.)
posted by StrawberryPie at 9:54 AM on July 1, 2013 [1 favorite]
wget --no-parent --no-check-certificate -rkKp http://website-to-mirror.com
(Not arguing with usonian's answer, just offering a possible alternative.)
posted by StrawberryPie at 9:54 AM on July 1, 2013 [1 favorite]
Wget is the old reliable of web site spidering utils. Useful for those occasions when you want to grab something and don't want to launch a web browser to do it. Personally I like the GUI on httrack under both Windows and Linux.
posted by endotoxin at 10:14 AM on July 1, 2013 [1 favorite]
posted by endotoxin at 10:14 AM on July 1, 2013 [1 favorite]
I like the grabmybooks addon for Firefox. It turns the website into a epub file so you can read it on your e-reader. It has a 'grab tabs' feature so you can have 20 tabs open at once and it will grab all the text in them in the sequence. There are some issues when there are a lot of tables, or frames, etc., but it works pretty well 95% of the time.
posted by jyorraku at 7:09 PM on July 1, 2013
posted by jyorraku at 7:09 PM on July 1, 2013
Response by poster: I ended up using the HTTrack Website Copier. It worked perfectly and was very easy to get started. Thanks to all.
posted by Rad_Boy at 8:43 PM on July 1, 2013
posted by Rad_Boy at 8:43 PM on July 1, 2013
This thread is closed to new comments.
It also can depend on how the website you are looking to grab is structured. Most extensions can follow links to grab sub pages but can have issues if java or other scripting stuff is used.
posted by Captain_Science at 8:22 AM on July 1, 2013