How do I download all the pages in a website?
July 1, 2013 8:16 AM   Subscribe

This particular site has about 50 pages of good reference material. I would like to download load it all before it goes away. Is there a program anyone can recommend that does this? I'm using Windows 7. Thanks!
posted by Rad_Boy to Computers & Internet (9 answers total) 24 users marked this as a favorite
 
There are browser based extensions (chrome/firefox) that can download some/all of a website... DownThemAll is one that I have used in the past on firefox and which worked pretty well.

It also can depend on how the website you are looking to grab is structured. Most extensions can follow links to grab sub pages but can have issues if java or other scripting stuff is used.
posted by Captain_Science at 8:22 AM on July 1, 2013


If you're using Firefox, the Scrapbook extension allows downloading via wildcards. So if the pages you're interested in have similar URLs, or if the links you want are all listed on one page, you can download them all at once.
posted by SuperSquirrel at 8:23 AM on July 1, 2013


I've used BlackWidow to grab copies of sites.
posted by belladonna at 8:24 AM on July 1, 2013 [1 favorite]


wget is what you want:

wget --mirror -p --convert-links -P ./[local-directory] http://website-to-mirror.com
posted by usonian at 8:25 AM on July 1, 2013 [1 favorite]


HTTrack Website Copier not only downloads the site, but it stores it on your computer so that you are able to use it as if you were on the actual site.
posted by 1367 at 8:52 AM on July 1, 2013 [3 favorites]


Along with the other wget command-line options in usonian's answers, I would add --no-check-certificate to forestall the occasional failure due to SSL certificate issues. FWIW, I also find that --mirror often brings in too much. The following is enough in my experience:

wget --no-parent --no-check-certificate -rkKp http://website-to-mirror.com

(Not arguing with usonian's answer, just offering a possible alternative.)
posted by StrawberryPie at 9:54 AM on July 1, 2013 [1 favorite]


Wget is the old reliable of web site spidering utils. Useful for those occasions when you want to grab something and don't want to launch a web browser to do it. Personally I like the GUI on httrack under both Windows and Linux.
posted by endotoxin at 10:14 AM on July 1, 2013 [1 favorite]


I like the grabmybooks addon for Firefox. It turns the website into a epub file so you can read it on your e-reader. It has a 'grab tabs' feature so you can have 20 tabs open at once and it will grab all the text in them in the sequence. There are some issues when there are a lot of tables, or frames, etc., but it works pretty well 95% of the time.
posted by jyorraku at 7:09 PM on July 1, 2013


I ended up using the HTTrack Website Copier. It worked perfectly and was very easy to get started. Thanks to all.
posted by Rad_Boy at 8:43 PM on July 1, 2013


« Older Old Skool Exchange Tabs   |   (SPOILERS) World War Z questions inside. Newer »
This thread is closed to new comments.