Downloading all files from webserver
April 3, 2007 8:47 PM   Subscribe

Downloading every file from a website

I have to download multiple small files (over 2000) from a website (using autoindex for listing). There are folders and subfolders that i need to download and keep the files in their respective folders (it is much too much work arrange them in folders after I downloaded the files). I tried using DownThemAll extension in Firefox, but it doesnt work well for subfolders(have to do click on each folder and restart DownThemAll). Also the website requires authentification via a form (username and password)

I need a program that can do this for me, keeps the files in their respective folders, and can download selectively by extensions.
I tried SurfOffline, and BackStreet Browser and they dont work(or i couldnt make them work).

Any suggestions are appreciated
posted by victorashul to Computers & Internet (9 answers total) 6 users marked this as a favorite
 
wget would be the tool of choice, though it seems like you're on Windows. I dunno if there's a Windows version (I'd be shocked if there wasn't.), but the syntax wget -m http://www.yoururlgoeshere.fake would give you the whole site.
posted by SansPoint at 8:55 PM on April 3, 2007


There's a DOS version of wget.
posted by pompomtom at 9:00 PM on April 3, 2007


Best answer: You can get wget for Windows as a part of the Cygwin distribution (just select wget from the long list of available packages, in the installler), or you can get a stand alone version, based on the Cygwin one.

A quick google discovered this Gui wrapper for wget for Windows, which might also interest you.
posted by gmarceau at 9:19 PM on April 3, 2007


(too slow but) There are also a handful of Windows GUIs for wget.
posted by misterbrandt at 9:22 PM on April 3, 2007


Response by poster: Does wget support sites that need authentification?
And will i be able to get all the files and keep the folder structure?
posted by victorashul at 9:28 PM on April 3, 2007


Yes and yes.
posted by flaterik at 9:37 PM on April 3, 2007


I second cygwin+wget (or curl, your choice), but if installing cygwin is too much, there's a standalone Windows binary for curl here. Use all appropriate caution; I didn't test that file or even try to download it (I'm at home on my Mac right now), so don't blame me if you hose a production system. It looks fairly legit, though.

A decent introductory tutorial for curl is available, that you might want to read. It's a pretty powerful program; used correctly it can crawl through just about anything, even PHP generated pages and other types of dynamic HTML.

More to the point in your case, I think, curl makes it pretty trivial to download sequentially numbered files. See this page, about 2/3rds of the way down for the syntax. You can specify different parts of the URL to increment, to crawl sequentially numbered folders, and get sequentially numbered documents inside of them. Not totally sure about grabbing the directory hierarchy but I think you'll want to use the -O (use remote filename to save as) and --create-dirs (create local directory structure) options. Not sure how the windows binary lets you call options, hopefully it has a nice GUI for you.

And finally, here's the manpage for the "official" reference, you might just want to glance across it.
posted by Kadin2048 at 9:44 PM on April 3, 2007


Best answer: I use httrack for this purpose. It's simple and does what you want.
posted by Mitheral at 10:05 PM on April 3, 2007


Response by poster: @Mitheral:
HTTTrack does not support pages that requires form authentification (or i didnt find the setting).
posted by victorashul at 11:21 AM on April 4, 2007


« Older Burn The Canon!   |   Getting over feelings of betrayal Newer »
This thread is closed to new comments.