Learning how to automate downloads
December 19, 2010 5:00 AM

Please teach me how to automate the downloading of a large number of files from a website.

I need to download a large number of files. I have a list of URLs for the files (in the form http://server.org/filenameXXXX.mov). The files are quicktime movies, in case that matters.

I want an automated way to go through my list of files and download each one of them to a particular local folder. I can do this on windows or linux.

Bonus points for explaining how to do this in simple words or pointing me toward good resources for learning about this.

I'm sure that doing this is quite simple, but I couldn't find the right combination of terms to google to get some basic instructions.
posted by medusa to Computers & Internet (12 answers total) 9 users marked this as a favorite
If you can get a directory listing, (something like this), then you can use Firefox and the DownThemAll extension.
posted by monkeymadness at 5:05 AM on December 19, 2010


In Linux you would use wget to download a single file:
wget http://server.org/filenameXXXX.mov /local/path/
posted by XMLicious at 5:08 AM on December 19, 2010


wget -i list_of_URLs.txt
posted by nicwolff at 5:30 AM on December 19, 2010


You don't need linux to use wget, it works just fine on windows. Put your list of URLs in a text file and run nicwolf's command and it will download all of them into the current directory. (So set the current directory to whereever you want them to reside.)
posted by Rhomboid at 5:42 AM on December 19, 2010


Don't you need to install cygwin to run wget on windows?
posted by bardophile at 5:45 AM on December 19, 2010


I use DownThemAll that monkeymadness mentions. An easy way to turn a text list of urls into clickable urls is to copy/paste them into a gmail message and send it to yourself. Open the message, and the urls will show up as links. Right-click, select DownThemAll, and in the pop-up choose "Videos" (if necessary -- perhaps all the file types will already show up selected), and you'll see a list of all the .mov urls selected. Just choose your destination folder ("Save Files In") and click "Start."
posted by taz at 5:48 AM on December 19, 2010


Never mind, I guess that's just the way I learned to do it. Lifehacker ran a feature on Mastering Wget a while ago. The instructions work, and are pretty detailed.
posted by bardophile at 5:50 AM on December 19, 2010


Or, because we all love commandlinefu, you could do the sequence in seq and then parralise the downloads for more remote server abuse.

seq -w 0000 0009 | xargs -P 20 -n 1 -I \{\} wget http://server.org/filename\{\}.mov

(There may be a wget switch for processing urls in parallel but I use xargs so much it flows of the fingers)

Everyone loves wget, but I'm duty bound to suggest the alternative of curl.

curl -O http://server.org/filename[0000-9999].mov

Will download 10,000 movies, assuming the names are sequential numbers.

curl -O http://server.org/filename{file,other,boo,foo,bar}.mov

Will download filenamefile.mov, filenameother.mov etc etc.

If there is an index file available on the site then you may be better off using that to generate the list of URLs.

lynx --dump http://www.google.com/ | grep -E "\.mov$" | awk '{print $2}' | xargs -P 20 -n 1 wget

This is, of course, the road to command-line madness. You normally end up using a pipeline of 20 commands to do something that there is almost certainly a built-in to do already. And there are bound to be quirks like (trying to download lines of text that happen to end in "mov" in this case)... but regardless - this land of madness is where I'm happiest.
posted by samworm at 6:11 AM on December 19, 2010


Another option is to use curl. It's similar to wget but has some advantages when you want to download many files with systematically formed filenames. If xxxx in your example is a number, you could use:

curl -O http://server.org/filename[0000-9999].mov
posted by brorfred at 6:14 AM on December 19, 2010


Opera can donwload all links from a single page (Links panel, select several, right-click, open). CuteFTP can also open single HTML files on a server, find all links and download them.
posted by oxit at 6:15 AM on December 19, 2010


Thanks metafilter, you are awesome!

For now, I'm marking nicwolff's answer as best because it is very easy and will let me get started.

I'll study the other options you suggested to learn more.
posted by medusa at 6:27 AM on December 19, 2010


Taz, there's also the 'linkify' bookmarklet that turns text into clickable links. That is if you run into a list online for some reason or another. Handy on the iPad. Here's the bookmarklet code, if MeFi lets me link it.

linkify (drag to bookmark bar).
posted by monkeymadness at 10:46 AM on December 19, 2010


« Older I haven't needed SQL this ugly before   |   Is there such a thing as an "international life... Newer »
This thread is closed to new comments.