Learning how to automate downloads
December 19, 2010 5:00 AM Subscribe
Please teach me how to automate the downloading of a large number of files from a website.
I need to download a large number of files. I have a list of URLs for the files (in the form http://server.org/filenameXXXX.mov). The files are quicktime movies, in case that matters.
I want an automated way to go through my list of files and download each one of them to a particular local folder. I can do this on windows or linux.
Bonus points for explaining how to do this in simple words or pointing me toward good resources for learning about this.
I'm sure that doing this is quite simple, but I couldn't find the right combination of terms to google to get some basic instructions.
I need to download a large number of files. I have a list of URLs for the files (in the form http://server.org/filenameXXXX.mov). The files are quicktime movies, in case that matters.
I want an automated way to go through my list of files and download each one of them to a particular local folder. I can do this on windows or linux.
Bonus points for explaining how to do this in simple words or pointing me toward good resources for learning about this.
I'm sure that doing this is quite simple, but I couldn't find the right combination of terms to google to get some basic instructions.
In Linux you would use wget to download a single file:
wget http://server.org/filenameXXXX.mov /local/path/
posted by XMLicious at 5:08 AM on December 19, 2010 [1 favorite] Best answer:
posted by nicwolff at 5:30 AM on December 19, 2010 [1 favorite]
wget -i list_of_URLs.txt
posted by nicwolff at 5:30 AM on December 19, 2010 [1 favorite]
You don't need linux to use wget, it works just fine on windows. Put your list of URLs in a text file and run nicwolf's command and it will download all of them into the current directory. (So set the current directory to whereever you want them to reside.)
posted by Rhomboid at 5:42 AM on December 19, 2010 [1 favorite]
posted by Rhomboid at 5:42 AM on December 19, 2010 [1 favorite]
Don't you need to install cygwin to run wget on windows?
posted by bardophile at 5:45 AM on December 19, 2010
posted by bardophile at 5:45 AM on December 19, 2010
I use DownThemAll that monkeymadness mentions. An easy way to turn a text list of urls into clickable urls is to copy/paste them into a gmail message and send it to yourself. Open the message, and the urls will show up as links. Right-click, select DownThemAll, and in the pop-up choose "Videos" (if necessary -- perhaps all the file types will already show up selected), and you'll see a list of all the .mov urls selected. Just choose your destination folder ("Save Files In") and click "Start."
posted by taz at 5:48 AM on December 19, 2010
posted by taz at 5:48 AM on December 19, 2010
Never mind, I guess that's just the way I learned to do it. Lifehacker ran a feature on Mastering Wget a while ago. The instructions work, and are pretty detailed.
posted by bardophile at 5:50 AM on December 19, 2010 [1 favorite]
posted by bardophile at 5:50 AM on December 19, 2010 [1 favorite]
Or, because we all love commandlinefu, you could do the sequence in seq and then parralise the downloads for more remote server abuse.
(There may be a wget switch for processing urls in parallel but I use xargs so much it flows of the fingers)
Everyone loves wget, but I'm duty bound to suggest the alternative of curl.
Will download 10,000 movies, assuming the names are sequential numbers.
Will download filenamefile.mov, filenameother.mov etc etc.
If there is an index file available on the site then you may be better off using that to generate the list of URLs.
This is, of course, the road to command-line madness. You normally end up using a pipeline of 20 commands to do something that there is almost certainly a built-in to do already. And there are bound to be quirks like (trying to download lines of text that happen to end in "mov" in this case)... but regardless - this land of madness is where I'm happiest.
posted by samworm at 6:11 AM on December 19, 2010 [3 favorites]
seq -w 0000 0009 | xargs -P 20 -n 1 -I \{\} wget http://server.org/filename\{\}.mov
(There may be a wget switch for processing urls in parallel but I use xargs so much it flows of the fingers)
Everyone loves wget, but I'm duty bound to suggest the alternative of curl.
curl -O http://server.org/filename[0000-9999].mov
Will download 10,000 movies, assuming the names are sequential numbers.
curl -O http://server.org/filename{file,other,boo,foo,bar}.mov
Will download filenamefile.mov, filenameother.mov etc etc.
If there is an index file available on the site then you may be better off using that to generate the list of URLs.
lynx --dump http://www.google.com/ | grep -E "\.mov$" | awk '{print $2}' | xargs -P 20 -n 1 wget
This is, of course, the road to command-line madness. You normally end up using a pipeline of 20 commands to do something that there is almost certainly a built-in to do already. And there are bound to be quirks like (trying to download lines of text that happen to end in "mov" in this case)... but regardless - this land of madness is where I'm happiest.
posted by samworm at 6:11 AM on December 19, 2010 [3 favorites]
Another option is to use curl. It's similar to wget but has some advantages when you want to download many files with systematically formed filenames. If xxxx in your example is a number, you could use:
curl -O http://server.org/filename[0000-9999].mov
posted by brorfred at 6:14 AM on December 19, 2010 [1 favorite]
curl -O http://server.org/filename[0000-9999].mov
posted by brorfred at 6:14 AM on December 19, 2010 [1 favorite]
Opera can donwload all links from a single page (Links panel, select several, right-click, open). CuteFTP can also open single HTML files on a server, find all links and download them.
posted by oxit at 6:15 AM on December 19, 2010
posted by oxit at 6:15 AM on December 19, 2010
Response by poster: Thanks metafilter, you are awesome!
For now, I'm marking nicwolff's answer as best because it is very easy and will let me get started.
I'll study the other options you suggested to learn more.
posted by medusa at 6:27 AM on December 19, 2010
For now, I'm marking nicwolff's answer as best because it is very easy and will let me get started.
I'll study the other options you suggested to learn more.
posted by medusa at 6:27 AM on December 19, 2010
Taz, there's also the 'linkify' bookmarklet that turns text into clickable links. That is if you run into a list online for some reason or another. Handy on the iPad. Here's the bookmarklet code, if MeFi lets me link it.
linkify (drag to bookmark bar).
posted by monkeymadness at 10:46 AM on December 19, 2010
linkify (drag to bookmark bar).
posted by monkeymadness at 10:46 AM on December 19, 2010
« Older I haven't needed SQL this ugly before | Is there such a thing as an "international life... Newer »
This thread is closed to new comments.
posted by monkeymadness at 5:05 AM on December 19, 2010 [3 favorites]