What wget configuration do I need to grab only mp3s, courteously?
July 2, 2004 9:36 AM   Subscribe

mp3 blogs: Say I'm reading an mp3 blog that links to several external sites hosting legal mp3s. I'm running os x. What wget configuration do I need to use to grab the mp3s in a given entry that are hosted elsewhere, do so in a courteous, non-hogging manner, and not grab the html or graphics?

I've been trying:

wget -r -l2 wc -A mp3, Mp3 -R html, jpg http://mp3blog.com

But it isn't working. I'd like to throttle the bandwidth down to so I don't crush anyone's servers. Thanks for any tips.
posted by mecran01 to Computers & Internet (7 answers total)
 
1. use firefox
2. load the spidezilla extension, which is a wget graphical interface.
3. Profit!
posted by Fupped Duck at 9:52 AM on July 2, 2004


Or,

Download Deep Vacuum which is a GUI for wget [wget baffles me also, and I'm not afraid to admit it] and, oh yeah,

3. Profit!
posted by plemeljr at 9:56 AM on July 2, 2004


Response by poster: Oddly, deep vacuum isn't getting much, even when I tweak the settings, so I suppose it's a robots.txt issue. Oh well, DV will be useful for other things, thanks.
posted by mecran01 at 11:00 AM on July 2, 2004


In general, I've found excluding html usually doesn't work, since it won't get the actual homepage either. So it never even gets the page that has the links to the mp3s.
posted by smackfu at 11:29 AM on July 2, 2004


Try using -erobots=off on the command line for wget.
posted by kindall at 11:50 AM on July 2, 2004


I use the command:
wget -r -l 2 -H --no-parent -A.mp3 -R.txt,.html.,.htm,.php --follow-tags=a -nd http://www.sitename.com/mp3directory/page.html

(I think your wget might be written wrong-- you might need the . in .mp3, etc.)

This will pull only the mp3s on a page (and on any other page within a click-- change the '-l 2' to '-l 1' if you only want the mp3s on that page.

I set up that command as a little shell script so I only have to type 'suckmp3s http://www.sitename.com/mp3directory/page.html'. Email me if you need more info about shell scripts... or google for "basic bash scripting",
posted by maniactown at 5:56 PM on July 2, 2004


Response by poster: Very cool, and thanks. I just ran across Jeffrey Veen's version, which is as follows:

http://www.veen.com/jeff/archives/000573.html

wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt

Where "mp3blogs.txt" is a list of mp3 blogs. Seems to work. I will put it in a Bash script using the directions above, thanks to all.
posted by mecran01 at 10:33 AM on July 8, 2004


« Older Computer sensitivity to speaker magnets &...   |   Murder shrine in front of my house Newer »
This thread is closed to new comments.