What wget configuration do I need to grab only mp3s, courteously?
July 2, 2004 9:36 AM Subscribe
mp3 blogs: Say I'm reading an mp3 blog that links to several external sites hosting legal mp3s. I'm running os x. What wget configuration do I need to use to grab the mp3s in a given entry that are hosted elsewhere, do so in a courteous, non-hogging manner, and not grab the html or graphics?
I've been trying:
wget -r -l2 wc -A mp3, Mp3 -R html, jpg http://mp3blog.com
But it isn't working. I'd like to throttle the bandwidth down to so I don't crush anyone's servers. Thanks for any tips.
I've been trying:
wget -r -l2 wc -A mp3, Mp3 -R html, jpg http://mp3blog.com
But it isn't working. I'd like to throttle the bandwidth down to so I don't crush anyone's servers. Thanks for any tips.
Or,
Download Deep Vacuum which is a GUI for wget [wget baffles me also, and I'm not afraid to admit it] and, oh yeah,
3. Profit!
posted by plemeljr at 9:56 AM on July 2, 2004
Download Deep Vacuum which is a GUI for wget [wget baffles me also, and I'm not afraid to admit it] and, oh yeah,
3. Profit!
posted by plemeljr at 9:56 AM on July 2, 2004
Response by poster: Oddly, deep vacuum isn't getting much, even when I tweak the settings, so I suppose it's a robots.txt issue. Oh well, DV will be useful for other things, thanks.
posted by mecran01 at 11:00 AM on July 2, 2004
posted by mecran01 at 11:00 AM on July 2, 2004
In general, I've found excluding html usually doesn't work, since it won't get the actual homepage either. So it never even gets the page that has the links to the mp3s.
posted by smackfu at 11:29 AM on July 2, 2004
posted by smackfu at 11:29 AM on July 2, 2004
I use the command:
wget -r -l 2 -H --no-parent -A.mp3 -R.txt,.html.,.htm,.php --follow-tags=a -nd http://www.sitename.com/mp3directory/page.html
(I think your wget might be written wrong-- you might need the . in .mp3, etc.)
This will pull only the mp3s on a page (and on any other page within a click-- change the '-l 2' to '-l 1' if you only want the mp3s on that page.
I set up that command as a little shell script so I only have to type 'suckmp3s http://www.sitename.com/mp3directory/page.html'. Email me if you need more info about shell scripts... or google for "basic bash scripting",
posted by maniactown at 5:56 PM on July 2, 2004
wget -r -l 2 -H --no-parent -A.mp3 -R.txt,.html.,.htm,.php --follow-tags=a -nd http://www.sitename.com/mp3directory/page.html
(I think your wget might be written wrong-- you might need the . in .mp3, etc.)
This will pull only the mp3s on a page (and on any other page within a click-- change the '-l 2' to '-l 1' if you only want the mp3s on that page.
I set up that command as a little shell script so I only have to type 'suckmp3s http://www.sitename.com/mp3directory/page.html'. Email me if you need more info about shell scripts... or google for "basic bash scripting",
posted by maniactown at 5:56 PM on July 2, 2004
Response by poster: Very cool, and thanks. I just ran across Jeffrey Veen's version, which is as follows:
http://www.veen.com/jeff/archives/000573.html
wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt
Where "mp3blogs.txt" is a list of mp3 blogs. Seems to work. I will put it in a Bash script using the directions above, thanks to all.
posted by mecran01 at 10:33 AM on July 8, 2004
http://www.veen.com/jeff/archives/000573.html
wget -r -l1 -H -t1 -nd -N -np -A.mp3 -erobots=off -i ~/mp3blogs.txt
Where "mp3blogs.txt" is a list of mp3 blogs. Seems to work. I will put it in a Bash script using the directions above, thanks to all.
posted by mecran01 at 10:33 AM on July 8, 2004
This thread is closed to new comments.
2. load the spidezilla extension, which is a wget graphical interface.
3. Profit!
posted by Fupped Duck at 9:52 AM on July 2, 2004