Website Mirroring Tools
March 28, 2004 4:27 PM   Subscribe

Website Mirroring Tools: I usually use wget, but find that it seems to have at least one nasty flaw: it doesn't fetch stylesheets, or images named in stylesheets (it may also have other flaws I'm not aware of). Is there something better out there that I can just give a homepage URL and have it suck down an entire site?
posted by namespan to Computers & Internet (7 answers total) 1 user marked this as a favorite
 
omt: "OMT is a simple script for mirroring Web pages for off-line/mirror reading. It rewrites the content of the pages to make a complete and functional mirror. It has a number of options to specify what files should be mirrored and what renaming should occur."

httrack: "It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer."

w3mir: "w3mir supports HTML4, and has partial support for CSS, Java and ActiveX. And it should work on Win32 machines."

Generally, I just use wget -m -np -t 0 -c -p URL, but I don't really care that much if I'm missing a couple of stylesheets.
posted by majick at 5:35 PM on March 28, 2004


I'm using wget version 1.8.2, and it has a -p or --page-requisites option. According to the man page:
This option causes Wget to download all the files that are necessary to properly display a given HTML page. This includes such things as inlined images, sounds, and referenced stylesheets.

Ordinarily, when downloading a single HTML page, any requisite documents that may be needed to display it properly are not downloaded. Using -r together with -l can help, but since Wget does not ordinarily distinguish between external and inlined documents, one is generally left with "leaf documents" that are missing their requisites.
Perhaps that'll do what you want? I don't know if it will get images mentioned in a .css though. But hey, I bet it will get embedded MIDI files, and that's almost as good.
posted by mragreeable at 5:40 PM on March 28, 2004


<rueful caveat>
just be damn careful about trying to use wget on, say, a geocities site, escpecially if you don't fully understand the ludicrous plethora options, each of which is expressable in alternate ways, not to mention the geocities habit of indecipherable spreading of various site bits across multiple hosts.
posted by quonsar at 6:57 PM on March 28, 2004


install Firefox, then install Spiderzilla.
posted by trondant at 7:28 PM on March 28, 2004


So what are everybody's favorite pr0n sites?
posted by coolgeek at 7:54 PM on March 28, 2004


coolgeek: I'm pretty sure you're just being snarky or trying to be funny, but the question's been answered already.
posted by majick at 8:27 PM on March 28, 2004


How do I use wget to just strip mine .mp3s from a site?
posted by mecran01 at 6:44 AM on March 29, 2004


« Older How are Spanish and Portuguese different when it...   |   On Pins and Needles Newer »
This thread is closed to new comments.