Daily webpage snapshot?
October 14, 2006 8:09 PM   RSS feed for this thread Subscribe

Is there any software/service which will let me save a daily snapshot of a webpage?

I'm looking for a way that every day - say at midnight - I can take a copy of a webpage and save it as a new version. This page belongs to a 3rd party. It is data intensive and each day the data gets refreshed and the old info lost. I want to be able to track it without having to physically save it each night. I'd like a service/software that does this automatically.
posted by missbossy to computers & internet (10 comments total) 4 users marked this as a favorite
If you have a Mac or Linux/Unix system, the wget or curl commands can be set up to create a mirror of a given website in its entirety. If you are somewhat programming savvy, you could set up a script that runs the command once a day, each time saving it to a different directory.
posted by drmarcj at 8:29 PM on October 14, 2006


wget is the tool you want.

This Lifehacker article gives a pretty good explanation of wget, and a brief tutorial on how to use it.

It can be run under windows using the task scheduler, or using crontab on a mac or linux.
posted by chrisamiller at 8:32 PM on October 14, 2006


The WayBack Machine does this.
posted by blue_beetle at 8:43 PM on October 14, 2006


Sites can block access the wayback machine's archiving via a < a a href="http://www.archive.org/about/exclude.php">robots.txt file.
posted by SirStan at 10:24 PM on October 14, 2006


drmarcj, wget and curl work just fine on Windows too.
posted by Rhomboid at 10:37 PM on October 14, 2006


The wayback machine doesn't track the page I'm interested in.

wGet might be the ticket but it takes more programming ability than I have. I followed the instructions at lifehacker and for the life of me cannnot get it to produce unique file names with a timestamp.
posted by missbossy at 12:08 AM on October 15, 2006


What kind of a system are you on?

If it's Windows, maybe this might help: WGetGUI. It looks like it has a timestamping option available.
posted by AmbroseChapel at 3:54 AM on October 15, 2006


If you're on a system where you can use Perl, it's a pretty simple script. Something like
use LWP::Simple;
$timestamp = time();
getstore('http://ask.metafilter.com',"metafilter-$timestamp.html");
would do it.
posted by AmbroseChapel at 4:00 AM on October 15, 2006


If the look of the page is more important than the data, then look into Paparazzi. It's a great (free) Mac app to take page snapshots.
posted by wackybrit at 5:49 AM on October 15, 2006


Thanks for that additional suggestion Ambrose. It made the wGet much more manageable... unfortunately the timestamp still doesn't work.
posted by missbossy at 9:08 AM on October 16, 2006


« Older Are there tax consequences fro...   |   I partied in London on Millenn... Newer »
This thread is closed to new comments.