Website modifications
April 21, 2006 6:55 PM   Subscribe

Is there a way to determine when a page (or a URL, for that matter) was last modified? I've seen a couple of Java suggestions, but they only give me the current date and time (and assume the page in question does not have a "this page modified on...." tag).
posted by astorias to Computers & Internet (16 answers total)
 
How about the header intended for just that?

% telnet localhost 80
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Sat, 22 Apr 2006 01:54:06 GMT
Server: Apache
Last-Modified: Sun, 01 May 2005 00:28:13 GMT
posted by kcm at 6:58 PM on April 21, 2006


Response by poster: kcm...thanks for the reply, but I'm a bit of a novice. What exactly do I do with the header?
posted by astorias at 7:00 PM on April 21, 2006


Look at the last line,

Last-Modified: Sun, 01 May 2005 00:28:13 GMT
posted by (lambda (x) x) at 7:02 PM on April 21, 2006


Response by poster: Maybe I didn't explain my issue in the best way. I have no idea how to check when a web page was modified, and I'm trying to determine how to do so. Thanks!
posted by astorias at 7:06 PM on April 21, 2006


Astorias, the question is clear and answered. Where it says "localhost" in the example, just substitute a URL. Or if you prefer, the info can be viewed right from your browser instead of opening a commandline session. Just get yourself Mozilla or Firefox, plus LiveHTTPHeaders.
posted by nakedcodemonkey at 7:13 PM on April 21, 2006


You're not being specific about what, but I'll guess Java as above. Look here.
posted by kcm at 7:16 PM on April 21, 2006


Uh, actually don't just substitute a URL. Duh. Put its domain namewhere 'localhost' is, then the remainder where that solo slash is on the second line.

telnet www.metafilter.com 80
GET /mefi/36789 HTTP/1.0


Then hit return twice, and you'll see the headers.
posted by nakedcodemonkey at 7:20 PM on April 21, 2006


Response by poster: Sorry folks...I know you are trying to help, but I don't get it. I don't know html or java or anything like that, so how about if I try this: How can I determine when this URL/web page was last modified:

http://www.chicagotribune.com/news/specials/chi-nepal-specialpackage,1,3634847.special?coll=chi-newsspecials-hed

What exactly do I do? I did try typing in the site instead of "localhost" in the address bar, but that didn't seem to work! Sorry to be a bother!
posted by astorias at 7:28 PM on April 21, 2006


Here's the problem with everyone's answer: "Last-Modified" is optional. If a server doesn't send it (like MetaFilter or the Chicago Tribune) then it's because the server doesn't know when the content was last modified. You see this a lot with dynamically generated content. Even if developers know how to determine the "Last-Modified" date, they often don't bother to put it in the header.
posted by sbutler at 7:35 PM on April 21, 2006


astorias: They're not talking about typing into your browser address bar; they're talking about typing into the windows command line. Go to the "Start" menu and click "Run...". Enter "command" into the box that appears and hit "OK". An old-school terminal window will appear; you can use the commands people are giving you there.

And yeah, there's not "Last-Modified" in the metafilter headers, at least.
posted by mr_roboto at 7:43 PM on April 21, 2006


HTTPLiveHeaders will makes this less difficult. But some pages are a pain to view headers for various reasons. Like, because the page is being dynamically generated each time so "modification date" has no real meaning. In the Tribune example, it's wants to intercept first to get user authentication and set a cookie.

Note that you'll also need to convert the header's GMT timestamp to your local time, which requires finding out what your local timezone's offset is from GMT.
posted by nakedcodemonkey at 7:43 PM on April 21, 2006


Here's a web-based tool for viewing headers. Though the caveats mentioned above still apply. Sorry.
posted by nakedcodemonkey at 7:51 PM on April 21, 2006


Astorias, the question is clear and answered.

Well... answered. It's not hard to see how someone not comfy with the nitty-gritty of HTTP might find things unclear at this point.

HTTPLiveHeaders will makes this less difficult.

If you're using Firefox, the really easy way to get this information (if it is available) by selecting Tools : Page Info under the menus.

I just tried it on the URL you gave me, and it gave me the current time. That likely means the page itself is being dynamically generated (therefore, the server reports its last change time to Firefox as the time of access).

In this case, I think the only thing you can really do is set up some way of saving the page at periodic intervals (say, every day, or week, or month), and then compare the cached versions with one another.

Or, you could look at services that do just that kind of periodic caching, like google or archive.org, but it looks like neither of them have cached versions of that specific page.

Or, ask the Chicago Tribune and trust their answer.
posted by weston at 7:53 PM on April 21, 2006


By the way, if you're going to type HTTP by hand with telnet you really need to include the "Host:" header. If you don't do this, you break "virtual hosting" which is how the vast majority of sites on the web are hosted. The Host: header is non-optional in HTTP 1.1, too.
posted by Rhomboid at 8:32 PM on April 21, 2006


Geez, talk about your nerdy answers! I think astorias is looking for something simpler.

In Firefox, right click on the page and select "View Page Info". One of the things in the "General" tab will be a "Modified:" field. If the web server is set up correctly, that's your answer.

But not all web servers will tell you the last modified date on a page. The chicagotribune.com URL you gave us, for instance, doesn't. In that case there's really nothing you can do to find when the page was last modified. Sorry.
posted by Nelson at 7:13 AM on April 22, 2006


Archive.org materials are (intentionally) 6-12 month behind the times, so nothing current would be in there anyway.

Weston, reassuring the poster that "Your question is clear" was in no way a comment about the clarity of answers.
posted by nakedcodemonkey at 12:16 PM on April 22, 2006


« Older Moving Windows Folders   |   Help me make a business case for RSS Newer »
This thread is closed to new comments.