How do I determine what the 'real' age of a blog-post is?
September 25, 2010 9:25 PM   Subscribe

How do I determine what the 'real' age of a blog-post is?

So I was contacted recently by a literature researcher from a university in India who has been studying on how the web is changing the usage of Indian languages; seems like she is studying how people have been taking to blogs (specifically), including the creation of (Telugu) memes and acronyms, and how they've spread. It quickly became apparent, then, that the study needs to find a way to find the 'true' age of a page is, in order to reliably study growth / popularity of terms and phrases in question.

Restricting ourselves to blogs here (blogs hosted by blog-providers; not necessarily a GeoCities site updated frequently), is there any way in which you can find out when a post was "really" made? Blogspot.com and Wordpress allow for posts to be made on a past-date. We tried playing with Google's timeline settings, but I'm not in a position to determine whether there were false-positives or not.

Any ideas would be appreciated.
posted by the cydonian to Computers & Internet (7 answers total)
 
Well, there's always the Wayback Machine.
posted by mumkin at 9:28 PM on September 25, 2010


Is there some reason to think that people would falsify the dates on their posts? I think this might not be as much of a problem as you're assuming it is.
posted by madcaptenor at 10:22 PM on September 25, 2010 [2 favorites]


Google only stores one copy of the page, so the only thing it's going to be able to tell you is whether the post was made before or after X, where X is the date that Googlebot crawled the site. The Wayback Machine could help, assuming that it crawls the site often enough, but sites can opt-out of being archived and there's quite a significant delay between when content is crawled and when it becomes available in the archive (between 6 and 24 months!) so it won't help for recent changes.

Otherwise, there's the option of taking regular archive copies of the site yourself, but that means that you need to establish a list of blogs to study beforehand and it obviously doesn't help for past posts.
posted by Rhomboid at 10:53 PM on September 25, 2010


You're not going to get at "actual" time of writing in any reliable way. Under the hood tools like WordPress and Blogger.com have numerical "ID"s for the post. If you had a database of all posts in the universe of blogs I suppose you could get a sense of "creation time" -- but even then, while a post is still a draft, and not yet published, it may not have been "completed" in the sense you mean.

I think the only thing you have to rely on is publish date, the moment the blog actually shared the item with the world. Any sites that crawl blogs are appropriate here: google blog search is obviously huge, technorati, friendfeed, and other blog search engines will be helpful here. The wayback machine is great, assuming they have the sites you're looking at.
posted by artlung at 2:54 AM on September 26, 2010


Not only can you post-date a post (schedule it to be published in the future) you can also back-date it (change the date to an earlier one). In practice, bloggers rarely post-date, and rarely for more than one or two weeks in the future. Back-dating is almost never done.

However, if you need to be 100% academically rigorous, your only option is to contact the author and ask.

I would put together a form email that says "Hi! I'd like to include your post [URL] in my academic study on the usage of Indian languages. I need to verify that this post was actually written on [date]. If you wrote this post ahead of time, or back-dated it, please let me know."

Be sure to include a bunch of academic credentials, and your real name at the end so that they can verify you're legit. Unfortunately a lot of people will assume you're either a spammer or a scammer, and won't reply to your email.
posted by ErikaB at 9:36 AM on September 26, 2010


Data point: some of my blog posts are written months ahead.
posted by acoutu at 10:55 AM on September 26, 2010


Response by poster: Thanks everyone for the responses! Some quick responses:-

Well, there's always the Wayback Machine

Should have mentioned this in the question itself, but the Wayback Machine was where I checked first. It doesn't seem to have spidered blogpost pages that well; lots of missing pages in there.

Is there some reason to think that people would falsify the dates on their posts? I think this might not be as much of a problem as you're assuming it is.

No specific reason; if it were to me, I'd have taken the date as it exists, if only to see trends, as opposed to specific data-points. In the span of our conversation, we began to wonder if there's any technical way by which we can be sure about the dates.

However, if you need to be 100% academically rigorous, your only option is to contact the author and ask.

Which is exactly how the researcher contacted me. :)
posted by the cydonian at 3:07 AM on September 27, 2010


« Older Online Product Demo   |   If someone attacks me, I'll knock them over with... Newer »
This thread is closed to new comments.