How do I bypass registration-only webpages like the googlebot seems to?
November 12, 2004 5:27 PM   RSS feed for this thread Subscribe

How do I bypass registration-only webpages like the googlebot seems to?

Sometimes when I click on a link, the content inside isn't the content promised by google, but, rather, a registration page. And often, the cached copy isn't the cached copy at all, but the same registration page. The first link here is an example. One very wise acquaintance explains that I need to modify my user agent to resemble a Googlebot. How? Is there a better way.
I know about bugmenot, and that's not what I am looking for.
posted by trharlan to computers & internet (13 comments total)
Also, I know that it is appropriate to employ a question mark when ending a sentence.
posted by trharlan at 5:32 PM on November 12, 2004


There is a Firefox extension called "User Agent Switcher". You could try that and set your User Agent to "Googlebot/2.1" and see what happens. Opera can do this as well.
posted by Voivod at 6:12 PM on November 12, 2004


Brilliant theory, Voivod, but I just tested and it doesn't seem to have worked.
posted by rafter at 7:27 PM on November 12, 2004


I suspect the sites will actually be allowing google to get the content page instead of the login page, either by user agent (unlikely) or hostname/IP. The fact that the chached page doesn't work either is that they put javascript in it to redirect to the login page: Try switching javascript off and going to the cached page. If it still manages to give you the login page then please give an example. (it could be a meta redirect, I don't know how badly behaved browsers are in accepting those outside of HEAD)
posted by fvw at 7:31 PM on November 12, 2004


I'm thinking this wasn't the best test case, rafter. If you cut and paste the link (dropping the referrer), you get a page not found error. So I'm now looking for a similar site.
posted by trharlan at 7:31 PM on November 12, 2004


Turning off javascript seems to have done the trick for the both the original and cached pages. Nice work, fvw.

Also, Voivod, the extension you suggested creates google-textad-free browsing
posted by trharlan at 7:57 PM on November 12, 2004


What fvw said. Also, you can always use ButMeNot for free sites that demand registration.
posted by revgeorge at 9:25 PM on November 12, 2004


Its a mixed bag. The javascript trick doesnt work for the Kansas Star. Some sites have special URLs which end in something like "/?Googlebot" and other sites just seem to allow through IP block.

I just use the bugmenot extension for firefox and have it auto-fill the reg page for me.
posted by skallas at 11:01 PM on November 12, 2004


The user agent trick does work for Kansas City Star
posted by skallas at 11:06 PM on November 12, 2004


Bugmenot has become almost useless of late. It can't get me into many sites I go to now even after 10+ different attempts. The sites are killing the logins very quickly.
posted by rushmc at 9:03 AM on November 13, 2004


bugmenot is spotty for me...but still useful.

If you block startribune.com from setting cookies, you can browse all you want and never have to register. You didn't hear that from me.
posted by gimonca at 9:45 AM on November 13, 2004


gominca: oooo! Thanks for the tip; I live in the Twin Cities and it really bugged me when they switched to registration.
posted by neckro23 at 4:04 PM on November 14, 2004


I just hit back and pick another server. If a site wants to piss me off, I don't want to use the site. All the best news comes from sites that WANT you to read them.
posted by krisjohn at 12:35 AM on November 15, 2004


« Older So: Delicious Library works pr...   |   My wife recieved a gift-certif... Newer »
This thread is closed to new comments.