Download Google Web History?
July 22, 2010 1:50 PM   Subscribe

How can I download my Google web history in a way that maximizes the data's usability?

I have recently been examining my google web history and have seen that there is a ridiculous amount of information in there. I would like to erase it from the cloud, but would also like to save the data before i do so.

What are my options?

Thanks!
posted by Popcorn to Computers & Internet (5 answers total) 9 users marked this as a favorite
 
Response by poster: Also, it would be really nice if I could still use google's interface to browse the downloaded history after I have erased it. I have a feeling that this is impossible, but is there any other way to navigate this data aside from having to work it over in excel?
posted by Popcorn at 1:52 PM on July 22, 2010


Best answer: According to this there is a RSS feed URL for the web history that has a 'num' parameter that controls how many results are included, so in theory you could crank that up to a large number and get the entire history as an XML file. There's bound to be a limit somewhere though so if you have hundreds of thousands of queries it would probably not be able to get them all -- for that you might have to screen-scrape.

Once you have it as RSS (XML) you can transform it into whatever form you want, that's the point of XML. For example you could view it in your browser with the default RSS stylesheet, or using perl and XML::RSS it would take only a few lines of code to output it formatted in whatever way you wanted.
posted by Rhomboid at 2:22 PM on July 22, 2010


Response by poster: Thanks Rhomboid, that seems to work, but I can only get the first 1000 or so, even if I change the num parameter to something higher. Any idea how to get it to show the previous thousand. I have a bout 10k entries so It wouldn't be terrible to make 10 xml files if I can find a way to output the older stuff. Any ideas?
posted by Popcorn at 2:48 PM on July 22, 2010


Response by poster: Never mind, found it in the comments of that page, you append &start=### to the url like this to tell it how far back to start.

https://www.google.com/history/lookup?q=&output=rss&num=1000&start=950

Still looking for ideas on how this could be navigated offline.
posted by Popcorn at 2:53 PM on July 22, 2010


Best answer: As for navigating it offline, one way to do it is to specify a XSLT stylesheet. This page gives an example of how to do that. So for example say you saved your search history RSS as 'foo.xml' -- you'd then take the first CSS snippet on that page (the one that's 69 lines long) and save that as 'foo.css' and take the XSLT snippet (the one that's 49 lines) and save that as 'foo.xsl' and put both of those in the same dir as 'foo.xml'. Then edit 'foo.xml' to add these two lines after the initial <?xml line:
<?xml-stylesheet type="text/xsl" href="foo.xsl" ?>
<?xml-stylesheet type="text/css" href="foo.css" ?>
Now if you open foo.xml in your browser you should see the feed styled as HTML instead of raw XML trees.
posted by Rhomboid at 3:42 PM on July 22, 2010


« Older Getting my kicks in the District   |   An unscientific method? Newer »
This thread is closed to new comments.