Download Google Web History?
July 22, 2010 1:50 PM Subscribe
How can I download my Google web history in a way that maximizes the data's usability?
I have recently been examining my google web history and have seen that there is a ridiculous amount of information in there. I would like to erase it from the cloud, but would also like to save the data before i do so.
What are my options?
Thanks!
I have recently been examining my google web history and have seen that there is a ridiculous amount of information in there. I would like to erase it from the cloud, but would also like to save the data before i do so.
What are my options?
Thanks!
Best answer: According to this there is a RSS feed URL for the web history that has a 'num' parameter that controls how many results are included, so in theory you could crank that up to a large number and get the entire history as an XML file. There's bound to be a limit somewhere though so if you have hundreds of thousands of queries it would probably not be able to get them all -- for that you might have to screen-scrape.
Once you have it as RSS (XML) you can transform it into whatever form you want, that's the point of XML. For example you could view it in your browser with the default RSS stylesheet, or using perl and XML::RSS it would take only a few lines of code to output it formatted in whatever way you wanted.
posted by Rhomboid at 2:22 PM on July 22, 2010
Once you have it as RSS (XML) you can transform it into whatever form you want, that's the point of XML. For example you could view it in your browser with the default RSS stylesheet, or using perl and XML::RSS it would take only a few lines of code to output it formatted in whatever way you wanted.
posted by Rhomboid at 2:22 PM on July 22, 2010
Response by poster: Thanks Rhomboid, that seems to work, but I can only get the first 1000 or so, even if I change the num parameter to something higher. Any idea how to get it to show the previous thousand. I have a bout 10k entries so It wouldn't be terrible to make 10 xml files if I can find a way to output the older stuff. Any ideas?
posted by Popcorn at 2:48 PM on July 22, 2010
posted by Popcorn at 2:48 PM on July 22, 2010
Response by poster: Never mind, found it in the comments of that page, you append &start=### to the url like this to tell it how far back to start.
https://www.google.com/history/lookup?q=&output=rss&num=1000&start=950
Still looking for ideas on how this could be navigated offline.
posted by Popcorn at 2:53 PM on July 22, 2010
https://www.google.com/history/lookup?q=&output=rss&num=1000&start=950
Still looking for ideas on how this could be navigated offline.
posted by Popcorn at 2:53 PM on July 22, 2010
Best answer: As for navigating it offline, one way to do it is to specify a XSLT stylesheet. This page gives an example of how to do that. So for example say you saved your search history RSS as 'foo.xml' -- you'd then take the first CSS snippet on that page (the one that's 69 lines long) and save that as 'foo.css' and take the XSLT snippet (the one that's 49 lines) and save that as 'foo.xsl' and put both of those in the same dir as 'foo.xml'. Then edit 'foo.xml' to add these two lines after the initial <?xml line:
posted by Rhomboid at 3:42 PM on July 22, 2010
<?xml-stylesheet type="text/xsl" href="foo.xsl" ?>Now if you open foo.xml in your browser you should see the feed styled as HTML instead of raw XML trees.
<?xml-stylesheet type="text/css" href="foo.css" ?>
posted by Rhomboid at 3:42 PM on July 22, 2010
This thread is closed to new comments.
posted by Popcorn at 1:52 PM on July 22, 2010