Google Chrome stores complete text of pages. How do I access this cache?
August 18, 2010 4:21 PM   Subscribe

Do you use the Google Chrome browser? Try this: Open the history page, and search for a term. The resulting list will include not just the pages that have the term in their page's title or URL; it will also include pages that have the term in the actual text. This indicates that somewhere, Chrome is actually storing the text of all web pages you visit. (This is backed up by the fact that the search-results page shows you snippets from the results.) So here's the thing: I want to see the whole page of cached text, not just snippets.

This would be useful for offline access, webpages that have been deleted, and probably other things as well.

I assume that Chrome is storing the text in some kind of lightweight database. So it must be trivially easy to extract and view this text, perhaps by means of an extension. And yet I haven't been able to figure out a way to do this. Does anyone have any suggestions?
posted by Alaska Jack to Computers & Internet (10 answers total) 2 users marked this as a favorite
 
I don't know how to do what you want, but if you type about:cache in the address bar, you'll get a list of the indexed pages. If you click on one, you will the bytes and text (a hex dump) for that page.
posted by zippy at 4:35 PM on August 18, 2010 [2 favorites]


Windows only but should get you going. http://unlockforus.blogspot.com/2008/09/how-opening-google-chrome-files-history.html
posted by bitdamaged at 4:40 PM on August 18, 2010


Are you sure it's searching cached data and not just using Google itself to deep-search pages in history? I mean, it is a Google product, after all.
posted by restless_nomad at 5:56 PM on August 18, 2010


Pretty sure it's searching locally cached pages. When I did a search on the topic I found an article about users finding excerpts of password protected https bank pages containing their credit card numbers.
posted by zippy at 7:28 PM on August 18, 2010


This indicates that somewhere, Chrome is actually storing the text of all web pages you visit.

Not exactly. Chrome is storing an index which is why it's searchable. This is equivalent in a mathematical sense but not the same thing to a human.

a 16, 25
an 5
but 19
chrome 3
equivalent 14
exactly 2
human 26
in 15
index 6
is 4, 8, 13
it's 10
mathematical 17
not 1, 20
same 22
searchable 11
sense 18
storing
the 21
thing 23
this 12
to 24
which 7
why 9

See what I'm saying? A real index has that sorted list and the document positions. You can recreate the original document from that but it isn't the original document.

Dammit. Now I can't edit those two sentences. What I'm saying is Chrome possibly has the full body of the documents in original order but it doesn't necessarily.
posted by chairface at 7:36 PM on August 18, 2010


Response by poster: Chairface -

I see. Hm. This sounds like a great idea for a Chrome extension. Something that, once installed, would recreate the full pages from the index and let you see the full text of pages you'd previously visited.
posted by Alaska Jack at 8:06 PM on August 18, 2010


Googling around got me this, which lets you access the contents of the Chrome's cache and copy / view files out of it. It seems a little clunky, but it works.

For reference, Chrome's cache is stored in the %localappdata%\Google\Chrome\User Data\Default\Cache folder.
posted by poq at 9:43 PM on August 18, 2010


If it's stored the way that chairface proposes it might not be possible to recreate the full pages. Unless the order of the words was somehow stored you'd just get back word salad.
posted by XMLicious at 6:07 AM on August 19, 2010 [1 favorite]


@XMLicious - didn't you notice the numbers next to the words? Those are word locations. He did mess up by putting a 5 next to "an" and leaving the location of "storing" blank, but essentially the number next to the word is the position where it occurs. Try it yourself, 1 = Not, 2 = exactly, 3 = chrome 4=is, 5=storing (or it's supposed to), 6=index, 7=which, 8=is, 9=why etc.

That's how an index works. :)
posted by BigBenInLondon at 12:19 PM on October 1, 2010


Oops, not exactly in the way that chairface proposes, then. But you see that you don't have to know the position of the word in the page to know which words appear in which pages, right? At least not for individual words.
posted by XMLicious at 1:56 AM on October 7, 2010


« Older What's best for maintaining wood and leather...   |   How to Hug on Facebook Newer »
This thread is closed to new comments.