Use up to ___ MB of space for the cache
May 10, 2008 9:08 AM Subscribe
What is the ideal size for a browser's disk cache?
I've searched for the answer to this but have never gotten a straight answer. Is there an ideal size for the disk cache? For example, if a browser sets the cache to 50 MB by default, should I leave it like that? If I have a big, fast hard drive, why not allocate 500 GB to the cache? Or does the cache get fragmented or otherwise difficult to search through, so that a smaller size is best?
I suspect the answer is probably, "It doesn't matter much." But, dang it, I'm curious!
I've searched for the answer to this but have never gotten a straight answer. Is there an ideal size for the disk cache? For example, if a browser sets the cache to 50 MB by default, should I leave it like that? If I have a big, fast hard drive, why not allocate 500 GB to the cache? Or does the cache get fragmented or otherwise difficult to search through, so that a smaller size is best?
I suspect the answer is probably, "It doesn't matter much." But, dang it, I'm curious!
There must be some kind of limit where the browser would spend more, or at least as much, time searching the cache for data as it would take to just fetch it from teh webz. I doubt browsers are smart enough to do that yet. I suspect that the faster your connection to the internet is, the less the cache matters. On dialup, I would think that bigger is almost always better.
But I also suspect that as broadband becomes more and more ubiquitous, the browser designers aren't spending a whole lot of time on the issue. It would be nice if there was some kind of HTML tag that holds the "last update time" so that as a browser was loading the index.html page, it would instantly know whether to bother checking the cache or comparing cached files. Maybe there is already?
Slightly related, I was recently looking to improve my own computers' performance on the internet. I have a router that has a DNS caching feature. And Windows does that too. I experimented with turning off one or the other, and it seemed faster to let the router do it and turn off Windows' client. How that relates is that I know that some ISPs have their own caching of files so that they reduce their outside traffic- maybe it's "cheaper" to let them do the caching and not worry about it at all.
posted by gjc at 10:20 AM on May 10, 2008
But I also suspect that as broadband becomes more and more ubiquitous, the browser designers aren't spending a whole lot of time on the issue. It would be nice if there was some kind of HTML tag that holds the "last update time" so that as a browser was loading the index.html page, it would instantly know whether to bother checking the cache or comparing cached files. Maybe there is already?
Slightly related, I was recently looking to improve my own computers' performance on the internet. I have a router that has a DNS caching feature. And Windows does that too. I experimented with turning off one or the other, and it seemed faster to let the router do it and turn off Windows' client. How that relates is that I know that some ISPs have their own caching of files so that they reduce their outside traffic- maybe it's "cheaper" to let them do the caching and not worry about it at all.
posted by gjc at 10:20 AM on May 10, 2008
Response by poster: gjc, I believe there is such a feature in HTTP via the Expires header.
posted by wastelands at 10:59 AM on May 10, 2008
posted by wastelands at 10:59 AM on May 10, 2008
Best answer: Most modern operating system's DNS client actually does Domain Name caching. A Web browser typically doesn't know and doesn't care who or what is resolving DNS queries. Caching DNS in your router may, or may not be smart. If you run torrent clients with tons of streams, and don't do port forwarding, your router will generally want to claim as much of its memory for NAT table entries as it can. If you stick to Web browsers, it probably doesn't matter much. Running your own DNS server can be nice, if you script it to pull down a DNS database update during off hours, at least for your top 1000 sites, and your ISP doesn't mind that. Otherwise, just relying on surfing behavior to force DNS updates on your local server is only a slight improvement in average DNS resolution time. If you're going to run DNS for your own address caching purposes, I highly recommend DLZ. This is not noob stuff, but it is very nice, if you've got the time to get it going. You definitely notice the "snap" you get, and people who visit your network think you have a much faster connection than you actually do, simply because they are never waiting on DNS.
Disk caching isn't a huge win for machines that have a decent, modern OS, in most cases, and broadband connections. Disk caching was all that made graphical browsers tolerable on dial up connections in the mid-90s, but today, it's a bit of a boost, but not a lot. It's more a help to your ISP, who probably doesn't have to feed all those logos and favicons, and page branding design elements of your favorite Web sites all the time, even if only from their own Akamai edge cache for every page load of CNN.com by each of its patrons.
Disk caches work in conjunction with memory caching, through your OS memory manager, on your local machine. Most OS memory managers are predictive with respect to swapping out RAM memory contents to disk, and with recent trends to multi-gigabyte PC memory specs, 95% of the time, the machine finds what it needs already in memory. The disk isn't often hit, directly, if the OS is working correctly, and there is plenty of free memory for it to use as disk cache.
Frankly, the value of a disk cache also has to do with how cache friendly a Web site is. As a user, there isn't much you can do about whether sites you visit are cache friendly, or not, but if you're spending significant time on sites which are dynamically generated (Web forums, news sites, DHTML, .NET, .asp, .php, or other database driven sites), a big disk cache setting isn't doing you much good. Frankly, 50 MB is a pretty generous cache, for an individual machine. I've run Squid proxies on 25 to 50 users networks for years, in the 2 to 4 GB range, and there is very little gain above 2 GB for shared caches, even with that many users. 100 MB is probably overkill for most individual Web users, but it's your disk space and your RAM. If you have typically have plenty of RAM for disk caches at even your most loaded conditions, bumping up the browser cache won't hurt anything, but if your operating system is working right, 95 to 97% of the time, it won't help one bit, either.
posted by paulsc at 11:49 AM on May 10, 2008 [1 favorite]
Disk caching isn't a huge win for machines that have a decent, modern OS, in most cases, and broadband connections. Disk caching was all that made graphical browsers tolerable on dial up connections in the mid-90s, but today, it's a bit of a boost, but not a lot. It's more a help to your ISP, who probably doesn't have to feed all those logos and favicons, and page branding design elements of your favorite Web sites all the time, even if only from their own Akamai edge cache for every page load of CNN.com by each of its patrons.
Disk caches work in conjunction with memory caching, through your OS memory manager, on your local machine. Most OS memory managers are predictive with respect to swapping out RAM memory contents to disk, and with recent trends to multi-gigabyte PC memory specs, 95% of the time, the machine finds what it needs already in memory. The disk isn't often hit, directly, if the OS is working correctly, and there is plenty of free memory for it to use as disk cache.
Frankly, the value of a disk cache also has to do with how cache friendly a Web site is. As a user, there isn't much you can do about whether sites you visit are cache friendly, or not, but if you're spending significant time on sites which are dynamically generated (Web forums, news sites, DHTML, .NET, .asp, .php, or other database driven sites), a big disk cache setting isn't doing you much good. Frankly, 50 MB is a pretty generous cache, for an individual machine. I've run Squid proxies on 25 to 50 users networks for years, in the 2 to 4 GB range, and there is very little gain above 2 GB for shared caches, even with that many users. 100 MB is probably overkill for most individual Web users, but it's your disk space and your RAM. If you have typically have plenty of RAM for disk caches at even your most loaded conditions, bumping up the browser cache won't hurt anything, but if your operating system is working right, 95 to 97% of the time, it won't help one bit, either.
posted by paulsc at 11:49 AM on May 10, 2008 [1 favorite]
I have kept my browser cache at 5 MB since approximately 1997. I have always had high-speed cable internet, and this size has never caused a problem for me. I set friend's and family's cache's to 25 MB just in case, and they never complain, either.
posted by chudmonkey at 11:55 AM on May 10, 2008
posted by chudmonkey at 11:55 AM on May 10, 2008
I've never set mine. There's no reason to think there's a realistic 'theoretical' point where it would take longer to search a cache then download something off the internet. If one 'step' of a disk cache lookup took time T, and downloading a fresh page off the internet took time T*N, then you would need to have T*2N files in your cache for the lookup time to be slower. So if it took 1,000 times as long to download something then find it in the cache, you would need to have 1.07*10301 files in your cache (1 followed by 301 digits)
posted by delmoi at 12:13 PM on May 10, 2008
posted by delmoi at 12:13 PM on May 10, 2008
Firefox users need not debate matters: install the Cache Status extension, and watch your cache stats to your hearts content. Bump up your disk cache, cut it down, browse around. Whatever benefits a cache can give you, you'll know.
posted by paulsc at 12:17 PM on May 10, 2008
posted by paulsc at 12:17 PM on May 10, 2008
Best answer: Caches can quickly hit a clear point of diminishing returns.
Theoretically, doubling the size of a cache halves the cache miss rate. So, say a 50MB cache has the needed file 10% of the time. That's a cache miss rate of 90% (100%-10%). Doubling the cache to 100MB would have that to 45%, which would mean that the hit-rate would go up to 55% (100%-45%). That's a better than 5x improvement, a huge win. The next doubling of cache size would halve the miss rate again, to 22.5%, leading to a hit rate of 81.5%. That's less than a 50% improvement, and the marginal utility of doubling the cache size again goes down even more while the costs (in disk space) keep doubling.
You'd have to know the cache hit rate for your current cache size to see whether increasing it would really be worthwhile. If your hit rate is in the low single digits, say 2%, then going from 50 to 500MB cache would be a huge win. You'd end up with a hit rate of something like 90%, a 45x improvement.
The interplay between that, cache lookup times, and network latency & speed would be important for finding the optimum.
posted by Good Brain at 2:02 PM on May 10, 2008
Theoretically, doubling the size of a cache halves the cache miss rate. So, say a 50MB cache has the needed file 10% of the time. That's a cache miss rate of 90% (100%-10%). Doubling the cache to 100MB would have that to 45%, which would mean that the hit-rate would go up to 55% (100%-45%). That's a better than 5x improvement, a huge win. The next doubling of cache size would halve the miss rate again, to 22.5%, leading to a hit rate of 81.5%. That's less than a 50% improvement, and the marginal utility of doubling the cache size again goes down even more while the costs (in disk space) keep doubling.
You'd have to know the cache hit rate for your current cache size to see whether increasing it would really be worthwhile. If your hit rate is in the low single digits, say 2%, then going from 50 to 500MB cache would be a huge win. You'd end up with a hit rate of something like 90%, a 45x improvement.
The interplay between that, cache lookup times, and network latency & speed would be important for finding the optimum.
posted by Good Brain at 2:02 PM on May 10, 2008
It comes at the cost of fragmentation and lots of disk reads/writes. IE6 used to allocate 10% of your disk for caches and this was kinda crazy. Just deleting the cache took forever. Microsoft has moved to between 50-250 megabytes in IE7 and I havent noticed any difference on the machines I manage. I believe 50 is the default for fresh installs. Perhaps MS's engineers agree that a large cache isnt helping anything.
I doubt cache is very important nowadays and putting in 100gig cache is probably a good way to encourage system slowdown via fragmentation for the off chance it might make yahoo load .003 ms faster.
posted by damn dirty ape at 2:52 PM on May 10, 2008
I doubt cache is very important nowadays and putting in 100gig cache is probably a good way to encourage system slowdown via fragmentation for the off chance it might make yahoo load .003 ms faster.
posted by damn dirty ape at 2:52 PM on May 10, 2008
Delmoi, care to prove up why that formula is valid here? How can that be a valid reflection of reality when it doesn't account for, well, anything? And I see no reason why you'd think it takes 1000 times as long to read a cached file off the disk than download it.
posted by gjc at 4:38 PM on May 10, 2008
posted by gjc at 4:38 PM on May 10, 2008
As disk sizes have grown, IE started to use silly amounts of space. I set it to no more than 50 mb.
posted by theora55 at 5:39 PM on May 10, 2008
posted by theora55 at 5:39 PM on May 10, 2008
« Older How Do I Load a Color Mgmt. Profile into Photoshop... | Academic library statistics? Newer »
This thread is closed to new comments.
if you only browse a few sites regularly the cache can be smaller and still provide decent coverage, but if you browse more widely the cache will have to be larger in order to be useful (by keeping static images etc from all those sites)... the defaults are picked to provide a decent benefit for most users without surprising them by stealing 500GB of their hard drive...
posted by russm at 9:31 AM on May 10, 2008