How frequently to refresh two data sources for optimal user experience?
February 4, 2013 1:29 PM   Subscribe

I'm finishing up a web reporting solution and am configuring the data-retrieval and data-caching parameters. There are three time variables involved - how often to retrieve new data, how long to cache that data on the server, and how long to tell clients to cache the data for. Help me figure out the optimal values for caching based on a set retrieval frequency.

Data caching in this tool works like this: "Each data result will be cached in memory on the server or client after the first time is Retrieved. Cached results will remain in memory and be used instead of querying the data sources for fresh results until the specified duration has expired.

There are two caching variables - "server data cache" and "client data cache." The tool is silverlight based.

If I fix the data-retrieval frequency to hourly - every hour on the hour, what values should I set for server & client data cache to? I'm hoping there is a clever solution for "set them to x and y and then your users are guaranteed up-to-data data z% of the time." If there is, how do I compute the solution for different values of data-retrieval frequency?

Or should I just forget this and set them both to cache for 60 minutes?
posted by ish__ to Computers & Internet (3 answers total)
 
if you're fetching every hour, then you want the server to cache for an hour (or longer/forever, and have the fetch process write the new cache). You want the clients to cache until just after you next expect the data to be fresh, so their max-age shouldn't be constant. You don't want them all to refresh at the exact same moment though, so randomise their expiry time a little (introduce "jitter" to avoid a "stampeding herd").
posted by gregjones at 1:57 PM on February 4, 2013


If you want the client to have the most up-to-date results, the client-cache should be 0, which will cause it to query your server cache every time it wants to display the results. Turning this up will decrease the request load on your server at the expense of the client receiving the latest results delayed by $client_cache_time + ($server_check_interval % $client_cache_time)

The server's cache time should be determined by business needs, since this interval will determine how fresh the freshest results will be. The server cache should expire itself when retrieving new results. Rather than have the app check when the last results arrived, run the retrieval separately (after which it expires the server cache) and rely on the client to hit the server when its cache has expired.

In other words, treat it as two separate things: maintaining the server result set and managing the query interval from clients. It sounds like that no client can trigger an out-of-interval query of the backing data store, so the server is effectively polling at an interval to serve the same results to both initial-request and cache-expired clients.
posted by rhizome at 2:52 PM on February 4, 2013


If there's any way to ensure that the cache is always refreshed when it is updated, you're usually better off doing that and setting the cache expiration to some very long amount of time (if you set it at all).
posted by smoq at 3:38 PM on February 4, 2013


« Older Friends found out about our alternate lifestyle...   |   Help me be a busy bee in the Beehive State Newer »
This thread is closed to new comments.