How to detect network caching/filtering simply?
March 20, 2014 12:59 PM   Subscribe

I work for a company that develops, hosts, and licenses a web-based career guidance app that breaks when it is subjected to caching/filtering. How can we detect if this is happening?

The application depends on providing instant feedback to individual students, who each have a unique account. We've found that any caching on the school network results in actions not being saved correctly, thus student progress is not being recorded. One of the setup tasks for any school or district who license the application is to whitelist all of our hosts, but as is the way of such things, some IT departments do not or cannot completely exempt our traffic.

This is not related to browser caching; that can be handled by the application and is not an issue. We need to be able to detect when traffic between the student and our servers is being cached via a proxy or filter.

Is there a simple test that we can ask teachers to perform that would let us know if caching is still present? If such a test involves setting something up on our servers, we would be willing to do so; a reliable diagnosis would save a lot of time, headaches, and user frustration.
posted by Dipsomaniac to Computers & Internet (14 answers total) 1 user marked this as a favorite
 
Wouldn't switching to 100% HTTPS do the trick?
posted by ryanrs at 1:18 PM on March 20, 2014 [1 favorite]


One simple method would be to provide a URL that loads a page which has a fixed delay built into it - so there'd always be a 5 or 10 second delay before the page arrives.

If the page loads immediately, or in less time than you've set the delay for, it's being cached somewhere.

You can go more advanced than this and have a page which loads information from your delayed page via ajax, and times it for you. That way you can put a friendly front-end on your cache detector.
posted by pipeski at 1:20 PM on March 20, 2014


It seems like it'd be as easy or easier to make your application work properly in the presence of caches. Transfer all personalized information over HTTPS, as ryanrs suggests; or include a nonce in the URL (as a path segment or a query parameter); etc. (I'm assuming that you're already using cache control headers and they're being ignored by the proxy-caches or something.)
posted by hattifattener at 1:30 PM on March 20, 2014


Response by poster: We are already using 100% HTTPS; browsers won't cache those pages, but a network appliance is not bound by such protocols. Many schools have web filters that inspect all traffic ported through them, even if it's only source/destination that's being looked at, and many filters also cache content because it's an easier solution, or the school uses a basic caching proxy.
posted by Dipsomaniac at 1:38 PM on March 20, 2014


Seconding that detection is going to be harder than prevention. Using some combination of HTTPS, nonces and sending the right HTTP caching/expiration/Etags headers is your best bet.
posted by Aleyn at 1:44 PM on March 20, 2014


Response by poster: To reiterate; cache-control headers are not the answer. We are not experiencing caching issues with browsers. Much of the caching being done on school networks is at page level only, ignoring nonces and headers.
posted by Dipsomaniac at 1:55 PM on March 20, 2014


This sounds like a complicated application and networking problem that will not be solved over the internet. I agree with everything above, and will add the following:
- Cache control headers are ALSO a hint to caching proxies and cdns on how to cache your pages. Obviously they can choose to respect or not respect them as they see fit, but you should be doing this, as it is easy.
- You may be able to use a timestamp to detect pages that are extremely out of date. You will need another source of timestamp, since end-user machines may have arbitrary clock times.
- You may be able to use posts to get around caches.
- If you are already using https and setting cache-control headers, and are seeing inter-user pages showing up (ie user with cookie_id1 gets data for user with cookie_id2), then the school's caching configuration is borked, and you are going to have an annoying time. It sounds like you will have to make EVERY url unique, at the very least unique per user. Having a loading page that people bookmark that then redirects to /the/same/page?timestamp=1234567 will be annoying but might help.
posted by Phredward at 2:14 PM on March 20, 2014


Well, assuming you have teachers at the far end of the system who can do the checking for you, a fairly simple thing would be to have a page that displays the current time from your server's internal in hours:minute:seconds:maybe even milliseconds.

Have them load the page and then re-load it. If it remains the same then you've got caching.

The time is just an example of something that should change each time you reload. Another would be a 15 digit string of random numbers or letters, or give different randomly selected words or whatever. The point would be to have the teacher look at a page that should give you something different each and every time it is visited, then reload and note whether same or different.
posted by flug at 2:15 PM on March 20, 2014


As others have said, one solution is to integrate a nonce into the client-server communications. Basically a nonce is a number that will always be different from any of its predecessors.

The client code would need to be updated to check the nonce. If the nonce has been repeated, the client knows it is dealing with cached content. From there you can alert the user, or take other actions to mitigate the caching.

The server generates the nonce (there are lots of libraries out there) and inserts it where appropriate. For example, the nonce could be added to the http header of responses that should not be cached.

That's the high level bit, the details would depend on your specific scenario, but it sounds like using a nonce could be part of a solution for you.
posted by forforf at 2:37 PM on March 20, 2014


Forgot to mention, that there are lots of libraries for generating nonces. I'm not sure what platform/language you use, but it's almost certain a library already exists.
posted by forforf at 2:41 PM on March 20, 2014


Cache control headers are ALSO a hint to caching proxies and cdns on how to cache your pages. Obviously they can choose to respect or not respect them as they see fit, but you should be doing this, as it is easy.

cache control headers are no more a hint to the proxy than the URL you type is a hint to the browser. if a cache (either browser or shared) refuses to honour cache control instructions then it's not actually an HTTP proxy it's something else and needs to be fixed.

asking how you can solve the issue is like saying "sometimes our client unplugs their upstream network link, which prevents access to our application. how do we fix this from our end?"
posted by russm at 3:40 PM on March 20, 2014 [1 favorite]


Response by poster: We don't want to "solve" the issue. We want to be able to detect caching so that we can eliminate issues with the app and tell the client "Stop caching, and it will work."

If the client is caching, the app will break; this is expected behaviour and we make clients aware of it from the start. We don't need to get around the caching, we simply need to be able to verify it - at that point the client has the responsibility of meeting the technical requirements that they were told of before the license was purchased.

We deal with many different school boards and districts; we can't work around every filter and proxy out there.
posted by Dipsomaniac at 7:33 PM on March 20, 2014


Okay, detection then. Here's what I would do to solve this problem, assuming that you can assume Javascript support. Put some sort of handler on your webserver that can be accessed with what looks like a random page URL, e.g. /cacheverifier/[random number]. Output some Javascript on each page that contains a version number that increments each time you deploy new bits to your server, and use Math.random() to generate a random number you'll use to to generate a URL to your cacheverifier endpoint. Make an AJAX request to that URL, sending along the version number associated with the page. If the server receives a version number it knows is out of date, send back an error response that you can use to show your notice and do whatever error handling you want to do.

Incidentally, those are some pretty deficient proxies if they don't respect caching headers or query-string nonces. I could see a more general solution using nonces with URL rewriting to force cache breaking, but if your initial page request could be cached that'll only get you so far.
posted by Aleyn at 9:46 PM on March 20, 2014


If you want to be able to detect caching without requiring a deployment, you could do something similar to my earlier suggestion, but it would be a two-phase process. Basically, hit a URL that responds with a number, then hit the same URL again with the number in the request (e.g. in the query string or in a cookie), and have your server return that number plus 1 or something. If you get the same number as before back, you've just detected caching, assuming that the page serves all the right cache-breaking headers and so forth.
posted by Aleyn at 9:59 PM on March 20, 2014


« Older Recipes using a julienne peeler?   |   Group dining near Lincoln Center? Newer »
This thread is closed to new comments.