Why won't my web page reflect changes in linked data files?
November 6, 2010 1:59 PM   Subscribe

My supplier is providing data at the promised frequency. I'm fetching it, and it's changing as expected, but the rendering of the data on my website does not reflect the changes. Any ideas what's going on? Details inside...

My organization is working with a data supplier who updates a feed for us every 15 seconds. They gave us a standard URL-based interface that looks like this:


When I enter the URL in my browser's address bar and hit refresh a bunch of times, I can see that the information is changing at the desired frequency. Everything appears to be working properly on the supplier's end. Now we want to display this data on our website, but some part of the system I have set up is not working the way I expected it to.

There are two computers involved on our end. Let's call them Data and Web. Data, an Ubuntu Linux box, makes the requests by issuing curl commands (using the above URL) scheduled using cron. It stores the data returned (an HTML fragment) to a file in a directory that is then shared via samba. Web is our public web server, Windows Server 2008 with IIS7. I created a virtual directory pointing to the shared directory on Data. On a test page, I used a server-side include to render the HTML fragment in our layout.

To some extent this works. I am definitely getting data from the supplier and it shows up on the page. I have verified that the data on Data is changing every 15 seconds. But when it comes time to render it, that 15 second frequency goes out the window. The weird thing is that eventually it does change. I'm not sure what the actual frequency is, but it is on the order of minutes (at least five, for sure) instead of seconds. You can hit refresh all you want, but it stays the same, even though the source HTML fragment on Data has changed....until eventually for some reason the web display changes too.

I have no idea what is going on. Is something getting cached somewhere automatically? I tried adding a no-cache meta tag on the test web page. I tried disabling output caching in IIS7. Neither of those worked. Could it be something about SSI that I don't understand?

posted by Fred Mars to Computers & Internet (5 answers total)
What happens if you use a wget or curl link to the page instead of a web browser? Do you get the correct data?

Are you flushing your output to the fragment in you output code?
posted by bottlebrushtree at 2:31 PM on November 6, 2010

Write a quick script to run on Web to check the data every 15 seconds. Run it on IIS, run it not on IIS. Does it change? If both are fine, its some weird problem with SSI. "Something in the pipeline is wrong" questions are very hard to debug. "This step in the pipeline is wrong" questions are much easier.
posted by devilsbrigade at 2:51 PM on November 6, 2010

Does the UI use flex? Flex can be a pita about ditching cached info, even on a hard refresh. We used to trick it by added a fake new parameter, to force it to treat it as a new page. Anything at all would work, &glooble=bahaha or whatever.
posted by L'Estrange Fruit at 3:25 PM on November 6, 2010

I would first check if the file in the directory that is created is updated. Check directly and not via the browser, check both via 'Data' and 'Web'.

If that file is correct then you know to investigate further down the chain (in your code, or the browser - perhaps set the cache-control settings in the browser).

If that file is not correct then you need to look lower in the chain, L'Estrange Fruit's suggestion of adding a random parameter to the URL requested is a good one.
posted by Gomez_in_the_South at 11:17 PM on November 6, 2010

Response by poster: Thanks for the responses, and you're right--that is a good point about "something in the pipeline" questions. I believe I have narrowed the problem down to some bug, limitation, or misconfiguration of IIS7's SSI feature. Some observations:

* Manual curl command returns expected data.

* Scripted curl returns expected data at desired rate, and I have verified (by two methods: 'cat data.txt' every 15 sec, and 'ls --full-time' on the directory) that the data changes over time. I tried the random param idea for kicks, but it didn't change anything. I was always getting correctly updated data from the supplier. And there is no Flex involved.

* I enabled includes in Apache 2 on Data, and the same thing that fails on Web works there.

* In addition to the HTML fragment designed for the SSI, my scheduled curl process also grabs a few chart images. They change at the same rate as the HTML and are stored in the same directory. My test layout has the SSI call and a table of these images. Refreshing the page from the IIS7 server, I can see the images change, while the SSI does not. (except at whatever as-yet untested frequency).

I am stumped here. Hopefully this new info helps. I also tried turning off all output caching on IIS7, including sending no-cache Cache-Control headers with all responses.
posted by Fred Mars at 1:35 PM on November 8, 2010

« Older I'm too old to be scared of intimacy!   |   I'm really not planning to forge report cards... Newer »
This thread is closed to new comments.