Website updates without RSS
February 13, 2013 8:16 AM   Subscribe

How do I monitor changes to a website when the results vary based upon a selection from a drop-down menu? How can I find where they are pulling from?

I would like to track when the USDA updates a list of documents on this webpage, specifically for the Livestock Subcommittee.

On other pages where they don't provide automatic updates, I use the amazingly useful Firefox add-on Update Scanner. However, this page changes based upon what subcommittee I select from the drop-down menu, so I don't think Update Scanner will work.

Is there a way to set something up? Is there a way to "call" whatever database they are using for each record?

If there is a solution, I would also appreciate hearing how you arrived to it, as this is a quite common occurrence on USDA webpages.
posted by gagoumot to Technology (8 answers total) 2 users marked this as a favorite
 
I don't know of a site or tool that does this automatically, but it wouldn't be hard to make a script to check this. The subcommittee list and go button comprise a form. Submitting it makes a POST request to the server with the committee ID as a post parameter, eg OptionalText2DropDown=STELPRDC5058027&Go=Go for Compliance, ...

To get that page, you could use run:

wget --post-data "OptionalText2DropDown=STELPRDC5058027&Go=Go" http://www.ams.usda.gov/AMSv1.0/ams.fetchTemplateData.do\?template\=TemplateJ\&page\=NOSBCommitteeRecommendations

Which gives you the page. To check for new documents, you store the list of existing documents in a text file or database, and fetch and parse the page periodically. Memail me if you want some help or further explanation.
posted by heliostatic at 8:34 AM on February 13, 2013 [1 favorite]


You can use this URL for Livestock:
http://www.ams.usda.gov/AMSv1.0/ams.fetchTemplateData.do?template=TemplateJ&page=NOSBCommitteeRecommendations&OptionalText2DropDown=STELPRDC5058030
as the site accepts both POST & GET parameters, so you are fine with that URL and the mentioned Update Scanner.

You can change the last part according to the needed Subcommitte:

STELPRDC5058027 = Compliance, Accreditation, and Certification Subcommittee
STELPRDC5058028 = Crops Subcommittee
STELPRDC5097829 = GMO Ad-Hoc Subcommittee
STELPRDC5058029 = Handling Subcommittee
STELPRDC5058030 = Livestock Subcommittee
STELPRDC5072878 = Materials Subcommittee
STELPRDC5058031 = Policy Development Subcommittee


How to:

Check the source of the HTML of the USDA page (CTRL + u in most browsers) for the correct select-field (<select name="foo">). It has a name-parameter ('foo' in this example). Now just append the name-parameter and the wanted value of an <option value="xyz"> within the <select> to the URL you checked the source of and you can create your own URL to use with the add-on. Note, that you have to use & between the last value of the URL and the values you append like in the URL above.

This might not work on every page because it looks like there were different cooks who brewed the USDA webpage, so might have bad luck occasionally, when someone really only accepts the POST variant.
posted by KMB at 9:41 AM on February 13, 2013


FYI, the correct wget to acquire the Livestock Subcommittee docs is:

wget --post-data "OptionalText2DropDown=STELPRDC5058030&Go=Go" http://www.ams.usda.gov/AMSv1.0/ams.fetchTemplateData.do\?template\=TemplateJ\&page\=NOSBCommitteeRecommendations


(Note that three should be no line breaks in this command.) The only difference is the code after the "DropDown=" part. This will work from most unix/linux boxes. If you have a Mac, you will need to use the curl command instead, since they do not come with wget:

curl -d "OptionalText2DropDown=STELPRDC5058030&Go=Go" http://www.ams.usda.gov/AMSv1.0/ams.fetchTemplateData.do\?template\=TemplateJ\&page\=NOSBCommitteeRecommendations

The key to making this functional for you is how you parse and store the data and compare it against checks you run at a later time. As heliostatic mentioned, this can be accomplished via a fairly simple script that's run on a regular schedule.
posted by pmbuko at 9:43 AM on February 13, 2013


Wow, all these answers are great.

Does that mean in an URL each section following a '&' is a name parameter and a value that is being passed back and forth?

Is wget/curl available on Windows?
posted by gagoumot at 10:33 AM on February 13, 2013


On your first question: Yes. As stated above, the name is the name of the select and the value is the option-value. As also stated: This might not work 100% of the time, because the form states that it has to be "posted" but if you use it by URL you "get" it.

Using the other solutions will work around that but are probably more advanced to set up.

Also: cURL for Windows
posted by KMB at 10:51 AM on February 13, 2013


Would ChangeDetection not work for you? When there are changes, they email you a notification with details about the changes. There are various viewing options in the reports.
posted by Dansaman at 11:49 AM on February 13, 2013


Thanks everyone for your help!
posted by gagoumot at 2:03 PM on February 13, 2013


Page2RSS
posted by egk at 8:52 AM on March 18, 2013


« Older Anti-semitic lizards. But why.   |   Bionade for yanks? Newer »
This thread is closed to new comments.