How can I search this list more effectively
February 18, 2015 3:09 PM   Subscribe

Pennsylvania's malt beverage registration page lists every beer sold in the state. But it breaks up the page into groups of 25 so that I can't see the entire list in one place. It allows me to search individual brand names and manufacturers, but not BC#. Got any clues on how I can extract the info?

Ultimately, I need to sort on that BC#. Can I download the entire list somehow?
posted by sixpack to Computers & Internet (7 answers total) 3 users marked this as a favorite
 
A web scraper will make short work of this.

Here's a Chrome Extension that's fairly user-friendly for small jobs like this.
webscraper.io
posted by nedpwolf at 3:31 PM on February 18, 2015 [1 favorite]


I poked around a bit in case there's some behind-the-scenes way of just adding the number of results to the server request; no dice. Shouldn't be hard with a scraper tho; Mechanize exists for Python and Perl in addition to the oh-so-handy ruby gem in the link, in case webscraper.io can't pull what you need.
posted by aspersioncast at 3:50 PM on February 18, 2015


You might also just ask them. When I worked in a state web job, I dumped data for people to save them from having to either scrape our apps or file requests for public data.
posted by advicepig at 6:24 PM on February 18, 2015


Agreeing with advicepig: dollars to donuts a few phone calls will get you to someone that has it in an excel spreadsheet they'd be willing to send you.
posted by GPF at 8:23 PM on February 18, 2015


Here you go: I used Kimono and linked the results to a Google spreadsheet, from which you can export an excel file if you like.
posted by muta at 9:24 PM on February 18, 2015


Unfortunately the PLCB is a bit slow on replying to data requests and has required FOIAs in the past. In any case, I want to glean this info repeatedly, so the web scrapers are perfect. Extra thx to muta for my first spreadsheet.
posted by sixpack at 7:23 AM on February 19, 2015


Just to test myself I went ahead and wrote a basic ruby script for this and threw it up on github; I'm sure it could be improved, but it's here. You'll need ruby > 1.8, and the mechanize/csv gems.
posted by aspersioncast at 11:46 AM on February 23, 2015


« Older How to make a social media buffet.   |   Is this a good deal for old cast iron pans in need... Newer »
This thread is closed to new comments.