Help me automate web data entry for my job
June 4, 2018 5:20 AM   Subscribe

I'm looking for some insight on the best way to go about automating data entry into web fields.

I work for a division of a small state government agency, and we may soon be losing a longstanding contractor that annually enters about 3,000 reporting forms' worth of our data into federal websites that have no bulk upload option. Most of these forms are standardized, so I assume there has to be an easier way to go about this than just hiring someone else to type it in by hand.

Googling the question yielded this Youtube video along with a number of websites for software that could do this , but I'm looking for some insight from people here who have actually tackled a similar problem.

Bonus points for less-complicated and more user-friendly solutions. My background is statistics so most of my programming experience is R and SQL, but that's significantly more than most of my coworkers. I can probably get some coding support from our IT staff, but they're quite siloed from our division, so I would like to be able to very precisely define what we need from them if we decide to do this in house vs. getting a vendor to do it for us.

(question is only anonymous due to username being close to real name and this being about my place of employment, but I can contact via memail if you need follow-up)
posted by anonymous to Computers & Internet (7 answers total) 5 users marked this as a favorite
 
I haven't done this sort of thing during the current century but if you have any budget you may want to look at test automation tools used by developers of web-based software; software quality engineers have long sought ways to fob off this sort of tedious task on interns of uncertain skill level, so some tools are extremely simple to use once they're set up. Wikipedia's list of web testing tools.

(And thinking about it further, you may want to make preliminary contact with your IT staff to make sure they don't already have something like this purchased and set up, lest you go through a whole evaluation process to pick a tool only to find you didn't need to.)
posted by XMLicious at 5:43 AM on June 4, 2018 [1 favorite]


How uniform is the data and how bad are the consequences if things get entered incorrectly in a small portion of them? What I have often found is that trying to script tasks like this when the data is all over the place is that the effort to handle all the special cases ends up being more work than just doing it, especially if you are not a programmer.

If it is wonderfully uniform, if someone knows Python or you want to learn, the Requests library makes working with web sites relatively painless.

Selenium is the most common open source test automation software if you want to look at that route.
posted by Candleman at 7:04 AM on June 4, 2018 [3 favorites]


If the form is dumb and there's no authentication your solution could be as simple as writing something in R to generate a URL like this:
http://fedsite.gov/formsubmission.php?breakfast=waffles&lunch=cheeseburgers&dinner=hamburgers
Then you'd use R to request that URL, wait and see if you get a confirmation page, and then you'd know that record was submitted.

But if you have to log in to get to the form that might be annoying or if there's a captcha that'd be annoying or if the forms respond to what you enter that'd be annoying etc.

Really though, the issue is that it's going to be hard to test this without generating a bunch of potentially garbage submissions.

memail me if you want to chat about this.
posted by gregr at 8:27 AM on June 4, 2018 [1 favorite]


Selenium is definitely the best web automation tool, but these kinds of things are fundamentally brittle, especially so in this case since you do not control the interface that you are automating against. Have you reached out to the Fed person who runs the system? They may be interested in figuring out an alternative solution for submitting that data, that could be offered to other organizations as well.
posted by rockindata at 9:44 AM on June 4, 2018 [3 favorites]


The term of art for this currently is Robot Process Automation. I'm not super well versed in this space but I've always seen Selenium more focused towards testing rather than automating processes although I am sure with a little programming you can make Selenium do what you want. The RPA vendors will definitely have offerings that speak to your need although I can't say if you can get ease of use for your need at a price point you can afford.
posted by mmascolino at 10:47 AM on June 4, 2018 [1 favorite]


You can do something similar to what gregr is saying by using Postman's advanced features. You'll want to look into the "Runner" feature which will allow you to upload raw data as a CSV, json or even just raw text and make your web call on every unit of data you send.

You can setup environments with key/value pairs as well as global variables for data you need to persist. There's also a CLI interface called newman that can be used if you need to keep the return data locally. Otherwise the runner will tell you if your request passed or failed.

Postman seems like a relatively simple app but I've been digging into it lately and there's a lot of power there.
posted by bendy at 8:47 PM on June 4, 2018


Are you comfortable with the unix command line, in Linux or OSX/Mac for example? If so the results of a Google search for form fill wget csv might be of interest... if the form is a conventional one without crap like all sorts of javascript or hidden fields, and the data is high quality and saved to .csv, it seems like it might be doable with a single-line command. xargs might be helpful too but the Google results were better without it in the search.

This might qualify as "less-complicated" but it's only "more user-friendly" if you're already comfortable working in the command line. (Or if you have some additional motivation because you want to learn, I guess.)
posted by XMLicious at 9:15 PM on June 4, 2018


« Older I want a raise. Now what?   |   Leaving academia, librarian edition Newer »
This thread is closed to new comments.