Join 3,556 readers in helping fund MetaFilter (Hide)


Can computer geekery save me from data entry hell?
November 10, 2010 10:48 AM   Subscribe

Can you help me automate this big data entry project? There has got to be a way to harness the power of my computer to make the process easier than entering everything line by tedious line on a website!

My company keeps its files at an off-site storage facility. Over the years each person responsible for transmitting the storage data has had their own idea about what information should go in each field, so the inventory report is a giant mess. I'm trying to normalize it but I'm at a loss as to how to do so in the least painful way.

The inventory report from the storage facility comes in csv or xls format. There are a couple thousand records, each with 12 variables. I was hoping I could just correct data in the inventory report and send it back to them for upload, but no dice. My options are to go to their website, search each record, and then correct the data line by line, or to pay them an exhorbitant fee to do the same manual process themselves (so they say).

I'm thinking there has got to be a way to write a simple program that can at least take the data from my (corrected) spreadsheet and populate the individual fields on the website after I search each record and get to the edit screen. Or something. I'm not a computer programmer but I'm also not a dummy, and it seems like this wouldn't be the most elaborate programming to learn how to do. But I just don't know where to begin. Can anyone point me in a good direction? Thanks!
posted by Balonious Assault to Computers & Internet (18 answers total)
 
That's kind of ridiculous of them. I've worked with Iron Mountain and Recall in the past and they were both willing to let me email them spreadsheets with changes. I don't know how big your company is or how much pull you have but perhaps you could strongly hint that their help here would go a long way towards ensuring they get to keep your business in the future.
posted by ghharr at 10:56 AM on November 10, 2010


You could probably cobble something together using cURL.
posted by usonian at 10:58 AM on November 10, 2010


This is not super trivial actually. It is kind of difficult to write a program to interact with an existing web browser, so your best bet might be to automate the whole process.

I am a perl programer so this is what I would do (assuming they are using a web form rather then some sort of applet):

First you save the data in a csv or something like that.

Next you grab the source code of the page and find out what variables go where.

Then you have your program that:

Sets up an internet user agent (via LWP). The security might be a small issue at this point but it can be dealt with.

Creates and sends the request. If it is a GET then it is just sending the url. If it is a POST then you have to create the url and the payload.

Examines the reply to make sure there were no errors

Repeat till done.


It would be a not so bad program to use to learn how to program if that is what you want to do. There might be some non programatic ways to do this with web automation stuff. Automated QA programs might do the trick.

Good luck
posted by d4nj450n at 11:08 AM on November 10, 2010


You could probably do this with looped Ajax posts using Javascript or looped urlopen's in Python. But as d4nj450n said above, this isn't exactly trivial and there is some legwork that needs to be done so you know how to structure the posts
posted by jasondigitized at 11:42 AM on November 10, 2010


In principle you could use a tool like Watir to do this.

Whether it's really worth what it would take for you learn how to build a robust automated script to do the job is another matter.
posted by philipy at 11:47 AM on November 10, 2010


The best way to do stuff like this is Perl's WWW::Mechanize, but with the proviso that you're dealing mostly with HTML on the website where the data needs to be entered.

Often these days there's a ton of Javascript involved in the page and that's harder to interact with, you probably need to drive an actual browser in that case, using something like Selenium.

I'd offer to help -- email me -- but it's going to be difficult writing code to interact with their system without being able to log in to it.

Seconding ghharr too, they really ought to be able to do this for you at their end.
posted by AmbroseChapel at 11:52 AM on November 10, 2010


It's definitely possible that you won't save a whole lot of time when you factor in the time to learn to write a Perl or Ruby, and then debug your script. However, scripting is way more fun than data entry (at least to me, a professional programmer) and you're learning a useful skill.

I agree that LWP or Watir could both do the job. A debugger like Firebug may come in handy as well, to inspect the pages you're trying to automate and see what different HTML elements and form inputs are called. I think that using Watir could be easier for a beginner. It won't be as fast for bulk operation, but I think it let's you step through your script line by line to get some confidence that it's doing the right thing. (I've never used it, but my impression is that you're literally driving a browser and can see the results as they happen.)

The trickiest part will be making sure your script doesn't misbehave and make changes you didn't intend. Test a single entry very well, then a loop of just a few changes, before going all out. If you go the LWP route you should be able to have it write out the POST it would send without actually sending it, as a kind of dry-run. Program defensively, to abort the script if things aren't as it expects. If the website has an audit trail showing all the changes you've made then you can feel much more confident.

As to whether this is too ambitious for a beginner to programmer, this really depends on the kind of aptitude you have for it. There are plenty of folks with little or no programming education who do this level of scripting in their jobs.
posted by serathen at 12:17 PM on November 10, 2010


You might even consider writing your own log file, so you have some idea of what your script tried to do. Also, it's been a while since I've done much Perl coding, so I will defer to AmbroseChapel on which particular module you want.
posted by serathen at 12:28 PM on November 10, 2010


I would probably do this with a Greasemonkey script if I had to, but once again, you'd need to know javascript.
posted by pyro979 at 2:02 PM on November 10, 2010


Thanks for all the suggestions so far. I'm looking forward to researching them further. I'm not averse to learning some programming, but I do have to weigh time spent learning vs. time spend just doing it manually. I am certainly going to start wuth ghharr's suggestion of finding out how much they really value our business. It does seem ridiculous that they wouldn't be able to work with me if properly motivated.

I'm probably just exposing my ass for all to see, but I was thinking it would relatively easy to come up with something that's essentially a scripted set of copies, tabs and pastes, that could take the data from Excel and put it into the correct fields in IE, to help ease the tedium. An inelegant process something like this:

1. I search the website for a single record (manually)
2. I click 'Edit' (manually)
3. I run the Program (to copy the data for that record from my spreadsheet and paste it in the variable fields for that record -- the fields are all on the same webpage and I can tab between them)
4. I click 'Enter' (manually)
5. I repeat the process until I'm ready to shoot myself in the head.

Thanks again for all the help.
posted by Balonious Assault at 2:24 PM on November 10, 2010


While I'm not terribly familiar its capabilities, this sounds like the sort of thing yahoo pipes might be useful for.
posted by zen_spider at 4:09 PM on November 10, 2010


I think I have to retract my post if you have to manually search, i.e. if you can't programmatically get to the record you want.

If you know that item #142857 in the database needs updating in fields X, Y and Z, then you can probably do it all without a GUI browser. But if you need a human to steer the browser to a certain URL before the process can even begin, that's not so easy to automate.
posted by AmbroseChapel at 6:48 PM on November 10, 2010


Yeah, no, I'm just tossing out ideas as a non-programmer. Fully automated would be great and I'm going to try to go that route if I can. Just trying to get a feel for whether semi-automated might be a more realistic goal for me. I sincerely appreciate the ideas and the advice. At this point it's turned from a dreaded project into an interesting puzzle with a possible solution, so things are looking up for me at least for the time being. Luckily I don't have a deadline.
posted by Balonious Assault at 6:55 PM on November 10, 2010


The key thing you have to address is whether you can look up the records in the website by some kind of unique key which is also available on your end.

If you have a spreadsheet where Consolodated Widgets has an ID column with 142857 in it, and when you look Consolodated Widgets up on the website, the URL looks roughly like domain.com/dosomething.php?id=142857 then it might still be fairly simple.
posted by AmbroseChapel at 6:59 PM on November 10, 2010


Sounds like a job for Firefox + iMacros. iMacros runs a kind of pseudo-scripting language, so I would tackle this by using e.g. Perl to pull the fields from your spreadsheet and constructing an iMacros script, then running that script from within Firefox.

Is there anyone who's into programming in your business who you can ask for help? This is the sort of thing that a certain type of person (i.e. a geek like me) would be interested in tackling as a intellectual exercise, if you don't have the time/inclination to learn one and a half languages yourself (I am counting the iMacros scripts as a half-language!)
posted by primer_dimer at 2:28 AM on November 11, 2010


As everyone is posting sensible replies with respect to programming a solution to interact over HTTP etc., I wanted to come completely out of left field and suggest you use Sikuli:
Sikuli is a visual technology to search and automate graphical user interfaces (GUI) using images (screenshots).
An excerpt from an MIT press article:
In a new paper to be presented this spring at CHI, the premier conference on human-computer interactions, the researchers describe a new application of Sikuli, aimed at programmers working on large software development projects. On such projects, new code accumulates every day, and any line of it could cause a previously developed GUI to function improperly. Ideally, after a day’s work, testers would run through the entire application, clicking virtual buttons and making sure that the right windows or icons still pop up. Since that would be prohibitively time consuming, however, broken GUIs may not be detected until the application has begun the long and costly process of quality assurance testing.

The new Sikuli application, however, lets programmers create scripts that automatically test an application’s GUI components. Visually specifying both the GUI and the window it’s supposed to pull up makes writing the scripts much easier; and once written, they can be run every night without further modification.
I've used Sikuli only once; surprisingly it works. Unfortunately you'll need a better-than-beginner's knowledge of Python. Start at the "Demos" section, then read the "Documentation". Good luck!
posted by asymptotic at 5:42 AM on November 11, 2010


Along the lines of iMacros and Sikuli are chickenfoot and autohotkey.

Google have just released a data cleanup tool called Refine.
posted by Giant luck at 6:18 AM on November 11, 2010


I think you could use something like Selenium or Watir to implement your semi-automated solution pretty nicely. You keep a file of your changes (or just use the corrected CSV file). Then use the unique ID number of the item you're correcting as a parameter to the script.

Your workflow would be simply:
  1. Manually search for the item to correct
  2. In another window run your script with the item ID as the parameter
  3. The script will cause the browser to fill in each field in the form using your input file
  4. If the results look reasonable, you press submit and go on to the next item.
You'd still need Firebug to see what the names of form fields are, but if you're only using the script to autofill fields and not search for or submit anything then it becomes a much easier programming task. I think you can develop interactively, writing commands for the browser from the Ruby interactive shell and seeing the result immediately.
posted by serathen at 7:19 AM on November 11, 2010


« Older Please recommend some good wat...   |  Every week during Poetry Works... Newer »
This thread is closed to new comments.