Batch job screen capture from database help?
May 21, 2013 9:16 AM   Subscribe

Let's say there is a database that I can only access from a front end tool, and that database cannot provide any extract of any sort. I also have no access to run reports off that database. And let's say that the only way to preserve that data is to print to pdf or do screen shots. btw- all legit and you are not helping me do something malicious.

Finally, the front end of this database requires mutliple 'clicks' to move from form to form, field to field in order to 'see' all the data elements that need to be preserved. Multiple forms and fields then must be compiled into individual records, compiled and saved.

Since there might have several million entries to capture, tell me oh hivemind, how you might do this. Is there a batch screen capture program that can do this work? Screen OCR? Other ideas?
posted by BrodieShadeTree to Technology (14 answers total) 4 users marked this as a favorite
 
Is the vendor of this app still in business? I can't believe they would not be willing to provide an export- even if they charge a reasonable (nominal) fee, it's worth well more than the time needed to view and extract the data record-by-record.
posted by mkultra at 9:23 AM on May 21, 2013


If the 'frontend tool' is in a web browser, that'd be very promising, but I'm assuming not. In that case, the first thing is see if the PDF has text in it--google PDF text extractor for a bunch of possibilities. The next thing is throw WireShark on your machine and see if there's unencrypted client-server chatter you can capture and parse.
posted by Monsieur Caution at 9:31 AM on May 21, 2013 [1 favorite]


Is the vendor of this app still in business? I can't believe they would not be willing to provide an export- even if they charge a reasonable (nominal) fee, it's worth well more than the time needed to view and extract the data record-by-record.

For what it's worth, I have first-hand experience with several of these systems - and yes, they often are unwilling to provide an export, for less than a truly staggering sum. It's a (disgusting) vendor lock-in strategy, but it's very real. I once spent a summer doing manual data entry from one system to another, because the former was intentionally designed to be inaccessible except through the provided front-end. After all, if you can't get your data out, you have no choice but to keep paying for support and upgrades...
posted by Tomorrowful at 9:31 AM on May 21, 2013


If printing shows you the whole result, in table form, print to a text file and then parse it in a spreadsheet (e.g., use Excel's Text to Columns feature).
posted by ubiquity at 9:32 AM on May 21, 2013 [1 favorite]


Tomorrowful: "For what it's worth, I have first-hand experience with several of these systems - and yes, they often are unwilling to provide an export, for less than a truly staggering sum."

Yeah, I've had experiences on both sides of that fence. It actually reminded me that you (OP) should check your contract- there may be a clause about data ownership you can leverage with them to get this done.
posted by mkultra at 9:39 AM on May 21, 2013


Response by poster: This is that situation where the data is not available as mentioned above.

The data is accessed thru VPN to the data site. No chatter.

We are reviewing the contract now to be sure.

I don't think we can get a single text file of the comprehensive entry, and certainly not in a table. It's a good idea though and thats why I asked about the batch print screen/OCR idea- to get to that point.
posted by BrodieShadeTree at 10:00 AM on May 21, 2013


Best answer: Could you automate copy/pasting from the form into excel using a GUI scripting engine?

It's been years since I've done something like this so I don't know what products are good these days. Another askmefi thread (from 2006) recommended Automate which is clearly still in business so it might be mature enough for this job.
posted by rouftop at 10:12 AM on May 21, 2013


You should be able to extract the text from a PDF without resorting to OCR. But I'm curious about the nature of your front-end program. Is it a local app, or a web page? If it's a local app, I'd first try watching its network traffic with WireShark to see if I could figure out its communication protocol; that would allow you to write your own client for scraping the database, and it may or may not be feasible/practical depending on how much thought your vendor has given to security.
posted by qxntpqbbbqxl at 10:13 AM on May 21, 2013


Response by poster: QNXTPQBBBQXL,

It is a non-local app accessed thru a VPN tunnel and run on the remote data store. There is no traffic to watch. It is protected data and would be considered non-sniffable/non-scrapable.
Some of the forms and fields that we would like to recover or preserve have no print/ print to PDF option, so direct screen capture is the only option.
posted by BrodieShadeTree at 10:21 AM on May 21, 2013


Best answer: For the automation & screen caputure end of it, have a look at Sikuli. It does visual automation beautifully and will work fine over a VPN. I could probably whip something together an a few hours for the screen capturing with Sikuli.
posted by Brent Parker at 10:23 AM on May 21, 2013 [2 favorites]


Oh and Sikuli has some basic OCR built in if you want to make some variables based on a screen capture while a script is running.
posted by Brent Parker at 10:24 AM on May 21, 2013


Best answer: If some prick were proposing to charge me multiple thousands of dollars for access to data I already own, I'd be swearing furiously and firing up the coffee maker and AutoIt and having an honest crack at writing a script to automate the application's controls and scrape its windows, even if doing it that way cost me more in my time than they were proposing to charge me, which it probably would. I hate vendor lock-in.

It is protected data and would be considered non-sniffable/non-scrapable.

Can you copy from fields in the front end's windows and paste into a text file? If so, AutoIt can certainly grab the data and do whatever you like with it.
posted by flabdablet at 10:46 AM on May 21, 2013 [1 favorite]


Response by poster: Yes, we can copy from fields!
posted by BrodieShadeTree at 11:18 AM on May 21, 2013


Best answer: I don't know how you're getting to the data through VPN, but if it's via a remote desktop viewer (e.g. Citrix), you may struggle.

From this Stack Overflow question, I saw this video about this app, Macro Scheduler, which looks as though it may do what you want.

On preview, though, it seems Sikuli, mentioned above does a very similar thing. It seems if you can see it on screen, then you can retrieve it.
posted by ambrosen at 12:07 PM on May 21, 2013


« Older Wedding ceremonies for the non-religious?   |   Fuse Amp Ratings Newer »
This thread is closed to new comments.