Can you transfer the info from a pdf to a web form?
April 28, 2022 7:00 AM   Subscribe

My work requires that I collect PDF forms from clients and transfer the info off those forms onto a web based form and then click submit.

The fields on the form are pretty standard: Name; Email, Phone, Business address, etc.

Though it doesn't take too long to cut and paste each field info onto the web form fields, I was wondering if there was a way or a program that would allow me to automatically copy all the info on the pdf that i get and paste it all onto the web form in one click and submit. ... Or something similar that would make the process faster. Speed is important for what I do.

The web based form is a third party company so we can't change anything on that end. But we can change the PDF we use and any programs on our end that would help us submit all the info quickly. Does something exist? Is this something I can program myself?
posted by fantasticness to Computers & Internet (11 answers total) 2 users marked this as a favorite
 
What you are looking for is called a PDF parser, Google that. Example.
posted by beagle at 7:07 AM on April 28, 2022


Response by poster: I just looked up PDF parser. It might work. Is there a way I can try something out as a personal account for free? We don't get the volume we used to before covid, so not sure my employer would be interested in anything that costs right now.
posted by fantasticness at 7:32 AM on April 28, 2022


Can you use the .pdf as an image, with the form's fields being actual input fields?
posted by theora55 at 9:31 AM on April 28, 2022


Is there a way I can try something out as a personal account for free?
Yes, for example, this one offers a free plan that lets you parse 30 pages a month. For $30 a month you can do 350 pages, which should not break your boss's budget. Investigate others, there may be better deals.

All that said, the output you get is going to be in spreadsheet format, .csv or .xsls. So you'll need to be able to use a spreadsheet as input into whatever you are cutting and pasting the data into now. Also, you might look into whether that third party outfit where the data originates can supply spreadsheets instead of PDFs (which seems like an odd format for this kind of data).
posted by beagle at 11:07 AM on April 28, 2022


Is this something I can program myself?

It seems like that should be doable. Mozilla's PDF.js library is baked into recent versions of Firefox and Chrome as well as being loadable from various handy CDNs as a script, and this PDF object browser proves that it can be used to load a PDF, parse it as an object tree and then Do Stuff with its pieces.

Try using that to open one of your form PDFs and have a poke about inside its Trailer.Root.AcroForm.Fields sub-object to see if you can find the fields you're interested in.

Shouldn't take a competent web dev (of whom I am emphatically not one) terribly long to hack up a little web page, probably built mostly from recycled pieces of that object browser, that lets you load a PDF in some convenient fashion and then marshals the specific fields you care about into a POST request directed at your web submission page. That would reduce your workflow to a single file selection operation per PDF. If you wanted to go in even harder you could probably make something you could point at a whole folder full of PDFs and have it automatically process all of them.
posted by flabdablet at 1:30 PM on April 28, 2022 [1 favorite]


It's almost certainly possible, but I don't believe anything exists that will do what you want in an out-of-the-box fashion, especially if the web form is literally the only way to get data into this system; if they're smart (which is an assumption that may or may not be valid), it will have an authentication layer that would make it difficult to just POST data to their service, which would be the biggest issue to overcome in my mind.

Might look at something like AutoHotKey if you want to try creating shortcuts to automate some of the process of doing the copy & paste method (like, perhaps you're able to parse the PDF data into a spreadsheet and create an AHK script to copy & paste all of the fields for a single record into your browser at once), though it'd be on the hacky side.
posted by Aleyn at 3:31 PM on April 28, 2022


it will have an authentication layer that would make it difficult to just POST data to their service

Most such things are pretty easy to figure out by using the browser's web dev tools to watch what the official submission page actually does. Typically, the result of authentication will be that the browser gets handed some kind of session cookie, and it's usually easy enough to stick that into the right place in subsequent script-driven POSTs.
posted by flabdablet at 4:24 PM on April 28, 2022 [1 favorite]


That said, building an AutoIt/AutoHotKey Rube Goldberg machine probably would be more fun. You might even be able to come up with one that uses the PDF.js object browser demo page as-is to do the PDF parsing, then scrape field values from that.
posted by flabdablet at 4:25 PM on April 28, 2022


Why not change the PDF that you send to clients? Make it POST the data directly to the third party?

Stack overflow: PDF form data submission
posted by Monochrome at 7:43 PM on April 28, 2022


Cutting and pasting all those form fields seems horribly tedious! I agree with Monochrome - if people are filling out this form on the web, it's not difficult to change the form so it can be filled out in a browser. Depending on your audience you may need to add some text to let people know that they can add their information in the browser.

Alternately:

You have three steps here:
1. Grab the data from the submitted PDFs
2. "Paste" the data into the web form
3. Submit the form

Are the submitted PDFs coming in on paper or emailed as attachments? Whichever way you get them there will be a way to get the text - OCR or ??

As you ingest the text use some scripting language to build an array of objects (I'm a Javascript/Node.js developer so that's how I think) that would be a relatively easy process.

Submit the form using node-fetch or axios or? If there's an API it would be super easy.

Hello, I'm a nerd who likes to overthink things.
posted by bendy at 10:00 PM on April 28, 2022


It's a built-in functionality of form-fillable pdfs.

You can create a submit button that either uploads the field entries to a web location, or through an email. If you have SharePoint and a (semi) public folder, you could have each pdf submission write a line to a csv file that you can work with as a spreadsheet. Or write a script in Excel to import the info from a specified folder of filled-out pdfs.
posted by porpoise at 6:58 PM on April 29, 2022


« Older Groped by a bouncer - worth pursuing prosecution?   |   How Do I Introduce a “New” Idea to the World? Newer »
This thread is closed to new comments.