From Concept to Code? Ingest checklist, present follow-up questionnaire.
February 6, 2021 1:51 PM   Subscribe

I'm a longtime dabbler in coding, and for the first time have a project that goes beyond the usual paint-by-the numbers stuff I'm used to. I'd like to create a web-app that ingests a checklist (in PDF form with the metadata intact or scanned) determines which boxes on the form are checked, and uses that to present appropriate follow-up questions for user input.

Then I need take those answers and put them into a database such that the information can be used to fill out other documents or merged into that user's CRM entry, etc.

My dabbling thus far has been mostly language agnostic, so while I'm most comfortably with Python I'll live if I should be doing this in Rails or some Javascript based thing.

This will begin life as an internal tool but it would be neat if it had potential to turn into something I could market to others in my line of work.

I've tried building what I need in existing universal form apps like Formstack and Wufoo and etc and none of them were up to the task. (Not interested in having to buy into something like Salesforce.)

The online database app builder things I've seen seem like they're intended more for intranet type apps than customer-facing tools, or are obtuse enough that it seems like I'd be better off just building my own web app.

So, how do I get my arms around a project like this for the first time?

I know this isn't a typical AskMe question, but there are lots of coding people here and this feels like too general of a question for StackExchange.
posted by Sockdown to Computers & Internet (9 answers total) 1 user marked this as a favorite
Best answer: Does it have to be pdf, or will something like Google forms (with ingestion api) or a web form suffice? I ask because the pdf part sounds orders or magnitude more painful than the rest of the requirements.
posted by gregglind at 3:22 PM on February 6, 2021

Response by poster: I could live without it. Especially with a prototype, I can just look at the submitted forms and check the boxes on a webform.
posted by Sockdown at 3:50 PM on February 6, 2021

Best answer: If you're comfortable enough with Python already, I'd suggest using a Python-based web framework like Flask. (Django is the other big popular one, but it's got an everything-and-the-kitchen-sink approach that can be kind of overwhelming if you're just learning.)

Depending on how comfortable you are with databases, you can start out with SQLite (which is built into Python) and then move on to something more robust (but still free) like MariaDB or PostgreSQL. Flask has an optional add-in that will let you interact with databases (Django has one included), or you can directly execute SQL queries from Python.

As for the PDF aspect, that might be a little trickier to work with, but there are Python PDF libraries available. I haven't used any myself, but the functionality is out there, and might be less trouble than messing around with Google forms. Or as gregglind says, a web form might suit your purposes.

I hope that all makes sense and is at least a little helpful. Let me know if you need any more detail.
posted by Mr. Bad Example at 4:14 PM on February 6, 2021 [1 favorite]

Response by poster: I think so, thanks.

It at least gives me a road to try to head down, instead of wondering how to get started.
posted by Sockdown at 4:16 PM on February 6, 2021

So you want to create a modern "scantron" type form except it looks like a regular form. Hmmm...

The problem here is there are all sorts of PDFs that look about the same, but have vastly different sizes due to their composition. Hypothetically, it'd possible to wrap a JPEG in a PDF envelope, and call it a PDF. It would not have any text to read and you'd have to OCR.

If you have a FILLABLE PDF then the matter is simple: just read the field contents.
posted by kschang at 11:11 PM on February 6, 2021 [1 favorite]

Best answer: Eat the elephant one bite at a time by breaking the workflow into stages:
* Making-of the sheet layout and its matched database layout (i.e. a first database might have full question text and 1-5 scoring ranges, the whole questionnaire in the DB where a later iteration will just write question numbers and store what they're about somewhere else)
* Generate some sample PDF's to use as you make up the rest of the system
* Web page recording uploaded PDF
* Scanned image to data
* PDF read and process its data
* prepare data to insert in database
* query and reporting from your database

You can write the code that achieves each step in the workflow by writing tests of things the code should do, which initially fail and then pass, as your component gains code to do what it should. Then later when you need to change things, the tests keep working the bits that need to keep working while acting as scaffolding to your extensions.
posted by k3ninho at 7:56 AM on February 7, 2021 [1 favorite]

Best answer: The "reading a scanned PDF" part is not going to be super easy, but is totally doable (within constraints, humans are extremely clever about finding new ways to fill out forms incorrectly). For example, here is one example implementation from almost 5 years ago. I would probably start with a fillable web form and then go from there, because you will need the form and the data management parts anyways.

I would also recommend starting with Flask, and then build your actual forms with Flask-WTF, which links Flask with the excellent WTForms package.

For styling your forms, I would actually start with the US Web Design System. I build government websites for a living, and the form elements in the USWDS have been extensively user tested.
posted by rockindata at 2:20 PM on February 7, 2021 [1 favorite]

Are you wanting to read a *particular* PDF? Or like, any generic PDF with checkboxes? Or maybe a small known subset?

How are people filling out the PDF? Like in a PDF editor and giving you a file? Did they scan it from paper and upload it? Who is creating the document that you want to ingest? Is it you (if so then you can create the document in a way that is simplest to read electronically). If it is not you, you have less options but it's possibly still doable.

The PDF to data aspect is one part, and web site is another, and the storing the data is a 3rd. You can and should break this down into smaller pieces and attach each of them separately, and then work on plumbing them all together.
posted by RustyBrooks at 2:26 PM on February 7, 2021

Response by poster: It's a standardized form, but I suppose there may be different fillable PDF versions of it floating around. Some people will be giving me a paper version with the checkboxes marked in ink, others a PDF scan of a physical copy they complete, others a filled PDF version with the metadata intact, still others a filled PDF version but flattened to an image.

I'm thinking that's the part I'll tackle last. For now, I can live with a temporary UI that requires me to input the selections myself.
posted by Sockdown at 4:05 PM on February 7, 2021

« Older Back then they didn't want me, now I'm hot...   |   How worried should I be about secondhand smoke and... Newer »
This thread is closed to new comments.