Using chatGPT to review single and multiple documents
July 31, 2024 11:44 AM   Subscribe

I work alone writing and assessing planning applications (for landscapes and buildings). Typically a single document, but often at least two documents. One set will be report format and primarily text, the other set will be drawings with text as small blocks and in labels. Identifying and avoiding inconsistencies and spelling (and number) errors is why I do this. I also do it to find adjacency of common words and phrases - usually when seeking precedents. Usually suitably qualified external readers do not exist, or there is no funding.

Currently I use Agent Ransack in REGEX mode (or DocSearcher) to do this but as this is only on a word or string case basis it's very slow.

I'm hoping for an outcome that would show me a list of items listing instances of inconsistencies and spelling (and number) errors across a single document, and sets of documents.

My cases are dealing with Landscape Architecture, Assessments of Effect, and landscape Planning Permission / Resource Consents / Consents. Typical use cases outside of my specialities (in case it helps associate my query with chatGPT use in other domains) would be;
• Engineers designing for, or reviewing others work - in mechanical, civil works, industrial etc.
• Law, Scientific reports
• any other domain where ideas require assessment and review especially for - inconsistencies, precedents, and spelling errors.

All papers for review would be either on my desktop, or local NAS, or in-office network. If relevant my system is Windows 10 Home Premium 64 Bit. normally all documents are in text based .pdf
posted by unearthed to Computers & Internet (7 answers total) 1 user marked this as a favorite
 
Be careful about what your expectations are for LLMs vs the reality of them. I found
When ChatGPT summarises, it actually does nothing of the kind
to be very useful in trying to figure out what's actually happening (and understand that all outcomes are pretty random on top of that).
posted by straw at 11:52 AM on July 31, 2024 [8 favorites]


My guess is that you will not be able to use an LLM to do a one-shot analysis like you describe.

You could definitely use the multi-modal (image + text) capabilities of GPT4.0 or Claude 3 Opus to extract text blocks from an image, however. Correlating those items with items extracted from a textual source might then be possible (though an LLM may not be the best tool for it).

For example, I took a screenshot of a residential site plan PDF and posted it to ChatGPT 4.0o with the prompt "Extract all of the text blocks from this image, producing a simple ordered list. When blocks are multi-line, combine them into a single line of text.".

That produced a reasonable, hallucination-free list of all the text that was present in the document. I then made up a quick list of items (like "Garage Slab", "Retaining Wall", and "Covered Porch") and posted a prompt of "I will give you two lists. Please identify items that are present in both lists, and produce a final list containing the text and indexes of the item in both lists. List 1: (list 1 items) List 2: (list 2 items)".

The final list it produced was not perfect; the indexes were off by one in two cases and it only included the text in some of them. While it might be possible to massage the prompt to get higher accuracy, LLMs are notably bad at repeated tasks like this.

If there is a repeated pattern to the label-and-number patterns you are hoping to analyze, you could probably get very high accuracy by providing a dozen examples as part of the prompt, followed by the new text extracted from the image. If the labels are stable and clear you could probably construct a joined table from two sources that way, and then eyeball it for whether it matches your expectations.
posted by graphweaver at 1:21 PM on July 31, 2024


Claude is slightly better than GPT-4(o) at summarization, parsing, and comparison tasks, but it's a matter of degrees; neither is particularly amazing. Identifying "inconsistencies" (you didn't really specify what kind; factual? formatting? grammatical?) is a weak spot of pretty much all LLMs — they generate them more than they spot them.

This is true even of the billion-dollar-investment LLMs. They're not much better than something you could run on a beefy home or office computer. In fact, I thought Mixtral-8x7, with the right sampling, did a pretty fine job of summarization of up to 32K of text. That doesn't really solve your multi-modal needs, though.

Long story short, I would not expect just dumping a couple of docs into ChatGPT alongside even a carefully-composed prompt would save you much time over doing it by hand. You'd still have to cross-check everything it said, to find the lies. With much more advanced use of a GPT-4(o), Claude, or local model, you could get a little closer to the point of saving actual time... but maybe not so much time as to have made the effort of building the bespoke tooling worth it.

My suggestion is to consider not doing this.
posted by majick at 3:31 PM on July 31, 2024 [3 favorites]


I can tell you that there still isn't a good solution for this that uses LLMs even in relatively big law.

If you code or want to, you might be able to create something that would help you. It's a tricky one, though.

Can you take a look at it from another angle, in terms of your data and document structure. If you do have the same text and information in multiple places, can you use fields or other document automation to populate the document with that common information so that there's no possibility of such errors? Is it possible to adopt something like LaTeX?
posted by lookoutbelow at 6:01 PM on July 31, 2024 [1 favorite]


Response by poster: straw Thanks, it's good to be reminded that with these things there's no there there, it's not even 'intelligently' predictive, it 'knows' nothing, and constant invents weird.

I had another look at Agent Ransack, and it's pro version FileLocator Pro (only US$70), I’d never really looked at it but it allows a much deeper level of multi-doc search than I thought, including multi-line regex, loading dictionaries/lists to search across multiple files, and several routes for scripting and programming. I think even at the Regex level it should work well enough for my cases.

graphweaver Thanks for sleuthing! I had no idea it could do OCR, but text in all the files I'm working with are vector fonts/glyphs. It sounds too inconsistent for reliable use. I trust Regex far more (but still with care, at least if there's a problem it's my end in writing mal-formed Regex).

majick Identifying "inconsistencies" I was thinking of strings with e.g. the same start and end but different inner parts. Also errors within the same pattern. I'm looking for instances of the same string but 'one letter wrong' type issues across documents for this case. FileLocator Pro will probably suffice.

lookoutbelow can you use fields/automation to populate the document with that common information [to avoid] such errors?

That's got me thinking deeper about what I'm trying to do. Due to some corporate clients I don't think LaTeX is in my future, although If I could write everything one in LaTeX and point WORD and my CAD at it that might save some grief.

In 2002 I was looking at semi-automating drawing labels from prose report text, but that was with AutoCAD and MS Word but the former is not useful for what I do, and my current software makes this automation impossible.
posted by unearthed at 1:16 AM on August 1, 2024


Are you able to use LLM APIs with the data restrictions you have?

If so, I think you could still probably create a decent python script that will divide up your document, potentially use images of pages in addition to extracted text, pair up the corresponding content and feed pairs of pages to the LLM with a prompt asking it to return a particular format. Even so, you may still need to come up with lists of types of concerns to check for. And even then, you'll still get missed errors and hallucinations - for example, I asked ChatGPT to check a French sentence for me, and it said it made one change (change to plural) and also edited it for general clarify and cohesiveness. The latter was completely false. You'd run into tons of these things, and unless you create your own evaluations (and even then) you could never be sure.

Again if APIs were okay, you might want to try something like this to convert your document to a plain text visual representation of what it looks like: LLM Whisperer.

If you can use APIs, I'd use Claude (based on a recent experience where it did very well with understanding documents from extracted text compared to GPT-4). Even then, this is still an immense amount of work for an imperfect process. But it is also the kind of procrastination I'd probably do if I were in your situation, so I can't wholeheartedly discourage it! If you do decide to go for it, memail me and I can point you to libraries and resources.
posted by lookoutbelow at 6:33 PM on August 1, 2024 [1 favorite]


Response by poster: Thanks very much lookoutbelow, I'll go and think about those items, and your experience with Claude - I may take you up on your kind offer. Depending on a few meetings this month there may be a lot of (planning law) reading on my horizon, and a limited ability to parcel it out to others.
posted by unearthed at 1:31 AM on August 2, 2024


« Older How to get grandma high safely (in a fictive story...   |   What could I have done differently? (IG story... Newer »

You are not logged in, either login or create an account to post comments