can any existing AIs/LLMs effectively summarize an entire book?
March 22, 2024 4:36 PM   Subscribe

Assume these are not books that have been extensively written about before, they are nonfiction, about 250-400 pages long.

I'm curious for two reasons:

1. As an instructor, I'd like to know if this is something my students can do when they are assigned reading. (I know I can't really "prove" it either way but knowing is helpful)

2. For myself I see how this could be useful if I don't have time to read a book but want more than the info presented in a review of it - ideally bullet points for each chapter.

I'm also interested if this is something that average people without sophisticated tech skills can access, and if the existing models can do it well (with minimal errors).
posted by cboggs to Computers & Internet (9 answers total) 2 users marked this as a favorite
 
Do you have an example book you would like me to ask GPT-4 to summarize?
posted by phil at 5:08 PM on March 22


Does the model have access to the text of the book or are you wondering about the feasibility of a prompt like "please summarize $title"?
posted by hoyland at 5:15 PM on March 22


This is based on my understanding of what I've heard other authors discussing, NOT personal experience, so definitely check into it on your own further or listen to others with first-person experience...

I believe that in (paid) ChatGpt 4, authors have been able to upload single works, and have it create summaries, book blurbs, and marketing materials based on the books. The quality seems to be such that in at least some cases, the material is being used as-is or nearly so, and they've found it effective and accurate.

Given that there is now the ability to create and train variations that are more skilled at specific tasks, I suspect that it's going to be very difficult to determine - without having read the book yourself - whether or not a student has read a specific book based on the details including in their writing, for your first reason.

As for your second, yeah, I'm pretty sure that it can do that, at least in a paid account and with decent prompts. Chances are, people have already developed trained variations that are even better than the basic ChatGPT 4 experience. (The benefit of paid, here, is partially the newer version (4 rather than 3.5) and partially the ability to upload longer texts for the AI to base their answer on.)

A smart student might even be able to train their own that could then spit out responses based on THEIR previous work, so that it has their specific voice... which is a step beyond telling it to respond with something like, "write like a typical 8th grader who would receive a B-grade".

Granted, each individual student may not have the skill or inclination to put the extra effort into creating a way to "work smarter, not harder"... but many students are going to easily be able to find the methods that others have shared... if that makes sense?
posted by stormyteal at 6:05 PM on March 22 [1 favorite]


Response by poster: hoyland, the latter
posted by cboggs at 6:30 PM on March 22


Best answer: Where the model has been trained on text that includes plot summaries and analysis, and/or the actual text of the book (shouldn't happen but has), it is reasonably likely you can get it to produce accurate content for notable works where a lot of general information is online. For lesser known and discussed works, the likelihood of errors is higher, though it can sometimes be improved by asking the model to double check.

If you have titles in mind, I can try it out and share a transcript. The quality of results would depend on the level of detail and organization in the prompt. The answer you'd get for "summarize X" might be pretty obviously AI-generated, but more complex prompts would be higher quality. Yet, I still find it pretty easy to spot the writing style unless it has been modified.

AI could also be used to avoid plagiarism detection by pasting in online material and asking to summarize that.

The best way to understand models capabilities is to play around with them. Microsoft Copilot is free and has the latest model (GPT-4), so you could try asking about a given work.

For pasting text in: Google's new Gemini model has a context window (the amount of text the model can 'see' at once) of one million tokens. As a rule of thumb, I tend to estimate that the number of words is about 75% the number of tokens. Questions asked of text inputted in this way tend to receive pretty accurate answers.
posted by lookoutbelow at 7:06 PM on March 22


As a software engineer, I wanted to chime in and highlight the significance of context window size when using AI tools for tasks like summarizing entire books. Having a model with a large context window is really key, as it allows the AI to maintain a broad understanding of the overall material and produce more coherent, insightful summaries.

From what I've seen, Anthropic's Claude model has a larger context window compared to GPT-4, which could give it an advantage for this type of work. However, if the book is particularly lengthy, one effective approach is to summarize each chapter individually, and then summarize all those chapter summaries together. This can help distill the core concepts while still fitting within the model's context length constraints.

It's worth noting that as these AI tools become more advanced and accessible, the potential for students to use them in ways that could be considered academic dishonesty is certainly a valid concern for instructors. While plagiarism detection may catch blatant copying, AI-generated summaries will likely be much harder to definitively identify.

Assessing genuine comprehension through other methods like class discussions, quizzes, etc. may become increasingly important. And assignment structures may need to adapt and evolve in thoughtful ways to account for the rapidly expanding capabilities of AI.

Overall, I believe summarizing books is an area where current AI tools can already produce quite impressive results, and this will only improve as context windows get even larger, as seen with models like Google's Gemini (1 million tokens). It's an exciting development in terms of quickly extracting key insights from texts, but one that will undoubtedly have complex implications in educational settings that are important to carefully consider.
posted by ben30 at 5:53 AM on March 23 [3 favorites]


Also note that many of the LLM interfaces that are publicly available have been trained to produce fairly fixed-size outputs, regardless of the length of the input. When summarizing an entire book you will probably find that they run out of output tokens before they address everything in the book. ben30’s idea of summarizing each chapter (and then possibly feeding all of those summarizes into a final summary) is definitely a way to address that.
posted by graphweaver at 2:07 PM on March 23


A corollary of what I just wrote, from a teaching point of view, is that it will probably be easier to get an LLM to perform question answering over an entire text (with citations, soon), than to perform anything that approximates whole-text analysis.

So, “where did Frodo take the ring” is easy; “what does Rohan want from the post-war political situation” is harder.
posted by graphweaver at 2:10 PM on March 23 [1 favorite]


I'm also interested if this is something that average people without sophisticated tech skills can access, and if the existing models can do it well (with minimal errors).

Everyone knows about ChatGPT, but I'm not sure average people have even heard of Claude, which is the long-text king. The free tier of ChatGPT service handles a context window of tens of pages. The free tier of Claude service can handle between 150 and 200 pages. The most expensive tier of ChatGPT service can equal that. The most expensive tier of Claude service can handle more than 300 pages.

I am guessing that non-sophisticated users will get worse results because they are expecting the AI model to have fully thought through an answer before giving it. That is not yet a reliable feature of AI models. A human answering a question may have a strategy for solving it (consciously or not), work through the strategy, and then report the answer. However, AI models don't do a great deal of silently thinking to themselves (though doing some silent thinking is a strength of ChatGPT's, in particular).

Users should be made aware that mostly, the AI model is thinking aloud and you as a user are collaborating with it. The model will do better if the discussion with the model includes a strategy for solving the problem (either ask it to follow a strategy you specify, or ask it to come up with a strategy). You may need to prompt the model iteratively to walk it through the strategy (for example, the chapter-by-chapter summary technique ben30 mentioned).

So, “where did Frodo take the ring” is easy; “what does Rohan want from the post-war political situation” is harder.

The latter question is something that might be in reach of an AI model if the user prompts the model very patiently ("Who leads Rohan, and what key decisions do they make?"... "Consider the first decision you listed; what national interests of Rohan do they bear on?") to create an arc of questioning that builds towards an answer. This could be done if the user understood the process of answering the deep question well, and was leaning on the LLM to extract the specific factual components necessary to progress through that answering process.
posted by a faded photo of their beloved at 11:31 PM on March 23 [3 favorites]


« Older Travel panic: misspelled name edition   |   It became necessary to destroy the town to save it... Newer »

You are not logged in, either login or create an account to post comments