Where to upload large amounts of text to a webpage for AI to summarize?
June 18, 2024 2:05 AM   Subscribe

I have book chapters I need summarized by an AI assistant. I copied the text and tried uploading them to paste.ee and privatebin.net, and was able to view the text there myself, but the AI seemed unable to access it.

When I shared the links with the AI assistant, it replied,

"the link https://paste.ee/r/REMOVED/0 does not seem to contain any actual text content that I can access or analyze. When I visit that URL, it simply displays a blank page with no information.

and

Unfortunately, the link https://privatebin.net/?REMOVED does not seem to contain any actual text content that I can access or analyze. When I visit that URL, it displays the following message:
PrivateBin requires a modern browser to work. Firefox, Opera, Chrome...
For more information see this FAQ entry.
This website is using an insecure connection! Please only use it for testing.
For more information see this FAQ entry.
Your browser may require an HTTPS connection to support the WebCry


Are pastebin sites like these blocking AIs from accessing uploaded text? Or is there another reason the AI cannot access and analyze it?

Are there other fast and simple (and free!) sites like these for me to upload large bodies of text to be analyzed (for personal use) by AI?

Thanks!
posted by tenderly to Computers & Internet (13 answers total) 1 user marked this as a favorite
 
Response by poster: update:

privatebin is encrypted in the browser, so that might explain why the AI can't access it.

I tried two other code or text upload services, with the same result: controlc.com and rentry.co
posted by tenderly at 3:23 AM on June 18


Can you paste the text directly into the submission box for the AI app? I've done that before, though not usually for chapter-length blocks of text. I don't know if there is a size limit.
posted by akk2014 at 4:44 AM on June 18


Have you uploaded the file into ChatGPT?
posted by NotMyselfRightNow at 4:50 AM on June 18


I would think that you could upload something like this to GitHub, unless the text is really really huge. Even then, you could probably put it in the repo's README.md.
posted by taltalim at 5:25 AM on June 18


Mod note: Comment removed. Please be kind and non-condescending to the OP when answering.
posted by Brandon Blatcher (staff) at 5:30 AM on June 18 [2 favorites]


Make absolutely sure the AI assistant you're using can actually access the internet.

A year ago Simon Willison warned that ChatGPT can give the impression of being able to access the internet, even though it cannot, but may hallucinate a convincing summary based on the URL.

But as he notes at the end of the article, ChatGPT and other LLMs have started adding this ability.

ChatGPT's release notes for 17 Oct 2023 show that "Browsing" is now available in GPT-4 for Plus and Enterprise users.

I don't know which AI you're using, but check its documentation to confirm what you're trying to do is actually supported.
posted by snarfois at 7:14 AM on June 18 [3 favorites]


I've used Google Drive for this sort of thing. Log into Google. Upload text to Google Drive. Under "Share" get a link that works only for users who get that link. Of course, when I looked this morning, I got menu options different from what I am used to, so maybe Google has reduced Drive functionality (again).

I have not looked into Microsoft OneDrive, but i'd guess it has similar capability. Apple, too, maybe.

Of course, there are security concerns. Google is sure to read anything you upload, excepting maybe if you have an option to opt out. You are asking the AI to read and analyze your upload and it's possible that it adds everything to the trillions of pages they use to train the AI.
posted by SemiSalt at 7:18 AM on June 18 [1 favorite]


Best answer: There's no real benefit to using (paid) ChatGPT's "browsing" feature here- it's just going to download the file in question to their servers and process it just the same as if you'd uploaded it directly.

Anthropic's Claude supports file upload, though to do more than a handful requires a subscription. You can upload your text file, and give it some instruction, like "summarize this text." ChatGPT can also summarize uploaded documents if you log in first, though again it is limited unless you pay. It looks like it can pull documents out of Google Docs but you can also upload them from your computer directly.

All of these tools have a limited "context window", meaning they can only process so much text at a time. If a chapter is too long it might either 1) refuse, or 2) only summarize some of it, I'm not sure which. More input costs more computation time, which is why Claude and ChatGPT don't offer to process large amounts of text for free.
posted by BungaDunga at 7:55 AM on June 18 [1 favorite]


If you need longer inputs, you can try Google's Gemini Advanced, also paid, that supports very long inputs. Or just summarize smaller chunks of the chapter at a time, though this strategy can be tricky, since the model will only be aware of one chunk at a time, and may end up missing important context: it won't remember that you gave it other chunks previously, or their contents. I think (paid) ChatGPT has a sort of rudimentary "memory" where it remembers brief summaries of previous inputs, but this may or may not be sufficient for summarization tasks.
posted by BungaDunga at 8:19 AM on June 18


For ChatGPT, under Settings - Data Controls- you can turn off "Improve the model for everyone," which is asking OpenAI not to use your inputs for training. If they want to disregard that they definitely could, but they can (and probably have) also mass-download copyrighted books from pirate sites and ingest them at will.
posted by BungaDunga at 10:09 AM on June 18


Best answer: If you have the files as PDFs, you can try ChatPDF
posted by Saxon Kane at 11:10 AM on June 18


if you feed the information into ChatGPT you have trained it on that copyrighted data

Nope. Training and usage are separate; hence "pre-trained transformer". ChatGPT learns nothing new from its prompts.
posted by flabdablet at 12:14 PM on June 18 [1 favorite]


Best answer: Here is a longer but very readable article explaining how your prompts are not "training" or being "memorised" by the model, as flabdablet correctly says.

It does list some reasons to worry anyway though, and what is meant by "memory" features. But as it concludes, people and policymakers have bad mental models around this stuff, which leads to bad decisions.
posted by snarfois at 3:28 AM on June 19


« Older Do you have ideas for cashing out a large Amazon...   |   Help me remember a film / story involving a lonely... Newer »

You are not logged in, either login or create an account to post comments