Training ChatGPT on a specific corpus?
April 23, 2023 11:29 AM

I keep seeing chatbots that are based on ChatGPT but trained on a specific corpus to converse as if they are a specific fictional character etc. How?

I have been on the internet for 32 years and can access/recover enough of my old posts, comments, emails, instant messages, etc. that I should be able to pull together a 25+ million word corpus of my own writing. How would I go about feeding that into ChatGPT to create a chatbot that would allow me to have conversations with "myself"?

Do you need programming skills to make these custom chatbots or are there tools where you can just upload a corpus and tell ChatGPT to emulate it? I don't know enough about ChatGPT as a platform to even know what search terms to use to find instructions on how to do this.

(Yes, I know this sounds like the start of a Black Mirror episode, but my motto is "If you can't be a good example then at least you can be a cautionary tale.")

posted by Jacqueline to Computers & Internet (6 answers total) 8 users marked this as a favorite

OpenAI allows "fine tuning" of their base (non-chat) models via API or command line interface. But since it uses supervised learning, you can't just throw a corpus of text at it, you have to provide a set of prompt-answer pairs. So for best results you'd have to generate a set of simulated conversations with yourself.

Another method that might work is semantic search, where you put all of your emails/posts/etc into a database. When you ask the chatbot a question, you pull up the most relevant fragments of text and tell the AI "Write a response in this style."

I don't know of any user-friendly apps that do this stuff yet, but surely someone is working on it.
posted by credulous at 12:03 PM on April 23, 2023

I don't know of a way to just dump a ton of raw text into ChatGPT and have it mimic that. There are two ways I'm aware of to "customize" ChatGPT. The first is just prepending a custom prompt ("You are the fictional character Watson from Sherlock Holmes, respond to questions in character.") Custom chatbots might inject this invisibly, so the user doesn't see that prompt. That might be what you've seen and this probably won't work for you for obvious reasons.

The other way is by fine-tuning. With this, you show GPT a lot of examples of prompt/responses, and it tries to learn to mimic them. It's possible that might do something like what you want. I think it's only available for GPT-3. Fine-tuning doesn't require programming, but it does require that you get your data into some kind of regular format, like a spreadsheet:

https://platform.openai.com/docs/guides/fine-tuning
posted by justkevin at 12:05 PM on April 23, 2023

To clarify, the chatbots I'm thinking of aren't accessed through the main ChatGPT site. They're on some other website, but are supposedly running on ChatGPT as a platform.

The creators generally present them as "I fed X into ChatGPT to make a chatbot that pretends to be Y" and you can interact with the chatbot, not just read transcripts.

So it's like coming in mid-conversation after someone else has already primed ChatGPT with all the preliminary prompts to get it to act a certain way, if that makes sense?

I wish I could provide a link to an example but it's nearly impossible to find specific Tumblr posts again. :(
posted by Jacqueline at 1:00 PM on April 23, 2023

If it's actually using the ChatGPT model instead of fine-tuning one of their previous models (the latter is unlikely given the high cost), it would likely be injecting your chat input into some bigger prompt, or after another prompt. The OpenAI API has a special message known as a system prompt, which the AI is trained to pay special attention to. It's possible therefore that they're using something like that to provide more detailed instructions to the chatbot about how to interact.
posted by lookoutbelow at 1:05 PM on April 23, 2023

https://customgpt.ai/pricing/ appears to do this, for $49 per month. You upload documents in a variety of formats, and point it at websites you want it to get the content from.
Haven't used this service, but you don't need coding experience, it just involves pointing it at the required documents.

Things I have tried that give you an idea of how it might work:
If you have a unique name/username, and your content has been available on the public web for a long time, you can get ask https://chat.openai.com/ to pretend to be you, (if you ask it to roleplay, like in a script, and you reply with "yourowncharactername:" before your replies, it responds with its roleplay name on each line, and it seems to hang onto the character longer).

https://beta.character.ai/ is better at maintaining a consistent character.

That works fairly well for authors and bloggers who have a lot of content online. I tried an author Barbara Sher, and while I don't know what data has been included in the training models, it clearly has some of her 'personality' from the content of her websites, and a book she released online.

Also, https://www.chatpdf.com/ has a free trial with 120 page PDF's - so I have tried uploading part of a workbook, and asked it to guide me through the exercises, or asked questions based on the text. I'm assuming that's kind of what you would get with customgpt but on a much bigger scale.
posted by Elysum at 5:11 PM on April 23, 2023

If it's within the ChatGPT's knowledge, all you need to do is tell ChatGPT to "take on the role of X, and answer the following question..." where you replace X with a specific but well known character or role or position that ChatGPT may have within its existing corpus. It often does a pretty decent job.

But seeing that you want ChatGPT to emulate... yourself, you may need something similar to GPT4ALL, which is basically a locally installed chatbot similar to ChatGPT based on LLama. Populating it with custom data right now is a bit obscure, but I'm sure people are figuring it out as we go.
posted by kschang at 5:13 PM on April 23, 2023

« Older Alternative for the Apple Super Serial Card | Working offshore - how to build coping skills in... Newer »

This thread is closed to new comments.

Ask MetaFilter

Training ChatGPT on a specific corpus?
April 23, 2023 11:29 AM

Tags

Share

Training ChatGPT on a specific corpus? April 23, 2023 11:29 AM

Tags

Share

Training ChatGPT on a specific corpus?
April 23, 2023 11:29 AM