Visual AI: how do it do?
October 19, 2023 5:01 AM Subscribe
I still don’t really have a working mental image of how visual AI image generation works.
I kind of get how language models are a sort of n-dimensional word cloud of relationships and n is a big number, (if that’s even remotely how it’s done now, I dunno) but I have no clue at all how AI can use a visual data set and smoosh up new variations. It’s not as if the images they use have minutely determined metadata about all conceivable aspects of the sources (or I guess not explicitly). Can anyone point to a layperson’s guide to the basic mechanisms at work that won’t make my poor old noggin ache too much?
I kind of get how language models are a sort of n-dimensional word cloud of relationships and n is a big number, (if that’s even remotely how it’s done now, I dunno) but I have no clue at all how AI can use a visual data set and smoosh up new variations. It’s not as if the images they use have minutely determined metadata about all conceivable aspects of the sources (or I guess not explicitly). Can anyone point to a layperson’s guide to the basic mechanisms at work that won’t make my poor old noggin ache too much?
https://www.pbs.org/newshour/science/how-ai-makes-images-based-on-a-few-words
posted by kschang at 7:35 AM on October 19, 2023
posted by kschang at 7:35 AM on October 19, 2023
It's basically auto-complete for pixels.
posted by emelenjr at 10:45 AM on October 19, 2023 [1 favorite]
posted by emelenjr at 10:45 AM on October 19, 2023 [1 favorite]
It’s quite ingenious actually and is not at all copying and pasting which is why there is no copyright infringement. Much like I dont need to pay anyone to view public images of famous paintings and learn from them. Much like if I painted something in Dali’s style that wouldn’t be copyright infringement either.
It works as Generative Adversarial Network. There are two AI models. One is tasked to create an image and creates random images from noise and the other detects if they are fake or real. If fake, the first model tries again. Like an iterative loop where the detector improves the generator. They are adversaries but working together.
What’s cool about diffusion models that do this is you can often see them ‘draw’ it out in real time starting from that random noise.
posted by PaulingL at 12:51 PM on October 19, 2023 [2 favorites]
It works as Generative Adversarial Network. There are two AI models. One is tasked to create an image and creates random images from noise and the other detects if they are fake or real. If fake, the first model tries again. Like an iterative loop where the detector improves the generator. They are adversaries but working together.
What’s cool about diffusion models that do this is you can often see them ‘draw’ it out in real time starting from that random noise.
posted by PaulingL at 12:51 PM on October 19, 2023 [2 favorites]
It's not quite what you asked for, but this 38 minute video A Love Letter to AI Art walks through the whole process of creating a piece of AI art. There's a lot more manual work than I thought: it's an iterative process of starting with a roughly correct image, then trimming and tweaking specific parts and specific aspects of it.
posted by TheophileEscargot at 4:40 AM on October 20, 2023
posted by TheophileEscargot at 4:40 AM on October 20, 2023
Response by poster: Thanks everyone for your very useful insights. As it happens, the article linked below fell into my lap, describing how the physical phenomenon of diffusion is only one way of modeling the building-from-noise process and that there are other physical phenomena that may be superior to diffusion when used to model learning algorithms. Your comments helped me enjoy this article a lot more! https://nautil.us/the-physical-process-that-powers-a-new-type-of-generative-ai-419953 It’s all a bit mind boggling but I have a teeny tiny grasp of it now.
posted by aesop at 4:33 PM on October 20, 2023
posted by aesop at 4:33 PM on October 20, 2023
This thread is closed to new comments.
Generative systems take this just one step further, and instead of "cancer" it's "Picasso". They can tell you if a painting is like a Picasso, and then have the capacity to replicate the commonalities of "Picasso-ness" by pulling on those N-dimensional strings and mimicking them. Sometimes it gets things very wrong (If I ask for X like Picasso I'm far more likely to get cubism than something from his blue period) and that's all just due to the weights and the prompt.
posted by griffey at 6:21 AM on October 19, 2023 [1 favorite]