Visual AI: how do it do?
October 19, 2023 5:01 AM   Subscribe

I still don’t really have a working mental image of how visual AI image generation works.

I kind of get how language models are a sort of n-dimensional word cloud of relationships and n is a big number, (if that’s even remotely how it’s done now, I dunno) but I have no clue at all how AI can use a visual data set and smoosh up new variations. It’s not as if the images they use have minutely determined metadata about all conceivable aspects of the sources (or I guess not explicitly). Can anyone point to a layperson’s guide to the basic mechanisms at work that won’t make my poor old noggin ache too much?
posted by aesop to Computers & Internet (6 answers total) 2 users marked this as a favorite
 
The best way I've found to think about it is to remember that these ML systems don't "see" in the way we conceive of. Imagine a slightly different visual analysis system, like the ones used for radiology screening. Those ML systems have been fed millions of photos which are labeled either "cancer" or "not cancer", and the system chugs away analyzing the weights of each pixel relative to each other pixel in the image, and building that N-dimensional graph of "when groups of pixels look like this = cancer".

Generative systems take this just one step further, and instead of "cancer" it's "Picasso". They can tell you if a painting is like a Picasso, and then have the capacity to replicate the commonalities of "Picasso-ness" by pulling on those N-dimensional strings and mimicking them. Sometimes it gets things very wrong (If I ask for X like Picasso I'm far more likely to get cubism than something from his blue period) and that's all just due to the weights and the prompt.
posted by griffey at 6:21 AM on October 19, 2023 [1 favorite]




It's basically auto-complete for pixels.
posted by emelenjr at 10:45 AM on October 19, 2023 [1 favorite]


It’s quite ingenious actually and is not at all copying and pasting which is why there is no copyright infringement. Much like I dont need to pay anyone to view public images of famous paintings and learn from them. Much like if I painted something in Dali’s style that wouldn’t be copyright infringement either.

It works as Generative Adversarial Network. There are two AI models. One is tasked to create an image and creates random images from noise and the other detects if they are fake or real. If fake, the first model tries again. Like an iterative loop where the detector improves the generator. They are adversaries but working together.

What’s cool about diffusion models that do this is you can often see them ‘draw’ it out in real time starting from that random noise.
posted by PaulingL at 12:51 PM on October 19, 2023 [2 favorites]


It's not quite what you asked for, but this 38 minute video A Love Letter to AI Art walks through the whole process of creating a piece of AI art. There's a lot more manual work than I thought: it's an iterative process of starting with a roughly correct image, then trimming and tweaking specific parts and specific aspects of it.
posted by TheophileEscargot at 4:40 AM on October 20, 2023


Response by poster: Thanks everyone for your very useful insights. As it happens, the article linked below fell into my lap, describing how the physical phenomenon of diffusion is only one way of modeling the building-from-noise process and that there are other physical phenomena that may be superior to diffusion when used to model learning algorithms. Your comments helped me enjoy this article a lot more! https://nautil.us/the-physical-process-that-powers-a-new-type-of-generative-ai-419953 It’s all a bit mind boggling but I have a teeny tiny grasp of it now.
posted by aesop at 4:33 PM on October 20, 2023


« Older Clawfoot tub -- how to make the best of it   |   Help with a teenager with possible OCD Newer »
This thread is closed to new comments.