Cheap options(?) to run local AI models
December 20, 2023 4:23 PM   Subscribe

I have been having fun learning about generative AI through cloud providers. I am curious about upgrading my computer to be able to run something locally, though I don't want to spend a ton. How much money is reasonable to spend here?

I am interested in running Stable Diffusion / Dreambooth, and possibly ollama or other LLMs.

I currently have a Thinkpad T490 with 16 gb ram and the base-level graphics card. I haven't actually tried to run anything locally, on the assumption that it would be extremely slow. I saw that you can get an external GPU, though I also saw some reports of headaches trying to get external GPUs up and running.

I am curious what a workstation might cost that could do a reasonable job running local models. I am not a huge gamer or have any other high performance needs that are not currently served by the Thinkpad; not sure I can justify a $3000 workstation just to make a few jpgs.

I would be happy to buy something secondhand, like if there was a good source of off-lease workstations.

Alternatively-- if you have a similar computer to the T490 and do run models locally, what sort of performance is reasonable to expect? Would it be enough to buy some more RAM for this laptop?

Thanks for any advice!
posted by beepbeepboopboop to Technology (3 answers total) 5 users marked this as a favorite
 
I run DiffusionBee just fine on my Macbook Air M1 w/ 8 GB, takes about a minute per image.

LLama models take up more RAM, but a refurbished Mac Mini M2 with 64 GB RAM wouldn't break the bank.

(Did I mention how nice it is not to have to mess with GPU drivers?)
posted by credulous at 5:03 PM on December 20, 2023 [1 favorite]


I run the 7 billion parameter models in Ollama with an Nvidia GeForce RTX 2060; speed is fine, but I haven't tried any of the image generation models.

I have been thinking about getting something with more ram, seems like you can get a used 3090 on Ebay for $650.

I would get a refurb desktop from Newegg that can accommodate the graphics card you choose. Much easier upgrade path generally.
posted by gregr at 5:15 PM on December 20, 2023


I use a desktop machine with AMD Ryzen 7 3700X, 32GB RAM and a 12GB RTX 3060. It's good for stable diffusion & LLM text generation from "7B" models; I think with quantization it can do "13B" models too. This is on Linux. Don't skimp too much on disk either, as each model you want to download will be multiple gigabytes.

Overall, I am not as impressed by the stuff I can self-host as opposed to GPT4 and DALL-E 3 from openai. And honestly, possibly aside from DALL-E 3 image generation (which I think is 0.12USD/image today), the API cost is so low that it'd take a long time for the cost to equal the cost of a whole computer plus my time setting it up, deciding among models, etc. (not to mention that of course chat with GPT-3 is free via the website interface)

It's no surprise that a 7B or ruthlessly quantized 14B model doesn't compare well with GPT-4. However, there seems to be a big jump in cost between this and HW that will let you run the biggest (30B and 60+B) models, which might compare better against GPT-4 for text generation. Ditto for a system that can effectively train or fine-tune LLMs from what I've read.

You can run llama.cpp on CPU only. I did that briefly on my 16GB RAM thinkpad (T16 with 12th Gen Intel(R) Core(TM) i5-1235U). The model I was last testing with, mistralai_mistral-7b-instruct-v0.1 with Q5_K_M quantization, outputs 1.62 tokens per second on the T16. Compare to my system with RTX 3060 that generates 48 tokens/second using the same model and quantization.

stable diffusion image generation on my RTX 3060 system is "a couple a minute".
posted by the antecedent of that pronoun at 6:12 PM on December 20, 2023 [1 favorite]


« Older Drone to Adventure   |   Newish hard Sci-fi books Newer »

You are not logged in, either login or create an account to post comments