Please help a Software Developer get up to speed on AI/ML/LLM
January 12, 2024 5:03 PM Subscribe
I'm a Software Engineer who's been working for a couple of decades. Not quite a greybeard, but greying. I've been watching the Artificial Intelligence (AI) craze unfold for a while and so far haven't dug much into it out of skepticism. Well, my company just decided to go big on this tech and while I'm not completely sold, I'm on board. How can I get up to speed?
1) How can I best utilize AI development tools?
- I've played with GitHub Copilot, and just heard about JetBrains AI. The former seems ok. Can you recommend some forums or other resources that are good for learning how to leverage these tools?
2) How does this AI stuff work?
- I'd like to learn more about how the current set of AI tools were built, and why. Mostly looking for a 10,000 foot level overview. Are there any good books for this?
3) How does this stuff work, in detail?
- I'd also like to learn how these tools were build and the data analysis techniques that they use. Think 1,000 foot level. Any book recommendations?
4) Even more detail
- I'm also interested in how this stuff is coded. Programming tricks and whatnot. Think 100 foot level. Any book recommendations?
5) Fundamentals
- This is where I could use the most help
- It's been 20 years since I've studied or thought much about statistics
- Can you recommend a good refresher book?
- And then a modern book that covers this stuff? Easy on the math, I don't need proofs.
- How about other prereq's?
Thanks!
1) How can I best utilize AI development tools?
- I've played with GitHub Copilot, and just heard about JetBrains AI. The former seems ok. Can you recommend some forums or other resources that are good for learning how to leverage these tools?
2) How does this AI stuff work?
- I'd like to learn more about how the current set of AI tools were built, and why. Mostly looking for a 10,000 foot level overview. Are there any good books for this?
3) How does this stuff work, in detail?
- I'd also like to learn how these tools were build and the data analysis techniques that they use. Think 1,000 foot level. Any book recommendations?
4) Even more detail
- I'm also interested in how this stuff is coded. Programming tricks and whatnot. Think 100 foot level. Any book recommendations?
5) Fundamentals
- This is where I could use the most help
- It's been 20 years since I've studied or thought much about statistics
- Can you recommend a good refresher book?
- And then a modern book that covers this stuff? Easy on the math, I don't need proofs.
- How about other prereq's?
Thanks!
For LLMs, Wolfram has a book explaining them. But what do you mean by AI, specifically? Logit models are well within the definition of AI.
posted by MisantropicPainforest at 5:22 PM on January 12, 2024 [1 favorite]
posted by MisantropicPainforest at 5:22 PM on January 12, 2024 [1 favorite]
IBM appears to have a free AI fundamentals program. It's a 10 hour online course.
posted by SPrintF at 5:29 PM on January 12, 2024
posted by SPrintF at 5:29 PM on January 12, 2024
Response by poster: By AI I mean whatever the business folks are currently talking about. So, mostly Chat GPT and whatever else is in the news.
posted by sl00 at 5:31 PM on January 12, 2024
posted by sl00 at 5:31 PM on January 12, 2024
Would you also include machine learning? Or mostly interested in LLMs?
posted by aramaic at 5:50 PM on January 12, 2024
posted by aramaic at 5:50 PM on January 12, 2024
Simon Willison has many smart things to say.
This post "The Rise of the AI Engineer" has thoughts on the skills needed to make use of foundation models and emerging ecosystems, and how much and what ML knowledge.
Ethan Mollick is a good source for discussion of potential business use cases, as well as clear-headed opinions on limitations and what are not currently good use cases to deploy (e.g. customer service chatbots).
posted by lookoutbelow at 7:09 PM on January 12, 2024 [1 favorite]
This post "The Rise of the AI Engineer" has thoughts on the skills needed to make use of foundation models and emerging ecosystems, and how much and what ML knowledge.
Ethan Mollick is a good source for discussion of potential business use cases, as well as clear-headed opinions on limitations and what are not currently good use cases to deploy (e.g. customer service chatbots).
posted by lookoutbelow at 7:09 PM on January 12, 2024 [1 favorite]
This is a bit dated but helped me understand some of the terminology and basic concepts:
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
The datedness helps, I find a lot of what is out there now is very “bootcamp” and tries to use a heavy library to not so much teach you but make you feel like you can do things when really the important part is understanding to think in terms of what problems it can solve.
Hacker News, while a bit eye rolls sometimes is good to keep up with the latest lingo. I myself get caught in the jargon and marketing a lot. I question how many people understand what they’re doing versus just talking about parameters, tokens, etc. Like when I was a kid I wanted a dual core processor but had no idea what dual core meant.
I think at a certain point it would best to treat some of this how I treat a compiler or or a processor. At a high level I understand what it’s doing but at a certain point it’s so optimized and complex you have to trust things just work.
posted by geoff. at 1:57 AM on January 13, 2024
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
The datedness helps, I find a lot of what is out there now is very “bootcamp” and tries to use a heavy library to not so much teach you but make you feel like you can do things when really the important part is understanding to think in terms of what problems it can solve.
Hacker News, while a bit eye rolls sometimes is good to keep up with the latest lingo. I myself get caught in the jargon and marketing a lot. I question how many people understand what they’re doing versus just talking about parameters, tokens, etc. Like when I was a kid I wanted a dual core processor but had no idea what dual core meant.
I think at a certain point it would best to treat some of this how I treat a compiler or or a processor. At a high level I understand what it’s doing but at a certain point it’s so optimized and complex you have to trust things just work.
posted by geoff. at 1:57 AM on January 13, 2024
It sounds recursive, but since these are ultimately language models, one good way to learn about them is to have conversations with the tool itself about what it is, what it does, and what its limitations are for your use cases.
Over the past 8 months or so I've started always having a ChatGPT tab open in a browser while programming. Ask it about an error message. Ask it about a piece of someone else's code and how it works. Ask it what you should consider before building something to solve a particular problem. Ask it to refactor something. Ask it about the pros and cons of different machine learning approaches for a particular task. Ask it to take some thing you wrote up in one language (or pseudocode, even) and write something in another. Evaluate the output. Heck, ask it to write you a test to evaluate the output. You can probably think of other things; I don't consider myself a software developer but I write a fair bit of code a few times a year.
Two things of note:
1. I pay $20/mo for CGPT4 access and when I ran through my query limit yesterday in the middle of working on some data analysis yesterday and had to switch to 3.5 temporarily it was way worse at understanding my problems.
2. These LLM tools aren't so great at remembering the start of the conversation once the conversation gets long, so I've gotten in the habit of refreshing its memory about my code or data structures once its responses get dumber. And/or start a new conversation. Long conversations get laggy.
posted by deludingmyself at 10:24 AM on January 13, 2024
Over the past 8 months or so I've started always having a ChatGPT tab open in a browser while programming. Ask it about an error message. Ask it about a piece of someone else's code and how it works. Ask it what you should consider before building something to solve a particular problem. Ask it to refactor something. Ask it about the pros and cons of different machine learning approaches for a particular task. Ask it to take some thing you wrote up in one language (or pseudocode, even) and write something in another. Evaluate the output. Heck, ask it to write you a test to evaluate the output. You can probably think of other things; I don't consider myself a software developer but I write a fair bit of code a few times a year.
Two things of note:
1. I pay $20/mo for CGPT4 access and when I ran through my query limit yesterday in the middle of working on some data analysis yesterday and had to switch to 3.5 temporarily it was way worse at understanding my problems.
2. These LLM tools aren't so great at remembering the start of the conversation once the conversation gets long, so I've gotten in the habit of refreshing its memory about my code or data structures once its responses get dumber. And/or start a new conversation. Long conversations get laggy.
posted by deludingmyself at 10:24 AM on January 13, 2024
I know you asked for books but for a high level view I would recommend Kaparthy’s videos. He also has written some good blog posts as mentioned above but more recently he’s posted a fair few videos on YouTube mostly about LLMs.
He is an expert and does a good job explaining things imo. If you are interested in learning more about the details of machine learning he also has some videos on building gpt "from scratch".
YouTube channel
Honestly there’s not a ton of statistics in ML. For a book that is oriented towards software engineers that want to learn about it I would recommend the fastai book. It’s not really going to teach you about ChatGPT but will definitely teach you the practical fundamentals of neural nets etc. It’s available as an ebook or online.
posted by colourlesssleep at 1:58 PM on January 13, 2024
He is an expert and does a good job explaining things imo. If you are interested in learning more about the details of machine learning he also has some videos on building gpt "from scratch".
YouTube channel
Honestly there’s not a ton of statistics in ML. For a book that is oriented towards software engineers that want to learn about it I would recommend the fastai book. It’s not really going to teach you about ChatGPT but will definitely teach you the practical fundamentals of neural nets etc. It’s available as an ebook or online.
posted by colourlesssleep at 1:58 PM on January 13, 2024
I've just bought this book The Little Learner. Haven't started it yet but looks very promising.
posted by neuron at 4:01 PM on January 13, 2024 [1 favorite]
posted by neuron at 4:01 PM on January 13, 2024 [1 favorite]
I found this paper by professor Murray
Shanahan (Imperial College London / DeepMind) to be helpful in getting past the LLM hype: Talking About Large Language Models
I'm constantly surprised by both the power and the limitation of LLM tech.
A fun way to directly experiment with the tech is to get an openai dev account, buy 10 bux worth of credits, and start messing with the completion API. Their documentation and cookbook examples are well written.
posted by stungeye at 6:23 PM on January 13, 2024
Shanahan (Imperial College London / DeepMind) to be helpful in getting past the LLM hype: Talking About Large Language Models
I'm constantly surprised by both the power and the limitation of LLM tech.
A fun way to directly experiment with the tech is to get an openai dev account, buy 10 bux worth of credits, and start messing with the completion API. Their documentation and cookbook examples are well written.
posted by stungeye at 6:23 PM on January 13, 2024
« Older Exporting from evernote to joplin, obsidian, etc... | Film Website Blog vs vblogging? Newer »
This thread is closed to new comments.
posted by so fucking future at 5:21 PM on January 12, 2024 [2 favorites]