Learning math on the fly?
November 17, 2020 7:24 AM Subscribe
I'm currently in grad school, and I am quickly realizing that the projects and research that seems most interesting to me requires math skills that I don't have. For anyone who has been in similar situations, what recommendations would you have to improve math skills without literally taking five or six semester-long courses in the necessary subjects?
Some background: I studied mechanical engineering, so I've taken a few calculus courses and differential equations, but did not need to take linear algebra. I also took basic stats. I did fine in these courses, but all of this was many years ago, and after working in an unrelated field, I returned to get a masters in an engineering field. Virtually all of my math knowledge is gone. Many of the courses I'm taking are not math-intense, and I can keep up just fine. However, most of the projects/research that I read about, and a vast majority of the job postings I see at places I'm interested in emphasize skills in data science, machine learning, or modeling.
I'm feeling a bit overwhelmed, as I don't have a lot of free time for extra learning, but when I finish this program (in about a year), I'm worried that I'm going to have a hard time finding jobs or interesting opportunities, and that my graduate degree will be a bit useless because of my lack of hard-math and trendy data science skills.
So, I want to spend some time to try and catch up as much as I can. I never took linear algebra, which seems quite useful, and I also think refreshing on stats would be generally very useful. A refresher on calculus would be great, but calculus is a broad subject and frankly I'm not sure where to start.
I currently have two research projects which take up a good amount of my time, and several other courses for the next few months, so I can probably spare only an hour a day on learning one of the above topics. Does anyone have recommendations on courses or approaches? I know this is perhaps a very general question, but I've never gone about learning quantitative subjects completely independently, and when I jump into MIT courses and just start watching videos they quickly become complex and require pre-requisite knowledge that I simply don't have.
Some background: I studied mechanical engineering, so I've taken a few calculus courses and differential equations, but did not need to take linear algebra. I also took basic stats. I did fine in these courses, but all of this was many years ago, and after working in an unrelated field, I returned to get a masters in an engineering field. Virtually all of my math knowledge is gone. Many of the courses I'm taking are not math-intense, and I can keep up just fine. However, most of the projects/research that I read about, and a vast majority of the job postings I see at places I'm interested in emphasize skills in data science, machine learning, or modeling.
I'm feeling a bit overwhelmed, as I don't have a lot of free time for extra learning, but when I finish this program (in about a year), I'm worried that I'm going to have a hard time finding jobs or interesting opportunities, and that my graduate degree will be a bit useless because of my lack of hard-math and trendy data science skills.
So, I want to spend some time to try and catch up as much as I can. I never took linear algebra, which seems quite useful, and I also think refreshing on stats would be generally very useful. A refresher on calculus would be great, but calculus is a broad subject and frankly I'm not sure where to start.
I currently have two research projects which take up a good amount of my time, and several other courses for the next few months, so I can probably spare only an hour a day on learning one of the above topics. Does anyone have recommendations on courses or approaches? I know this is perhaps a very general question, but I've never gone about learning quantitative subjects completely independently, and when I jump into MIT courses and just start watching videos they quickly become complex and require pre-requisite knowledge that I simply don't have.
If you feel that your graduate degree isn't adequately preparing you for the work you intend to do afterwards, it may be worth talking to someone at the school. Whenever any of my classmates had issues like these, department heads and faculty usually did their best to fix the problem. One of my classmates went back and attended additional free classes after graduation to fill gaps that she felt our program didn't adequately address. I know this doesn't answer your question as asked, but it may be an avenue worth exploring. And at the very least, faculty should at least be able to point you in the right direction towards important texts and resources.
posted by meows at 8:16 AM on November 17, 2020 [8 favorites]
posted by meows at 8:16 AM on November 17, 2020 [8 favorites]
Yes, linear algebra is key (among other things very important in mathematical modeling and statistics, design of experiments, understanding & simplifying data, etc.).
Whatever else you do, find a book or few to keep by your side as reference. There are many (I have half a dozen). I like Mathematical tools for applied multivariate analysis (Carroll, Green, Chaturvedi). But look around for something that's at the right level: not so advanced you can't use it, but not so simple it's not much help. And perhaps one used by people in your field so it's most relevant; perhaps a book of math essentials for your field. Ask a professor or advanced student what they find essential when they forget something, or what's on their shelves. Certainly the web is useful for looking up specific things, finding papers, & the like.
Much of "data science" is about programming and learning the latest software. Best left for later -- these are tools you can pick up as needed. But start looking at something like R if you need to analyze data in classes.
posted by lathrop at 8:17 AM on November 17, 2020 [3 favorites]
Whatever else you do, find a book or few to keep by your side as reference. There are many (I have half a dozen). I like Mathematical tools for applied multivariate analysis (Carroll, Green, Chaturvedi). But look around for something that's at the right level: not so advanced you can't use it, but not so simple it's not much help. And perhaps one used by people in your field so it's most relevant; perhaps a book of math essentials for your field. Ask a professor or advanced student what they find essential when they forget something, or what's on their shelves. Certainly the web is useful for looking up specific things, finding papers, & the like.
Much of "data science" is about programming and learning the latest software. Best left for later -- these are tools you can pick up as needed. But start looking at something like R if you need to analyze data in classes.
posted by lathrop at 8:17 AM on November 17, 2020 [3 favorites]
I have the same issue, minus the grad school angle. Khan Academy seems great, any other suggestions?
posted by ®@ at 8:40 AM on November 17, 2020
posted by ®@ at 8:40 AM on November 17, 2020
For linear algebra, Lyryx is an open text that has a free homework tool for practicing and getting immediate feedback. I believe it is the official text at a few state schools in my area.
[On review, it may be the case that only instructors can request a demo course, but it looks like anyone can download the text.]
posted by klausman at 8:43 AM on November 17, 2020 [1 favorite]
[On review, it may be the case that only instructors can request a demo course, but it looks like anyone can download the text.]
posted by klausman at 8:43 AM on November 17, 2020 [1 favorite]
Many of the courses I'm taking are not math-intense, and I can keep up just fine. However, most of the projects/research that I read about, and a vast majority of the job postings I see at places I'm interested in emphasize skills in data science, machine learning, or modeling.
Ok - it sounds like you don't need math to finish your degree. So put that aside. For the next bit - if you're interested in data science roles - a couple points. I think the data science title is breaking into a) data engineer - getting all of the data all reshaped and pipelined and automated and clean so it can feed into an algorithm, b) machine-learning engineer - taking the pipeline inputs, implementing algorithms and outputiting/publishing results, and c) researcher/research scientist - developing and modifying algorithms to suit methodological issues not amenable to an off-the-shelf solution.
(a) and (b) are programming jobs, with (b) requiring lots of (a) plus conceptual understanding of the machine learning models and how they go right/wrong. I think (c) is a math job, with some ability to prototype in code and handoff to production, but pencil and chalkboard will be as important as code in working things out.
I think (roughly) that (a) needs a bachelors, (b) needs a masters, and (c) is PhD level. You really only need to fluidly "do math" for (c), and if you want to do (c), you'd be best served by taking the time and re-doing the work for all three calculus, linear algebra, and probably real analysis and then going into a PhD program. This is just my opinion, and there are likely exceptions, but I think this is how the near-term future is breaking / going to break.
posted by everythings_interrelated at 9:06 AM on November 17, 2020 [3 favorites]
Ok - it sounds like you don't need math to finish your degree. So put that aside. For the next bit - if you're interested in data science roles - a couple points. I think the data science title is breaking into a) data engineer - getting all of the data all reshaped and pipelined and automated and clean so it can feed into an algorithm, b) machine-learning engineer - taking the pipeline inputs, implementing algorithms and outputiting/publishing results, and c) researcher/research scientist - developing and modifying algorithms to suit methodological issues not amenable to an off-the-shelf solution.
(a) and (b) are programming jobs, with (b) requiring lots of (a) plus conceptual understanding of the machine learning models and how they go right/wrong. I think (c) is a math job, with some ability to prototype in code and handoff to production, but pencil and chalkboard will be as important as code in working things out.
I think (roughly) that (a) needs a bachelors, (b) needs a masters, and (c) is PhD level. You really only need to fluidly "do math" for (c), and if you want to do (c), you'd be best served by taking the time and re-doing the work for all three calculus, linear algebra, and probably real analysis and then going into a PhD program. This is just my opinion, and there are likely exceptions, but I think this is how the near-term future is breaking / going to break.
posted by everythings_interrelated at 9:06 AM on November 17, 2020 [3 favorites]
Khan Academy, hands down.
posted by Amalie-Suzette at 9:12 AM on November 17, 2020
posted by Amalie-Suzette at 9:12 AM on November 17, 2020
I found the book Practical Statistics for Data Scientists really useful. It's straightforward, easy to get through, and very explicitly discusses which elements of statistics are often used in data science work and which aren't. It's available through my university's library, so you might find it in yours as well!
posted by thebots at 9:12 AM on November 17, 2020 [3 favorites]
posted by thebots at 9:12 AM on November 17, 2020 [3 favorites]
I'm a data scientist and I've been on the hiring panel for other data scientists, and I can tell you that VAST swaths of the applicant pool, maybe the majority for jobs at non-FAANG could not do a dot product on tiny sample matrices a whiteboard on the spot, so some perspective is good. Yes, you should probably be able to do that, but maybe you don't need six semesters worth of more math to be better than the average applicant and to do the job (data science roles are also very heterogeneous these days and some places are mathier, some less). I agree with most of what everythings_interrelated said; these are programming roles and you need to know enough theory to have good instincts on algorithm selection and troubleshooting and to write good tests, but that's it. You will not touch the linear algebra much in day-to-day (because the programming frameworks are way higher abstracted) unless you are doing the kind of work where you are coming up with novel deep learning architectures and that's 2% of the actual data science jobs out there. You should brush up on classical stats; most places lack analysts and data scientists who can choose appropriate tests and generate appropriate metrics and if you understand when and when not to use a t-test, you are adding value. You should have some exposure to parametric and non-parametric approaches, maybe some exposure to timeseries and Bayesian stats. I got all that in grad school in a few semesters while also doing research so I can't recommend other resources but I know it doesn't need to be your exclusive focus.
posted by slow graffiti at 10:07 AM on November 17, 2020 [4 favorites]
posted by slow graffiti at 10:07 AM on November 17, 2020 [4 favorites]
I would say put major focus now on learning linear algebra and don't worry about or get distracted by other math topics for a while longer. (Maybe basic calculus helps also, but not complicated stuff or solving any difficult integrals.)
Seriously, almost everything you need to do is just linear algebra. If you understand matrices and linear operators, eigenvalues and eigenvectors, a lot of the workhorse algorithms in data science like PCA become trivial to understand.
In addition to Khan Academy, Gilbert Strang is famously a great teacher of linear algebra. In addition to his main linear algebra course on OCW, you may enjoy Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, a programming- and data-science-focused linear algebra course for beginners.
Also, general advice for learning math at your own pace -- just gloss over stuff you have a really hard time understanding, don't stress about it because there's no exam, and don't be afraid to bounce off things and come back to them weeks or months later.
posted by vogon_poet at 10:23 AM on November 17, 2020 [1 favorite]
Seriously, almost everything you need to do is just linear algebra. If you understand matrices and linear operators, eigenvalues and eigenvectors, a lot of the workhorse algorithms in data science like PCA become trivial to understand.
In addition to Khan Academy, Gilbert Strang is famously a great teacher of linear algebra. In addition to his main linear algebra course on OCW, you may enjoy Matrix Methods in Data Analysis, Signal Processing, and Machine Learning, a programming- and data-science-focused linear algebra course for beginners.
Also, general advice for learning math at your own pace -- just gloss over stuff you have a really hard time understanding, don't stress about it because there's no exam, and don't be afraid to bounce off things and come back to them weeks or months later.
posted by vogon_poet at 10:23 AM on November 17, 2020 [1 favorite]
All of Statistics (Larry Wasserman) is a terse but very helpful reference for data science. There’s some calculus and linear algebra, but speaking as, shall we say, a reluctant integrator, there’s a lot you can get out of it without being particularly good at either.
posted by en forme de poire at 4:41 PM on November 18, 2020
posted by en forme de poire at 4:41 PM on November 18, 2020
Nth'ing that either statistics (hypothesis testing) or linear algebra is probably your main thing here, depending on the role -- having both good! (Also it cheers me to see that Gil Strang is still teaching!) From calculus you can expect to require the concept of the gradient, and not much else is likely critical.
I do think it's valuable before you get fancy in machine learning to have a solid sense of what you can get from plain old 19th-century linear regression, and what from logistic regression. Kernels maybe too.
posted by away for regrooving at 2:01 AM on November 19, 2020 [1 favorite]
I do think it's valuable before you get fancy in machine learning to have a solid sense of what you can get from plain old 19th-century linear regression, and what from logistic regression. Kernels maybe too.
posted by away for regrooving at 2:01 AM on November 19, 2020 [1 favorite]
This thread is closed to new comments.
posted by BekahVee at 7:42 AM on November 17, 2020 [4 favorites]