How to scrape audio from a course platform/LMS to a personal podcast?
September 18, 2024 1:48 AM Subscribe
I'm currently doing some extended online training which uses an unwieldy browser-based LMS to deliver all the course material. I'd like to somehow scrape the audio from all the material here and create my own personal private podcast that I can access from my own mobile podcast player. What would be the most efficient way to do this?
The content on the course is a mixture of audio and video and varies in length from 10 minutes to over an hour. On the longer ones, I like to listen to when I am driving or walking, but usually I need to stop on occasion and then finding/remembering my spot is a huge pain - I end up having to log back into the LMS, find the section, find the spot etc. Doing this a few times is okay, but this is a course that is going for over a year, so I want to come up with a better system.
In a perfect would, I would somehow be able to scrape all the content from the page as audio files and then create my own private podcast where I can load all the content in order, listen to it on my podcast player that I can adjust speed, remove silence, maintain play position across devices etc. It looks like most of the video content is embedded on Vimeo without any native download button. The audio-only content seems to be hosted directly on the LMS.
I haven't investigated content scraping for a while - last time I looked it was full of dodgy plug-ins and complex workarounds. The custom podcast part of things is totally new to me.
Is there a way to accomplish this in a relatively straightforward way?
The content on the course is a mixture of audio and video and varies in length from 10 minutes to over an hour. On the longer ones, I like to listen to when I am driving or walking, but usually I need to stop on occasion and then finding/remembering my spot is a huge pain - I end up having to log back into the LMS, find the section, find the spot etc. Doing this a few times is okay, but this is a course that is going for over a year, so I want to come up with a better system.
In a perfect would, I would somehow be able to scrape all the content from the page as audio files and then create my own private podcast where I can load all the content in order, listen to it on my podcast player that I can adjust speed, remove silence, maintain play position across devices etc. It looks like most of the video content is embedded on Vimeo without any native download button. The audio-only content seems to be hosted directly on the LMS.
I haven't investigated content scraping for a while - last time I looked it was full of dodgy plug-ins and complex workarounds. The custom podcast part of things is totally new to me.
Is there a way to accomplish this in a relatively straightforward way?
For the downloading part of your question: Downie is the MacOS app we use at work to pull audio and video from archival sources, social media, news sites, and video hosting sites such as Vimeo (whatever clips we end up using are fully attributed and licenses are paid). It allows you to select the quality and format of the files and is pretty smart about downloading the specific ones you want while ignoring the rest. Videos can be automatically converted to audio-only files if you prefer that. It’s very easy to use and it has a fully functional free trial.
posted by theory at 2:39 AM on September 18 [2 favorites]
posted by theory at 2:39 AM on September 18 [2 favorites]
Response by poster: Forgot to mention I’m fully Mac/iOS based - thank you!
posted by LongDrive at 2:46 AM on September 18
posted by LongDrive at 2:46 AM on September 18
I'd see if yt-dlp manages to download the files - it should work for the vimeo ones at least. You can pass an argument to automatically extract the audio from the video: yt-dlp -x --audio-format=mp3 [url]
If you need a username/password to access the content, you'll probably need to pass that as an argument to yt-dlp. I don't remember what the syntax for that is but you can look it up.
posted by trig at 2:47 AM on September 18 [1 favorite]
If you need a username/password to access the content, you'll probably need to pass that as an argument to yt-dlp. I don't remember what the syntax for that is but you can look it up.
posted by trig at 2:47 AM on September 18 [1 favorite]
i’ll just add a suggestion to request downloadable audio files from the person(s) teaching the course. it’s quite possible they haven’t thought of offering this to students, and i think you can articulate a compelling explanation of why it would be helpful. (i also use to do this for myself as a grad student, though apologies i can’t recall exactly what steps i did). it doesn’t hurt to ask!
posted by tamarack at 7:03 AM on September 18 [1 favorite]
posted by tamarack at 7:03 AM on September 18 [1 favorite]
I did exactly this with an online course a few years ago, via the dumbest most brute-force method.
I used an app called Piezo by Rogue Amoeba software. It costs $25. There are no cost ways, read on if you want one.
Piezo runs on your Mac and records sounds from running apps. It can save in mp3 format. I set Piezo to record my browser, started the training, and let it run in the background while I did other work.
This recording is real time, so takes as long as the content.
I renamed the MP3 files to 01 class, 02 class… etc. I then imported to
Apple Music (formerly iTunes) and dumped them in a playlist. I plugged my iPhone into the Mac and set Music to sync that playlist to the iPhone’s Music app. Just like 2001.
While time consuming, this didn’t require any internet hosting or servers.
As an alternative Piezo, you could also make a recording with the free and built in app Quick Time Player. You will need software to create a “virtual sound card” that pipes the audio from your browser to a source that can be selected as input. There are many apps that do this, paid ones like Loopback, but Black Holeis a free (suggested donation) option.
After making your browsers audio output appear as a virtual audio input, select that as source in QuickTime Player and start an audio only recording. Hit play on the training and record in real time.
This will require some fiddling and research, which is why I spent the $25 on Piezo.
posted by sol at 2:48 PM on September 18 [1 favorite]
I used an app called Piezo by Rogue Amoeba software. It costs $25. There are no cost ways, read on if you want one.
Piezo runs on your Mac and records sounds from running apps. It can save in mp3 format. I set Piezo to record my browser, started the training, and let it run in the background while I did other work.
This recording is real time, so takes as long as the content.
I renamed the MP3 files to 01 class, 02 class… etc. I then imported to
Apple Music (formerly iTunes) and dumped them in a playlist. I plugged my iPhone into the Mac and set Music to sync that playlist to the iPhone’s Music app. Just like 2001.
While time consuming, this didn’t require any internet hosting or servers.
As an alternative Piezo, you could also make a recording with the free and built in app Quick Time Player. You will need software to create a “virtual sound card” that pipes the audio from your browser to a source that can be selected as input. There are many apps that do this, paid ones like Loopback, but Black Holeis a free (suggested donation) option.
After making your browsers audio output appear as a virtual audio input, select that as source in QuickTime Player and start an audio only recording. Hit play on the training and record in real time.
This will require some fiddling and research, which is why I spent the $25 on Piezo.
posted by sol at 2:48 PM on September 18 [1 favorite]
I have done essentially the method suggested by sol, but using BlackHole to pipe the audio and Audacity to do the recording. So that's probably what I would use to do this task.
BlackHole is a little fiddly to use initially though -- if you are willing to spend the $25, I imagine Piezo is probably easier. But once you understand how BlackHole works, it works great.
posted by mekily at 8:15 PM on September 18
BlackHole is a little fiddly to use initially though -- if you are willing to spend the $25, I imagine Piezo is probably easier. But once you understand how BlackHole works, it works great.
posted by mekily at 8:15 PM on September 18
You are not logged in, either login or create an account to post comments
As far as "private podcast" - personally I might be more inclined to just put all the files into a directory on my android device and then use any of a number of different music/audiobook/audio/video players that can keep track of position etc. to watch/listen.
If you really want to make a podcast, though, if you just put all your mp3/etc files into a single directory online somewhere (presuming you have some kind of online web space or whatever) you can use dir2cast to make the necessary .rss file needed to magically turn that directory into a podcast. A bunch more similar programs here.
posted by flug at 2:21 AM on September 18 [1 favorite]