Would current technology allow someone to make an audio recording of their life?
December 4, 2007 2:02 AM Subscribe
Would current technology allow someone to make an audio recording of their life?
I was pondering on the topic of augmented memory, and how someone with a bad memory might use technology to give themselves a leg up. It seems to me that with a tiny microphone and high-capacity flash memory a person could literally record everything they hear. There are open-source speech codecs (http://www.speex.org/) that can even take care of detecting when speech is being heard. Ethical implications aside, would this be possible? Is anyone doing this in a homebrew kind of way?
Reliable computerised transcription would obviously make it much easier to manage the recordings.... it seems likely that someone must be making recordings, with a view to running them all through a decent transcription program when it's developed in 10 or 20 years time.
I was pondering on the topic of augmented memory, and how someone with a bad memory might use technology to give themselves a leg up. It seems to me that with a tiny microphone and high-capacity flash memory a person could literally record everything they hear. There are open-source speech codecs (http://www.speex.org/) that can even take care of detecting when speech is being heard. Ethical implications aside, would this be possible? Is anyone doing this in a homebrew kind of way?
Reliable computerised transcription would obviously make it much easier to manage the recordings.... it seems likely that someone must be making recordings, with a view to running them all through a decent transcription program when it's developed in 10 or 20 years time.
Best answer: Check out MyLifeBits. Similar idea, not limited to audio, though.
posted by suedehead at 2:27 AM on December 4, 2007
posted by suedehead at 2:27 AM on December 4, 2007
Best answer: You may be interested in this article.
posted by WPW at 3:46 AM on December 4, 2007 [1 favorite]
posted by WPW at 3:46 AM on December 4, 2007 [1 favorite]
I had a friend who carried a voice-activated recorder around with him for a couple of weeks. It wouldn't record everything, just a few conversations at random. Sociologists then used the recordings to do some sort of study.
Anyway, my point is that with voice-activated recorders, you have a lot less data to store.
posted by grouse at 4:42 AM on December 4, 2007
Anyway, my point is that with voice-activated recorders, you have a lot less data to store.
posted by grouse at 4:42 AM on December 4, 2007
I'm not convinced by that 10k number for audio. A 500Gb hard disc is £80. A 128kbps MP3 stream is 1Mb a minute. That's 347 days. 2Gb of flash will store a day's worth of audio, and it can be streamed to hard disc each night.
The difficulty is going to be in post-processing a signal that's chock full of background noise to get something useful out of it.
posted by Leon at 4:44 AM on December 4, 2007
The difficulty is going to be in post-processing a signal that's chock full of background noise to get something useful out of it.
posted by Leon at 4:44 AM on December 4, 2007
Yeah, that article does not answer the question at all. You could buy a small compact flash recorder like the M-Audio Microtrack II, and a battery pack like the Tekkeon MyPower All, and with that combination and a large flash card you could record a continuous stream for days on end. I suppose you'd have to put the thing down at night for a few hours to charge and transfer the files.
If you wanted VOX, you could just get one of the various VOX kits Radio Shack makes and tie it to the record button.
In other words, it's possible right now for around $700-800.
posted by fake at 5:46 AM on December 4, 2007
If you wanted VOX, you could just get one of the various VOX kits Radio Shack makes and tie it to the record button.
In other words, it's possible right now for around $700-800.
posted by fake at 5:46 AM on December 4, 2007
Leon, my bad - Stross assumes in that article that you're going for a full 'lifelog' with video channel, audio channel and a record of keyboard clicks and mouse clicks. So, yeah, 10K is a bit much.
It is a very good thinkpiece on the tech required and the ramifications of 'total history', where absolutely everything is recorded and searchable. Which links to the OP's original application of a 'memory prosthesis' (stross's concept) for people with bad memories (or in the article, long term Alzheimer's sufferers).
posted by Happy Dave at 6:19 AM on December 4, 2007
It is a very good thinkpiece on the tech required and the ramifications of 'total history', where absolutely everything is recorded and searchable. Which links to the OP's original application of a 'memory prosthesis' (stross's concept) for people with bad memories (or in the article, long term Alzheimer's sufferers).
posted by Happy Dave at 6:19 AM on December 4, 2007
After some thought, I think a mobile phone is the right place to do this. The list of requirements isn't that far away from a modern phone - run software, record telephone calls, record from an external mic (bluetooth ideally), record to flash, data upload, and always in your pocket.
Only thing missing is decent battery life.
posted by Leon at 6:27 AM on December 4, 2007
Only thing missing is decent battery life.
posted by Leon at 6:27 AM on December 4, 2007
It doesn't answer the question, but it's worth noting that there's a lot of research being done on what it will take to make "lifelogs" (video and audio) practical—and not just at Microsoft. This is clearly a development goal.
posted by adamrice at 6:45 AM on December 4, 2007
posted by adamrice at 6:45 AM on December 4, 2007
The real problem is what do you do with all that audio? Hire transcription secretaries to listen to it non-stop for the rest of their lives? Invent some mega-awesome heuristic natural language contextualizer to sort the audio into a searchable database?
You can record all you want these days, the problem is dealing with it afterwards. That's still a ways off.
posted by Aquaman at 8:25 AM on December 4, 2007
You can record all you want these days, the problem is dealing with it afterwards. That's still a ways off.
posted by Aquaman at 8:25 AM on December 4, 2007
Having worked with documentary and archival audio and video, the problems are not so much with storage as being able to make sense of it afterwards and finding what you want.
posted by KirkJobSluder at 8:27 AM on December 4, 2007
posted by KirkJobSluder at 8:27 AM on December 4, 2007
And to echo Aquaman, even transcripts are not that much of a timesaver when you are dealing with trascriptions of candid and unstructured communication. The breakthroughs needed to make this work have less to do with storage or even speech recognition, than methods for meaningfully indexing and imposing structure on large poorly-structured datasets.
posted by KirkJobSluder at 8:34 AM on December 4, 2007
posted by KirkJobSluder at 8:34 AM on December 4, 2007
KirkjobSluder and Aquaman are right. You can't *deal* with that much data. Now, it's true that you're willing to be patient (10-20 years). So perhaps in that time, all of artificial intelligence will be solved and then you'll have a goldmine.
But seriously, it would require better-than-human-level intelligence to get "context" (whatever that might be) out of snippets of audio. If you gave a random 3 minute clip of your "life log" to your mom, would she be able to create an accurate depiction of that moment? (Hint: No.)
posted by zpousman at 9:02 AM on December 4, 2007
But seriously, it would require better-than-human-level intelligence to get "context" (whatever that might be) out of snippets of audio. If you gave a random 3 minute clip of your "life log" to your mom, would she be able to create an accurate depiction of that moment? (Hint: No.)
posted by zpousman at 9:02 AM on December 4, 2007
You people are aware the NSA does something like this, but on a massive scale, right? They listen in on phone traffic for keywords, etc.
This is not that far fetched. First of all, storing it is trivial. as Leon says, audio can be recorded to mp3 at 128Kbps at around 1MB per minute. Assume average lifespan of 75 years.
75 years * 365 days/year * 24 hours/day * 60 minutes/hour * 1MB/minute = 39,420,000 MB = 39.42 terabytes
And you only fill about 565 GB per year, so assuming you buy only the storage you need for the upcoming year, the cost is pretty minimal.
Secondly, with regard to processing that much data, you certainly can deal with it, and this is where what is generally understood to be what NSA does comes into play. If you establish certain keywords (people's names, key events "birthday", "meeting", etc) then software can be set up to listen to the stream in realtime, and when it detects a keyword, it simply stores the timecode with the associated keyword. This would be much more practical as part of a service, i.e. your audio is recorded on the network somewhere where supercomputers work on a lot of streams at once.
Mining a huge amount of that audio offline and after the fact would be difficult, but if what you are looking for is a rather small set of keywords, it can be done now. (dragon naturally speaking is quite effective at taking down dictation, so imagine the improvement in ten years).
The real problem is the fact that much of the knowledge and information we process is visual, not auditory. Reading, viewing, etc. All that time would be recorded as silence.
posted by Pastabagel at 9:35 AM on December 4, 2007
This is not that far fetched. First of all, storing it is trivial. as Leon says, audio can be recorded to mp3 at 128Kbps at around 1MB per minute. Assume average lifespan of 75 years.
75 years * 365 days/year * 24 hours/day * 60 minutes/hour * 1MB/minute = 39,420,000 MB = 39.42 terabytes
And you only fill about 565 GB per year, so assuming you buy only the storage you need for the upcoming year, the cost is pretty minimal.
Secondly, with regard to processing that much data, you certainly can deal with it, and this is where what is generally understood to be what NSA does comes into play. If you establish certain keywords (people's names, key events "birthday", "meeting", etc) then software can be set up to listen to the stream in realtime, and when it detects a keyword, it simply stores the timecode with the associated keyword. This would be much more practical as part of a service, i.e. your audio is recorded on the network somewhere where supercomputers work on a lot of streams at once.
Mining a huge amount of that audio offline and after the fact would be difficult, but if what you are looking for is a rather small set of keywords, it can be done now. (dragon naturally speaking is quite effective at taking down dictation, so imagine the improvement in ten years).
The real problem is the fact that much of the knowledge and information we process is visual, not auditory. Reading, viewing, etc. All that time would be recorded as silence.
posted by Pastabagel at 9:35 AM on December 4, 2007
Also, an interesting film on this subject is The Final Cut with Robin Williams. Everyone seems to have hated this movie, but the premise is interesting. Parents choose to implant a video recorder in their unborn child's head, which will record their entire lives without their knowledge. On their death, the mortician, discovering the implant, notifies next of kin who contract a "cutter" to basically edit a highlight reel of that person's life. The plot is meh but the creative implications explored in the film are interesting.
posted by Pastabagel at 9:39 AM on December 4, 2007
posted by Pastabagel at 9:39 AM on December 4, 2007
Pastabagel: You people are aware the NSA does something like this, but on a massive scale, right? They listen in on phone traffic for keywords, etc.
Certainly. And you are aware that the NSA considers it's signal processing intelligence to be a military secret and has not been forthcoming about revealing details about it's signal processing systems, right? And that the scale of that processing may involve investment in both computing and human power that are considerably larger than consumer computing applications, right? Furthermore, indexing on a priori selected phenomena is of very limited use because often the most significant episodes are the completely unexpected.
And to correct you, Dragon is effective at taking dictation from known voices, under exceptionally controlled audio conditions, using a very specific dictation protocol. Apples and oranges, apples and oranges.
posted by KirkJobSluder at 6:54 PM on December 4, 2007
Certainly. And you are aware that the NSA considers it's signal processing intelligence to be a military secret and has not been forthcoming about revealing details about it's signal processing systems, right? And that the scale of that processing may involve investment in both computing and human power that are considerably larger than consumer computing applications, right? Furthermore, indexing on a priori selected phenomena is of very limited use because often the most significant episodes are the completely unexpected.
And to correct you, Dragon is effective at taking dictation from known voices, under exceptionally controlled audio conditions, using a very specific dictation protocol. Apples and oranges, apples and oranges.
posted by KirkJobSluder at 6:54 PM on December 4, 2007
Note that I'm not saying that it can't be done. What I'm saying is that making sense of large sets of video and audio data collected in naturalistic settings (and even in contrived settings) isn't just a matter of slapping down an code every time you see or hear a certain behavior.
Just walking this methodology through, a priori coding schemes depend on some very fragile theoretical assumptions about what the data contains. Which is why Dragon can work as dictation software. It places hard limits on the structure and kind of spoken language used to communicate with it. If you violate those limits (and I've found that just getting tired over the course of a dictation will do it) the accuracy goes down.
Qualitative research is hard work because it is often the case that simply applying some theory as to what events are significant and what events are insignificant blindly to datasets produces garbage for results. So often you need to look at the raw data to discover why that theory doesn't work for that particular dataset. I don't doubt that the NSA's signal intelligence also involves a lot of smart people developing evolving theories of the targeted intelligence.
posted by KirkJobSluder at 7:28 PM on December 4, 2007
Just walking this methodology through, a priori coding schemes depend on some very fragile theoretical assumptions about what the data contains. Which is why Dragon can work as dictation software. It places hard limits on the structure and kind of spoken language used to communicate with it. If you violate those limits (and I've found that just getting tired over the course of a dictation will do it) the accuracy goes down.
Qualitative research is hard work because it is often the case that simply applying some theory as to what events are significant and what events are insignificant blindly to datasets produces garbage for results. So often you need to look at the raw data to discover why that theory doesn't work for that particular dataset. I don't doubt that the NSA's signal intelligence also involves a lot of smart people developing evolving theories of the targeted intelligence.
posted by KirkJobSluder at 7:28 PM on December 4, 2007
Response by poster: Lots of good answers, thanks. The consensus seems to be recording/storage = trivial; data processing = hard (no-one has mentioned it, but IMHO the privacy implications also = hard). I suppose some of the indexing concerns could be helped with other metadata (e.g. GPS - if I'm at work, tag any conversations with "work".). If each conversation was tagged with a timestamp, location and participants (just speculating here - can you distinguish between the voice of different people based on frequency spectra?) maybe it would be useful even without reliable transcription.
posted by primer_dimer at 1:55 AM on December 7, 2007
posted by primer_dimer at 1:55 AM on December 7, 2007
It's true that the data processing is hard, and I was alluding to that in my previous post. I've translated technical papers that discuss techniques for taking an audio or video clip as a search key and searching through a huge corpus of recorded audio/video. This has applications beyond life-log searching, of course, but that's an expected application.
posted by adamrice at 6:38 AM on December 7, 2007
posted by adamrice at 6:38 AM on December 7, 2007
This thread is closed to new comments.
Short answer, totally possible, for about 10,000 Euros, for a year. In ten years, it'll cost peanuts.
posted by Happy Dave at 2:25 AM on December 4, 2007 [2 favorites]