Voice analyzing software that can differentiate people?
May 9, 2016 10:57 PM   Subscribe

I am looking for voice analyzing software that can perform a rather particular function: one that can differentiate different people's voices as they have a conversation. Is there any commercial software that can do this?

I am teaching a university ESL course in which the students have discussions in groups of three or four people. They are supposed to speak an equal amount of time during discussions. I want to place a microphone on their table, connect that to my computer (Windows), and have the software recognize each different student and their speaking times. So that at the end of the discussion I can say to students: "Ok, Student A, you talked for 4 minutes, or 23% of the time. Student B, you talked for 2 minutes, or 12% of the time...[etc.]" Obviously the software should be able to do this in realtime or something very close to it so I can give students feedback right away.

Actually recording the discussion isn't really necessary; I just want to be able to tell students the percentage of time they spent talking. I know that everyone has a unique voice, but I wonder if commercial software is sophisticated enough to detect it. So does anyone know software that can perform this task?
posted by zardoz to Computers & Internet (8 answers total) 5 users marked this as a favorite
Is this a semi-formal discussion i.e. speak in turns, or more like a casual conversation where people can interject in between?
posted by Gyan at 12:09 AM on May 10, 2016

I think your best approach will involve multiple microphones - one next to or attached to each student. Sit the students far enough apart (e.g. a couple of metres), and the microphone will quite clearly pick up the volume difference between the student it is next to, and other people speaking. Then you just need software that can tell you what % of the recording is over a certain volume level. That is going to be a LOT more realistically achievable than software that can distinguish individuals' voices based on other factors.
posted by lollusc at 12:31 AM on May 10, 2016 [3 favorites]

Gyan--it's a semi-formal discussion but interruptions and crosstalk is fairly common.
posted by zardoz at 1:28 AM on May 10, 2016

I don't believe there is any software that can do that.

The only way I can think of to do this is to make the recording and then annotate it in a program like ELAN or PRAAT using their names, and you can then pull the times out.

Then you just need software that can tell you what % of the recording is over a certain volume level.

To make this work you need to record on multiple channels. Which practically speaking means multiple computers or an interface with multiple inputs.

Then you could use a program like PRAAT to automatically annotate the silences for each channel (you can choose the threshold), use a script to get the length of those silences, and then subtract the length of those silences from the total length of the recording--and that will give you their time talking.

But this doesn't seem like it's worth it.
posted by Kutsuwamushi at 8:34 AM on May 10, 2016

This is actually a Very Hard problem known as the cocktail party problem, and you for sure will be much better off with multiple recordings with multiple clip mics and analyzing how many seconds each recording spends above a certain level.
posted by Jairus at 9:25 AM on May 10, 2016 [1 favorite]

Speaker diarisation

Would it be acceptable to you to throw away the segments of the discussion where people are speaking simultaneously? You'd still get pretty good numbers, I'd think.
posted by at at 11:14 AM on May 10, 2016

There is software that can do this? I'm a volunteer ESOL tutor - this would be really useful for me. (If it's free software)
posted by finding.perdita at 2:48 AM on May 11, 2016

If you can do multiple microphones routed to different channels, then ffmpeg with the silencedetect or histogram filter will spit out stats upon completion of recording.


You'll need some sort of audio mixer which takes in multiple inputs and relays them over USB as a multi-channel stream.

Ideally, see if you can get Microcone - a microphone array for groups - currently discontinued - from ebay or elsewhere.
posted by Gyan at 1:14 PM on May 11, 2016

« Older Please make the spinning stop!   |   Help me find another car Newer »
This thread is closed to new comments.