Preparing audio files for 8khz
March 2, 2010 10:02 AM Subscribe
In preparing voice files for a final sample rate of 8kHz what can I do to ensure that the voice remains as warm and untinny as possible.
I am saving down files for upload to a telephone system that only supports 8kHz wav format. While I have recorded at 41kHz it is obviously losing a lot of quality in the down-sample. While I realize there are obvious limitations with the final format and eventually it will sound telephone-y, is there artificial processing I should carry out to balance this before they are down-sampled, e.g. bass boost, equalization to make sure that they sound as natural as possible even at this rate. I have already normalized all of the files and have good levels to work with.
It is similar but opposite to this question, which has already helped.
I am using Audacity - thanks!
I am saving down files for upload to a telephone system that only supports 8kHz wav format. While I have recorded at 41kHz it is obviously losing a lot of quality in the down-sample. While I realize there are obvious limitations with the final format and eventually it will sound telephone-y, is there artificial processing I should carry out to balance this before they are down-sampled, e.g. bass boost, equalization to make sure that they sound as natural as possible even at this rate. I have already normalized all of the files and have good levels to work with.
It is similar but opposite to this question, which has already helped.
I am using Audacity - thanks!
Best answer: Okay, so let's talk terms here for a second.
"Tinny" refers to having a certain amount of high end. (Along with words like brittle, thin, sparkly, etc)
"Warm" refers to having a certain amount of low end. (Along with words like thick, muddy, etc)
With a sampling rate of 8K, the top end of your frequency response is going to be 4khz, which is just above the frequency response of a POTS system. So, the probability of any of your files sounding "tinny" is pretty small. Not impossible, because it's all relative - if your files have no low end at all, they'll still sound thin, but I'd doubt that's the case.
My experience (radio engineer, I deal with phone audio several hours every day) is this - you're looking for clarity. To me, this generally means a cut somewhere around 300hz. 300hz is that muddy frequency - it makes your audio sound like it's on the other side of a wall. By cutting that (3-6db, don't go crazy), you'll increase your clarity significantly. Typically I'd also boost the high-end a little bit, but since most of that won't be present in your 8k file, that's pretty much unnecessary.
A little bit of compression will help as well - widely fluctuating levels can make phone audio more difficult to understand, and evening that out can be very helpful. Compressors are hard to use well, though, and it's easy to turn your audio into total crap.
And of course, any of the above depends very strongly on the quality and character of your source material. If you want more details or more specific help, just memail me.
posted by god hates math at 11:18 AM on March 2, 2010 [2 favorites]
"Tinny" refers to having a certain amount of high end. (Along with words like brittle, thin, sparkly, etc)
"Warm" refers to having a certain amount of low end. (Along with words like thick, muddy, etc)
With a sampling rate of 8K, the top end of your frequency response is going to be 4khz, which is just above the frequency response of a POTS system. So, the probability of any of your files sounding "tinny" is pretty small. Not impossible, because it's all relative - if your files have no low end at all, they'll still sound thin, but I'd doubt that's the case.
My experience (radio engineer, I deal with phone audio several hours every day) is this - you're looking for clarity. To me, this generally means a cut somewhere around 300hz. 300hz is that muddy frequency - it makes your audio sound like it's on the other side of a wall. By cutting that (3-6db, don't go crazy), you'll increase your clarity significantly. Typically I'd also boost the high-end a little bit, but since most of that won't be present in your 8k file, that's pretty much unnecessary.
A little bit of compression will help as well - widely fluctuating levels can make phone audio more difficult to understand, and evening that out can be very helpful. Compressors are hard to use well, though, and it's easy to turn your audio into total crap.
And of course, any of the above depends very strongly on the quality and character of your source material. If you want more details or more specific help, just memail me.
posted by god hates math at 11:18 AM on March 2, 2010 [2 favorites]
Best answer: The ยต-Law digital phone data format compands linear 14 bit samples down to 8 bits in a pseudo-logarithmic way that provides higher resolution for low voice tones. So if you're feeding 8000Hz to something like Asterisk or Freeswitch, don't save to 8-bit, save to 16-bit, after running through a 300Hz high-pass filter as the previous poster suggests.
posted by nicwolff at 12:08 PM on March 2, 2010
posted by nicwolff at 12:08 PM on March 2, 2010
I don't know how you do it in Audacity but if you have sox you can do something like
posted by nicwolff at 12:22 PM on March 2, 2010 [1 favorite]
sox -c 1 -r 8000 -b 16 -e mu-law in.wav out.wav highpass 300
to create a file that the telephony server won't have to recode at all, which should let it use less CPU to handle calls. Nicer for batch processing, too!posted by nicwolff at 12:22 PM on March 2, 2010 [1 favorite]
Response by poster: Thanks for the replies everyone and to GHM for taking a listen.
posted by clarkie666 at 9:20 AM on March 4, 2010
posted by clarkie666 at 9:20 AM on March 4, 2010
« Older Copyright status of game footage in broadcast | I want to look like a million bucks on a couple... Newer »
This thread is closed to new comments.
posted by Dragonness at 10:36 AM on March 2, 2010