Sound/Speech compression

Posted on September 18, 2010

Hi all,

This week I’m going to talk a bit about some very useful open source libraries we’ve used in developing some of our past client projects. These have mainly been used on DS which is a little more restricted in terms of space than other platforms.

Sounds interesting!

Sound has been the main component of the projects I’m covering today, probably 60%-80% of the final size of the games was audio.

One of the titles consisted of 5 languages of speech audio data to playback during gameplay. We also have to deal with very little available RAM on the DS which means we can’t precache much information if any, and so data needs to be decompressed on the fly.

Looking at the English speech only for one of the titles we had 187,199,378bytes (178.5mb) of source data stored at a sampe rate of 44100hz with 16bit mono samples this was roughly 35 minutes of speech.

Storing speech at 44100hz is a little excessive to start with, the Nyquist-Shannon sampling theorem tells us that perfect reconstruction of a signal (a sound signal in this case) is possible as long as the sampling frequency used to represent it is possible when the sampling frequency is greater than twice the maximum frequency of the signal being sampled.

This means we can safely resample down based off the highest frequency in the sound, this can be automated in a tool so that we find the maximum frequency in the audio and then resample to twice that.

Speech tends to be 0-4000hz frequencies which is why most speech data is sampled at 8khz. Note this obviously doesn’t represent all possible sounds a person can make and is purely meant for speech which is why listening to music (or singing) down the phone doesn’t exactly sound great!

Our speech data is kind of stylised and we actually used a sampling rate of 11025hz. If we were just storing the data as raw PCM this would mean our data is 1/4 of the size (46,215,824bytes, 44.1mb for our data set) already but that’s not good enough for our needs.

ADPCM

This is an alternative compression method for audio data and is supported in hardware on many platforms (meaning lower in RAM storage overhead and no playback slowdown), the actual algorithm varies but most platforms use IMA ADPCM. Depending on the implementation the compression ratio varies but generally it’s 4:1 (another 1/4 of the data size!) note this is a fixed ratio compression and the same amount of data would be used to store a piece of silence in your audio data as it would to store the most ‘interesting’ data.

Most audio people I’ve worked with don’t really like ADPCM for general use, I’ve not really looked into how it fares in terms of comparing to the source sample but even I can hear some of the issues on certain sounds!

For our speech data 4:1 ADPCM would result in 11,553,956 bytes (11mb) of data at 11025hz.

Speex of the devil

There are some truly great open source / free software packages available and one of them is Speex from the same xiph.org who bring us Vorbis (mentioned briefly below) and Theora.

As you can probably guess from the name Speex is intended for speech compression and therefore won’t perform as well with none-speech data.

For our project we implemented a decoder thread that would when notified stream the compressed data from disc and decode either directly into a double buffered audio stream (i.e. direct to the sound hardware or the sound API) or into a decompressed buffer if the speech was going to be used a few times in that particular part of the game (to save decompressing the same sound over and over).

To give an idea on compression ratio using quality 5, complexity 1 on our 11025hz mono speech data we compressed all the data into only 4,401,832bytes (4.2mb). As you can see this is a massive saving on the original data! The quality is still very good. During development we produced a tool that we sent to testers to help them evaluate the various quality settings so we could find a good trade off on size/quality.

Wavpack

Briefly I want to mention WAVPACK this is a compression format we used for a really great but unfortunately unreleased project a few years ago.

This again was a DS title but this was featuring instrument samples and needed to be really high quality in terms of the quality achieved. WavPack produced some really impressive results for us and again worked very quick even on the DS CPU for both compression and decompression.

Vorbis

A final brief mention for another quality compression format from xiph.org this time Vorbis which is a very decent competitor for mp3 and doesn’t come with any associated licensing overhead or concerns. Again we use this for decompressing on the fly on PC and newer console hardware plus have used on iPhone on Aurifi


Thanks

Hopefully some of you are looking at reducing your games footprint and these libraries may be of use, even in terms of install size you could use the above methods to compress the files and then decompress on first load.

Things we’ve been enjoying this week

  • http://cbloomrants.blogspot.com/
    • Charles Bloom is posting at the moment about data compression challenges, it’s an area that fascinates me but I’ve never had chance to do much work in other than tinkering on the periphery. If you’ve not read his blog before it’s well worth looking into including some great posts on threading last year and many more before that – I’ve no idea how he gets so much time to write some really quality posts.
Be Sociable, Share!

Tags: , ,

1 Response

  1. […] I hopefully covered these three pretty well at a high level in last weeks blog post. […]


Leave a Reply