Introduction to VoIP Codecs

With respect to voice over IP, a codec is an algorithm used to encode and decode the voice conversation. Since voice and sound as we hear it is analogue, it needs to be converted (or encoded) to a digital format suitable for transmission over the Internet. Once at the other end, it needs to be decoded again so the other person can hear what you are saying. There are a variety of different ways this encoding and decoding can be done - many of which utilise compression in order to reduce the required bandwidth of the conversation. A key thing to remember with VoIP, is that encoding, particularly when heavy compression is used, takes time, which adds a delay to the conversation. Thus, the holy grail is a codec which not only maintains good quality with compression, but is able to do the encoding and decoding in a minimal amount of time.

These pages attempt to demistify codecs and give a brief overview of the different codecs and when they are used. It is important to keep in mind that different VoIP clients support different codecs, and each VoIP provider will only support a subset of the codecs too. Generally, when a VoIP call is established, you will need to use a codec that both parties and the provider support. No need to worry though, this sort of negotiation is handled automatically, but knowing the details will enable you to force or encourage certain codecs to be used. Understanding codecs will also help you understand why some VoIP clients sound better than others, and why voice quality with some providers, or through certain ISPs, are better than others.

If you would like to read up more about codecs with respect to VoIP, the following links may be of interest:

Codecs page the VoIP Wiki

Find out which hardware supports which codecs.

Codec Comparison

The following table lists the various codecs used in voice over IP, and in particular SIP. Many codecs come in a few varieties, and we have attempted to list all such version of each codec. If you would like to voice your opinion about a particular codec, or discuss the merits of one over another, feel free to do so in our voice over IP forums.

Codec	Sampling Rate (kHz)	Bandwidth (kbps)	Nominal Bandwidth (kbps)	Payload Size (ms)	License	Comments	Pros	Cons
DVI4	unknown	unknown	unknown			Not a very common codec.
G.711	8	64	87.2	20	Open Source	G.711u/a often refered to as u-law/a-law: where a-law is the European version and u-law the US/Japanese version	Designed to deliver precise transmission of speech Very low processing overheads	Including overheads, uses >64kbps, thus at least 128kbps bandwidth in each direction is required
G.722	16	48	unknown		Open Source	An ITU standard codec.
	16	56	unknown	30
	16	64	unknown
G.723.1	8	5.3	20.8	30	Proprietry	Often used by dialup VoIP users for optimal quality.	Very high compression whilst maintaining high quality audio.	Requires a lot of processor power.
G.723.1	8	6.3	21.9	30	Proprietry	Often used by dialup VoIP users for optimal quality.		Requires a lot of processor power.
G.726	8	16	unknown		Open Source	An improved version of G.721 and G.723 (totally different from G.723.1)		CPU overhead is relatively low for level of compression obtained.
	8	24	47.2	20
	8	32	55.2	20
	8	40	unknown
G.728	unknown	16	31.5		Open Source	An ITU standard codec.
G.729	8	8	31.2	20	Patented	An ITU standard codec.	Excellent bandwidth utilisation for toll quality speech Performs well under random bit errors	License required for use
GSM	8	13	unknown		Proprietry	Same encoding as used in GSM mobile phones (though improved version are often used nowadays).	Relatively high compression ratio. Royalty free means it is available in many hardware and software platforms.
iLBC	unknown	13.33	unknown	30	Free to use		High robustness to packet loss
iLBC	unknown	15	unknown	20	Free to use		High robustness to packet loss
Siren	unknown	unknown	unknown			Not much known about this codec, and does not appear to be commonly supported.
Speex	8	unknown	unknown		Open Source		Uses variable bit rate to minimise bandwidth usage
	16	unknown	unknown
	32	unknown	unknown

Notes

The information provided here is for information purposes only, if you find errors or ommissions, please report them in the relevant discussion forum.

Bandwidth

Bandwidth values represent the amount of data in the payload of the IP packets.

Bandwidth values indicate the bandwidth in each direction - not the sum of upstream and downstream bandwidths.

Bandwidth values assume continuous transmission of voice in both direction with no silence suppression.

The 'nominal bandwidth' column indicates the typical Ethernet bandwidth one can expect the codec to use.

Sampling Rate

The sampling rate is the rate at which the analogue audio signal is sampled. Nyquist's Theorem states that in order to record a certain frequency, sampling must occur at at least twice that frequency. Thus, the higher the sampling rate, the greater the frequency range in the encoded audio stream. The human ear is capable of hearing from about 20Hz to about 20,000Hz. Typically, speech is around 100-4,000Hz. Thus, a sampling rate of at least 8kHz is required to accurately encode the human voice. Greater sampling rates will capture higher frequencies (this is useful, for example, if you are playing music down the phone), but will also increase bandwidth as there are more samples to encode and transmit.

Payload Size

The size of the payload of each encoded voice packet influences two things: lag and bandwidth. Every encoded packet that is sent incurs fixed bandwidth overheads (due to IP and other headers added to the data in the network). Thus, larger payloads incur a proportionately smaller overhead, thus reducing the nominal bandwidth utilisation. However, by using larger payloads, more audio (ie., a longer period of time) is required to construct a single packet, which in turn increases the amount of time it takes for even the beginning of the packet to reach the other end and be decoded, thus increasing the lag in the conversation. This is a typical trade-off in VoIP. Most codecs use payload sizes of 10-40ms.