With respect to voice over IP, a codec is an algorithm used to
encode and decode the voice conversation. Since voice and sound as we hear it
is analogue, it needs to be converted (or encoded) to a digital format suitable
for transmission over the Internet. Once at the other end, it needs to be
decoded again so the other person can hear what you are saying. There are a
variety of different ways this encoding and decoding can be done - many of
which utilise compression in order to reduce the required bandwidth of the
conversation. A key thing to remember with VoIP,
is that encoding, particularly when heavy compression is used, takes time,
which adds a delay to the conversation. Thus, the holy grail
is a codec which not only maintains good quality with compression, but is able
to do the encoding and decoding in a minimal amount of time.
These pages attempt to demistify codecs and give a brief
overview of the different codecs and when they are used. It is important to
keep in mind that different
VoIP clients support different codecs, and each VoIP provider will only
support a subset of the codecs too. Generally, when a VoIP call is established,
you will need to use a codec that both parties and the provider support. No
need to worry though, this sort of negotiation is handled automatically, but
knowing the details will enable you to force or encourage certain codecs to be
used. Understanding codecs will also help you understand why some VoIP clients
sound better than others, and why voice quality with some providers, or through
certain ISPs, are better than others.
If you would like to read up more about codecs with respect to
VoIP, the following links may be of interest:
Codecs page the VoIP Wiki
Find out which
hardware supports which codecs.
The following table lists the various codecs used in voice over
IP, and in particular SIP.
Many codecs come in a few varieties,
and we have attempted to list all such version of each codec. If you would like
to voice your opinion about a particular codec, or discuss the merits of one over
another, feel free to do so in our voice over IP forums.
|
Codec |
License |
Comments |
Pros |
Cons |
||||
|
unknown |
unknown |
unknown |
|
|
Not a very common codec. |
|
|
|
|
8 |
64 |
87.2 |
20 |
Open Source |
G.711u/a often refered to as u-law/a-law: where a-law is the
European version and u-law the US/Japanese version |
Designed to deliver
precise transmission of speech Very low processing
overheads |
Including
overheads, uses >64kbps, thus at least 128kbps bandwidth in each direction
is required |
|
|
16 |
48 |
unknown |
|
Open Source |
An ITU standard codec. |
|
|
|
|
16 |
56 |
unknown |
30 |
|||||
|
16 |
64 |
unknown |
|
|||||
|
8 |
5.3 |
20.8 |
30 |
Proprietry |
Often used by dialup VoIP users for optimal quality. |
Very high
compression whilst maintaining high quality audio. |
Requires a lot of
processor power. |
|
|
8 |
6.3 |
21.9 |
30 |
|||||
|
8 |
16 |
unknown |
|
Open Source |
An improved version of G.721 and G.723 (totally different from
G.723.1) |
|
CPU overhead is
relatively low for level of compression obtained. |
|
|
8 |
24 |
47.2 |
20 |
|||||
|
8 |
32 |
55.2 |
20 |
|||||
|
8 |
40 |
unknown |
|
|||||
|
unknown |
16 |
31.5 |
|
Open Source |
An ITU standard codec. |
|
|
|
|
8 |
8 |
31.2 |
20 |
Patented |
An ITU standard codec. |
Excellent bandwidth
utilisation for toll quality speech Performs well under
random bit errors |
License
required for use |
|
|
8 |
13 |
unknown |
|
Proprietry |
Same encoding as used in GSM mobile phones (though improved version are often used nowadays). |
Relatively high
compression ratio. Royalty free means it
is available in many hardware and software platforms. |
|
|
|
unknown |
13.33 |
unknown |
30 |
Free to use |
|
High robustness to
packet loss |
|
|
|
unknown |
15 |
unknown |
20 |
|||||
|
Siren |
unknown |
unknown |
unknown |
|
|
Not much known about this codec, and does not appear to be
commonly supported. |
|
|
|
8 |
unknown |
unknown |
|
Open Source |
|
Uses variable bit rate
to minimise bandwidth usage |
|
|
|
16 |
unknown |
unknown |
|
|||||
|
32 |
unknown |
unknown |
|
The information provided here is for information purposes only, if
you find errors or ommissions, please report them in the relevant discussion forum.
Bandwidth values
represent the amount of data in the payload of the IP packets.
Bandwidth values indicate
the bandwidth in each direction - not the sum of upstream and downstream
bandwidths.
Bandwidth values
assume continuous transmission of voice in both direction
with no silence suppression.
The 'nominal bandwidth' column indicates the
typical Ethernet bandwidth one can expect the codec to use.
The sampling rate is the rate at which the analogue audio signal
is sampled. Nyquist's Theorem states that in order to record a certain frequency,
sampling must occur at at least twice that frequency. Thus, the higher the
sampling rate, the greater the frequency range in the
encoded audio stream. The human ear is capable of hearing from about 20Hz to
about 20,000Hz. Typically, speech is around 100-4,000Hz. Thus, a sampling rate
of at least 8kHz is required to accurately encode the
human voice. Greater sampling rates will capture higher frequencies (this is
useful, for example, if you are playing music down the phone), but will also
increase bandwidth as there are more samples to encode and transmit.
The size of the payload of each encoded voice packet influences
two things: lag and bandwidth. Every encoded packet that is sent incurs fixed
bandwidth overheads (due to IP and other headers added to the data in the
network). Thus, larger payloads incur a proportionately smaller overhead, thus
reducing the nominal bandwidth utilisation. However, by using larger payloads,
more audio (ie., a longer period of time) is required to construct a single
packet, which in turn increases the amount of time it takes for even the
beginning of the packet to reach the other end and be decoded, thus increasing
the lag in the conversation. This is a typical trade-off in VoIP. Most codecs
use payload sizes of 10-40ms.