The Three C’s of VoIP: CODECS, Codes, and Convicts!

Anyone who has ever been engaged most certainly can list the four “Cs” of diamonds: Color, Clarity, Cut and Carat weight. But did you know that there are also three “Cs” of Voice over IP?

The shift from “traditional” telephony to packet telephony brought with it a new way of transmitting voice. In the old model, voice was coded into eight, four, three or two bit samples, with decreasing fidelity, at a rate of 8,000 samples per second. Those samples were then placed into a fixed circuit and transmitted at a fixed rate. This was just fine when there was no sharing within the circuits. With the shared connections of IP networks, however, there are advantages to not sending everything; and when something is sent, there is additional benefit to sending what is sent more efficiently. “Non-sending of silence” is accomplished with Voice Activity Detection (VAD), which is often known as Silence Suppression, and can free up as much as 75% of the shared bandwidth for use by other shared network traffic. The voice samples that are sent can be sent more efficiently too and that is where the three C’s of VoIP come into play.

The first C of VoIP is CODECS, which means that the implementing organization can choose to use either traditional Pulse Code Modulation (PCM) coder/decoders, just like in the old networks, can choose new wideband CODECS, or can choose from a long list of CODECS in the family known as Linear Predictive Coders (LPCs). LPCs have the benefit of producing a reasonably high quality of voice under the varying conditions encountered in a shared IP network and they do this using far less bandwidth than the traditional PCM CODECS. VAD/Silence Suppression can also be applied to voice packets, as can header compression to reduce packet overhead, and, therefore, overall bandwidth requirements.

The second C of VoIP is Codes, which means that voice wavelets are matched to specific codes at the source which are transmitted to the destination where the wavelets are recreated and strung together to give the impression of continuous speechi. That brings us to the third C of VoIP. The third C of VoIP is for Convicts because convicts, or prisoners, are a convenient way of describing – and remembering – how members of the Linear Predictive Coder (LPC) family do their voice coding job.

Consider a new prisoner. After the admission process, they are placed in their cell. They are told during admission that due to the high security nature of the prison, the prisoners are only allowed to communicate for one hour each day, from 4:00 to 5:00 pm, just before dinner. On the new prisoner’s first day, when the clock struck 4:00 pm, a buzzer sounded, the prisoners immediately jumped to their feet, and they began yelling numbers. The result was such a cacophony of guffaws and belly-laughs as might be heard in the best comedy club. The new prisoner asked his cell-mate what the deal was and he was told that because the prisoners only had one hour, they decided to make the most of that hour so they numbered the jokes. “Give me a good one!” “49 will have ‘em rolling in the aisles”, was the reply. The new prisoner proudly stepped up to the bars, cupped his hands around his mouth and said confidently, “Forteeee-Nine!” The result was immediate silence on the cell block. “You bum!” the new prisoner said, “you gave me a bad joke!” “Hey, not my fault!” replied the old con, “It’s all in the delivery!”

Well, OK, maybe not the world’s greatest joke but it does illustrate the point. The prisoner had limited resources – time and bandwidth, if you think about it. Instead of transmitting the entire lengthy joke, they transmit a short code, in this case the joke’s number. Instead of transmitting a complete wavelet, hundreds of bits representing the height of the wavelet or coefficients of a formula that could be used at the destination to recreate the wavelet, a single code, usually less than 256 bits, is transmitted. Does this work? You be the judge, but in most cases, especially when Packet Loss Recovery (PLR) or similar quality-enhancing schemes are employed, the results can be very good and can rival traditional voice quality while consuming less bandwidth and fewer other resources. So now you know the three “Cs” of VoIP.

PS: If you think about it, we really never have heard a human voice at a distance farther than a yell could carry. In original analog voice systems, the analog acoustic wave of the human speaker would be converted into an analogous electrical wave and that electrical wave – not the human’s voice wave – would be transmitted over the distance and cause the creation of a new acoustic wave very close to the original. PCM causes codes representing the heights of the input waves to be generated 8,000 times per second and transmitted in circuits, or, if used with VoIP, in packets. The subsequent wave that is generated at the destination – and played into the ear of the listener – is not the original wave either. So, new LPC-based systems are really no different in that regard, but the way that the translation is done is different.

Editor: James teaches our VoIP courses. See the VoIP (Jan. 28, 2008) and Advanced VoIP (Jan 30, 2008) seminars scheduled in Washington, DC.