Posted by ganguly100
I have been wondering this for a while and dont know what they are.
What is half rate and enhanced full rate?
and how does it work?
Posted by MikLSP
In relation to what?
Posted by HyperiaBlue
Mik, its ok i understand what he is referring to:
The transmission of speech from one point to another over GSM mobile phone network is something that most of us take for granted. The complexity is usually perceived to be associated with the network infrastructure and management required in order to create the end-to-end connection, and not with the transmission of the payload itself. The real complexity, however, lies in the codec scheme used to encode voice traffic for transmission.
The GSM standard supports four different but similar compression technologies to analyse and compress speech. These include full-rate, enhanced full-rate (EFR), adaptive multi-rate (AMR), and half-rate. Despite all being lossy (i.e. some data is lost during the compression), these codecs have been optimized to accurately regenerate speech at the output of a wireless link.
In order to provide toll-quality voice over a GSM network, designers must understand how and when to implement these codecs. To help out, this article provides a look inside how each of these codecs works. We'll also examine how the codecs need to evolve in order to meet the demands of 2.5 and 3G wireless networks.
Speech Transmission Overview
When you speak into the microphone on a GSM phone, the speech is converted to a digital signal with a resolution of 13 bits, sampled at a rate of 8 kHz—this 104,000 b/s forms the input signal to all the GSM speech codecs. The codec analyses the voice, and builds up a bit-stream composed of a number of parameters that describe aspects of the voice. The output rate of the codec is dependent on its type (see Table 1), with a range of between 4.75 kbit/s and 13 kbit/s.
Table 1: Different Coding Rates
After coding, the bits are re-arranged, convoluted, interleaved, and built into bursts for transmission over the air interface. Under extreme error conditions a frame erasure occurs and the data is lost, otherwise the original data is re-assembled, potentially with some errors to the less significant bits. The bits are arranged back into their parametric representation, and fed into the decoder, which uses the data to synthesise the original speech information.
The Full-Rate Codec
The full-rate codec is a regular pulse excitation, long-term prediction (RPE-LTP) linear predictive coder that operates on a 20-ms frame composed of one hundred sixty 13-bit samples.
The vocoder model consists of a tone generator (which models the vocal chords), and a filter that modifies the tone (which models the mouth and nasal cavity shape) [Figure 1]. The short-term analysis and filtering determines the filter coefficients and an error measurement, the long-term analysis quantifies the harmonics of the speech.
Figure 1: Diagram of a full-rate vocoder model.
As the mathematical model for speech generation in a full-rate codec shows a gradual decay in power for an increase in frequency, the samples are fed through a pre-emphasis filter that enhances the higher frequencies, resulting in better transmission efficiency. An equivalent de-emphasis filter at the remote end restores the sound.
The short-term analysis (linear prediction) performs autocorrelation and Schur recursion on the input signal to determine the filter ("reflection") coefficients. The reflection coefficients, which are transmitted over the air as eight parameters totalling 36 bits of information, are converted into log area ratios (LARs) as they offer more favourable companding characteristics. The reflection coefficients are then used to apply short term filtering to the input signal, resulting in 160 samples of residual signal.
The residual signal from the short-term filtering is segmented into four sub-frames of 40 samples each. The long-term prediction (LTP) filter models the fine harmonics of the speech using a combination of current and previous sub-frames. The gain and lag (delay) parameters for the LTP filter are determined by cross-correlating the current sub-frame with previous residual sub-frames.
The peak of the cross-correlation determines the signal lag, and the gain is calculated by normalising the cross-correlation coefficients. The parameters are applied to the long-term filter, and a prediction of the current short-term residual is made. The error between the estimate and the real short-term residual signal—the long-term residual signal—is applied to the RPE analysis, which performs the data compression.
The Regular Pulse Excitation (RPE) stage involves reducing the 40 long-term residual samples down to four sets of 13-bit sub-sequences through a combination of interleaving and sub-sampling. The optimum sub-sequence is determined as having the least error, and is coded using APCM (adaptive PCM) into 45 bits.
The resulting signal is fed back through an RPE decoder and mixed with the short-term residual estimate in order to source the long-term analysis filter for the next frame, thereby completing the feedback loop (Table 2).
Table 2 - Output Parameters from the Full Rate Codec
The Enhanced Full-Rate Codec
As processing power improved and power consumption decreased in digital signal processors (DSPs), more complex codecs could be used to give a better quality of speech. The EFR codec is capable of conveying more subtle detail in the speech, even though the output bit rate is lower than full rate.
The EFR codec is an algebraic code excitation linear prediction (ACELP) codec, which uses a set of similar principles to the RPE-LTP codec, but also has some significant differences. The EFR codec uses a 10th-order linear-predictive (short-term) filter and a long-term filter implemented using a combination of adaptive and fixed codebooks (sets of excitation vectors).
Figure 2: Diagram of the EFM vocoder model
The pre-processing stage for EFR consists of an 80 Hz high-pass filter, and some downscaling to reduce implementation complexity. Short-term analysis, on the other hand, occurs twice per frame and consists of autocorrelation with two different asymmetric windows of 30mS in length concentrated around different sub-frames. The results are converted to short-term filter coefficients, then to line spectral pairs (for better transmission efficiency) and quantized to 38 bits.
In the EFR codec, the adaptive codebook contains excitation vectors that model the long-term speech structure. Open-loop pitch analysis is performed on half a frame, and this gives two estimates of the pitch lag (delay) for each frame.
The open-loop result is used to seed a closed-loop search for speed and reduced computation requirements. The pitch lag is applied to a synthesiser, and the results compared against the non-synthesised input (analysis-by-synthesis), and the minimum perceptually weighted error is found. The results are coded into 34 bits.
The residual signal remaining after quantization of the adaptive codebook search is modelled by the algebraic (fixed) codebook, again using an analysis-by-synthesis approach. The resulting lag is coded as 35 bits per sub-frame, and the gain as 5 bits per sub-frame.
The final stage for the encoder is to update the appropriate memory ready for the next frame.
Going Adaptive
The principle of the AMR codec is to use very similar computations for a set of codecs, to create outputs of different rates. In GSM, the quality of the received air-interface signal is monitored and the coding rate of speech can be modified. In this way, more protection is applied to poorer signal areas by reducing the coding rate and increasing the redundancy, and in areas of good signal quality, the quality of the speech is improved.
In terms of implementation, an ACELP coder is used. In fact, the 12.2 kbit/s AMR codec is computationally the same as the EFR codec. For rates lower than 12.2 kbit/s, the short-term analysis is performed only once per frame. For 5.15 kbit/s and lower, the open-loop pitch lag is estimated only once per frame. The result is that at lower output bit rates, there are a smaller number of parameters to transmit, and fewer bits are used to represent them.
The Half-Rate Codec
The air transmission specification for GSM allows the splitting of a voice channel into two sub-channels that can maintain separate calls. A voice coder that uses half the channel capacity would allow the network operators to double the capacity on a cell for very little investment.
The half-rate codec is a vector sum excitation linear prediction (VSELP) codec that operates on an analysis-by-synthesis approach similar to the EFR and AMR codecs. The resulting output is 5.7 kb/s, which includes 100 b/s of mode indicator bits specifying whether the frames are thought to contain voice or no voice. The mode indicator allows the codec to operated slightly differently to obtain the best quality.
Half-rate speech coding was first introduced in the mid 1990's, but the public perception of speech quality was so poor that it is not generally used today. However, due to the variable bit-rate output, AMR lends itself nicely to transmission over a half-rate channel. By limiting the output to the lowest 6 coding rates (4.75 -- 7.95kbps), the user can still experience the quality benefits of adaptive speech coding, and the network operator benefits from increased capacity. It is thought that with the introduction of AMR, use of the half-rate air-channel will start to become much more widespread.
Computational Complexity
Table 3 shows the time taken to encode and decode a random stream of speech-like data, and the speed of the operations relative to the GSM full-rate codec.
Table 3: General Encoding and Decoding Complexity
The full-rate encoder operates on a non-iterative analysis and filtering, which results in fast encoding and decoding. By comparison, the analysis-by-synthesis approach employed in the CELP codecs involves repetitive computation of synthesised speech parameters. The computational complexity of the EFR/AMR/half-rate codecs is therefore far greater than the full-rate codec, and is reflected in the time taken to compress and decompress a frame.
The output of the speech codecs is grouped into parameters (e.g. LARs) as they are generated (Figure 3). For transmission over the air interface, the bits are rearranged so the more important bits are grouped together. Extra protection can then be applied to the most significant bits of the parameters that will have biggest effect on the speech quality if they are erroneous
Figure 3: Diagram of vocoder parameter groupings.
The process of building the air transmission bursts involves adding redundancy to the data by convolution. During this process, the most important bits (Class 1a) are protected most while the least important bits (Class 2) have no protection applied.
This frame building process ensures that many errors occurring on the air interface will be either correctable (using the redundancy), or will have only a small impact on the speech quality.
Future Outlook
The current focus for speech codecs is to produce a result that has a perceptually high quality at very low data rated by attempting to mathematically simulate the mechanics of human voice generation. With the introduction of 2.5G and 3G systems, it is likely that two different applications of speech coding will be developed.
The first will be comparatively low bandwidth speech coding, most likely based on the current generation of CELP codecs. Wideband AMR codecs have already been standardised for use with 2G and 2.5G technologies and these will utilise the capacity gains from EDGE deployment.
The second will make more use of the wide bandwidth employing a range of different techniques which will probably be based on current psychoacoustic coding, a technique which is in widespread use today for MP3 audio compression.
There is no doubt that speech quality over mobile networks will improve, but it may be some time before wideband codecs are standardised and integrated with fixed wire-line networks, leading to potentially CD-quality speech communications worldwide.
Ok this was detailed enough...i could have explained it in my own words but that takes too much time!
[PS: This came from a text book of mine].
Posted by dave_uk
I believe it has something to do with the codec used for voice traffic over GSM.
EFR (Enhanced Full Rate) is encoded at a higher rate and therefore provides better call quality but more battery drain.
HR (Half Rate) is a lower rate, and therefore enhances battery life.
DOn't know how relevant it is anymore - used to hear it mentioned in conjunction with Nokias for some reason.
BTW, if this is total bull, somebody please correct me.
Dave
edit: Ok, having seen HyperiaBlue's post below - the above would now seem to be surplus to requirements!!
_________________
This message was posted in an envelope
[ This Message was edited by: dave_uk on 2005-03-15 17:22 ]
Posted by sapibobo
How can i tell whether the current method is using EFR or HR? Heck i dont even know whether the current band at the moment is 900, 1800 or 1900. I only have those bars and a triangle.
Posted by ganguly100
ok, thanks for the info.
so, how do we know what our phone is set on and how tdo we change it?
Posted by HyperiaBlue
Your phone is not set to any particular speech rate, the system is adaptive.
This setting is determined by the network and not by the user!
In other words, it changes between all three of them, should your mobike support all 3 modes.
You cannot force the handset to work in only one of those modes either (as far as i am aware), even if you could...you may end up dropping 80% of your calls if it is unable to connect @ the forced speech rate.
Posted by ganguly100
ok.
A quick search on the net shows this:
Nokia Code function
Press *3370# This Nokia code activates Enhanced Full Rate Codec (EFR) - Your Nokia cell phone uses the best sound quality but
talk time is reduced my approx. 5%
#3370# Deactivate Enhanced Full Rate Codec (EFR)
*4720# Activate Half Rate Codec - Your phone uses a lower quality sound but you should gain approx 30% more Talk Time
#4720# With this Nokia code you can deactivate the Half Rate Codec
Do these really work?
Posted by HyperiaBlue
I am sorry but i am unable to answer your question there, since i have not used a Nokia phone...ever!
However, whoever said that...needs to prove to me that all those "resource saving" gains if you de-activate these features...
IMHO: don't mess with it ok?
Posted by aboveconnection
wowweeee
havent heard about half rate for YEARS!
not known if its been posted above,
but you also save battery life
Posted by ganguly100
| Quote: |
|
Personally I wouldnt touch or mess with it because it may mess up the phone at the expensive of just saving a bit of battery life. Its not worth it I think
Posted by clank
| Quote: | |||
| |||
A year back when i used to have a nokia i came across these codes and for the same reason as yours i never messed with it. Activated and deactivated it quickly once my curiosity was satisfied...
Posted by stuck_in_a_rutt
i've used all the different rates on my old nokia and the sound quality goes up n all calls work...
Posted by dj_wolfinstein
so how does one activate the EFR/HR codecs on SE phones, Nokia did have an option but how does SE Phones choose this...
regards
Posted by whizkidd
I tried all these codes successfully on a Nokia 2100......And no, it didn't explode!! *wierd* :-D
Posted by HyperiaBlue
LOL nobody said your phone would explode...why would you want to manage how calls are encoded when u can let the network do it for you, since they know thier network best?