Adaptive Transform Acoustic Coding

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

Adaptive Transform Acoustic Coding or ATRAC is an encoding algorithm (codec) lossy audio system developed by Sony, based on psychoacoustic principles, which offers different compression rates, depending on the sound quality. It is currently used to store audio signal information on MiniDisc and other proprietary Sony audio players.

Evolution

The first version of this codec, ATRAC1 or simply ATRAC, was developed in 1992 and was able to compress audio information to approximately one-fifth the data rate of a CD. Likewise, this technology has not stopped evolving and its most recent versions ATRAC3 and ATRAC3plus, which appeared in 1999 and in 2002, offer higher compressions, even encoding audio at 5 and 10% of the data rate of a CD., respectively.

The fact of reducing the size of the audio data, while maintaining a sufficiently good quality, has allowed more songs to be recorded in the different storage and playback media, changing the way music is listened to.

Versions
	PCM	ATRAC1	ATRAC3	ATRAC3plus
Bit rate	1411 kbit/s	292 kbit/s	132 kbit/s	64 kbit/s
Data size (4 minutes)	42,33 MB	8,77 MB	3,97 MB	1.94 MB
No. of songs recorded on CD-R (700 MB)	16	79	176	360

Technology

ATRAC uses perceptual coding. Perceptual coding is based on psychoacoustics and takes advantage of the imperfections of the human ear to avoid digitizing sounds that a person will hardly perceive.

Like many other codecs, eg MP3 or ePAC, only sounds above the masking threshold (audible sounds) are encoded. Most people are unable to distinguish sounds that are close to the dominant tonal frequencies. Masked and hardly audible sounds are discarded in the encoding, allowing fewer bits to be used and thus performing compression. However, some audiophiles might notice the loss of these mostly masked sounds.

ATRAC1

ATRAC1 (often known simply as ATRAC) was the format used by Sony in its SDDS system in the 1990s. This multi-channel system uses 8 channels encoded using this encoding, obtaining an overall bitrate (of all channels) of 1168 kbit/s. It could be said that it is one of the main competitors of Dolby Digital (AC-3) and other systems like DTS.

Block diagram

Temporal-spectral analysis

ATRAC1 divides the signal into three parts or bands:

Inferior to 5.5 kHz
Between 5.5 and 11 kHz
More than 11 kHz

Each of these three parts is derived from Quadrature Mirror Filters (QMF). The input signal is divided into high and low frequencies by the first of these filters and then another QMF divides the low frequencies to obtain the three aforementioned bands. Using QMFs ensures that possible time-domain aliasing caused by banding is canceled during rebuild. Each of these bands is analyzed and filtered independently, using the modified direct cosine transform (MDCT). The MDCT allows an overlap of the windows in time domain of up to 50%, allowing to improve the resolution in frequency, maintaining the critical sampling, in such a way that aliasing does not occur. Once this is done, the signal is analyzed according to the principles of psychoacoustics. This analysis indicates which parts of the signal are critical and must be encoded with high precision in order not to lose relevant signal information, and which can tolerate noise being quantized without degrading the signal or the perceived sound quality of noise (less precision).). Based on this information, the available quantization bits are weightedly assigned to frequency and time units or parts according to their importance. The spectral coefficients obtained in each unit are then quantized according to the assigned bits. Subsequently, in the decoder, the quantized spectrum is reconstructed following the bit allocation and then synthesized into an audio signal.

Apart from this process, which is usually carried out by the vast majority of audio codecs, ATRAC uses psychoacoustics and concepts such as the masking threshold not only in the bit allocation algorithm, but also applies it in the separation of the parts temporal-spectral into which it divides the signal. Thus, the input signal is analyzed in non-uniform spectral divisions that emphasize the most frequency sensitive regions of the human ear which, according to experimental tests and isophonic curves, is around 4 kHz. In addition, ATRAC uses a block in its coding that allows adapting the length of the windows it uses, depending on the input signal. In this way, it is ensured that bits are not wasted in encoding transitional passages or silences.

This adaptive-length block chooses the length of the windows according to the characteristics of the signal. There are two modes:

Short mode (short mode): Use 1.45 ms windows in the high frequency band and 2.9 ms in the others.
Long mode (long mode): Use windows of 11.6 ms. This mode is normally used to provide good spectral resolutions.

However, using adaptive length windows does not make the system immune to what is called "pre-echo" and which happens to be one of the main problems in most lossy audio codecs. The pre-echo is the result of abrupt audio transitions, such as the noise of a glass breaking, this creates a noise that extends through all the samples in the window. However, if the window length is small enough, temporal masking can obscure the noise added before and after the transition. The effects of echo are more of a concern before the transition as the effects of temporal masking affect the areas after the tone much more than the areas before it.

To prevent pre-echo, ATRAC switches to short window mode when it detects an attack signal (abrupt transition). Thus, there is only a small segment of noise before the attack and the rest is hidden thanks to temporary post-masking. However, if the window change were not carried out due to an adaptive block error and continued in long mode, the subsequent temporary masking (post-masking) would not have the desired effects because the noise would remain masked for a period of time. very small time frame and well below the length of the window.

Spectral quantization

ATRAC stores all the information necessary to reconstruct the audio signal.

Spectral values are quantized using two parameters: word length and scale factor. The scale factor defines the full scale range of the quantization. Whereas, the length of the word (wordlength) defines the precision within the scale. Each unit has the same word length and scaling factor, reflecting a psychoacoustic similarity to the binned frequencies. The scaling factor is chosen from a fixed list of possibilities and reflects the magnitude of the spectral coefficients of each unit. The word length is determined by the bit allocation algorithm. For each sound frame (corresponding to 512 samples of the input signal), the following information is stored:

Length of the MDCT window (short or long).
Word length of each unit.
Scale factor of each unit.
Quantified spectral coefficients.

To guarantee the correct reconstruction of the signal, the most relevant information is stored redundantly. As information about the amount of redundant data is also saved.

Bit Assignment

The bit allocation algorithm divides the available bits among the different units. Units with a high number of bits will have less quantization noise. However, those with few or no bits will have a lot of error.

It is important to mention that this algorithm has to ensure that the critical or relevant units have enough bits and, in turn, that the noise in the not so relevant units is not perceptually significant. It should be noted that ATRAC uses entropy coding, giving fewer bits to the most redundant values, thus considerably reducing the size of the audio file.

As we can see in the block diagram of this codec, the decoder is completely independent of the bit allocation algorithm, which allows the system to evolve without having to change the playback devices.

ATRAC3

This new version follows the same operating principles as the previous one, introducing new improvements. Thus, it doubles the compression capacity of ATRAC with almost no loss in the resulting sound quality (encoding audio at 10% of the CD bit rate).

Divides the signal into four parts:

Inferior to 2,75625 kHz
From 2,75625 to 5,5125 kHz
From 5,5125 to 11,025 kHz
Higher to 11,025 kHz inside audiofrequency

In turn, it classifies the sound more efficiently than the previous version, allowing it to differentiate pure tones, such as violins and sounds with a high sound pressure level, from the rest of the signal. This fact is similar to the one previously mentioned regarding critical and non-critical units.

In addition, it uses an allocation algorithm that allows the number of bits used in the encoding to be greatly reduced, thus reducing the size of the audio file (entropy encoding). However, it must be borne in mind that the relevant or more intense information must be encoded with more bits, so that perceptible noise is not quantified.

ATRAC3 has two modes:

LP2: This mode uses a 132 kbit/s bit rate, obtaining a quality similar to the MP3 coded at the same rate.
LP4: This mode reduces the bit rate to half of LP2 (66 kbit/s), by using principles similar to codification joint stereo, or a low-pass filter around 13.5 kHz.

Both techniques allow minimizing the effect of the pre-echo with better results than in the previous version.

The low pass filter eliminates abrupt transitions above a certain frequency. Likewise, the fact of using encodings similar to joint stereo allows the right and left channels to be added above a certain frequency, keeping the difference between both channels and, during decoding, reconstructing the high-frequency information of the right and left channels, recombining the common information of each channel.

In this way, quantization noise and its propagation throughout an entire window are avoided.

ATRAC3plus

Improvements that it presents:

Divide the audio signal in 16 bands, obtaining better resolution and more precision.
It presents more window length options within the adaptive block, allowing to use longer windows, up to 4096 samples.
It presents new bit mapping algorithms, improving one of the deficits that presented the old versions for certain musical or voice passages, where the final quality varied depending on the type of audio signal.

Bit Assignment

Introduces two rules appropriate for certain types of audio signals:

Rule A:

00 → 0 | 01 → 10 | 10 → 110 | 11 → 111 |

Rule B:

00 → 0 | 01 → 110 | 10 → 111 | 11 → 10 |

It is observed that rule A is appropriate for signals that take advantage of the entire dynamic range, and that have a high probability of occurrence of values with a high level of high sound pressure.

By contrast, rule B is appropriate for audio signals with relatively few values, or samples with a high sound pressure level.

With the use of all these aspects, it is possible to reduce the bit rate considerably, achieving compression factors of approximately 5%, compared to the data rate of a CD (linear PCM).

Applications

Hi-MD breeders ("Hi-LP" and "Hi-SP")
Memories Flash storage and reproduction
Console PSP

In the beginning, due to the high compression rate of the ATRAC, it was not used for professional level audio operation. ATRAC encoders have improved considerably over time since the first generation, and ATRAC versions are now available that produce audio signals that sound identical to the original source.

This encoding is practically only used in Sony's portable audio devices: CD players (which include software to burn CDs with this format) and Minidiscs.

Equivalences

According to tests carried out by Intertek Testing Services (United Kingdom) and TESTFactory (Germany), commissioned by Sony, it is possible to establish approximate quality equivalences between the different evolutions of the ATRAC format:

ATRAC3plus 256 kbit/s is equivalent to CD quality. Similar specification was not achieved with the previous encoders.
ATRAC3plus 64 kbit/s is equivalent to ATRAC3 132 kbit/s, and ATRAC 292 kbit/s (MDD quality, original Minidisc). It is also similar to an MP3 128 kbit/s file.
ATRAC3plus 48 kbit/s is equivalent to ATRAC3 66 kbit/s, and sound quality similar to FM stations.

Más resultados...