Data compression

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

In computer science, data compression is the reduction of the volume of data that can be processed to represent certain information using a smaller amount of space. The act of data compression is called "compression", and conversely "decompression".

The space occupied by encoded information (data, digital signal, etc.) without compression is the product of the sampling frequency and the resolution. Therefore, the more bits used, the larger the file size will be. However, the resolution is imposed by the digital system with which it works and the number of bits cannot be altered at will; therefore, compression is used to transmit the same amount of information that would occupy a high resolution in a lower number of bits.

Compression is a particular case of encoding, whose main characteristic is that the resulting code is smaller than the original.

Data compression is fundamentally based on looking for repetitions in data series to later store only the data together with the number of times it is repeated. Thus, for example, if a sequence like "AAAAAA" appears in a file, occupying 6 bytes it could simply store "6A" which occupies only 2 bytes, in RLE algorithm.

In reality, the process is much more complex, since it is rarely possible to find such exact repeating patterns (except in some images). Compression algorithms are used:

On the one hand, some look for long series that then encode in shorter forms.
On the other hand, some algorithms, such as the Huffman algorithm, examine the most repeated characters and then encode the most repeated characters.
Others, like the LZW, build a dictionary with the found patterns, to which reference is made later.
The coding of pair bytes is another simple compression algorithm very easy to understand.

When talking about compression, two concepts must be kept in mind:

Redundance: Data that are repetitive or predictable.
Entropy: The new or essential information that is defined as the difference between the total amount of data from a message and its redundancy.

The information transmitted by the data can be of three types:

Redundante: repetitive or predictable information.
Irrelevant: information that we cannot appreciate and whose elimination therefore does not affect the content of the message. For example, if the frequencies that are able to capture the human ear are between 16/20 Hz and 16 000/20 000 Hz, those frequencies that were below or above these values would be irrelevant.
Basic: the relevant one. The one that is neither redundant nor irrelevant. The one that must be transmitted so that the signal can be rebuilt.

Taking these three types of information into account, three types of information compression are established:

No real losses: that is, transmitting all the entropy of the message (all basic and irrelevant information, but eliminating the redundant).
Subjectively without loss: that is, in addition to eliminating redundant information, the irrelevant is also eliminated.
Subjectively with losses: a certain amount of basic information is deleted, so the message will be rebuilt with perceptible but tolerable errors (e.g. videoconference).

Differences between lossy and lossy compression

The objective of compression is always to reduce the size of the information, trying that this size reduction does not affect the content. However, data reduction may or may not affect the quality of the information:

Compression without loss: data before and after compressing them are accurate in compression without loss. In the case of lossless compression a greater compression only involves more process time. The bit rate is always variable in the lossless compression. It is mainly used in text compression.
A loss compression algorithm can remove data to further reduce the size, thereby reducing the quality. In the loss compression, the bit rate can be constant or variable. Once the compression is made, the original signal cannot be obtained, although an approximation whose similarity with the original will depend on the type of compression. This type of compression is mainly given in images, videos and sounds. In addition to these functions, compression allows algorithms used to reduce code strings to discard redundant image information. One of the formats that allows to compensate for this loss is the JPG, which employs techniques that soften the edges and areas that have a similar color allowing the lack of information to be invisible to the naked eye. This method allows a high degree of compression with loss in the image that is often visible only by zoom.

Uses

Image

Main article: Image compression

Entropy coding originated in the 1940s with the introduction of Shannon-Fano coding, the basis of Huffman coding which was developed in 1950. Transform coding dates back to the late 1960s, with the introduction of the Fast Fourier transform (FFT) in 1968 and the Hadamard transform in 1969.

An important image compression technique is the discrete cosine transform (DCT), a technique developed in the early 1970s. DCT is the basis for JPEG, a lossy compression format that was introduced by the Joint Photographic Experts Group (JPEG) in 1992. JPEG greatly reduces the amount of data needed to represent an image at the cost of relatively little reduction in image quality, and has become the standard image file format. most widely used. Its highly efficient DCT-based compression algorithm was largely responsible for the widespread proliferation of digital images and digital photos.

Lempel-Ziv-Welch (LZW) is a lossless compression algorithm developed in 1984. It is used in the GIF format, introduced in 1987. DEFLATE, a lossless compression algorithm specified in 1996, is used in the Portable Network Graphics (PNG) format.

Wavelet compression, the use of wavelets in image compression, began after the development of DCT encoding. The JPEG 2000 standard was introduced in 2000. Unlike the DCT algorithm used by the original JPEG format, JPEG 2000 uses Discrete Wavelet Transform (DWT) algorithms instead. }} JPEG 2000 technology, which includes the Motion JPEG 2000 extension, was selected as the video coding standard for digital cinema in 2004.

Audio

Lossy audio compression

Comparison of audio spectragrams in an uncompressed format and in various formats with losses. Lost spectragrams show bandlimiting of higher frequencies, a common technique associated with loss audio compression.

Lossy audio compression is used in a wide range of applications. In addition to stand-alone file playback applications on MP3 players or computers, digitally compressed audio streams are used in most DVD-Video, digital television, Internet streaming media, satellite and cable radio., and increasingly in terrestrial radio emissions. Lossy compression typically achieves much higher compression than lossless compression by discarding less critical data based on psychoacoustic optimizations.

Psychoacoustics recognizes that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces redundancy by first identifying perceptually irrelevant sounds, that is, sounds that are very difficult to hear. Typical examples are high frequencies or sounds that occur at the same time as louder sounds. Those irrelevant sounds are encoded less precisely or not at all.

Due to the nature of lossy algorithms, audio quality suffers a loss of digital generation when decompressing and recompressing a file. This makes lossy compression unsuitable for storing intermediate results in professional audio engineering applications such as sound editing and multitrack recording. However, lossy formats such as MP3 are very popular among end users as the file size is reduced to 5-20% of the original size and one megabyte can store about a minute of music at adequate quality..

Encoding methods

To determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transformations such as the discrete cosine transform (MDCT) to convert the sampled waveforms into the time domain into a transformation domain, usually the frequency domain. Once transformed, the component frequencies can be prioritized based on their audibility. The audibility of the spectral components is assessed using the absolute threshold of hearing and the principles of simultaneous masking-the phenomenon in which one signal is masked by another signal separated by frequency-and, in some cases, temporal masking-in which one signal is masked by another signal separated by time. Contours of equal loudness can also be used to weight the perceptual importance of components. Models of the human ear-brain combination that incorporate these effects are often referred to as psychoacoustic models.

Other types of lossy compressors, such as linear predictive coding used with speech, are source-based coders. LPC uses a model of the human vocal tract to analyze speech sounds and infer the parameters used by the model to produce them moment by moment. These changing parameters are transmitted or stored and used to drive another model in the decoder that plays the sound.

Lossy formats are often used for streaming audio distribution or interactive communication (such as over mobile phone networks). In these applications, data must be decompressed as it flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications.

Latency is introduced by the methods used to encode and decode the data. Some codecs parse a longer segment, called a frame, of data to optimize efficiency, and then encode it in such a way that a larger segment of data is required at a time to decode. The inherent latency of the encoding algorithm can be critical; For example, when there is a two-way transmission of data, such as in a telephone conversation, significant delays can seriously degrade perceived quality.

Unlike compression speed, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples that need to be analyzed before processing a block of audio. In the minimal case, the latency is zero samples (for example, if the encoder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also tend to have low latencies, hence their popularity in speech coding for telephony. However, in algorithms such as MP3, a large number of samples have to be analyzed to implement a psychoacoustic model in the frequency domain, and the latency is on the order of 23 ms.

Speech coding

Speech coding is an important category of audio data compression. The perceptual models used to estimate which aspects of speech can be heard by the human ear are often somewhat different from those used for music. The frequency range needed to transmit the sounds of a human voice is typically much narrower than that needed for music, and the sound is typically less complex. Therefore, speech can be encoded with high quality using a relatively low bit rate.

This is typically achieved by a combination of two approaches:

Codify only the sounds that could emit a single human voice.
Discard most of the signal data, keeping only enough to rebuild an "intelligible" voice instead of the entire range of human ear frequencies.

The first algorithms used in speech coding (and audio data compression in general) were the A-law algorithm and the μ-law algorithm.

History

Solidyne 922: The first commercial audio sound card for PC audio bit compression, 1990

Early research on audio was done at Bell Laboratories. There, in 1950, C. Chapin Cutler filed a patent for differential pulse code modulation (DPCM). In 1973, P. Cummiskey, Nikil S. Jayant, and James L. Flanagan introduced adaptive DPCM (ADPCM).

Perceptual coding was first used for compression of speech coding, with Linear Predictive Coding (LPC). Early concepts of LPC date back to the work of Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966. During the 1970s, Bishnu S. Atal and Manfred R. Schroeder at Bell Labs developed a form of LPC called adaptive predictive coding (APC), a perceptual coding algorithm that exploited the masking properties of the human ear, followed in the early 1980s with the Code Excited Linear Prediction (CELP) algorithm, which achieved a significant compression ratio for its time. Perceptual encoding is used by modern audio compression formats such as MP3 and AAC.

The discrete cosine transform (DCT), developed by Nasir Ahmed, T. Natarajan, and K. R. Rao in 1974, provided the basis for the modified discrete cosine transform (MDCT) used by modern audio compression formats such as MP3, Dolby Digital, and AAC. MDCT was proposed by J. P. Princen, A. W. Johnson, and A. B. Bradley in 1987, following earlier work by Princen and Bradley in 1986.

The world's first commercial audio broadcast automation system was developed by Oscar Bonello, an engineering professor at the University of Buenos Aires. In 1983, using the psychoacoustic principle of masking critical bands first published in 1967, he began to develop a practical application based on the newly developed IBM PC, and the broadcast automation system was released in 1987. under the name of Audicom. Twenty years later, almost every radio station in the world was using similar technology made by various companies.

In February 1988, a bibliographical compendium on a wide variety of audio coding systems was published in the IEEE Journal on Selected Areas in Communications (JSAC). Although there were a few articles from before, this collection documented a variety of finished and working audio encoders, almost all of them using perceptual techniques and some form of frequency analysis and coding without background noise.

Video

Coding Theory

Video data can be represented as a series of still image frames. These data often contain copious amounts of spatial and temporal redundancy. Video compression algorithms attempt to reduce redundancy and store information more compactly.

Most video compression formats and codecs exploit spatial and temporal redundancy (for example, through motion-compensated difference coding). Similarities can be encoded by storing only the differences between, for example, temporally adjacent frames (interframe coding) or spatially adjacent pixels (intraframe coding). Inter-frame compression (a temporal delta encoding) (re)uses data from one or more frames before or after a sequence to describe the current frame. Intra-frame encoding, on the other hand, uses only data from the current frame, and is actually image compression.

The interframe video encoding formats used in camcorders and video editing employ a simpler compression that uses only intra-frame prediction. This simplifies video editing software, as it prevents a compressed frame from referring to data that the editor has removed.

Typically, video compression also employs lossy compression techniques such as quantization that reduce aspects of the source data that are (more or less) irrelevant to human visual perception by exploiting perceptual features of human vision. For example, small differences in color are more difficult to perceive than changes in brightness. Compression algorithms can average a color across these similar areas in a manner similar to those used in JPEG image compression. As with all lossy compression, there is a trade-off between video quality and bit rate, the cost of processing compression and decompression, and system requirements. Highly compressed videos may have noticeable or annoying compression artifacts.

Methods other than DCT-based transform formats, such as fractal compression, matching, and the use of a discrete wavelet transform (DWT), have been the subject of some research, but are not commonly used in products. practical. Wavelet compression is used in still image and video encoders without motion compensation. Interest in fractal compression seems to be waning, due to recent theoretical analyzes showing a lack of comparative efficacy of such methods.

Encoding Between Frames

Block-based hybrid transformation formats

Processing stages of a typical video encoder.

Today, almost all of the most commonly used video compression methods (for example, those found in ITU-T or ISO-approved standards) share the same basic architecture that dates back to the H. 261, standardized in 1988 by the ITU-T. They are mainly based on DCT, applied to rectangular blocks of neighboring pixels, and on time prediction using motion vectors, as well as, currently, on a loop filtering stage.

In the prediction stage, various deduplication and difference coding techniques are applied to help decorrelate the data and describe new data based on already transmitted data.

Next, the remaining rectangular blocks of pixel data are transformed into the frequency domain. In the main stage of lossy processing, frequency domain data is quantized to reduce information that is irrelevant to human visual perception.

In the last stage, statistical redundancy is largely removed by an entropy encoder that typically applies some form of arithmetic coding.

In a further loop filtering stage, various filters can be applied to the reconstructed image signal. By computing these filters also within the coding loop they can aid compression because they can be applied to the reference material before it is used in the prediction process and can be guided using the original signal. The most popular example is deblocking filters that remove blocking artifacts from quantization discontinuities at transform block boundaries.

History

In 1967, A.H. Robinson and C. Cherry proposed a run-length coding bandwidth compression scheme for the transmission of analog television signals. The Discrete Cosine Transform (DCT), which is fundamental to modern video compression, it was introduced by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974.

H.261, which debuted in 1988, commercially introduced the prevalent basic architecture of video compression technology. It was the first video coding format based on DCT compression. H.261 was developed by several companies, including including Hitachi, PictureTel, NTT, BT and Toshiba.

The most popular video coding standards used for codecs have been the MPEG standards. MPEG-1 was developed by the Motion Picture Experts Group (MPEG) in 1991, and was designed to compress VHS-quality video. It was succeeded in 1994 by MPEG-2/H.262, which was developed by several companies, mainly Sony, Thomson, and Mitsubishi Electric. MPEG-2 became the standard video format for DVD and SD digital television. 1999, MPEG-4/H.263 followed. It was also developed by various companies, mainly Mitsubishi Electric, Hitachi and Panasonic.

H.264/MPEG-4 AVC was developed in 2003 by several organizations, primarily Panasonic, Godo Kaisha IP Bridge, and LG Electronics. AVC commercially introduced modern context-adaptive binary arithmetic coding (CABAC) algorithms and Context Adaptive Variable Length (CAVLC). AVC is the primary video encoding standard for Blu-ray discs, and is widely used by video-sharing websites and Internet streaming services such as YouTube, Netflix, Vimeo, and the iTunes Store, web software such as Adobe Flash Player and Microsoft Silverlight, and various HDTV broadcasts on terrestrial and satellite television.

Genetics

Genomic compression algorithms are the latest generation of lossless algorithms that compress data (usually nucleotide sequences) using both conventional compression algorithms and genetic algorithms tailored to the specific data type. In 2012, a team of scientists from Johns Hopkins University published a gene compression algorithm that does not use a reference genome for compression. HAPZIPPER has been scaled to the HapMap data and achieves over 20x compression (95% reduction in file size), providing 2-4x better compression and is less computationally intensive than leading general purpose compression utilities. To do this, Chanda, Elhaik, and Bader introduced MAF-based coding (MAFE), which reduces the heterogeneity of the data set by classifying SNPs by their lowest allele frequency, thus homogenizing the data set. Other algorithms developed in 2009 and 2013 (DNAZip and GenomeZip) have compression ratios of up to 1200 times, allowing 6 billion base pairs of diploid human genomes to be stored in 2.5 megabytes (relative to a reference genome or averaged over many genomes). For a reference on genetic/genomic data compressors, see.

Contenido relacionado

Más resultados...