Joint Photographic Experts Group

format_list_bulleted Contenido keyboard_arrow_down

ImprimirCitar

Joint Photographic Experts Group (JPEG) is the name of a committee of experts that created a standard for compression and encoding of files and still images, which is currently one of the most of the most used formats for photographs.

This committee was integrated from its beginnings by the merger of several groups in an attempt to share and develop their experience in digitalizing images. The ISO, three years before had begun its investigations in the area.

In addition to being a compression method, it is often considered an archive format. JPEG/Exif is the most common image format, used by digital cameras and other image capture devices, along with JPG/JFIF, which is also another format. for the storage and transmission of photographic images on the Web. These variations of formats are often indistinguishable, and are called "JPEG." Files of this type are often named with the.JPG extension.

JPEG compression

Comparison of quality between the original image, compressed in JPG (with loss) and compressed in WebP (with loss).

The JPEG format typically uses a lossy compression algorithm to reduce the size of image files. This means that when decompressing or viewing the image, you do not get exactly the same image that you started with before compression. There are also three variants of the JPEG standard that compress the image without data loss: JPEG 2000, JPEG-LS, and Lossless JPEG.

The JPEG compression algorithm is based on two visual phenomena of the human eye: one is the fact that it is much more sensitive to changes in luminance than chrominance; that is, it captures changes in brightness more clearly than in color. The other is that it more easily notices small changes in brightness in homogeneous areas than in areas where the variation is large, for example at the edges of the bodies of objects.

One of the characteristics of JPEG is the flexibility in adjusting the degree of compression. Too high a degree of compression will result in a small file size, at the cost of significant loss of quality. With a low compression rate, you get an image quality that is close to that of the original, but with a larger file size.

The loss of quality when successive compressions are performed is cumulative. This means that if an image is compressed and then decompressed, image quality will be lost, but if an already compressed image is recompressed, the loss will be even greater. Each successive compression will cause further loss of quality. Lossy compression is not suitable for images or graphics that have very sharp text, lines, or borders, but it is suitable for files that contain large areas of solid colors.

The JPEG format does not include in its encoding the management of the alpha channel, which is what defines the opacity of a pixel in an image. Therefore, unlike formats such as PNG, or variants of the standard such as JPEG 2000, the standard JPEG format is not capable of managing transparency.

Encoding

Many of the features of the JPEG standard are rarely used. This is a brief description of one of the many commonly used methods to compress images when applied to an input image with 24 bits per pixel (eight each for red, green, and blue, or "8 bits per pixel). channel"). This particular option is a lossy compression method.

Color space transformation

RGB model outline.

Scheme of the YUV model.

It starts by converting the image from its RGB color model to another called YUV or YCbCr. This color space is similar to that used by the PAL and NTSC television color systems, but is much closer to the MAC (Multiplexed Analog Component) television system.

This color space (YUV) has three components:

The component And, or luminance (gloss information); that is, the image in gray scale.
Components U o Cb and V o Cr, respectively difference of blue (relativiza the image between blue and red) and difference of red (relativiza the image between green and red); both signals are known as chromancy (color information).

The equations that perform this base change from RGB to YUV are the following:

Y = 0.257 * R + 0.504 * G + 0.098 * B + 16
Cb = U = -0.148 * R - 0.291 * G + 0.439 * B + 128
Cr = V = 0.439 * R - 0.368 * G - 0.071 * B + 128

The equations for the inverse change can be obtained by clearing the previous ones and the following are obtained:

B = 1,164 * (Y - 16) + 2.018 * (U - 128)
G = 1,164 * (Y - 16) - 0,813 * (V - 128) - 0,391 * (U - 128)
R = 1,164 * (Y - 16) + 1,596 * (V - 128)

NOTE: These equations are under continuous investigation, so other different equations but with very similar coefficients can be found in books and on the net.

If we analyze the first trio of equations we will see that the three components have a minimum value of 16. The luminance channel (channel Y) has a maximum value of 235, while the chrominance channels have 240. All these values They fit in one byte by rounding to the nearest integer. During this phase there is no significant loss of information, although the rounding introduces a small margin of error imperceptible to the human eye.

Subsampling

Light visual explanation about the subsample. The image from top to left is the original; the others suffer some aggressive sub-sample that give an idea of the effects of this technique. Enlarge for better visualization.

The JPEG algorithm transforms the image into 8×8 squares and then stores each one of these as a linear combination or sum of the 64 boxes that form this image; this allows you to remove details selectively. For example, if a box has a value very close to 0, it can be deleted without affecting the quality much.

An option that can be applied when saving the image is to reduce the color information with respect to the brightness information (due to the visual phenomenon in the human eye mentioned above). There are several methods: if this step is not applied, the image remains in its YUV color space (this subsampling is understood as 4:4:4), so the image is lossless. Color information can be halved, 4:2:2 (reduced by a factor of 2 in the horizontal direction), giving color half the resolution (horizontally) and brightness still intact. Another widely used method is to reduce the color by a quarter, 4:2:0, in which the color is reduced by a factor of 2 in both the horizontal and vertical directions. If the starting image was in grayscale (black and white), the color information can be completely removed, leaving it as 4:0:0.

Some programs that support saving images in JPEG (such as the one used by GIMP) refer to these methods with 1×1,1×1,1×1 for YUV 4:4: 4 (don't lose color), 2×1.1×2.1×1 for YUV 4:2:2 and 2×2.1×1.1×1 for the last method, YUV 4:2:0.

The algorithmic techniques used for this step (for its exact reconstruction) are usually bilinear interpolation, nearest neighbor, cubic convolution, Bezier, b-spline and Catmun-Roll.rh

Discrete Cosine Transform (DCT)

Each component of the image is divided into small blocks of 8×8 pixels, which are processed almost independently, which significantly decreases the computation time. This results in the typical grid formation, which becomes visible in images saved with high compression. If the image undersampled the color, the colors would remain in the final image in blocks of 8×16 and 16×16 pixels, depending on whether it was 4:2:2 or 4:2:0.

Then, each small block is converted to the frequency domain via the discrete cosine transform, called DCT for short.

An example of one of those little initial 8x8 blocks is this:

begin{bmatrix} 52 & 55 & 61 & 66 & 70 & 61 & 64 & 73 \ 63 & 59 & 55 & 90 & 109 & 85 & 69 & 72 \ 62 & 59 & 68 & 113 & 144 & 104 & 66 & 73 \ 63 & 58 & 71 & 122 & 154 & 106 & 70 & 69 \ 67 & 61 & 68 & 104 & 126 & 88 & 68 & 70 \ 79 & 65 & 60 & 70 & 77 & 68 & 58 & 75 \ 85 & 71 & 64 & 59 & 55 & 61 & 65 & 83 \ 87 & 79 & 69 & 68 & 65 & 76 & 78 & 94 end{bmatrix}

The next process is to subtract 128 so that there are numbers around 0, between -128 and 127.

begin{bmatrix} -76 & -73 & -67 & -62 & -58 & -67 & -64 & -55 \ -65 & -69 & -73 & -38 & -19 & -43 & -59 & -56 \ -66 & -69 & -60 & -15 & 16 & -24 & -62 & -55 \ -65 & -70 & -57 & -6 & 26 & -22 & -58 & -59 \ -61 & -67 & -60 & -24 & -2 & -40 & -60 & -58 \ -49 & -63 & -68 & -58 & -51 & -60 & -70 & -53 \ -43 & -57 & -64 & -69 & -73 & -67 & -63 & -45 \ -41 & -49 & -59 & -60 & -63 & -52 & -50 & -34 end{bmatrix}

The matrix is transformed by DCT, and each element is rounded to the nearest integer.

begin{bmatrix} -415 & -30 & -61 & 27 & 56 & -20 & -2 & 0 \ 4 & -22 & -61 & 10 & 13 & -7 & -9 & 5 \ -47 & 7 & 77 & -25 & -29 & 10 & 5 & -6 \ -49 & 12 & 34 & -15 & -10 & 6 & 2 & 2 \ 12 & -7 & -13 & -4 & -2 & 2 & -3 & 3 \ -8 & 3 & 2 & -6 & -2 & 1 & 4 & 2 \ -1 & 0 & 0 & -2 & -1 & -3 & 4 & -1 \ 0 & 0 & -1 & -4 & -1 & 0 & 1 & 2 end{bmatrix}

Notice that the largest element in the entire array appears in the upper left corner; this is the DC coefficient.

Digital quantification

Main article: Digital qualification

"Before", in a block 8×8 (ample ×16).

"After", in an 8×8 block, errors are noticed regarding the first image, as in the lower left corner, which is clearer.

The human eye is very good at detecting small changes in brightness in relatively large areas, but when the brightness changes rapidly in small areas it is not (high frequency variation). Due to this condition, high frequencies can be removed without excessive loss of visual quality. This is done by dividing each component in the frequency domain by a constant for that component, and rounding it to the nearest integer. This is the process where most of the information (and quality) is lost when an image is processed by this algorithm. The result of this is that the ctm high-frequency components tend to equalize to zero, while many of the others become small positive and negative numbers.

A typical quantization matrix is the Lossheller matrix which is optionally used in the JPEG standard:

begin{bmatrix} 16 & 11 & 10 & 16 & 24 & 40 & 51 & 61 \ 12 & 12 & 14 & 19 & 26 & 58 & 60 & 55 \ 14 & 13 & 16 & 24 & 40 & 57 & 69 & 56 \ 14 & 17 & 22 & 29 & 51 & 87 & 80 & 62 \ 18 & 22 & 37 & 56 & 68 & 109 & 103 & 77 \ 24 & 35 & 55 & 64 & 81 & 104 & 113 & 92 \ 49 & 64 & 78 & 87 & 103 & 121 & 120 & 101 \ 72 & 92 & 95 & 98 & 112 & 100 & 103 & 99 end{bmatrix}

Dividing each coefficient of the matrix of the transformed image by each coefficient of the quantization matrix, this matrix is obtained, already quantized:

begin{bmatrix} -26 & -3 & -6 & 2 & 2 & -1 & 0 & 0 \ 0 & -2 & -4 & 1 & 1 & 0 & 0 & 0 \ -3 & 1 & 5 & -1 & -1 & 0 & 0 & 0 \ -4 & 1 & 2 & -1 & 0 & 0 & 0 & 0 \ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 end{bmatrix}

For example, quantizing the first element, the DC coefficient, would look like this:

mathrm{round} left( frac{-415}{16} right) = mathrm{round} left( -25.9375 right) = -26

Entropy coding

Entropy coding is a special form of lossless data compression. To do this, the elements of the matrix are taken following a zig-zag pattern, putting groups with similar frequencies together, and inserting coding zeros, and using Huffman coding for what remains. You can also use arithmetic encoding, superior to Huffman but rarely used as it is covered by patents, this compression produces 5% smaller files, but at the cost of longer encoding and decoding time, this small gain, can also be used to apply a lower degree of compression to the image, and obtain more quality for a similar size.

In the previous matrix, the zig-zag sequence is this:
−26, −3, 0, −3, −2, −6, 2, −4, 1 −4, 1, 1, 5, 1, 2, −1, 1, −1, 2, 0, 0, 0, 0, 0, −1, −1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

JPEG has a Huffman code to cut off the previous string at the point where all other coefficients are zero, thus saving space:
−26, −3, 0, −3, −2, −6, 2, −4, 1 −4, 1, 1, 5, 1, 2, −1, 1, −1, 2, 0, 0, 0, 0, 0, −1, −1, EOB

Noise produced by compression

Main article: Quantification noise

After compression, blocks like these or pixels are sometimes left, in this case in a small part of an enlarged image.

The result after compression can vary, depending on the aggressiveness of the divisors of the quantization matrix, the higher the value of those divisors, the more coefficients become zeros, and the more the image is compressed. But higher compressions produce more noise in the image, worsening its quality. An image with strong compression (1%-15%) may have a much smaller file size, but will have so many blemishes that it won't be interesting, very low compression (98%-100%) will produce a very high quality image, but, it will be so large in size that you might be more interested in a lossless format like PNG.

Most internet surfers will be familiar with these blemishes, which are the result of achieving good compression. To avoid them, you will have to reduce the compression level or apply lossless compression, producing larger files later.

Decoding

The decoding process is similar to the one followed so far, only in reverse. In this case, having lost information, the final values will not match the initial ones.

The information is taken from the matrix, it is decoded, and each value is placed in its corresponding box. Then each of these values is multiplied by the corresponding value of the quantization matrix used, as many values are zeros, only the values in the upper left corner are recovered (and in an approximate way).

Then the DCT transform is undone:

begin{bmatrix} -416 & -33 & -60 & 32 & 48 & -40 & 0 & 0 \ 0 & -24 & -56 & 19 & 26 & 0 & 0 & 0 \ -42 & 13 & 80 & -24 & -40 & 0 & 0 & 0 \ -56 & 17 & 44 & -29 & 0 & 0 & 0 & 0 \ 18 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 end{bmatrix}

begin{bmatrix} -68 & -65 & -73 & -70 & -58 & -67 & -70 & -48 \ -70 & -72 & -72 & -45 & -20 & -40 & -65 & -57 \ -68 & -76 & -66 & -15 & 22 & -12 & -58 & -61 \ -62 & -72 & -60 & -6 & 28 & -12 & -59 & -56 \ -59 & -66 & -63 & -28 & -8 & -42 & -69 & -52 \ -60 & -60 & -67 & -60 & -50 & -68 & -75 & -50 \ -54 & -46 & -61 & -74 & -65 & -64 & -63 & -45 \ -45 & -32 & -51 & -72 & -58 & -45 & -45 & -39 end{bmatrix}

And finally 128 is added to each input:

begin{bmatrix} 60 & 63 & 55 & 58 & 70 & 61 & 58 & 80 \ 58 & 56 & 56 & 83 & 108 & 88 & 63 & 71 \ 60 & 52 & 62 & 113 & 150 & 116 & 70 & 67 \ 66 & 56 & 68 & 122 & 156 & 116 & 69 & 72 \ 69 & 62 & 65 & 100 & 120 & 86 & 59 & 76 \ 68 & 68 & 61 & 68 & 78 & 60 & 53 & 78 \ 74 & 82 & 67 & 54 & 63 & 64 & 65 & 83 \ 83 & 96 & 77 & 56 & 70 & 83 & 83 & 89 end{bmatrix}

To compare the differences between the original block and the compressed one, the difference between both matrices is found, the average of their absolute values, gives a slight idea of the quality lost:

begin{bmatrix} -8 & -8 & 6 & 8 & 0 & 0 & 6 & -7 \ 5 & 3 & -1 & 7 & 1 & -3 & 6 & 1 \ 2 & 7 & 6 & 0 & -6 & -12 & -4 & 6 \ -3 & 2 & 3 & 0 & -2 & -10 & 1 & -3 \ -2 & -1 & 3 & 4 & 6 & 2 & 9 & -6 \ 11 & -3 & -1 & 2 & -1 & 8 & 5 & -3 \ 11 & -11 & -3 & 5 & -8 & -3 & 0 & 0 \ 4 & -17 & -8 & 12 & -5 & -7 & -5 & 5 end{bmatrix}

It can be seen that the biggest differences are near the stain, and at the bottom, between the left corner and the center, the latter being more noticeable, since a clear stain runs that was more towards the corner before. The mean of the absolute values of the subtractions is 4.8125, although in some areas it is higher.

Contenido relacionado

Más resultados...