Joint Photographic Experts Group
Joint Photographic Experts Group (JPEG) is the name of a committee of experts that created a standard for compression and encoding of files and still images, which is currently one of the most of the most used formats for photographs.
This committee was integrated from its beginnings by the merger of several groups in an attempt to share and develop their experience in digitalizing images. The ISO, three years before had begun its investigations in the area.
In addition to being a compression method, it is often considered an archive format. JPEG/Exif is the most common image format, used by digital cameras and other image capture devices, along with JPG/JFIF, which is also another format. for the storage and transmission of photographic images on the Web. These variations of formats are often indistinguishable, and are called "JPEG." Files of this type are often named with the.JPG extension.
JPEG compression
The JPEG format typically uses a lossy compression algorithm to reduce the size of image files. This means that when decompressing or viewing the image, you do not get exactly the same image that you started with before compression. There are also three variants of the JPEG standard that compress the image without data loss: JPEG 2000, JPEG-LS, and Lossless JPEG.
The JPEG compression algorithm is based on two visual phenomena of the human eye: one is the fact that it is much more sensitive to changes in luminance than chrominance; that is, it captures changes in brightness more clearly than in color. The other is that it more easily notices small changes in brightness in homogeneous areas than in areas where the variation is large, for example at the edges of the bodies of objects.
One of the characteristics of JPEG is the flexibility in adjusting the degree of compression. Too high a degree of compression will result in a small file size, at the cost of significant loss of quality. With a low compression rate, you get an image quality that is close to that of the original, but with a larger file size.
The loss of quality when successive compressions are performed is cumulative. This means that if an image is compressed and then decompressed, image quality will be lost, but if an already compressed image is recompressed, the loss will be even greater. Each successive compression will cause further loss of quality. Lossy compression is not suitable for images or graphics that have very sharp text, lines, or borders, but it is suitable for files that contain large areas of solid colors.
The JPEG format does not include in its encoding the management of the alpha channel, which is what defines the opacity of a pixel in an image. Therefore, unlike formats such as PNG, or variants of the standard such as JPEG 2000, the standard JPEG format is not capable of managing transparency.
Encoding
Many of the features of the JPEG standard are rarely used. This is a brief description of one of the many commonly used methods to compress images when applied to an input image with 24 bits per pixel (eight each for red, green, and blue, or "8 bits per pixel). channel"). This particular option is a lossy compression method.
Color space transformation
It starts by converting the image from its RGB color model to another called YUV or YCbCr. This color space is similar to that used by the PAL and NTSC television color systems, but is much closer to the MAC (Multiplexed Analog Component) television system.
This color space (YUV) has three components:
- The component And, or luminance (gloss information); that is, the image in gray scale.
- Components U o Cb and V o Cr, respectively difference of blue (relativiza the image between blue and red) and difference of red (relativiza the image between green and red); both signals are known as chromancy (color information).
The equations that perform this base change from RGB to YUV are the following:
Y = 0.257 * R + 0.504 * G + 0.098 * B + 16 Cb = U = -0.148 * R - 0.291 * G + 0.439 * B + 128 Cr = V = 0.439 * R - 0.368 * G - 0.071 * B + 128
The equations for the inverse change can be obtained by clearing the previous ones and the following are obtained:
B = 1,164 * (Y - 16) + 2.018 * (U - 128) G = 1,164 * (Y - 16) - 0,813 * (V - 128) - 0,391 * (U - 128) R = 1,164 * (Y - 16) + 1,596 * (V - 128)
NOTE: These equations are under continuous investigation, so other different equations but with very similar coefficients can be found in books and on the net.
If we analyze the first trio of equations we will see that the three components have a minimum value of 16. The luminance channel (channel Y) has a maximum value of 235, while the chrominance channels have 240. All these values They fit in one byte by rounding to the nearest integer. During this phase there is no significant loss of information, although the rounding introduces a small margin of error imperceptible to the human eye.
Subsampling
An option that can be applied when saving the image is to reduce the color information with respect to the brightness information (due to the visual phenomenon in the human eye mentioned above). There are several methods: if this step is not applied, the image remains in its YUV color space (this subsampling is understood as 4:4:4), so the image is lossless. Color information can be halved, 4:2:2 (reduced by a factor of 2 in the horizontal direction), giving color half the resolution (horizontally) and brightness still intact. Another widely used method is to reduce the color by a quarter, 4:2:0, in which the color is reduced by a factor of 2 in both the horizontal and vertical directions. If the starting image was in grayscale (black and white), the color information can be completely removed, leaving it as 4:0:0.
Some programs that support saving images in JPEG (such as the one used by GIMP) refer to these methods with 1×1,1×1,1×1 for YUV 4:4: 4 (don't lose color), 2×1.1×2.1×1 for YUV 4:2:2 and 2×2.1×1.1×1 for the last method, YUV 4:2:0.
The algorithmic techniques used for this step (for its exact reconstruction) are usually bilinear interpolation, nearest neighbor, cubic convolution, Bezier, b-spline and Catmun-Roll.rh
Discrete Cosine Transform (DCT)
Each component of the image is divided into small blocks of 8×8 pixels, which are processed almost independently, which significantly decreases the computation time. This results in the typical grid formation, which becomes visible in images saved with high compression. If the image undersampled the color, the colors would remain in the final image in blocks of 8×16 and 16×16 pixels, depending on whether it was 4:2:2 or 4:2:0.
Then, each small block is converted to the frequency domain via the discrete cosine transform, called DCT for short.
An example of one of those little initial 8x8 blocks is this:
- [chuckles]52556166706164736359559010985697262596811314410466736358711221541067069676168104126886870796560707768587585716459556165838779696865767894]{display {begin{bmatrix}52 hip-shot65}{65}{65}{65}{65}{bmatrix}{65}{65}{bmatrix}{65}{65}{65}{65}{68}{68}{65}{68}{65}{68}{68}{68}{65}{65}{68}{65}{65}{65}{65}{65}{68}{65}{65}{68}{65}{68}{65}{65}{65}{65}{65}{65}{65}{65}{6}{6}{65}{65}{65}{65}{65}{65}{65}{65}{65}{65}{65}{65}{6}{65}{65}{65}{65}{6}{65}{6}{6}{65}{65}{6}{6}{6}{6}{6}{6}{6}{6}{6}
The next process is to subtract 128 so that there are numbers around 0, between -128 and 127.
- [chuckles]− − 76− − 73− − 67− − 62− − 58− − 67− − 64− − 55− − 65− − 69− − 73− − 38− − 19− − 43− − 59− − 56− − 66− − 69− − 60− − 1516− − 24− − 62− − 55− − 65− − 70− − 57− − 626− − 22− − 58− − 59− − 61− − 67− − 60− − 24− − 2− − 40− − 60− − 58− − 49− − 63− − 68− − 58− − 51− − 60− − 70− − 53− − 43− − 57− − 64− − 69− − 73− − 67− − 63− − 45− − 41− − 49− − 59− − 60− − 63− − 52− − 50− − 34][63-60begin{bmatrix}-76 fake-67-smoke-67-smoke-65-65-65-65-69-BS-38- bout- bout- bout-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-b-
The matrix is transformed by DCT, and each element is rounded to the nearest integer.
- [chuckles]− − 415− − 30− − 612756− − 20− − 204− − 22− − 611013− − 7− − 95− − 47777− − 25− − 29105− − 6− − 491234− − 15− − 1062212− − 7− − 13− − 4− − 22− − 33− − 832− − 6− − 2142− − 100− − 2− − 1− − 34− − 100− − 1− − 4− − 1012]{display {begin{bmatrix}-415style-30 hypo-61}{4}{bmatrix}{bmatrix}-415}{-415}{4}{bmatrix}–161}{415}{415}{415}{415}–161}{5}{415}–115}{5}{415}{415}{415}–1}{415}{4}{415}{415}{415}{1}–1}{1}{1}{1}{415}{1}{1}{1}{4}{1}{4}{4}{1}{4}{4}{4}{4}{4}{1}{4}{4}{1}{1}{4}–1}{1}{1}{4}–1}–1}{4}{4}–1}{1}{4}–1}{4}{1}{4}{4}{1}{4}
Notice that the largest element in the entire array appears in the upper left corner; this is the DC coefficient.
Digital quantification
The human eye is very good at detecting small changes in brightness in relatively large areas, but when the brightness changes rapidly in small areas it is not (high frequency variation). Due to this condition, high frequencies can be removed without excessive loss of visual quality. This is done by dividing each component in the frequency domain by a constant for that component, and rounding it to the nearest integer. This is the process where most of the information (and quality) is lost when an image is processed by this algorithm. The result of this is that the ctm high-frequency components tend to equalize to zero, while many of the others become small positive and negative numbers.
A typical quantization matrix is the Lossheller matrix which is optionally used in the JPEG standard:
- [chuckles]1611101624405161121214192658605514131624405769561417222951878062182237566810910377243555648110411392496478871031211201017292959811210010399]{dhisplay {begin{bmatrix}16 style{1}16 supposed to be such a dream16}{absorbmatrix}16 naked24}{absorbmatrix}16 fake24}{14}{65}{14}{175}{bmatrix1}16 age151}{14 purs1}{4-s16}{14-s16-outda-outda-s24-s24-s24-s24-outs24-s24-s24-sex-sex-sor-s24-65}{1-sor-s24-65}{4-sor-sor-sor-sor-sor-sor-65}{1-sor-sor-s24-65}{1-sor-sor-sor-sor-65}{1-sor-s24-s24-sor-sor-sor-sor-sor-s24-sor-s24-s24-
Dividing each coefficient of the matrix of the transformed image by each coefficient of the quantization matrix, this matrix is obtained, already quantized:
- [chuckles]− − 26− − 3− − 622− − 1000− − 2− − 411000− − 315− − 1− − 1000− − 412− − 1000010000000000000000000000000000000]0{display {begin{bmatrix}-26 exposes to-6 hypostyle}-2 fake2 either fake 0 fake 0bmatrix}-26 fake 1 fake0}-1 fake0}0}0 fake0}0{0}0{0{0{0}{bmatrix}-26 #0 fake 0{0{0b}-26 fake 0{0}{0{0{0{0}{0{0}{0}{bmatrix-26}{0}{0}{0style 0bbmatrix-26}{0{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{bmatrix-26{0}{0}{0}-26}-26}-26}-26}-26}-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26-26
For example, quantizing the first element, the DC coefficient, would look like this:
- rorund(− − 41516)=rorund(− − 25.9375)=− − 26{displaystyle mathrm {round} left({frac {-415}{16}{16}right)=mathrm {round} left(-25.9375right)=-26}
Entropy coding
Entropy coding is a special form of lossless data compression. To do this, the elements of the matrix are taken following a zig-zag pattern, putting groups with similar frequencies together, and inserting coding zeros, and using Huffman coding for what remains. You can also use arithmetic encoding, superior to Huffman but rarely used as it is covered by patents, this compression produces 5% smaller files, but at the cost of longer encoding and decoding time, this small gain, can also be used to apply a lower degree of compression to the image, and obtain more quality for a similar size.
In the previous matrix, the zig-zag sequence is this:
−26, −3, 0, −3, −2, −6, 2, −4, 1 −4, 1, 1, 5, 1, 2, −1, 1, −1, 2, 0, 0, 0, 0, 0, −1, −1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
JPEG has a Huffman code to cut off the previous string at the point where all other coefficients are zero, thus saving space:
−26, −3, 0, −3, −2, −6, 2, −4, 1 −4, 1, 1, 5, 1, 2, −1, 1, −1, 2, 0, 0, 0, 0, 0, −1, −1, EOB
Noise produced by compression
The result after compression can vary, depending on the aggressiveness of the divisors of the quantization matrix, the higher the value of those divisors, the more coefficients become zeros, and the more the image is compressed. But higher compressions produce more noise in the image, worsening its quality. An image with strong compression (1%-15%) may have a much smaller file size, but will have so many blemishes that it won't be interesting, very low compression (98%-100%) will produce a very high quality image, but, it will be so large in size that you might be more interested in a lossless format like PNG.
Most internet surfers will be familiar with these blemishes, which are the result of achieving good compression. To avoid them, you will have to reduce the compression level or apply lossless compression, producing larger files later.
Decoding
The decoding process is similar to the one followed so far, only in reverse. In this case, having lost information, the final values will not match the initial ones.
The information is taken from the matrix, it is decoded, and each value is placed in its corresponding box. Then each of these values is multiplied by the corresponding value of the quantization matrix used, as many values are zeros, only the values in the upper left corner are recovered (and in an approximate way).
Then the DCT transform is undone:
- [chuckles]− − 416− − 33− − 603248− − 40000− − 24− − 561926000− − 421380− − 24− − 40000− − 561744− − 290000180000000000000000000000000000000]0displaystyle {begin{bmatrix}-416 exposes-33 fake-60 fake-32 fake-48 fake-40 fake0 fake-56-19 fake0 fake0-42 fake0}{0}{0}{0}{0}{0display{0style {bmatrix}-416{bmatrix}{016}{0}{0}{0 expose-s0}{0}{0{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0 expose0}{0}{0}{0}{0{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}{0}
- [chuckles]− − 68− − 65− − 73− − 70− − 58− − 67− − 70− − 48− − 70− − 72− − 72− − 45− − 20− − 40− − 65− − 57− − 68− − 76− − 66− − 1522− − 12− − 58− − 61− − 62− − 72− − 60− − 628− − 12− − 59− − 56− − 59− − 66− − 63− − 28− − 8− − 42− − 69− − 52− − 60− − 60− − 67− − 60− − 50− − 68− − 75− − 50− − 54− − 46− − 61− − 74− − 65− − 64− − 63− − 45− − 45− − 32− − 51− − 72− − 58− − 45− − 45− − 39]{begin{bmatrix}-68 fake-smokin-smoke-smoke-70}{begin}{45-70}{bmatrix}-68}{45-68}{b-68}{b-68}{b-65}{b-65-smoke-smoke-65-65-45-65-45-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-65-
And finally 128 is added to each input:
- [chuckles]6063555870615880585656831088863716052621131501167067665668122156.1166972696265100120865976686861687860537874826754636465838396775670838389]{display {begin{bmatrix}60 pursie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sie-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-sty-
To compare the differences between the original block and the compressed one, the difference between both matrices is found, the average of their absolute values, gives a slight idea of the quality lost:
- [chuckles]− − 8− − 868006− − 753− − 171− − 3612760− − 6− − 12− − 46− − 3230− − 2− − 101− − 3− − 2− − 134629− − 611− − 3− − 12− − 185− − 311− − 11− − 35− − 8− − 3004− − 17− − 812− − 5− − 7− − 55]{5display {begin{bmatrix}-8style hypostyle-8 hypostyle}6 fake8 fake8 fake0-8}{8}0-8 hypo5}{5}{bmatrix}-8style-8 hypo7}{12}{12-8}{6-8}{6-8}{6-8}{5{5}{bmatrix}{bmatrix}–8–8–-8style–8–8–8––––8-out–8-out–8-out–––8-style–8–8––––––––8-out––––8-out–––––8-out—out–8-style–8-style–––––––––––8-out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out—out
It can be seen that the biggest differences are near the stain, and at the bottom, between the left corner and the center, the latter being more noticeable, since a clear stain runs that was more towards the corner before. The mean of the absolute values of the subtractions is 4.8125, although in some areas it is higher.
Contenido relacionado
OIC (desambiguación)
Decano del Colegio Cardenalicio
Grupo de realidad virtual