As we have seen in part1 - A walkthrough progressive JPEGs, the progressive loading or decoding of a JPEG image can be customized using a scan script. The progressive scan script tells the decoder about the data from each image component (by component here we mean the color space component e.g R, G, B) that is necessary to do the scan. In this post, we would like to dig deeper into scan scripts used by libjpeg/MozJPEG to customize the progressive scans, in an attempt to decipher their syntax.

Scan script syntax:

According to MozJPEG docs, a scan script can be made of several lines where each line can be composed of up to 2 parts separated by a colon:

  1. The (color space) component index which can be either 0, 1, 2, or 3. The component index can also be a combination of 1 or more components separated by a comma or space. For example, if we are in an RGB (Red Green Blue) color space than 0 would refer to the Red component of a pixel, 1 to the Green component, and 2 would refer to the Blue component.

  2. The (optional) second part has 4 parameters that specifies somehow the component pixels to display in this scan. The parameters are: Ss, Se, Ah, Al.

scan_script

The Ss, Se refer respectively to the indexes of the first and last DCT (Discrete Cosine Transform) coefficients within a JPEG block. a JPEG image is usually encoded as several 8x8 pixels blocks 1. This implies that each block is composed of 64 items (called coefficients) that carry the color informations. The items are the result of a multi stage process that applies DCT, quantization, and encoding. The order of the coefficients goes in a zig zig fashion rather than horizontally or vertically.

scan_script

Source: Figure 5 (The International Telegraph and Telephone Consultative Committee, 1993b) 1

The illustration below shows the order in which the coefficients are written

scan_script

Source: Figure A.6 (The International Telegraph and Telephone Consultative Committee, 1993b) 1

Now that we have the big picture, we can understand why Ss, Se are referred to as zig zig indexes: they indicate the first and last index of the coefficients that will be part of the scan. It’s also worth noting here the distinction between two important elements within the 8x8 block: the DC and the AC coefficients. The DC is the 0 or the first coefficient, and the remaining 63 are referred to as the AC coefficients. More details in the next section.

Ah, Al are values related to successive approximation which is a method used to code the DCT coefficients1. Using this method, the coefficients are coded with reduced precision in the first scan, and precision is increased by one bit with each succeeding scan. This is an advanced topic that we will cover in a future post. For now, let’s just assume that Ah and Al are 0.

so what in the world are the DC and the AC coefficients:

According to P. Venkat Rangan2:

The DC coefficient represents the average color of the 8x8 region. The 63 AC coefficients represent color change across the block. Low-numbered coefficients represent low-frequency color change, or gradual color change across the region. High-numbered coefficients represent high-frequency color change, or color which changes rapidly from one pixel to another within the block.

We can build on this definition to conclude that the DC coefficient is the most important one because it carries most of the color information in a 8x8 block. As a consequence, it is always the coefficient that loads first in progressive scans. Here is an example of an image that has only the DC coefficient for all components loaded:

dc_only

As you can see, the DC components contain most of the image visual identity.

Now that we know what the DC and AC are, we are ready to understand some ground rules related to the Ss and Se parameters. First of all, each (color) component should load the DC coefficient first. In other words, the Ss and the Se parameters should be always 0-0 or 0 0 in the first lines before dealing with AC ones. They don’t have to necessarily occur in the first line, but each component whose DC is not loaded cannot proceed to the AC scans. Second of all, you cannot have a mix between DC and AC coefficients unless you are loading everything at once for a particular component. For example, Ss and Se cannot be 0-5 or 0-42 because we are here mixing the DC coefficient which has index 0 with the rest of the AC coefficients. However, it’s possible to have a value of 0-63 which would mean here loading the whole block at once for a particular component. This is in fact the default value assumed when the part after the : is omitted. Finally, AC components can be loaded only for one color component at once, so if you have a Ss and Se values that are from 1 up to 63, the left side cannot contain more than one component index, for example: 1 2: 20-45 would lead to a validation error.

If you try compressing an image with the -scans flag that points to a script that violates one of the mentioned rules, you will get a validation error similar to:

Invalid progressive parameters at scan script entry 3

More details about scripts validation can be found in the validate_script function in MozJPEG source code.

Analyzing a script:

Let’s analyze this very simple scan script:

0;
1 2;

which is equivalent to:

0: 0 63 0 0;
1 2: 0 63 0 0;

The script above loads all DCT coefficients at once. Let’s assume we are in an RGB space, then all the coefficients for the R component are loaded in the first scan. In the second scan, the same thing will happen to the G and B components.

The script will result in a decoding similar to baseline JPEG except that the R component will load first yielding a full image with a grayscale like color. Here is a demonstration of the resulting image after each scan:

first scan

scan_1_sequential

second scan

scan_2_sequential

Now let’s take a look at a more elaborate one( presented in the MozJPEG docs here):

0,1,2: 0-0,   0, 0 ;
0:     1-2,   0, 0 ;
0:     3-5,   0, 0 ;
1:     1-63,  0, 0 ;
2:     1-63,  0, 0 ;
0:     6-9,   0, 0 ;
0:     10-63, 0, 0 ;

The script specifies 7 scans that interleaves between the color components. The first scan loads the DC coefficient (as we have seen earlier, this is an essential step). The subsequent scans juggle between the AC coefficients for the 0 component. In the scans 4 and 5, the AC coefficients of the components 1 and 2 are fully loaded. The next steps take care of the remaining AC coefficients for the 0 component.

How to test:

A script file can be provided to MozJPEG or libjpeg(-turbo), using the -scans flag, for example:

cjpeg -outfile output.jpg -scans scan-script input.jpeg

It is useless to provide the -progressive flag when using -scans because it is not guaranteed that all the scripts will produce a progressive JPEG decoding. We have seen before that a script like 0 1 2; will lead to a baseline like decoding, and therefore by ignoring the -progressive flag, MozJPEG tells the user that by using the -scans flag, you are on your own.

As of the date this post is published, there is no tool that can help in the visual analysis of scripts by, for example, generating images after each scan step (e.g a timeline). Would it be helpful to have an online tool that does the visual analysis ? Happy to hear your thoughts on the comments section. If there is enough interest, we may create a free online tool for this purpose.

Wrap up:

We have explored the scan scripts used by libjpeg to customize the progressive decoding by providing the data that should be loaded in each scan. In this post, we have focused on how the DC and the AC coefficients that result from applying DCT constitute the building blocks of an image’s overall visual identity. We have seen how the Ss and the Se parameters allow selecting the coefficients to load in a scan. In a future post, we will talk about successive approximation and the Ah, Al parameters.

References

  1. The International Telegraph and Telephone Consultative Committee. (1993). Information technology – digital compression and coding of continuous-tone still images – requirements and guidelines. https://www.w3.org/Graphics/JPEG/itu-t81.pdf  2 3 4

  2. Rangan, V. P. (2003). JPEG (DCT). University of California San Diego. Retrieved 2022, from https://cseweb.ucsd.edu/classes/sp03/cse228/Lecture_5.html