Let’s begin with a quite simple instance and construct it up.

## Instance-1: Symmetric uint8 quantization

Let’s say we want to map the floating level vary [0.0 .. 1000.0] to the quantized vary [0 .. 255]. The vary [0 .. 255] is the set of values that may slot in an unsigned 8-bit integer.

To carry out this transformation, we wish to rescale the floating level vary in order that the next is true:

Floating level 0.0 = Quantized 0

Floating level 1000.0 = Quantized 255

That is referred to as symmetric quantization as a result of the floating level 0.0 is quantized 0.

Therefore, we outline a scale, which is the same as

The place,

On this case, scale = 3.9215

To transform from a floating level worth to a quantized worth, we are able to merely divide the floating level worth by the dimensions. For instance, the floating level worth 500.0 corresponds to the quantized worth

On this easy instance, the 0.0 of the floating level vary maps precisely to the 0 within the quantized vary. That is referred to as symmetric quantization. Let’s see what occurs when this isn’t the case.

## Instance-2: Affine uint8 quantization

Let’s say we want to map the floating level vary [-20.0 .. 1000.0] to the quantized vary [0 .. 255].

On this case, we now have a distinct scaling issue since our *xmin* is totally different.

Let’s see what the floating level quantity 0.0 is represented by within the quantized vary if we apply the scaling issue to 0.0

Properly, this doesn’t fairly appear proper since, in keeping with the diagram above, we might have anticipated the floating level worth -20.0 to map to the quantized worth 0.

That is the place the idea of zero-point is available in. **The zero-point acts as a bias for shifting the scaled floating level worth and corresponds to the worth within the quantized vary that represents the floating level worth 0.0.** In our case, the zero level is the detrimental of the scaled floating level illustration of -20.0, which is -(-5) = 5. The zero level is at all times the detrimental of the illustration of the minimal floating level worth because the minimal will at all times be detrimental or zero. We’ll discover out extra about why that is the case within the part that explains instance 4.

At any time when we quantize a price, we’ll at all times add the zero-point to this scaled worth to get the precise quantized worth within the legitimate quantization vary. In case we want to quantize the worth -20.0, we compute it because the scaled worth of -20.0 plus the zero-point, which is -5 + 5 = 0. Therefore, quantized(-20.0, scale=4, zp=5) = 0.

## Instance-3: Affine int8 quantization

What occurs if our quantized vary is a signed 8-bit integer as an alternative of an unsigned 8-bit integer? Properly, the vary is now [-128 .. 127].

On this case, -20.0 within the float vary maps to -128 within the quantized vary, and 1000.0 within the float vary maps to 127 within the quantized vary.

The best way we calculate zero level is that we compute it as if the quantized vary is [0 .. 255] after which offset it with -128, so the zero level within the new vary is

Therefore, the zero-point for the brand new vary is -123.

To date, we’ve checked out examples the place the floating level vary consists of the worth 0.0. Within the subsequent set of examples, we’ll check out what occurs when the floating level vary doesn’t embrace the worth 0.0

## The significance of 0.0

Why is it essential for the floating level worth 0.0 to be represented within the floating level vary?

When utilizing a padded convolution, we count on the border pixels to be padded utilizing the worth 0.0 in the commonest case. Therefore, it’s essential for 0.0 to be represented within the floating level vary. Equally, if the worth X goes for use for padding in your community, you should make it possible for the worth X is represented within the floating level vary and that quantization is conscious of this.

## Instance-4: The untold story — skewed floating level vary

Now, let’s check out what occurs if 0.0 isn’t a part of the floating level vary.

On this instance, we’re making an attempt to quantize the floating level vary [40.0 .. 1000.0] into the quantized vary [0 .. 255].

Since we are able to’t signify the worth 0.0 within the floating level vary, we have to lengthen the decrease restrict of the vary to 0.0.

We will see that some a part of the quantized vary is wasted. To find out how a lot, let’s compute the quantized worth that the floating level worth 40.0 maps to.

Therefore, we’re losing the vary [0 .. 9] within the quantized vary, which is about 3.92% of the vary. This might considerably have an effect on the mannequin’s accuracy post-quantization.

This skewing is critical if we want to make it possible for the worth 0.0 within the floating level vary may be represented within the quantized vary.

Another excuse for together with the worth 0.0 within the floating level vary is that effectively evaluating a quantized worth to examine if it’s 0.0 within the floating level vary could be very worthwhile. Consider operators similar to ReLU, which clip all values under 0.0 within the floating level vary to 0.0.

It’s important for us to have the ability to **signify the zero-point utilizing the identical information kind** (signed or unsigned int8) **because the quantized values**. This allows us to carry out these comparisons rapidly and effectively.

Subsequent, let’s check out how activation normalization helps with mannequin quantization. We’ll particularly concentrate on how the standardization of the activation values permits us to make use of all the quantized vary successfully.