Fixed-point math is a way to represent fractional values using integers. This is done by selecting a constant scaling factor that is implicitly applied to every value. The scaling factor defines a step size, and the integer value defines a number of steps. The number of steps is usually relative to zero. Fixed-point math is commonly used for audio and video signal processing where each sample is represented by an integer, either signed or unsigned, with a fixed number of bits. In many cases, the scaling factor is selected so the fractional range is -1.0 – 1.0 or 0 – 1.0, but other scaling factors may be used depending on the application. The following table gives some examples:
Type | Scaling Factor | Range |
8-bit unsigned | 1/256 | 0.0 – 0.99609375 |
16-bit unsigned | 1/256 | 0.0 – 255.99609375 |
16-bit unsigned | 1/65536 | 0.0 – 0.9999847412109375 |
16-bit signed | 1/32768 | -1.0 – 0.999969482421875 |
In computer programming, powers of 2 are commonly used as scaling factors because they can be applied by bit shifting. This is required when multiplying and dividing fixed-point numbers to restore the proper scaling factor after the operation. The following equations illustrate multiplication and division using fixed-point math:
The actual value represented is an integer multiplied by the scaling factor. | |
For multiplication, the correct fractional result is obtained by multiplying the integer result by the scaling factor. | |
For division, the correct fractional result is obtained by dividing the integer result by the scaling factor. |
In code, these operations require extra precision in the intermediate values. The following code example demonstrates fixed-point multiplication and division in C:
fixed_point.c
#include <stdio.h> #include <stdint.h>
/* Signed 16-bit integer range -1.0 – 1.0 */ #define SCALE_SHIFT 15 #define SCALE_FACTOR (1.0 / (double)(1 << SCALE_SHIFT))
int main() { int16_t a, b, c; int32_t temp;
a = 1234; /* 0.03765869140625 */ b = 8765; /* 0.267486572265625 */
temp = a * b; c = (int16_t)(temp >> SCALE_SHIFT); printf(“%d * %d = %d\n”, a, b, c); printf(“%f * %f = %f\n”, a * SCALE_FACTOR, b * SCALE_FACTOR, c * SCALE_FACTOR);
temp = ((int32_t)a << SCALE_SHIFT) / b; c = (int16_t)temp; printf(“%d / %d = %d\n”, a, b, c); printf(“%f / %f = %f\n”, a * SCALE_FACTOR, b * SCALE_FACTOR, c * SCALE_FACTOR);
return 0; } |
Expected Output
1234 * 8765 = 330 0.037659 * 0.267487 = 0.010071 1234 / 8765 = 4613 0.037659 / 0.267487 = 0.140778 |