When a time domain signal is transformed to the frequency domain, a corresponding shift exponent is determined for that given frame. This exponent is related to the input signal energy for the windowed frame, and is used for rescaling the frequency domain data. Therefore, if one is to apply a frequency domain FIR filter with multiple taps, the code would appear as follows:
/* apply filter to each frequency bin */ for (j = 0; j < FFT_BINS; j++) { fcomplex_q30 result = 0; fcomplex_q15 *coef = &input_coefs[j][0]; fcomplex_q15 *data = &input_data[j][0]; /* apply complex FIR */ for (i = 0; i < FILTER_LENGTH; i++) { result.real += coef->real * data->real; result.real -= coef->imag * data->imag; result.imag += coef->real * data->imag; result.imag += coef->imag * data->real; result.real = SHIFTOP(result.real, data_exponent[i]); result.imag = SHIFTOP(result.imag, data_exponent[i]); coef++; data++; } /* save result */ output->real = result.real; output->imag = result.imag; output++; } |
From a compiler’s perspective, this loop is not ideal for optimization due to the presence of shifting operations amongst the multiply accumulation. These shifts interrupt the parallelization and vectorization of the loop. To remove the shift from the inner loop, the filter data can be scaled prior to the filtering operation. This is shown below:
/* pre-scale the filter data */ for (j = 0; j < FFT_BINS; j++) { fcomplex_q15 *data = &input_data[j][FILTER_LENGTH–1]; data.real = SHIFTOP(data.real, data_exponent[FILTER_LENGTH–1]); data.imag = SHIFTOP(data.imag, data_exponent[FILTER_LENGTH–1]); }
/* apply filter to each frequency bin */ for (j = 0; j < FFT_BINS; j++) { fcomplex_q30 result = 0; fcomplex_q15 *coef = &input_coefs[j][0]; fcomplex_q15 *data = &input_data[j][0]; /* apply complex FIR */ for (i = 0; i < FILTER_LENGTH; i++) { result.real += coef->real * data->real; result.real -= coef->imag * data->imag; result.imag += coef->real * data->imag; result.imag += coef->imag * data->real; coef++; data++; } /* save result */ output->real = result.real; output->imag = result.imag; output++; } |
Of course, there is an engineering tradeoff with this loop optimization. Shifting is a multiplication operation. The distributive property of multiplication would make the results of the two filter designs identical. However, this is only true if the bit depth of the fixed-point data is large enough. Applying the shift prior to the filter could result in a loss of precision of the filter data and filter output. One needs to determine if this loss of precision is statistically significant to their filter design.