Many microprocessors exist today with hardware capabilities that create favorable conditions for digital signal processing (DSP). The key however is choosing a hardware solution that allows proper implementation of DSP routines without the unnecessary overhead, thus reducing overall system cost. The two limiting factors when choosing proper hardware, disregarding the obvious necessity of peripheral availability, come down to size and speed.
Clock speed needs to be great enough to handle any peripheral drivers, while also leaving enough room for the often MIPS hungry DSP routines. Size on the other hand needs to be accommodating, often in terms of faster read/write access SRAM where many of the loop-heavy Fourier Transform DSP routines can be located. It is also important to have internal flash memory rather than external, as the latter is much larger, but also much slower in accessing. Where DSP is concerned, speed is much more important as codec code is often less than 512kb and often fits internally. Cache memory is also important to factor in here, as a cache can greatly improve the performance of many applications including DSP. Unfortunately, usually the transition to cache enabled is also much more costly.
Cortex M4 Features
In consideration of the aforementioned, we now can take a look at the ARM Cortex M4 and why it is a good hardware choice for many DSP applications. For one, the M4 was created with some intrinsic functions that can greatly reduce the cost of many functions that are used in signal processing, such functions include saturated math (think QADD, SMLAD, etc).
Saturated math is a cornerstone of any DSP system from the minute a signal is picked up, to the AD conversion, to the coder and finally the decoder. The M4 processor, by reducing the clock cycles that saturated math functions take, can greatly reduce the cost of these heavy use functions. These functions are also intrinsically inlined, thus skipping any push/pop that could occur while branching the function, further reducing the CPU cost. In addition to the intrinsic functions provided by the M4, the 16/32-bit MAC, dual 16-bit MAC, and 8/16-bit SIMD arithmetic are all single cycle.
Cortex M7 Comparison
At this point, it is prudent to point out that ARM has recently released the Cortex-M7, which is still in development for most chip manufactures. The M7 seems to be a major game changer as far as performance is concerned. The M7 has all of the same features of the M4, but with a few key differences. Both a 64-bit data bus and instruction bus for the M7 in place of the M4’s 32-bit buses greatly increase the throughput. Also the M7 doubles the M4’s 3-stage pipeline, which almost doubles the power efficiency.
Cortex Mx Family
Overall the M7 looks to be a processor with a lot of kick to it. The question then becomes is it too much for many DSP applications. The answer to that remains to be seen, but for performance and price, the M7 is a bit of overkill for most applications. The M7 may be handy for applications in which data throughput is larger, such as video or high-bitrate audio, but for much of the lower bit-rate applications its increased power would not be justified by its higher price tag.
In addition to the M7, it should be noted that the M3 is also a viable choice for DSP, but lacks the saturated math that is intrinsic to the newer M4 and M7 processors. If your DSP application has few or none of these functions, then the M3 may be a better choice.
To finalize, there are numerous applications in which the M3, M4, or M7 processor would each be best suited; however, the applications in which the M4 is a better choice are much greater. Thus the Cortex M4 can be seen as the lightweight DSP solution for most scenarios.