Up to 8000 DMIPS performance anticipates compute-load growth in media designs
EDN Europe, 04 Oct 2007
4th October 2007 - ARM has extended its offering in multicore processing with the launch of the Cortex-A9 core to add “scalable performance within tight powers and cost constraints” - the company says. The fully synthesisable Cortex A9 will deliver up to 8000 “aggregate” DMIPS, from a four-core configuration, when designers optimise it for highest performance. This figure – equivalent to 2.0 DMIPS/MHz – is at the upper end of a performance range that extends down to a few hundred DMIPs, when you select a single processor core and also optimise it for low power. One of the key applications that ARM anticipates will need the high-end processing capability is set-top-boxes/media gateways, as those products evolve to handle multiple high-definition video streams and other resource-intensive features. As well as increased high-end performance, the objectives for the A9 are to provide scalability, enabling a range of performance/power/cost tradeoffs across a product range, protecting investment in software that will run on multiple products built on a common platform. Cortex-A9 is a new processor design in two variants, single- and multicore, the single core version being intended by ARM for design into cost-sensitive applications such as mobile handsets. The core uses a dynamic length, 8-stage superscalar, multi-issue pipeline with speculative out-of-order execution and is capable of executing up to four instructions per cycle in devices clocked at more than 1GHz. You can add modules such a an advanced level-2 cache controller, ARM’s NEON media processing engine, a single-and double-precision floating point unit, and a program trace macrocell for detailed on-chip debug. The architecture builds on ARM’s existing MPCore technology, using its advanced snoop control approach that, the company says, maintains cache coherency without intrusive interrupts to running processes, to yield near-linear scalability with increased numbers of cores. A feature called the accelerator coherence port shares the cache coherency with external special-function blocks (DMA or cryptography, for example) removing the need for repeated cache-flushing and reducing power. ARM’s multiprocessing programme manager John Goodacre observes that the company has already been supporting licensees through the transition to multicore processing for some time: he cites some 10 licenses for the ARM11 architecture, of which some are uni-processor, some dual and some x4. First steps into multicore, he notes, tend to integrate what is already being done at a board level, progressing in stages to symmetric multi-processing (SMP) on multiple cores. The software support is now in place, Goodacre says, to make the transition to multicore processing, “actually quite simple – all operating systems that use time-slicing can move on to multicore hardware.” Most embedded applications, he says, have implicit concurrency that can benefit from a multicore platform.