Jaguar processor from AMD boasts four cores for mobile applications

August 29, 2012 // By Rick Merritt
Advanced Micro Devices will describe Jaguar, a low-power x86 core for notebooks, tablets and embedded systems at Hot Chips. Jaguar packs four x86 cores into one unit with a large shared L2 cache to compete both with Intel’s Core and Atom chips.

In a separate keynote talk, AMD will announce a follow-on for its HyperTransport processor interconnect. Freedom Fabric aims to link thousands of cores at more than a terabit/second, likely based on technology acquired from SeaMicro.

AMD is expected to try to make Freedom Fabric an industry standard across x86, graphics and ARM cores, competing with the proprietary Quick Path Interconnect on Intel’s CPUs. Last week, the RapidIO Trade Association said it is trying to get ARM and its SoC partners to adopt its technology as a processor interconnect.

As for the Jaguar core, AMD predicts that based on simulations it will deliver more than ten percent higher frequencies and more than 15 percent more instructions per clock than Bobcat, its current low power x86 core. Jaguar will appear in 2013 in AMD’s Kabini SoC for low-power notebooks and in Temash, AMD’s first sub-5W SoC, aimed at tablets.

The chip sports a re-designed load/store unit and an expanded 128-bit floating point unit. It includes several new instructions to support AES encryption, accelerate media processing and switch big/little endian structures for embedded systems. But the most novel aspect of the new core is its use of four x86 cores in a single unit sharing one L2 cache.

“From a core perspective we will call this a four-core unit that forms the building block of an SoC design,” said Jeff Rupley, an AMD Fellow and chief architect of Jaguar. “It’s possible to fuse off some cores for lower end or lower power designs,” he said.

AMD found sharing one 1-2 Mbyte L2 cache among the cores saves silicon area over using four private caches. It also provides a performance boost when only one or two single-threaded cores are running and can then access a larger memory pool.

“Generally the larger cache outweighs the latency” of needing an L2 cache interface, Rupley said. “There could be an app where the latency increase defeats the