“To the best of our knowledge, it is the world’s first 1,000-processor chip and it is the highest clock-rate processor ever designed in a university,” said Bevan Baas, professor of electrical and computer engineering, who led the team that designed the chip architecture. While other multiple-processor chips have been created, none exceed about 300 processors, according to an analysis by Baas’ team. Most were created for research purposes and few are sold commercially. The KiloCore chip has been fabricated and run; it was built by IBM using its 32 nm PD-SOI CMOS technology.
The basic architecture is MIMD (multiple instruction/multiple data) and each of the 7-stage-pipelined cores is a general purpose unit with a 72-instruction set, single instruction/cycle. The team says that none of the instructions is ‘algorithm-specific’ - so distinguishing it from a GPU-class device. The 1.78 trillion instructions/sec figures comes with a clock speed of 1.78 GHz, at 1.1V: running at 0.84V and 1 GHz consumes 13.1W, while peak power efficiency of 5.8 pJ/Op is quoted at 0.56V and 115 MHz.
Each core is independently powered and can shut down to leakage-only power if it has no task to perform. Rather than a cache architecture, every processor can store instructions and data in a hierarchy of locations; local memory, one or more nearby processors, on-chip independent memory modules, or off-chip memory. Each processor communicates via a high-throughput circuit-switched network plus a packet-switched network (both on-chip). The team says there is little energy overhead to source operands from companion processors some way across the chip, as ‘wormhole’ routing is employed. That is, messages from an adjacent or nearby core will be routed via the ‘circuit’ network; those from further away in the processor matrix will travel via the packet network. Each core has north-south-east-west comms buffers plus a fifth channel for host-processor traffic; maximum throughput is 45.5 Gbps per router and 9.1 Gbps per port at 1.1V.