Xilinx creates “UltraScale” FPGA architecture for move to 20-nm process

July 10, 2013 // By Graham Prophet
The FPGA maker has announced it has taped-out its first 20nm all-programmable device, and anticipates a 1.5-2X system-level performance when devices are available

Xilinx has implemented a new architecture it terms the “industry’s first ASIC-class programmable architecture”, UltraScale. The UltraScale architecture was developed to scale from 20nm planar, through 16nm (and beyond) FinFET technologies, and from monolithic through 3D ICs. It not only addresses the limitations to scalability of total system throughput and latency, but directly attacks what the company calls the number-one bottleneck to chip performance at advanced nodes: the interconnect.

At 28 nm, Xilinx says, the ability to place 2million logic cells on a device took it into new customer sectors, especially those concerned with very high data rates and throughputs, at low latency. UltraScale is designed to serve that constituency; it is structured, Xilinx says, for 3D chips. The company confirms that, for the immediate future, “3D” means what has otherwise been termed 2.5D – separate dice laid side-by-side on a passive silicon interposer. UltraScale increases routinng on each die and also provides for very wide interconnected between chips via the interposer, and to off-chip memory. It is also easier to route, the company says, adding, “An innovative architectural approach is required to manage multi-hundred gigabit-per-second levels of system performance with smart processing at full line rate, scaling to terabits and teraflops. The mandate is not simply to increase the performance of each transistor or system block, or scale the number of blocks in the system, but to fundamentally improve the communication, clocking, critical paths, and interconnect to address the massive data flow and real-time packet, DSP, and/or image processing. The UltraScale architecture addresses these challenges by applying leading-edge ASIC techniques in a fully programmable architecture. It supports massive data flow with optimised wide buses that support multi-terabit throughput; has multi-region ASIC-like clocking, power management, and next-generation security; and has highly optimised critical paths and built-in high-speed memory, with cascading to remove bottlenecks in DSP and packet processing. It provides massive I/O and memory bandwidth with latency reduction and 3D