Current Issue
|
Texas Instruments Technical Resources
FREE Vicor PowerTechtorials CD
Money for [from] nothing - comment 5/3/2010
MORE BLOG POSTS
The amount of DSP capability now available in programmable-logic chips is formidable, and it is a route that design teams are increasingly pursuing to add hardware signal-processing capability, especially in an acceleration or co-processing mode. The FPGA vendors are proud of their early adoption of leading-edge process technology, with 65-nm chips already in the market—the exact process technology that the vendor uses to build a chip hardly matters to the user, save for the fact that it means you get a lot of logic for your money. FPGAs have evolved away from providing a great deal of very basic logic resources (of course, they still do that as well) to providing programmable functional blocks at a higher level of complexity. The applicability of FPGAs to signal processing has led their designers to pre-configure multiple units of basic DSP functionality—the standardmultiply/accumulate function (MAC) and other arithmetic functions—in the repetitive way that reflects their underlyingstructure. In the larger FPGAsof the current generation, you certainlyget a lot of DSP resource as well as basiclogic. This lends itself to the generationof impressive numbers in terms ofraw processing steps—take the numberof MACs on the chip, multiply by the(datasheet maximum) clock rate, andyou get large values of MMACs (millionof MACs), well into the GMAC(GigaMAC) region.
Are such devices, therefore, now in competition with stand-alone DSP products such as those from Texas Instruments (for example the TMS 320-family chips and derivatives) or Analog Devices (Sharc, Blackfin)? The answer is: yes and no. Yes, in that they both provide hundreds-of-MHz execution of DSP fundamental operations; no, in that they are hardly—if at all—head-to-head competitive for the same application socket. As a hardware solution—even if a programmable one—the FPGA matches the task best when the magnitude of the processing task has out-run the software-programmable option. Applications that have a high degree of parallelism and require repetitive execution of similar functions simultaneously across many paths or channels are those that FPGA solutions increasingly address.
Perhaps a more meaningful comparison is between FPGA solutions and“traditional” ASICs: if there is a confrontationgoing on at all, it is betweenthose two technologies. Vendors oftencite the cellular-radio basestation asone of the key battlegrounds in whichtheir programmable chips are gainingground. Designers might use theDSP-rich variants at the front-end ofsuch a system, on the RF transceivercard itself, after the receiver’s A/Dconverter. A similar problem occurs insurveillance-type radios that monitora wide bandwidth. Immediately thesignal passes through A/D conversion,the problem becomes one of datareduction, filtering multiple channelsegments. Now that clock speeds arein the mid-hundreds-of-MHz, FPGAshave the speed to address this problem.
In the light of rapidly evolving standards, the attractions of programmable hardware for its ability to replace or tune an algorithm in hardware, without an ASIC revision, are obvious. Most of the FPGA vendors also quote the military/ aerospace community as converts, especially in respect of systems such as software-defined radio. Once again, being able to put hardware—and, therefore, fast—algorithms directly in the signal path and being able to re-configure those algorithms at will has obvious appeal. Another application domain in which DSP-optimised programmables claim success is medical imaging.
Each of the FPGA architectures, from each vendor, will also support an assortment of CPU cores, either loaded as soft IP or embedded as hard cores. You could feasibly contemplate building (in effect) a customised stand-alone DSP with a microprocessor core and the on-chip DSP resources of the FPGA. However, feedback from the device vendors appears to indicate that this is not the most common mode of using their products for signal processing; where FPGAs are performing DSP it is, for the most part, the multiple- channel co-processing model that designers are employing.
Indications of a sustained upswing in the usage of programmables in DSP include a report from EDA-software supplier Synplicity, stating that 2006 saw a significant increase in business for its Synplify DSP package. The company calls this software an ESL-design tool, because when using it you work directly from an algorithmic level towards implementation on a range of manufacturer’s FPGAs. At the design-exploration and -entry level in Synplify DSP, you are working in a Matlab and Simulink environment, using a drag-and-drop graphical interface to configure blocks of IP and to enter algorithmic expression of functions you wish to implement. Once a system is working in simulation at the highest level, you impose on it a set of implementation parameters (number of data paths, bit widths, number types and the like) and the system provides a set of optimisation paths on the way to producing workable RTL. From that point, implementation follows a conventionalFPGA-programming route.
Users have, Synplicity says, praised the performance gains that flow from being able to optimise the design at the RTL stage and to perform re-timing, and the fact that the code it produces is comprehensible to—and changeable by—designers.
Synplicity says that the package has detailed knowledge of silicon that Actel, Altera and Xilinx produce, and provides a choice of hand-off points to the support software of each supplier— each of them having its own support for the DSP features of their respective devices. For example, Actel provides direct access to the Synplify DSP software within its Libero integrated design environment (IDE), and in the case of Xilinx parts designers can use both the Synplify DSP software and Xilinx System Generator to create optimised DSP designs.
The latest release of the Synplify DSP software, v3.2, added IP blocks for communications algorithms. These include blocks for building adaptive filtering applications, and a Viterbi. They appear in the Simulink/Synplify DSP library, giving users the flexibility and extensibility of features like vector signals and fixedpoint data-type propagation. The new IP blocks, Synplicity says, make design capture faster and much more concise for developers working with WiMAX, 802.11, and other OFDM-based wireless standards.
At Synplicity’s UK office, European business-development director Doug Amos confirms that in Europe, users of the package are mostly in either the military/aerospace or telecom disciplines, and that the highest proportion of usage is among system architects and with implementation specialists. The application areas that the company sees include multi-channel wireless designs and image/video processing in professional and broadcast equipment. In the video domain, the rapid proliferation of transcoding systems providing realtime scaling between different media standards is also an area of interest.
Amos continues, “Most of our users [for the DSP package] are FPGA engineers already—processors tend to be operating at a lower bit rate. The tool becomes a medium of communication between architects and designers. The architects can create a Simulink diagram that does something, but yields a specification that’s too big for a real FPGA. The tool provides a means of reducing that specification to something that FPGA synthesis can accept while keeping the process understandable to both groups in the process.”
Early in February, Xilinx announced that it is now shipping the “SXT” version of its flagship Virtex 5 FPGAs (Figure 1).
Both Xilinx and Altera build variants of their leading-edge products that include different blends of functional blocks in order to optimise the parts for different application areas. Chips for DSP have a higher proportion of DSP hardware (and memory) to “plain” logic. Both companies have also evolved an asymmetrical“stripe” layout to lay down these blocks.The largest—of three—parts in Xilinx’SXT family will deliver a theoreticalmaximum of 352 GMACs at a clockspeed of 550 MHz. Xilinx talks in termsof a DSP “slice” as the base module ofDSP capacity. This version is called theDSP48E and it includes a 2518-bitmultiplier, a 48-bit second stage foraccumulation and arithmetic operations,and a 48-bit output that configurationsoftware can expand to 96 bits. Thewider data path and output enableincreased dynamic range and higherprecision as well as support for singleprecisionfloating-point operations. TheSXT devices range in logic density from35,000 to 95,000 logic cells and have 192to 640 of the DSP48E “slices”, togetherwith up to 10.3 Mbytes of memory thatoffers a maximum aggregate bandwidthof 58 Tbits/sec. The February 2007 announcementwas in respect of the midrangemember of the trio, selling for $299(1000): Xilinx says it will add the smallerand larger devices within four months.In parallel with the Virtex product lineXilinx offers the Spartan series, at a lowercost point and with a more restricted setof features.
Lattice launched its ECP2M devices in September last year, comprising five devices ranging in density from 20,000 to 95,000 LUTs (look-up tables); they carry 1818 multipliers, ranging from 24 to 168 across the device family. The series joins the ECP2 family the company had released earlier; according to Lattice’s marketing vice president Stan Kopec, this is the family that designers of DSP-intensive systems most often select.
The ECP series is Lattice’s “economy plus” line, which the company says integrates features previously only present on higher cost FPGAs. Kopec adds, “Not everyone needs hundreds of thousands of logic elements, but applications such as consumer video are taking DSP into cost-sensitive areas. For example, some degree of filtering or processing is located close to the camera or other sensor, and only pre-processed data is forwarded to a central processor.” Kopec also sees interest in this class of application from the automotive community, for filtering and processing of feeds from in-car cameras and sensors. On Lattice’s chips, a functional block called sysDSP (Figure 2) provides DSP-specific resources; this block supports four functional elements in three data-path widths (9, 18 and 36). The configuration software—in this case the company’s ispLEVER design-tool suite—can configure each block to provide a variety of arithmetic functions; MULT, MAC, MULTADDSUB, and MULTADDSUBSUM. Multiple blocks comprise a highly parallel DSP provision that ranges up to 63 GMACs, based on a clock frequency of up to 375 MHz.
The “M” suffix indicates an uprated memory provision, compared to the ECP2 series, that spans 1.2 to 5.3 Mbits. At announcement, Lattice anticipated that the complete family would be in production in the first half of 2007, with pricing for a mid-range part being $23 in high volumes.
In November 2006, Altera announced its Stratix III series of FPGAs, beginning a new generation of its high-end products, and the corresponding reducedcost Cyclone family should follow it within days (of the appearance of this article). EDN Europe recently reported on Stratix III (Reference 1): it follows the model of variants for different application sectors, one of which is DSP. The largest part (again, there are variants that have lower numbers at lower cost) hosts 896 18x18-bit multipliers— in fact, full MAC blocks, and associated arithmetic logic, that will run at up to 550 MHz. A headline figure of 600 GMACs is the “maximum throughput” figure for Stratix III. There are variable bit-width support, plus features to build efficient tapped delay lines, and a fast adder component in the mix.
Altera has said that Stratix III chips will appear through the course of 2007. Altera’s Ro Chawla, European corporate marketing manager, says that he detects a number of reasons for the growing acceptance in the market for FPGAs as a vehicle for processing-intensive DSP.
Designs gain speed through parallelism, and benefit from scalability (you can repeat the same algorithms over few or many signal paths); applications such as “triple-play” impose high processing burdens; but there are also factors such as protection from obsolescence. Once you have the software developed for a given FPGA platform, it can “live on” through successive product generations, and you can run it on newer devices without re-design, protecting investment in IP at the algorithm level.
“The hardware is in place,” Chawla says, “but we need to invest more in software.”
He notes that Altera has already put considerable effort into ESL-style design software in the form of its DSP Builder tool: as with the third-party offering from Synplicity, you can carry out algorithmic development in a modelbased environment with The Mathworks’ Simulink, and transfer into the established HDL-based synthesis environment of the regular device-programming tool flow; C-to-HDL design flows are also possible. (Figure3) shows Altera’s view of the matrix of possible tool flows that can take you from concept to working device in an FPGA, DSP project.
You can, therefore, begin a design from a “traditional” HDL standpoint; or from the customised block sets available within the Matlab/Simulink environment; or using a C-to-hardware approach such as those Celoxica or Mentor Graphics offer. The FPGA vendors have their own DSP-centric tool flows (Xilinx has, for example, promised new releases of its System Generator for DSP and AccelDSP tool suites at around press time), and Altera has also hinted that it may extend the C-to-hardware tools it already has in place for its embedded processor cores (C2H) further into the hardware domain.

REFERENCE
1 Prophet, Graham, “Altera moves Stratix to low-power generation III for 2007,” EDN Europe, December 2006, pg 16, http://www.edn-europe.com/ article.asp?articleid=375.