Abracadabra: making system interconnect disappear with FPGAs
ADVANCED SIGNAL-CONDITIONING TECHNIQUES CAN SOLVE INTERCONNECT PROBLEMS.
BY ANDY TURUDIC • ALTERA CORP -- EDN Europe, 01 Oct 2006
Designing system interconnect isn’t the most glamorous engineering job in the world. Indeed, to most system or digital designers, being assigned a backplane, interconnect, and connector selection is tantamount to being told to go stand in a corner with a pointy hat. In reality, what they need is a wand to go with that hat, because a lot of magic has to happen to make problems disappear.
Designers and system architects are realizing that the number of interconnections between functional blocks in systems is becoming unwieldy. Indeed, Moore’s Law not only applies to the number of transistors that you can pack onto an IC, but also drives the amount of data that flows around a system, between chips, modules or at a higher level. Classically, the digital information coding this data simply scales by increasing the number of parallel signals. This scaling continues until N interconnects multiply the bandwidth, which the maximum clock speed in the system dictates. In this scenario, the tricky part for the designer was to maintain phase alignment across N signal lines to a single clock line. A few years ago, clock speeds increased to the point at which phase alignment between bits and their clock was operationally marginal, driving even wider system buses to accommodate information-transfer needs.
BYPASS THE BANDWIDTH BOTTLENECK The PC industry, which silicon suppliers primarily drive, has recognized the ugly trend of I/O-count proliferation and recently reached the limit of what was possible with parallel interfaces— first by eliminating the parallel interfaces to peripherals, such as DVD and hard disks, with SATA (serial ATA) and then by defining a serial peripheral interface such as PCI Express. SATA operates at a native rate of 1.5 and 3 Gbps and will soon deliver 6 Gbps; PCI Express delivers 2.5 Gbps per lane (available with one, four, and eight serial lanes), soon scaling to 5 Gbps per lane. The architects of these serial interfaces cleverly disguised them so that above the transport layer, software cannot distinguish them from the legacy parallel implementations of those interfaces.
Chip manufacturers aren’t the only ones reaping the benefits of serial interfaces. These interfaces reduce the number of pc-board and cable interconnects by as much as two orders of magnitude. But they may still deliver an increase in information- sharing between points over their parallel ancestors, because they do not use a separate clock line. Figure 1 depicts an example layout of a PCI Express interface delivering twice the bandwidth in half the number of interconnects. The use of other serial-interface standards and data rates is also increasing significantly (Table 1).
Though standardized protocols facilitate midspan interconnections, as well as modular and interoperable system building blocks, a number of systems architects encapsulate standard protocols or data with signaling, switching, data integrity, and other channel overhead. In these proprietary systems, designers typically use a nonstandard baud rate, which in the past meant they had to implement their design in an ASIC. Now, designers can use FPGAs that accommodate both standard and proprietary serial protocols from 614 Mbps through 6.375 Gbps per lane, including “bonded” lanes in which data spans multiple serial links, building the aggregate bandwidth by the number of lanes added.
Serial-protocol and baud-rate support, however, is only the tip of the iceberg. The motivation for economic and reliable interconnect does not stop at a silicon vendor’s package or business interests. For serial links to succeed and grow in use, the circuitry to transport data must be fairly simple, low-power, and inexpensive. In many instances, the circuitry must fit into legacy chassis, and, for new systems, it must be “evergreened”—backwardcompatible and externally upgradable through simple plug-ins to quickly increase capacity, performance, and new features for customers, making it more difficult for competitors to take market share.
These economic, strategic, and business interests lend the appearance that the interconnect problem and its solutions are rather boring. On the contrary, it takes a substantial amount of design magic to make excessive interconnect and related problems disappear.
The challenge in most interconnects is to deliver a recognizable signal from the transmitter to the receiver. Most digital designers make a connection with any-sized wire and assume the receiver coincidentally understands the ones and zeros that the transmitter sends. In some cases, have heard of the reputation of fast edges, designers take great care to either series-damp or parallel-terminate a signal to eliminate false triggering and thresholds. Designers rarely give consideration to impedance, matching, or trace length, because rise and fall times are longer than /10, and the loss budget from transmitting to receiving is on the order of 3 dB for most logic. Designers could accomplish greater interconnect bandwidths by simply making buses wider, possibly adding layers to pc boards in the systems, and possibly shipping a sledgehammer with every card to facilitate insertion of hundreds of connector pins at a few ounces of force apiece.
MAKING A SIGNAL DISAPPEAR Although high-speed serial links may seem to deliver the Holy Grail of insertion force by reducing the number of interconnects by one or two orders of magnitude, they do not come without a price. The amount of consolidated bandwidth requires a high baud rate, mandating speedy rise and fall times. The system must deliver these high edge rates to the receiver by wires that must be insulated from each other. Unlike digital signals of less than 100 MHz, for which the resistance of interconnect wiring is the primary source of signal loss, at high edge rates (high frequency), a “skin effect” occurs. The electric fields in the wire cause electrons to conduct only in the “skin” of the conductor, thus dramatically increasing resistive loss compared to that of the bulk resistance of the conductor's full cross-section.
To further complicate matters, this crowding of electrons at the conductor surface produces intense electric fields in the interface between the conductor and the insulating dielectric that has energy- absorbing electrostatic dipoles. This dielectric-loss component is much greater than resistive losses for interconnect attenuation above 500 MHz for a typical combination of trace geometry, built on typical printed circuit board materials (Figure 2). In this example, the dielectric losses are twice that of the resistive loss at 2 GHz and diverge to almost five times the conductive loss at 10 GHz. Digital designers from the “Land of Ohm” must face nature’s sleight of hand, in which energy at gigabit speeds becomes lost in a once benign and invisible medium: a conductor’s insulation. In addition, familiarity, manufacturability, availability, legacy, reliability, and cost considerations anchor the constraint of using homogeneous FR4 as the board material of first choice for pc boards and backplanes.
THE MAGICAL PROPERTIES OF VIAS The curves in Figure 2 model losses in only a microstrip and do not include the effects from connectors, vias, and impedance discontinuities. Simply adding a 0.15-in.-long via stub to a signal trace creates 65 dB of further losses at 9 GHz compared to that of that of a backplane trace with no vias (Figure 3). These signal “suckouts” result from the resonance of the nonconnected portion of the via (the “stub”), yielding an approximate resonant frequency, f, in gigahertz of this one-quarter-wave LC structure in FR4 of: fk1/(4Lv), where k is a correction factor for stray reactance, L is the length of the via stub in inches, and v is the propagation velocity in nanoseconds per inch, which equals 174 psec/in. for FR4.
The novice faces the nonintuitive concept that a barrel of metal lacking an end-to-end electrical connection can resonate with strong electromagnetic fields (Figure 3a), thereby annihilating a signal that happens to share a summing node with that via. However, by using the full length of the via as respective input and output ports or by using counterbored or backdrilled vias, designers can model the resulting attenuation curve for an interconnect (Figure 3b) and measure the attenuation curve for a 1.25m trace on a GbX-connector-based, backdrilled, FR408 backplane from Molex (Figure 4). Although the via may appear to magically vanish, the backdrilling process simply tunes the resonant stub frequency to a substantially higher frequency and out of the information band of the transmitted signal.
Once the designer has achieved the lowest possible attenuation, what does a signal look like after it traverses the backplane? Inspection of the backplane’s attenuation curve in Figure 4 reveals that low frequencies are moderately attenuated, but there is also a linearly increasing attenuation of the signal with frequency. Severe signal degradation in the higher frequency components of a waveform—namely, in narrow, single-bit pulses and in its rising and falling edges—results in eye closure and ISI (intersymbol interference). For example, when you apply a PRBS (pseudorandom bit sequence) of high-speed NRZ data to a lossy channel, its high-frequency components become attenuated, and the system exhibits reduced amplitudes as pulses shrink to a single-bit width (Figure 5a). Rise and fall times are lowpassfiltered to the point at which the signal cannot reach full amplitude before signaling an adjacent complementary bit. The attenuation curve intuitively suggests that a high-frequency PRBS pattern would result in eye closure. The waveform shown in Figure 5b, captured at the far end of the FR408 Molex backplane, confirms this suspicion.
Having improved the passive aspects of the attenuation curve, yet having achieved a disappointing closed eye at the receiver, there are two cards left for system designers to play. One is to use a significantly more expensive, higher performance board material that reduces the dominant dielectric-loss component, as the loss tangent of the material specifies. It would reduce the slope of the attenuation curve such that the far end of the backplane might achieve an eye opening. The resulting distance that you can drive on FR4 versus a lower loss material, with tan() 0.01, contrasts with the yellow and green 20-dB-loss curves in Figure 6.
Apart from reducing the slope of the attenuation curve through the use of exotic board materials, designers have a second choice that retains the low-cost, manufacturable aspects of FR4. The trick is to actively distort the frequency response of the overall system such that the serial channel has a flatter effective frequency response. Designers can accomplish this objective by either de-emphasizing lowfrequency content or emphasizing high-frequency content of the transmitted and received signals themselves. Multiplying these channel responses and the harmonic content of a predistorted or equalized signal results in a flatter equivalent-frequency response of the signal from transmitter to receiver.
Why de-emphasize low-frequency content? Back in 1966, RL Johnson of RCA Labs postulated the Johnson Limit: For a given transistor structure and process, the product of breakdown voltage, VBR, and the transistor speed, ft, is limited. The Johnson Limit for CMOS is about 75VGHz, meaning that a 3.3V I/O transistor (VBR4.5V for the process) has about 16 GHz of bandwidth. After considering parasitics, this value is suitable for the edge rates you need in a multigigabit driver for a 90-nm FPGA operating as fast as 6.375 Gbps. The finite headroom on the output driver limits the total voltage compliance. Thus, physics limits the peak-to-peak output. For high-speed, lowjitter output drivers in a 90-nm CMOS process, the intentional predistortion, manifested as high-frequency overshoot, is subject to these limitations, making it appropriate to reduce the low-frequency portion of a waveform to yield the desired ratio of intentional overshoot to “steady-state” levels. This ratio has the overall effect on the launched waveform of decreasing lowfrequency components to compensate for the lower attenuation at low frequencies in the FR4 material. In practice, designers use multitap FIR filters for compensation in the transmitter, in which, at 6.375 Gbps, one pre-tap and two post-taps are sufficient for FR4 (specifically, FR408) backplane applications up to 1.25m. The predistorted transmit eye, using NRZ and PRBS, appears at the transmitting driver (Figure 7); Figure 5c shows the resulting eye at the receiver, after 1.25m of FR4, with two Molex GbX connectors, a 2m coaxial test cable, and five SMA connectors.
To further open the eye, designers can boost the high-frequency content at the receiver. Here, it is possible for designers to tune (equalize) the response of the receiver amplifier so that it exhibits a higher gain at high frequencies (Figure 8). With careful design, system or board designers can dynamically program-in an optimized amount of boost, because excessive boost decreases the SNR of the received signal, and insufficient boost may result in insufficient edge rates and eye opening. Figure 9 shows the result of having an optimized level of equalization. The figure depicts a noncompensated eye and equalized eye, respectively, after a non-pre-emphasized PRBS signal has traversed 107 cm of FR4 at 6.375 Gbps.
SIGNAL-INTEGRITY-OPTIMIZED FPGAs Designers can adjust pre-emphasis, equalization, and transmitting- drive levels to compensate for the frequency-dependent attenuation effects of connectors, discontinuities, and losses in a backplane, or even those of cable. Figure 10 shows 5.5m of standard RJ45-connectorized Category 5 wire at 3.125 Gbps using FPGA pre-emphasis capabilities. To illustrate, Altera’s Stratix II GX 90-nm FPGA with multigigabit transceivers has more than 5000 possible combinations, making optimization of signal integrity a nightmare in systems with multiple and varying channel lengths or properties. During the design phase of the system, designers can measure s-parameters for each card-slot position. They can then run the PELE (pre-emphasis-and-equalizationlink- estimator) program using The MathWorks’ (www.mathworks.com) Matlab code, which incorporates an FPGA-multigigabit-transceiver-specific model, to determine the optimal pre-emphasis, drive level, and equalization for each channel in the system. The dynamic configurability of an FPGA then comes into play. Designers can read the backplane-slot ID and then change settings to optimize the eye for each blade position without altering the configuration of the FPGA core. Designers can enable other system capabilities by using a small controller in the FPGA to automate the receiver equalizer, making it adaptive to system, environmental, and component variations.
FPGAs provide ready access to the density and performance that advanced process nodes offer, enabling system designers to develop highly integrated SoCs (Systems-on-Chips) in the shortest amount of time. FPGA vendors also offer debugged protocol IP, thus reducing R&D. FPGA packaging and design techniques mitigate simultaneous-switching-noise effects on all I/O, including serial links, allowing designers to use a large number of parallel I/O for devices such as external memory and ASICs. To address the issues of interconnecting large amounts of information—either intrasystem or intersystem—FPGAs are available now with large counts of multigigabit serial I/O. These high-speed, multigigabit transceivers have kept pace with the system need of remaining on low-cost FR4 materials, and provide lower bit-error rate, higher interconnect bandwidth, and lower power dissipation than discrete serializer/deserializer devices.
As CMOS-process nodes continue to shrink below 100 nm, most, if not all, systems will use an FPGA with multigigabit serial interconnects or a structured ASIC for integrating systems onto a chip. The advanced signal-conditioning techniques that this article presents, which are available in high-end FPGAs, will appear rather mundane in the coming decades, particularly as device speeds push into the tens of gigahertz of operation and as the magical tricks of signal-integrity wizards become common knowledge.
ACKNOWLEDGMENTS The author would like to thank Steve McKinney at Mentor Graphics; Eric Bogatin at www.bethesignal.com; Gourgen Oganessyan at Molex; and Leonard Dieguez, Tina Tran, Mark Flanigan, Naresh Raman, Venkat Yadavalli, Samson Tan, Michael Woo, and Dave Greenfield at Altera.
AUTHOR’S BIOGRAPHY Andy Turudic is a senior manager for Altera’s high-end-FPGA product line. He is currently investigating advanced, high-speed applications and signal integrity of multigigabit FPGAs in backplanes and cables. He has been involved in research, development, applications engineering, and marketing of high-speed serial communications, PLLs, and mixed-signal devices for more than 26 years. He has a bachelor’s degree in electrical engineering from the University of Windsor (Windsor, Ontario, Canada), holds eight US patents, and is a senior member of the IEEE.