How to achieve earlier, faster subsystem performance analysis and debugging

November 04, 2014 // By Nick Heaton and Avi Behar, Cadence
To understand system performance, engineers have traditionally relied on testbenches to model corner-case scenarios that can cause performance bottlenecks. This is a time-consuming, manual process, where it’s difficult to mimic real traffic situations that lend themselves to productive hardware debugging.

This article discusses technologies and techniques that make it possible to, early in the design process, model realistic traffic that taxes the interconnect in order to quickly identify and resolve performance bottlenecks.

Introduction

Corner cases—those exceptional, unexpected scenarios or sequences of events that wreak havoc on otherwise well-behaving designs—happen. While you may not be able to prevent corner cases, you can take steps to model them in order to debug the hardware in your design to minimise their impact.

Understanding system performance calls for a considerable investment in testbenches that you can use to put your system through corner-case scenarios that can cause performance bottlenecks. Done manually, this can involve weeks or even months of testbench coding. And this doesn't include accommodating changes in the design. Who can afford this investment in time and resources? What’s more, once you’ve detected the performance bottlenecks, how can you efficiently find and debug the causes?

Fortunately, there are technologies and techniques available that help you automate testbench creation and accurately model the kind of traffic that a given design is anticipated to experience. With these insights, you can productively accomplish cycle-accurate performance analysis of bandwidth and latency in your design.

Cycle-accurate performance analysis

Traditionally, one way to generate the kind of realistic traffic that will burden [stress] a system-on-chip (SoC) interconnect has involved a lot of waiting. After all, it’s only at the end of the register-transfer level (RTL) simulation stage that you would have in place all of the intellectual property (IP) and associated software drivers. Of course, the closer you are to the end of your design cycle, the costlier it is to make changes.

Another solution is to model all of the IP in SystemC and run early versions of the software on top. There are many limitations to this approach, not the least of which is that the models are not cycle-accurate. However, worse than this is that many components of the SoC infrastructure may be extremely complex and, in many cases, provided by third-party providers (the ARM CoreLink CCI-400 Cache Coherent Interconnect is an example of such an IP). This limits the availability of models and may force analysis to be deferred until RTL analysis has been performed.

Ideally, it would be great to run performance analysis simulations with the cycle-accurate RTL of the interconnect subsystem. In this approach, we would add critical IPs such as the DDR controller, while removing dependency on the availability of other IPs by replacing them with traffic synthesisers that drive realistic traffic patterns representing the replaced IP.

Coupling this approach with a tool capable of automating the creation of the necessary testbench would greatly reduce the effort and risk associated with manual testbench creation. This is especially true as experience shows that interconnect configuration frequently changes during the design cycle.

GUI-based tool automatically generates testbenches

Cadence’s Interconnect Workbench is a tool with two major capabilities. One: it automatically generates testbenches tailored for functional verification and performance analysis of complex interconnect subsystems. Two: the tool provides a powerful GUI for analysing the performance metrics collected while running simulations using the generated testbenches. These testbenches use Cadence Verification IP to replace selected IP blocks in your design and gain access to faster simulation and a higher level of control over simulation traffic. Verification IP monitors can assess traffic at each of your interconnect ports. Making cumbersome spreadsheets redundant, the GUI has built-in filters for choosing the masters, slaves, and paths that you want to evaluate. Rather than running multiple, lengthy simulations, the tool can quickly identify the critical paths for debugging.

By using Interconnect Workbench on its SoC, one leading communications technology company reduced its interconnect verification effort from eight man-months down to one man-month, gaining important insights into latency, bandwidth, and outstanding transaction depth.

Here’s a summary of what Interconnect Workbench can do:

- Automatically generate Universal Verification Methodology (UVM)-compliant performance and verification testbench code from ARM CoreLink AMBA Designer output (interconnect fabric RTL and IP-XACT metadata)

- Deliver cycle-accurate performance analysis, plus a performance analysis cockpit that lets you visualise, discover, and debug system performance behaviours

- Collect all transactions and verify the correctness and completeness of data as it passes through the SoC interconnect fabric, via integration with Cadence Interconnect Validator Verification IP

Figure 1: Data flow through Interconnect Workbench. RTL, Verification IP, and traffic pattern descriptions move into the tool, which automatically generates a testbench for simulation. As other variations of SoCs are generated, the tool can generate additional testbenches. The performance GUI provides an overview of simulation results. Performance metrics can also be collected from manually created testbenches, as long as they include an instance of the Interconnect Validator.

next; Top-down debug...