Transform slow software into fast hardware with Vivado HLS

September 03, 2014 // By David C. Black, Doulos
Learn how HLS enables speed-ups in your software: an exploration of some issues involved when accelerating software with high-level synthesis (Vivado HLS). This article provides an overview of the tasks and knowledge needed to perform such a transformation using the Zynq All Programmable SoC as the foundation for the design. (image; Xilinx)

Have you ever written some software that, despite your best coding efforts, didn't run as fast as desired? I have.

Have you thought, “If only there were an easy way to put some of the code into multiple custom processors or custom hardware that wasn’t so expensive?” After all, your application is one of many, and custom hardware takes time and money to create. Or does it?

I began rethinking this proposition recently when I heard about the Xilinx high-level synthesis tool, Vivado HLS. In combination with the Zynq-7000 All Programmable SoC, which combines a dual-core ARM Cortex-A9 processor with an FPGA fabric, high-level synthesis opens up new possibilities in design. This class of tools creates highly tuned RTL from C, C++ or SystemC source code. Many purveyors of this technology exist, and the rate of adoption has been increasing in recent years.

So, how hard would it be to migrate some of that slow code into hardware, if indeed I could simply use Vivado HLS to do the more demanding computations? After all, I usually write my code in C++, and Vivado HLS uses C/C++ as an input. The presence of the ARM processor cores means I could run the bulk of my software in a conventional environment. In fact, Xilinx has even made available a software development kit (SDK) and PetaLinux for this purpose.

Architectural concerns

As I started to think about this transformation from a software perspective, I grew concerned about the software interface. After all, HLS creates hardware dedicated to processing hardware interfaces. I needed something easy to access, such as a coprocessor or hardware accelerator, to make the software run faster. Also, I didn’t want to write a new compiler. To make it easy to exchange data with the rest of the software, the interface needed to look like simple memory locations where we could place the inputs and later read back the results.

Then I made a discovery. Vivado HLS supports the idea of creating an AXI slave, with relatively little effort. This capability started me thinking that an accelerator might not be so difficult to create after all. Thus, I found myself coding up a simple example to explore the possibilities. I was pleasantly surprised with how it turned out.

Let’s take a walk through the approach I took and consider the results.

next; choose a sample task...