New Multiplier-Accumulator accelerates polynomials

November 29, 2016 // By Samuele Raffaelli & David Vincenzoni, Azcom Technology and STMicroelectronics
The classical Multiply and Accumulate (MAC) architecture represents the best solution for the implementation of many general purpose algorithms. This structure is found in DSPs, and also in some microcontrollers to increase computational performance. In this article, we will describe a slightly different MAC architecture, one that more efficiently computes polynomials. The data path is IEEE 754 single precision floating-point, but of course could also be fixed-point or double-precision floating-point.

Classical MAC architecture

The classical MAC unit computes the product of two numbers and adds the product to an accumulator (Z), thus implementing the following operation:

 

Z ← Z + (A × B)

 

implemented as:

 

 

Figure 1. Classical MAC architecture

 

This is a general-purpose architecture widely used to solve algorithms that can be converted into a sum of multiplications (e.g., filters, polynomial computation).

 

Polynomial computation and Horner’s method

Being that every real function can be approximated by a polynomial of sufficient degree (think of a Taylor series), such a computation should be considered whenever function evaluation has to be performed. To this aim, let’s consider Horner’s method, which represents the best known algorithm to compute polynomials, requiring a minimum number of arithmetic operations.

 

Given the real polynomial:

 

p(x) = an·xn + an-1·xn-1 + ... + a1·x + a0

 

Evaluating at x using Horner’s method, we proceed through the following steps:

 

zn = an

zn-1 = zn·x + an-1

z0 = z1·x + a0 = an·xn + an-1·xn-1 +...+ a1·x + a0 = p(x)

 

For polynomial degree n, calculation of p(x) requires only n additions and n multiplications (or n multiply-adds in Horner’s steps).