Get Complete Project Material File(s) Now! »

## Linear Constant-Coefficient Difference Equations

A convenient way of describing LTI systems is using constant-coefficient difference equations (CCDE). CCDEs define the relationship in time domain between an input signal u(k) and the output y(k) as N1 N2 X X ai y(k ¡ i) ˘ bi u(k ¡ i). (1.10).

Throughout the manuscript we restrict ourselves to the case of real coefficients ak and bk. We can normalize the coefficients in order to have a0 ˘ 1, so that the above equation is rearranged as N2 N1 X X ai y(k ¡ i). y(k) ˘ bi u(k ¡ i) ¡ (1.11).

CCDE (1.11) describes an IIR linear system. If 8i ¨ 0, ai ˘ 0, it describes a FIR system. The difference may be interpreted in the following way: in FIR filters the output is computed solely out of the input signal, while the output of the IIR filters also depends on the previous N2 outputs. Everywhere in the thesis we consider filters to have zero initial conditions, i.e. in response to the zero signal, a zero signal is generated.

### Quantization and computational errors

We have already seen that in general rounding errors occur during the conversion of a real (exact) number to some FxP format. Further we will refer to the rounding errors of coefficients of the filter as quantization errors. Bounds on the quantization errors can be deduced with respect to the quantization step and rounding mode.

Consider a real number t 2 R that is represented in FxP with format (m,‘). Denote by ¢t the quantization error between t and its FxP approximation tˆ: ¢t ˘ tˆ ¡ t. The bounds on ¢t for different rounding modes are summarized [6] in Table 2.1.

Apart from the quantization errors, in the general case computational errors are induced after every arithmetic operation on FxP numbers. Usually, in the computational units the result of intermediate operations is stored with larger precision and then rounded to the initial precision. The errors due to this rounding can propagate and be significantly amplified.

Example 2.3. Let cˆ be some number rounded to FxP format (mc,‘c) and let t be a variable (exact quantity) in FxP format (mt,‘t). The product of these two numbers can be exactly representable on wc ¯ wt bits.

Suppose the rounding error ¢c is bounded in its absolute value j¢c j ˙ 2‘. In other words, the last bit of cˆ is potentially erroneous. Then, by using the classic long multiplication method we may remark that this potentially false bit will propagate through the multiplication in such way that the last wt ¡1 bits of the product become potentially false, too. Therefore, we obtain that the error between the exact product c ¢ t and cˆ › t (here › means FxP product) is bounded in absolute value by 2wt¡1. When implementing FxP algorithms, the classic problem is to estimate an upper bound on the error of some operation, given the formats of the operands. However, during the implementation of digital filters we will look at this problem in another way: given the required upper bound on the output error, determine the least formats of the operands that guarantee this error.

The reader may have remarked that filter structures (e.g. state-space) are often based on the computation of Sums-of-Products by Constants (SOPCs). To ensure some properties of the implemented filters we need to define how these Sums of Products are computed in FxP arithmetic. Further in this thesis we will rely on the following result presented and proved in [6, 41, 42]. Proposition 2.1 (Faithfully rounded Sums-of-Products). Let c1, c2, . . . , cn be some quantized FxP numbers with respective formats (mci ,‘ci ) and t1, t2, . . . , tn be FxP variables with respective formats (mti ,‘ti ). Suppose we need to compute the SOPC S ˘ Pn ci ti in FxP arithmetic and i˘1 represent the output in the format (ms,‘s) (and we know that there will be no overflow).

• the results of FxP multiplications are stored on wci ¯ wti bits, and.

• the suml is performedm on the extended by g guard bits FxP format2 (ms,‘s ¡ g), where g ˘ log2(n) ¯1, and then rounded to the final format (ms,‘s) (see Figure 2.3).

we can guarantee that the FxP result is a faithful rounding of the exact sum of product, i.e. the absolute error is bounded by 2‘s .

#### Normalized IEEE 754 Binary Floating-Point

The 2008 version of the IEEE standard supports radix-2 and radix-10 systems. In this thesis we consider only radix-2 floating-point numbers. According to the standard, a radix-2 FP number is written as: x ˘ (¡1)s ¢ m ¢2e, (2.9).

where:

• s 2 {0,1} is the sign bit.

• the exponent e 2 [emin, emax], with emin and emax specified for each standard format.

• the mantissa m is represented with p bits (p defined by the standard for each format) is normalized to the form 1.m1m2…mp¡1. In other words, only p ¡1 trailing mantissa bits are stored. In contrast to Fixed-Point Arithmetic, the exponent is stored explicitly along with the mantissa and is not fixed (thus the name, “floating-point”). The actual exponent e is biased in order to be stored as a positive integer: an exponent stored on w bits is biased by b ˘ 2w¡1 ¡ 1. The smallest positive normal4 number is 2emin . The largest finite binary floating-point number is (2 ¡21¡p) ¢2emax . Numbers jx j ˙ 2emin are called subnormal numbers and not all systems support them. In this work we do not consider sub-normal numbers.

**Multiple Precision Arithmetic**

Sometimes bounding the error of our implementation will not suffice: the error bound may be relatively small but still too large in its absolute vale. This usually means that the precision of intermediate computations must be increased.

There exist a quite a few solutions for extending the precision of computations: the IEEE 754 standard proposes a few large and a few extended formats; exact arithmetic (rational numbers, continued fractions, etc.); the super-accumulator of Kulisch [56]; floating-point expansions [57]; multiple precision arithmetic [58, 59], etc.

In our algorithms we will need to change the precision dynamically and non-homogeneously (i.e. different variables may have different precision). To satisfy these requirements, we use Multiple Precision (MP) floating-point arithmetic. In particular, we are going to use GNU MPFR library [59] for our implementations.

We will often refer to some outputs of algorithms as being computed with a priori error bound. In fact, what we understand by this is that given a bound on the error, the algorithm dynamically adapts the precision of internal computations such that the computational errors are not larger than the given error bound and the output of the algorithm is returned as a multiple precision number. In such algorithms we will take care to not overestimate the required internal precision.

**Interval Arithmetic**

Another way to deal with the uncertainties and rounding errors in the computations is Interval Arithmetic. The term “interval arithmetic” dates from the 1950s due to works of Moore [60]. We refer the reader to [61–63] for a detailed overview of interval arithmetic. An interval, denoted [x] ˘ [x, x], is a closed and bounded nonempty interval: [x] ˘ £x, ⁄˘ ‘x 2 Rjx • x • “, (2.13).

where x and x are called lower and upper bounds respectively and x • x. In the case x ˘ x, we will call [x] a point-interval. Sometimes, we will use the mid-rad representation of an interval: [x] ˘ hxm, xri, (2.14).

**Finite Precision Effects for IIR filters**

Before actually implementing IIR filters, we need to ascertain the extend to which its performance will be degraded by finite-precision effects. And, if the degradation is not acceptable, find a solution. Usually, the output error of the filter can be reduced by increasing the precision of computations and of the filter’s coefficients but at the expense of an increased cost.

The filter degradation in both software and hardware depends on three main factors:

• quantization of the filter’s coefficients.

• specification of arithmetic and wordlengths and.

• structure, i.e. algorithm for the evaluation of the filter.

**Table of contents :**

**I Technical Pre-requisites **

**1 Digital filters **

1.1 Discrete-time signals

1.2 Discrete-time systems

1.3 Z-transform

1.4 Design of IIR filters

1.5 Filter Structures

1.6 Conclusion

**2 Computer Arithmetic **

2.1 Fixed-Point Arithmetic

2.2 Floating-Point Arithmetic

2.3 Finite Precision Effects for IIR filters

2.4 Conclusion

**3 Towards reliable implementation of Digital Filters **

3.1 Automatic Filter Code Generator

3.2 Specialized Implicit Form

3.3 Conclusion

**II Improvements to the Specialized Implicit Form **

**4 Specialized Implicit Form for Lattice Wave Digital Filters **

4.1 Lattice Wave Digital Filters

4.2 A LWDF-to-SIF conversion algorithm

4.3 Conversion example

4.4 Conclusion

5 General algorithms for conversion 65

5.1 Conversion of data-flow graphs to SIF

5.2 Conversion of arbitrary structures to transfer functions

5.3 Numerical examples

5.4 Conclusion

**III Reliable Fixed-Point Implementation of Digital Filters **

**6 Reliable evaluation of the dynamic range of an exact filter **

6.1 State of the Art

6.2 Algorithm for Worst-Case Peak Gain evaluation

6.3 Truncation order and truncation error

6.4 Summation

6.5 Basic bricks

6.6 Numerical examples

6.7 Extending the WCPG theorem to the range of the state variables

6.8 WCPG for interval systems

6.9 Conclusion

**7 Determining Reliable Fixed-Point Formats **

7.1 Determining the Fixed-Point Formats

7.2 Taking rounding errors into account

7.3 Error analysis of the MSB computation formula

7.4 Complete algorithm

7.5 Numerical results

7.6 Application to the Specialized Implicit Form

7.7 Conclusion

7.8 Ongoing work: Off-by-One Problem

7.9 Ongoing work: Taking into account the spectrum of the input signal

**8 Rigorous verification of implemented filter against its frequency specification **

8.1 Problem statement

8.2 Verifying bounds on a transfer function

8.3 Verifying bounds for any LTI realization

8.4 Numerical examples

8.5 Conclusion

**IV Hardware Code Generation **

**9 LTI filters computed just right on FPGA. Implementation of Direct Form I **

9.1 Introduction

9.2 Error analysis of direct-form LTI filter implementations.

9.3 Sum of products computing just right

9.4 Implementation results

9.5 Conclusion

**Conclusion and Perspectives **

**A Appendix **

1 Lattice Wave Digital Filter basic brick data-flows

2 Coefficients for the examples

3 Off-by-One problem

**Bibliography **