Digital Signal Processors (DSP)

Get Complete Project Material File(s) Now! »

Reverberation effect

It is a widely used effect. Reverb gives the listener a room perception and as a result simulates the actual reverb one might experience in a closed-space environment. It in fact adds the multiple reflections from the room walls as an actual wave sound travels back and forth in a closed-space environment. Reverberation characteristics in real world are hugely dependant on the size of the room, wall material as well as items which are located inside the room e.g. furniture. Thus, it can easily be concluded that reverb is an important part in the 3D Audio sound experience.
From the source there is normally a direct path to the listener but the sound waves can be reflected from walls and other surfaces in a room to the listener. These waves travel a longer path and arrive at a later time, with less energy. This is shown in Figure 1.2.1, as sound waves may be reflected several times before reaching the listener, the reflections attenuate over time. The reflections are divided in early reflections and late reflections. The early reflections are direct reflections with a higher energy. They also have a direction. The early reflections usually are not implemented due to their complexity and as a result the imitated room is spherical. The late reflections have a denser characteristic with lower energy.
The reverberation time (the time between a direct-path waveform and the last hearable reflection) depends on the size of the room and surface material of the room. Various materials also reflect certain frequencies differently which means they can have high-pass or low-pass characteristics.

Reverb in R13

Reverb in R13 is composed of individual filter parts. These filters shape the output signal when combined in an appropriate order. However, their combination as whole is of no interest in this thesis rather the performance of its individual filters are important. As stated previously, the thesis intention is to optimize kernel processing parts while compiler takes over the general set-up as well as flow control parts written in C. To address this thesis results more precisely, individual filter parts are described and analysed in detail. The following is the description of the main components in Reverb in R13.

Comb filters

Comb Filters have comb-like notches in their frequency response which results in a shape similar to a comb. The main parts of a comb filter are a relatively large delay element as well as an IIR filter in the feedback part. This IIR filter is a one-pole filter which could have low-pass or high-pass characteristics.
Due to the fact that Comb Filters are IIR filters, any implementation of them in practice is basically approximation of their impulse response which in other words means truncation of impulse response. As a result, the truncated impulse response has a non-ideal comb-like shape in its frequency response. One proposed way to compensate for the above artifact is to use a number of comb filters in parallel in which delay length is preferable a prime number. It has been proved (according to the internal investigations done at Ericsson Research) that this solution compensates for the overall inconsistent frequency response to some degree specially when combined with the rest of filters in the Reverb implementation. Moreover, it is also proved that by increasing the number of parallel comb filters, the output approaches that of an ideal filter without truncation.

All-Pass filters

There are a number of all-pass filters present in R13. All-Pass filters pass all frequencies equally and as a result they do not shape the spectra. Besides, they have a phase shift effect that is responsible for changing the sound density.
All-pass filters have the same non-ideal characteristics as comb-filters when truncated. Similarly, it is advantageous to use a number of them to compensate for the truncation. In that case, the individual delay lengths will also be prime numbers.

Reverb in R15

Reverb in R15 is enjoying the application of a modified Moorer algorithm. This new algorithm has proved to provide much higher quality and is a huge improvement over R13. In contrast to R13, Reverb in R15 is composed of two parts, Early Reverb and Late Reverb. Early Reverb is quite a new concept in R15 which simulates the reflections from objects inside the room (e.g. furniture). These Early Reflections arrive to the listener within 30 ms after the direct sound. Simulating the early reflections properly results in a more real-world like sound experience and one of the most important properties of the early reverberation is that it increases audio externalization when listening through headphones. Most previously developed early reflection algorithms have concentrated on simulating the reflections from the walls, the floor and the ceiling of a room, thus ignoring the reflections from other objects in a room such as tables and chairs. It has been shown though, that the reflections from objects between the sound source and the listener are most important to simulate and that the room’s size can instead be simulated by a late reverberation algorithm. Therefore, the algorithm described here concentrates on simulating reflections from objects between the sound source and the listener.

The Early Reverberation Algorithm

Due to the fact that early reverb didn’t exist in the R13 implementation, the requirement for analysis of its filter parts was waived in this report and there is no need to cover its structure in detail. The early reverberation algorithm, however, improves as much the quality as it adds to its complexity. Implementation of this part exploits a relatively huge amount of CPU power.

The late reverberation algorithm

Ignoring introduction of a new filter named Room filter, the late reverberation as it is in R15 very much takes after its counterpart in R13 when individual filter components are taken into consideration. The above fact provided us with the opportunity to compare the implementations in R13 and R15 considering execution time, etc. Although not covered by this thesis, future analysis could be carried out to obtain a thorough comparison of Reverb in R13 and R15 thanks to the inherent similarity between Reverb in R13 and Later Reverb in R15.
Although the primary objective was to compare the common components in both versions, a number of other useful filter structures could also be probed due to their wide applications as well as usefulness. Among these filters the one-pole IIR (Room Filter in R15), Biquad as well as FIR filters could be mentioned.

Primary features and applications

The Late Reverb algorithm comes with mono and stereo output capacity. There is also a possibility to mix the amount of dry (unprocessed) and wet (processed) signal in the output to achieve a pleasant reverb level. This can be easily adjusted through user set-up.
One of the main applications of Reverb (both Early and Late) is 3D audio simulation. Reverb comprises the focal point of such algorithms to achieve near-real experience in rendering 3D audio.

“A digital signal processor frequently referred to as DSP is a type of microprocessor – one that is incredibly fast and powerful. A DSP is unique because it processes data in real time. This real-time capability makes a DSP perfect for applications where we won’t tolerate any delays.” [3]
DSPs usually have a Very Large Instruction Word (VLIW) and are dedicated to perform fast and efficient signal processing operations. They are often very task-specific oriented. Their design supports powerful MACs (Multiply and Accumulate) which are integral parts of a DSP algorithm. Their fast and efficient processing capability doesn’t come for free. DSPs usually occupy a large portion of silicon area and generally need its own dedicated memory. The memory is usually a zero-wait state memory which is a key requirement for processing residing data directly in memory (the main principle behind how DSPs manipulate data). Poor and restricted memory management has often been considered a drawback for DSPs. This is usually in contradiction with general purpose processors which can share common memory architecture and usually have powerful and flexible memory management units. DSPs also have a complex multi-stage/parallel pipeline structure which is tailored to the needs of a fast real-time signal processing application. Thanks to their specific design, they can carry out multiple operations at one clock cycle. DSPs could have high power consumption and are applied in critically demanding DSP applications like Radar/Sonar, medical devices, etc. Although their performance nominates them as good candidates in other consumer products like iPods, cell phones, etc., their inherent constraints limit their wide application in such areas. Large corporations with large scale production prefer solutions which cost less and provide a moderate alternative to DSPs while their application is inevitable in numerous places. In many occasions, companies like EMP (Ericsson Mobile Platforms) have had no choice but to use DSP in critical parts of the platform due to their powerful real-time Digital Signal Processing features.
Many companies have designed and manufactured DSPs. For instance, DSP chipsets manufactured by Texas Instruments as well as Analogue Devices are quite popular and well-known in the DSP literature. Among the DSP series manufactured by TI (Texas Instruments) TMS320xxx and among those manufactured by Analogue Devices ADSP-21xx are quite well-known and widely used in many applications. Another company which has attracted the attention and interest of many big customers (known by their embedded system design), is Ceva. As the name implies their DSPs are named after the company, Ceva DSPs. One specific and actually really interesting point about Ceva as a company is that they provide DSP IP (Intellectual Property) block solutions. This is in contrast to many big DSP manufacturers which provide the already-made silicon solution. IP blocks provide a really flexible solution to embedded system design since customers can enjoy the liberty of applying them in their designs and freely choose the silicon fabrication solution themselves.

General Purpose Processors

The word General Purpose Processors is referred to all microprocessors other than DSPs. However, microprocessors are also classified into sub groups with each one having a specific set of features. In fact, microprocessors are grouped into embedded processors and general purpose processors which are of course different from the general term microcontrollers. A microcontroller is in fact a computer residing on a silicon chip. It encompasses all necessary parts (including the memory) all in one IC package. A microcontroller generally has the main CPU core, ROM/EPROM/EEPROM/FLASH, RAM and some other essential functions (like timers and I/O controllers) all integrated into one single chip. The main motive behind the microcontroller was to restrict the capabilities of the CPU itself, allowing a thorough computer (memory, I/O, interrupts, etc) to fit on the available silicon solution. Microcontrollers are typically used where processing power is of no crucial importance and are typically used for low-end applications.
On the other hand, a General Purpose Processor (CPU) has much higher computational power but also encompass some peripheral parts like internal memory, I/O controller, etc. They occupy larger silicon area when compared to Microcontrollers and are much more expensive. They are intended for much larger range of applications and they are used for example in a PC where many different applications can be just loaded and run. They have the flexibility to get connected to many different peripherals. To sum up, they are designed for general purpose applications.
There are also embedded processors on which embedded systems are built. Embedded systems are intended for specific applications. For instance, an MP3 player, iPod or cell phone is an embedded system while a PC is a general purpose system. In embedded systems, designer can optimize, improve and reduce size and cost. They are designed to be optimized and tailored to a specific application. On the other hand, general purpose systems are designed to carry out a wide spectrum of general purpose tasks and operations. Naturally, microprocessors are also designed to be used in a specific purpose application (e.g. ARM processors) or in general purpose applications ( e.g. Intel CPUs).
Although microprocessors are much faster than most microcontrollers, they are still slower than their DSP counter parts for signal processing applications. In fact, microprocessors are designed to provide more flexible alternatives from the point of memory, I/O, etc. but they usually don’t have the same computational power when compared to their DSP counter parts. However, they have many advantages that make them appropriate for a wide range of applications. If one could find a microprocessor to satisfy the needs of his application, he might not choose a DSP for the same purpose. In other words, microprocessors are designed for diversity while DSPs are designed for specialty. Yet, there are numerous choices of different silicon solutions as well as applications. At the end, it is the designer who chooses the solution that fits his design criteria.

READ Internal Milk-run Distribution System

RISC processors versus CISC processors

RISC(Reduced Instruction Set Computing) is a design philosophy aimed at delivering simple but powerful instructions that execute within a single cycle at high clock speeds [1]. RISC concept is evolved around compact instruction sets which are fixed in size and make the hardware structure fairly simpler when compared to CISC (Complex Instruction Set Computing). However, it imposes more demand on the compiler and software optimization. Its unique structure makes it flexible for optimization purposes and it is often the programmer’s expertise which plays the key role in comparison with the hardware cost.
The following features are associated with a RISC processor,
1- Reduced Instruction Set: RISC concept imposes the instructions to be of a fixed size while the instructions are of variable size in CISC. The fixed size concept makes the instructions run at roughly one clock cycle and makes pipelining a feasible approach. Contrary, CISC instructions run at more than one clock cycle.
2- Pipeline structure: Pipeline is a concept original to RISC processors. Instructions are fetched and decoded step by step in each pipeline step. The pipeline stage advances one step per clock cycle. Yet, in CISC microcode idea is behind decode and execute. Microcode is itself a mini-program.
3- Register set: RISC processors have a large number of general purpose registers in which both address and data could be stored. In contrast, CISC works on special purpose register for every specific operation. One might easily comprehend this difference if one has been involved in program development for 8086 family of processors.
4- Load/Store idea: one of the main features of RISC processors which by many users is considered a major disadvantage is the fact that data should be loaded into registers for manipulation. Otherwise, no processing is allowed on data. Quite contrary, CISC processors process data while they reside in memory locations and they generally lift this requirement that data should be first loaded into a register.
To summarize, RISC processors need a high clock rate for execution of a typical program when compared to CISC. Yet, RISC provides a much simpler alternative from hardware complexity point of view which results in smaller silicon area as well as lower power consumption.

The ARM architecture concept

ARM processors are a family of RISC processors designed by ARM which are targeted at specific applications encompassing special features. These features have driven ARM towards the following orientations:
1- ARM has a compact instruction set. This makes the code memory space as small as possible. In consumer products like cameras, cell phones, etc. the least amount of memory is desired. As a result, the amount of memory usage is reduced which meant a considerable cost saving for the company. Generally, memory is expensive and intentions are to reduce its size on embedded devices as much as possible.
2- ARM design has moved towards a small silicon area which means ARM processors occupy very small die area. That makes them perfect for small scale consumer products.
3- ARM processors use slow low-cost memory solutions. This feature finds crucial importance when the scale of production is taken into consideration. In contrast, DSPs use zero-wait state memory which turns out to be quite expensive.
4- ARM has made it feasible to debug the hardware within the processor. This makes the program flow and execution transparent to the programmer. In fact it makes it much easier for programmer to debug the program during execution.
Specific features to ARM which are different from embedded processor concept:
1- Not all ARM instructions run at one clock cycle. This is contrary to RISC concept.
This feature has better efficiency for certain Load/Store instructions.
2- ARM processors have an inline barrel shifter attached to ALU. This feature brings about implementation of more complex instructions with higher efficiency.
3- Thumb 16-bit instruction set: ARM is generally a 32-bit architecture. However, to make instructions more compact and reduce the code density, ARM has come up with a dense instruction set called thumb which is sometimes up to 30% denser than 32-bit instructions. However, they have lower flexibility and generally take more clock cycles to execute.
4- All ARM instructions are conditional instructions. That means there is an option to run them based on a specific conditional flag e.g. in the event of overflow.
5- Enhanced instructions set: ARM has gradually evolved towards a microprocessor having DSP capabilities. New DSP instruction sets have provided ARM with a huge computational power which in many cases lifts the requirement for application of bulky expensive DSPs. In one of its recent structures, ARM is using 16 and 32-bit multipliers and fast MACs units. Thanks to VFP coprocessor Floating Point operations are now much more efficient.

ARM processor Fundamentals

Registers

ARM is a 32-bit architecture. Its register bank has sixteen 32-bit general registers in one specific mode. ARM processor can work in seven different modes (reader is encouraged to refer to ARM architecture reference manual [2] for more information). In any mode, there are up to eighteen active registers among which 16 are general purpose registers visible to programmers as r0-r15 while there are two processor status registers. Three of the general purpose registers (r13-r15) have particular tasks and as a result are known by specific names,
a) r13 is named sp and is used as the stack pointer.
b) r14 is called lr (link register) in which the return address from a subroutine is placed.
c) r15 is the PC (Program Counter) and points to the next instruction to be executed.
One could use r13-r14 as general purpose registers but care should be taken because the program should not rely on them as they are particularly dedicated registers for their intended tasks.
The processor status register contains many fields for configuration and program flow control. For instance the processor mode flags or condition flags are placed in this register.

Pipeline

“A pipeline is the core mechanism a RISC processor uses to execute instructions. Using a pipeline speeds up execution by fetching the next instruction while other instructions are being decoded and executed” [1].
Pipeline stages evolve along processor improvement. For instance there were 5 pipeline stages in ARM9 (a member of ARMv5 family of processors) while the pipeline structure is composed of 8 stages in ARM11 (a member of ARMv6 family of processors). A more complicated pipeline structure can increase the throughput but it also brings about more latency cost. Yet, more pipeline stages make it possible to run more advanced instructions at a reasonable number of clock cycles and lift the requirement for execution of extra instruction to perform the same task. It also comes with other disadvantages for the programmer. Optimizing code for a CPU with longer pipeline is more challenging and requires much more effort.

Table of contents :

Introduction
– Motivation
– Objectives
– Outline
Chapter 1
– Introduction to audio processing algorithms
– Reverberation effect
– Reverberation algorithm in R13
– Reverberation algorithm in R15
Chapter 2
– Digital Signal Processors (DSP)
– General purpose processors versus DSPs
– ARM architecture concept
– ARMv6 family of processors
– ARM1176jzfs processor
Chapter 3
– DSP algorithm implementation and optimization
– Signal representation
– Reverberation implementation in R13
– Reverberation implementation in R15
– Comparison results and conclusions
Chapter 4
– Quantization noise in Fixed point arithmetic
– SNR in Reverberation algorithm implementation
Chapter 5
– Conclusions and future work
References