Chapter 2 Background
Since this thesis presents parallelisation concepts targeting object-oriented lan-guages, diﬀerent forms of parallelism in the context of desktop applications are explored. But firstly the challenges of parallel computing are discussed, focussing on the additional challenges particular for desktop applications.
Inhibitors for desktop parallelisation
What are the challenges that programmers face when parallelising desktop ap-plications? The programmer is first faced with the general challenges of parallel computing (section 2.1.1). Unfortunately, parallel computing is further compli-cated for desktop applications (section 2.1.2).
General parallel computing challenges
The prominent challenge in parallel computing is the theoretical and practical diﬃculty of producing an eﬀective program to exploit the available hardware. The developer will need to deal with many more issues that do not exist in sequential programming.
Some of the challenges surrounding parallel computing are inherent in the prob-lem itself, regardless of any practical solutions that may be implemented.
Missing central parallel model Much of the success of the uniprocessor computer industry is to be credited to the von Neumann model that connects the hardware and software of sequential programming. This allows software to be developed independent of hardware, and hardware to be developed independent of software. Even with this separation, the software and hardware complement each other and the result consistently executes with satisfactory performance. Parallel computing is missing such a central unifying model that the sequential computing has been so successful on [67, 100, 106].
Numerous models for parallel computing have been developed and proposed, but parallel computing is still lacking a single dominant model. The characteris-tics of a successful model include simplicity to understand and use, architecture independence and accurate performance prediction . These characteristics are contradictory and as a result have resulted in a large number of parallel com-puting models being developed. Highly abstract models make it easy to develop portable parallel programs, but such programs will result in highly uncertain performance. On the contrary, highly detailed models allow optimised programs but are considerably more diﬃcult and less portable due to the explicit details.
The PRAM model for example is very simple to use, but at the expense of hiding vital performance influencing factors such interprocessor communication and synchronisation . Other models such as LogP and BSP have gained some attention , but parallel computing is still waiting for the one dominant model to rise above all and standardise parallel computing.
Task decomposition In order to increase the throughput of an application, the developer needs to examine and decompose the original sequential problem into sub-tasks. Identification of such sub-tasks will determine which areas of the problem may be executed simultaneously. Whether the developer is modifying an already existing sequential program, or coding parallel code from scratch, task decomposition is the first thing to perform.
A developer may also decide to restructure the program in a way to increase the inherent parallelism of the problem, such as seeking to use alternative al-gorithms. This is based on the idea that some conceptual models may be par-allelised with greater ease than other models. Therefore, seeing the problem in a diﬀerent light may ease the diﬃculty and consequently make the inherent parallelism more noticeable.
Dependence analysis Even when the problem has been decomposed into independent tasks, this does not mean the tasks are free to execute in any random order. There are ordering constraints that govern the way tasks ought to proceed. Such constraints prohibit tasks from simply operating independently of each other, as they may frequently need to halt due to the state of other tasks . Dependences must be respected in order to ensure that the meaning of the program remains correct.
Dependences take on many forms and primarily fall into the categories of either data or control dependences. A data dependency is when two or more active tasks both require access to the same data, and the order in which the data is modified influences the correctness of the program. A control dependence is when the execution of some part of the program is dependent on the outcome of another part of the program.
Flow dependency is a type of data dependency where the result of a task is required as an input to a future task. The consequence is that a lot of code is forced to execute sequentially in the program, consequently reducing the inherent parallelism of the problem. The consequence of control dependences means that code cannot be explicitly parallelised since the execution flow is uncertain until runtime.
After attempting task decomposition and dependency analysis, the developer may conclude that the problem is not a suitable candidate for parallelisation. The classification of a problem’s inherent parallelism is rarely black or white. A single problem may have a wide range of inherent parallelism, simply depending on the specific goal or instance of the problem class . For example, the problem of installing light bulbs is not inherently parallel if the goal is to install a single bulb. But the problem is inherently parallel if the goal is to install ten bulbs.
Some problems simply cannot be parallelised due to the inherent requirement of consecutive instructions that need to be executed one after the other . Even if the developer attempted to force some level of parallelism into the implementation of such a problem, the performance improvement is likely to be quite minimal. It may actually be faster to execute the problem sequentially due to the fact that there exists a lot of dependences between the tasks.
Scheduling It is well known that determining the optimal schedule for a set of tasks is an NP-hard problem . Scheduling becomes more diﬃcult when the tasks to schedule have varying priorities, or when the main objective of the scheduling changes. The user may have important tasks, and the response of those tasks is of more interest rather than the overall execution time of all the tasks in the system. In some cases, static scheduling is impossible since the tasks are unknown up front and as a result dynamic scheduling needs to be used.
It generally makes sense that load balancing is important in order to create tasks that take roughly the same amount of time to execute, otherwise some processors will simply be idle while others have a large amount of computation to complete . However, it is interesting to note that achieving a well balanced load may not necessarily always result in the best performance. If a large amount of dependences exist between a set of tasks, the high communication costs between processors could mean that having select processors do more work will result in a lower execution time.
In addition to the inherent theoretical challenges of parallelisation, other issues commonly tend to arise due to the way that things have traditionally been done. Much of the practical challenges are a result of the underlying theoretical challenges.
Portability When parallel programs are written, developers will always want to ensure that they eﬀectively exploit the available hardware to achieve maxi-mum performance. In order to optimise the parallel program, it is necessary to understand the specific hardware details for the target system. Many parallel programming solutions are therefore architecture and machine dependent . Simply porting the program to another machine would not provide the same performance and the program would require modification, or in the worst case be rewritten from scratch.
Parallel programming using the message passing model usually results in higher performance in comparison to the shared memory model . The down-side of such an approach is that it involves explicit parallelism where the de-veloper needs to carefully examine the problem and implement the parallelism. Due to the weak support available for parallel programming, developers have largely produced specific and custom tools for their particular purpose . Along with the issue of portability, this creates a wide mass of non standardised tools and programs.
Synchronisation The need for synchronisation arises when data locations are reused in the program, and so this issue is of particular interest to the shared memory paradigm. Synchronisation further complicates the challenge of scheduling, since now there is timing constraints between some tasks that must be adhered to throughout the entire system execution. It is important to respect the synchronisation requirements of a parallel program if it is to make satisfactory progress . If the multiple threads (of a parallel program) were scheduled independently, they may end up running sequentially (on the same processor): such a scheduling would prove counterproductive for the parallel application due to the overhead introduced in the parallelisation. A task that is falling behind could also impede the progress of other dependent tasks if undesirable scheduling takes place .
Debugging A major problem in having tasks execute concurrently is that they may need to access the same data, and the order in which this is done may change the correctness of the program. This is known as a race condition, and this non-deterministic behaviour makes it diﬃcult to reproduce when attempting to understand the error . The debugging process itself may actually interfere with the monitored program, therefore altering its behaviour and not necessarily displaying the same behaviour without the debugging mode. Debugging has become increasingly complicated and tools to assist developers are limited . Debugging of a parallel program may generally mean that a large amount of data needs to be gathered which may be diﬃcult to manage.
Additional challenges for desktop parallelisation
The challenges discussed in section 2.1.1 apply to parallel programs in general. The development of parallel desktop applications involves additional challenges that generally did not apply to traditional parallel programs. Such challenges need to be discussed since parallelisation is needed on the desktop. So far, users have been interested in faster machines; hence, it may be assumed that they are still interested. Parallelisation is therefore needed otherwise there will be no improvement given the current trend in processor technology, as well as increasing user demands.
Diﬀerent application types Most of the eﬀort for parallel computing has focused on theoretically trivial applications that possess a high degree of inherent parallelism. The decomposition of such problems is often trivial and the resulting tasks have minimal, if any, communication between each other. Such problems are commonly termed embarrassingly parallel and usually fall into the area of scientific, engineering and database problems that are usually highly regular in structure . These ideal candidates for parallelisation also tend to be computationally intensive and require little, if any, user intervention during their execution.
However, desktop users have diﬀerent needs and are generally not interested in running such scientific and engineering applications like weather forecasting programs on their desktop. This contributes to why desktops were not tradi-tionally equipped with numerous processors. However, parallelisation on the desktop is of interest where day to day users may benefit. So the question is, what are those applications that desktop users run that may benefit from par-allelisation? The first concern is to address the issue why parallelism has not been exploited for daily desktop users.
The first clear observation is that most applications running on the desktop cannot be classified as embarrassingly parallel problems. The types of problems that are likely to run on a desktop are irregular in nature, therefore making the issues of task decomposition and dependency analysis more troublesome for de-velopers. As well as being irregular, desktop applications are rather interactive, constantly accepting input from the user and are not as computationally de-manding. Irregular problems tend to be dealt with using threads where sections of the program are identified as having the potential to be executed simultane-ously alongside other sections of the program.
Graphical user interfaces As will be discussed in section 2.3, GUI applica-tions have an additional threading demand compared to their console-equivalent versions. Since most desktop applications involve a GUI, this further compli-cates the parallelisation of desktop applications as programmers must serialise all GUI-related aspects [21, 62, 73, 79].
Actual performance versus perceived performance The traditional mo-tivation behind parallelisation has been to reduce wall-clock time, i.e. the actual time to execute the program. Although reducing this physical time will in real-ity improve the actual performance, desktop users might not necessarily concur with this improved performance. Desktop users typically rely on mental esti-mations rather than objective measurements to determine duration (and consequently performance) . It is therefore the user’s perceived performance that ultimately dictates performance for interactive desktop applications . To truly benefit from multi-core desktop systems, programmers must now strive to improve both actual time and perceived time:
Actual time is important to compare the application against benchmarks and verify performance objectives.
Perceived time is important for a successfully responsive application and positive user experience.
So if reducing the execution time (for example by parallelisation) does not neces-sarily contribute to perceived performance, what else can the programmer do? Possible examples include ensuring that the user interface does not “freeze”, progress indication and completing tasks in an expected order. Parallel pro-grammers must now be concerned with such human-computer interaction (HCI) issues that were traditionally unnecessary for parallel computing, especially in the engineering and scientific fields.
Short runtime Desktop applications tend to be interactive in nature, where the user initiates tasks with a series of input events. During the execution of these tasks, the user usually remains seated in front of the desktop waiting for the tasks to complete. Most of these tasks tend to have a relatively small runtime in comparison to the computationally intensive programs that would normally run on high performance computers. The user generally expects most tasks to be accomplished “soon” while they wait.
Since these tasks tend to have short runtimes anyway, chances are that im-proving their response time through parallelisation would simply go unnoticed by the user. One of the leading contributors as to why such an attempt would be futile is the fact that sequential overhead will be introduced in the paralleli-sation process. The overhead will be introduced when decomposing the problem into tasks, the communication between the tasks, and interpreting the tasks at the end. If all this overhead is significant in comparison to the parallelised com-putation, then this will actually have an adverse eﬀect on the overall execution time of the task.
In the past, the benefits of parallelising desktop applications were usually shadowed by the cost of this extra eﬀort meaning that applications were simply not parallelised. This is due to the non-trivial nature of parallelising most problems that are likely to run on desktops. It has been easier for desktop users to “just wait” until a faster processor became available (or more aﬀordable) in order to run their day to day applications (also, multi-cores were not the norm). This contributes as to why there was little pressure in the past to use parallelism in order to improve desktop performance. The aim now is to change this, making parallelisation more common on desktop environments.
2.1 Inhibitors for desktop parallelisation
2.2 Data parallelism
2.3 Task parallelism
3 Parallel Iterator concept and related work
3.1 Parallel Iterator concept
3.2 Related work
4 Parallel Iterator implementation and performance
5 Parallel Task concept and related work
5.1 Parallel Task overview
5.3 Task completion
5.4 Exception handling
5.5 Grouping tasks
5.6 Scheduling schemes
5.7 Adherence to object-oriented programming
5.8 Related work
6 Parallel Task implementation and performance
6.3 Example application: ParaImage
6.4 Combining ParaTask & the Parallel Iterator
GET THE COMPLETE PROJECT