\"\" \"\" \"\"
Go backward to Hello, World!
Go up to Tutorial
Go forward to Long Integer Multiplication
RISC-Linz logo

Quicksort

Below we illustrate the skeleton of a parallel version of the Quicksort algorithm for the in-place sorting of arrays. The full program can be found here.
// ----------------------------------------------------------------------------
// Quicksort
// ----------------------------------------------------------------------------
#include <rt++.h>

...

// ----------------------------------------------------------------------------
// sort <a/l/r> in parallel
// ----------------------------------------------------------------------------
int qsort(Array<float> a, int l, int r)
{
  if (r-l < MINSIZE) return(sort(a, l, r));

  int m0, m1;
  partition(a, l, r, a[(l+r)/2], &m0, &m1);

  Thread3<int, Array<float>, int, int> t0(qsort, a, l, m0);
  Thread3<int, Array<float>, int, int> t1(qsort, a, m1, r);

  Wait(t0); Wait(t1);
  return(0);
}

// ----------------------------------------------------------------------------
// RT++ main function
// ----------------------------------------------------------------------------
int qmain(int argc, char *argv[]) 
{
  while(...)
  {
    Array<float> a(S);
    ...
    qsort(a, 0, S);
  }
  return(0);
}

// ----------------------------------------------------------------------------
// C++ main function
// ----------------------------------------------------------------------------
int main(int argc, char *argv[]) 
{
  RT_Argument rtarg;
  ...
  rt_arg.proc = 3;
  rt_main(qmain, argc, argv, rtarg, 0);
  ...
}
 

This program operates on arrays of type Array<float>. Array is an RT++ type constructor whose objects can be used similar to C++ arrays but are automatically reclaimed when not referenced any more.  

The core of the program is the function qsort which takes an array a that is to be sorted within the index range [l ...r[. If this index range is smaller than some threshold, the array is sorted sequentially. Otherwise, the array is partitioned according to some pivot element e into three ranges [l,m0[ (all elements less than e) [m0,m1[ (all elements equal to e) and [m0,r[ (all elements greater than e). What remains to be done is to sort the array in the first and in the third subrange. This is the task of two concurrent threads (units of control) t0 and t1. When both threads have terminated, the whole array is sorted.

Let's take a closer look at such a thread declaration  

Thread3<int, Array<float>, int, int> t0(qsort, a, l, m0);
which creates a thread t0 that executes the function application qsort(a, l, m0). The type of t0 is Thread3<R, A1, A2, A3> where the type fields R, A1, A2, A3 correspond to the function prototype
R qsort(A1, A2, A3)
and the number 3 in Thread3 corresponds to the number of function arguments. Threads may be declared in principle with an arbitrary number of argument types (but there is an implementation-dependent upper bound). The call  
Wait(t0)
waits until t0 has terminated and returns the result of the thread (i.e. the return value of qsort). In above case, this return value is always 0 and thus ignored.

Above skeleton omits a few ugly technical details that are only necessary because the GNU g++ compiler does not implement the full ANSI C++ standard yet (see here for details). The program namely has to contain also the following top-level declarations  

Atom(int); 
Atom(float);
Ref(Array<float>);
ThreadArg3(A, int, Array<float>, int, int);
The Atom declarations tell the RT++ garbage collector that int and float are atomic types that are allocated on the stack and do not contain pointers to other objects. The Ref declaration says that Array<float> is a heap-allocated type. Declarations of this kind have to be provided for any type used by an RT++ program (actually only for those which are used in the context of an RT++ type constructor).

ThreadArg3(A,...) declares that threads of the corresponding prototype are used (where A is an arbitrary unused identifier that is needed internally). Declarations of this kind have to be given for any thread type used by an RT++ program.

The good message is that these declarations are only needed in one object file; if your program is composed of multiple source files that are individually compiled and finally linked together, all necessary declarations can be collected in a single source file of the application.

In case none of the object files contains the required declarations, the linker complains with error messages as the following:  

qsort.o(.text+0xef8): undefined reference to `Type<int>::init'
qsort.o(.text+0xf0b): undefined reference to `Type<int>::scan'
qsort.o(.text+0x987): undefined reference to `Type<float>::init'
qsort.o(.text+0x9fe): undefined reference to `Type<float>::scan'
qsort.o(.text+0xee5): undefined reference to `Type<Array<float> >::scan'
qsort.o(.text+0xf39): undefined reference to `Type<Array<float> >::init'
qsort.o(.text+0xb6d): undefined reference to 
  `Type<ArgumentThread3<int, Array<float>, int, int> >::scan'
When the program is compiled and executed, we get statistics output of the kind depicted below (for demonstration purposes, this is the output of a different RT++ program).
------------------------------------------------------------------------------
RT++ 1.0 (C) 1996 Wolfgang Schreiner <Wolfgang.Schreiner@risc.uni-linz.ac.at>
proc: 3, mem: 1000000, heap: 100032, blocks: 12, stack: 5000
------------------------------------------------------------------------------
RT++ GC 1 (1360 ms): 8 blocks with 42560 bytes in 30 (+ 0) ms.
RT++ GC 2 (1820 ms): 11 blocks with 64112 bytes in 20 (+ 0) ms.
RT++ GC 3 (2640 ms): 9 blocks with 53456 bytes in 20 (+ 0) ms.
RT++ GC 4 (3200 ms): 3 blocks with 15984 bytes in 30 (+ 0) ms.
RT++ GC: less than 25008 bytes heap (1 block per process) free.
RT++ GC: increase heap by 100032 bytes.
RT++ GC 5 (4670 ms): 7 blocks with 33520 bytes in 50 (+ 0) ms.
RT++ GC 6 (5120 ms): 16 blocks with 89952 bytes in 40 (+ 10) ms.
RT++ GC 7 (6350 ms): 14 blocks with 75008 bytes in 40 (+ 0) ms.
RT++ GC 8 (7140 ms): 14 blocks with 67344 bytes in 50 (+ 0) ms.
RT++ GC 9 (7890 ms): 18 blocks with 94864 bytes in 40 (+ 0) ms.
RT++ GC 10 (9170 ms): 4 blocks with 7664 bytes in 50 (+ 10) ms.
RT++ GC: less than 25008 bytes heap (1 block per process) free.
RT++ GC: increase heap by 100032 bytes.
RT++ GC 11 (10650 ms): 16 blocks with 74144 bytes in 60 (+ 10) ms.
RT++ GC 12 (11520 ms): 6 blocks with 17872 bytes in 70 (+ 0) ms.
RT++ GC: less than 25008 bytes heap (1 block per process) free.
RT++ GC: increase heap by 100032 bytes.
RT++ GC 13 (13170 ms): 41 blocks with 243984 bytes in 70 (+ 0) ms.
------------------------------------------------------------------------------
RT++ execution
        Return code:    0
        Real time(ms):  16650
        Proc time(ms):  16620
        All procs(ms):  46900
RT++ garbage collections
        Number:         13
        Increments:     3
        Waiting(ms):    30
        Collecting(ms): 570
        Blocks:         167
        Bytes:          880464
------------------------------------------------------------------------------
The RT++ GC messages are generated by the system whenever a garbage collection is performed (unless the verbose flag is not set). Such a message displays the garbage collection counter, the current time (from the start of the RT++ program), the number and total size of the reclaimed free heap blocks, the duration of garbage collection and the synchronization time needed before garbage collection could start (in parentheses). If the heap is increased due to insufficient free heap space after garbage collection, this is also shown.  

A hint concerning the number of processes used for the execution of an RT++ program: if the total process time of all processes is drastically smaller than the real time of program execution multiplied with the number of processes, the machine is probably overloaded i.e. not every process ready for execution finds a free processor. Consequently, only as many processes should be used for program execution as processors are idle (keeping in mind that in a multi-user system also other activities may be going on!). Likewise, if synchronization times for garbage collection phases get large, this is a sign that some process is not executing while other processes are waiting for garbage collection to begin.  


Author: Wolfgang Schreiner
Last Modification: April 12, 1997