Chapter 10. OpenMP C/C++ API Multiprocessing Directives

This chapter provides an overview of the multiprocessing directives that MIPSpro C and C++ compilers support. These directives are based on the OpenMP C/C++ Application Program Interface (API) standard, version 1.0. Version 2.0 is not supported for the MIPSpro 7.4 release, but will be available in the 7.4.1 release. Programs that use these directives are portable and can be compiled by other compilers that support the OpenMP standard.

The complete OpenMP standard is available at http://www.openmp.org/specs . See that documentation for complete examples, rules of usage, and restrictions. This chapter provides only an overview of the supported directives and does not give complete details or restrictions.

To enable recognition of the OpenMP directives, specify -mp on the cc or CC command line.

In addition to directives, the OpenMP C/C++ API describes several library functions and environment variables. Information on the library functions can be found on the omp_lock(3), omp_nested(3), and omp_threads(3) man pages. Information on the environment variables can be found on the pe_environ(5) man page.


Note: The SGI multiprocessing directives, including the Origin series distributed shared memory directives, are outmoded. Their preferred alternatives are the OpenMP C/C++ API directives described in this chapter.


Using Directives

Each OpenMP directive starts with #pragma omp, to reduce the potential for conflict with other #pragma directives with the same name. They have the following form:

#pragma omp directive-name [clause[ clause] ...] new-line

Except for starting with #pragma omp, the directive follows the conventions of the C and C++ standards for compiler directives.

Directives are case-sensitive. The order in which clauses appear in directives is not significant. Only one directive name can be specified per directive.

An OpenMP directive applies to at most one succeeding statement, which must be a structured block.

Conditional Compilation

The _OPENMP macro name is defined by OpenMP-compliant implementations as the decimal constant, yyyymm, which will be the year and month of the approved specification. This macro must not be the subject of a #define or a #undef preprocessing directive.

#ifdef _OPENMP
iam = omp_get_thread_num() + index;
#endif

If vendors define extensions to OpenMP, they may specify additional predefined macros.

If an implementation is not OpenMP-compliant, or if its OpenMP mode is disabled, it may ignore the OpenMP directives in a program. In effect, an OpenMP directive behaves as if it were enclosed within #ifdef _OPENMP and #endif. Thus, the following two examples are equivalent:

if(cond)
{
   #pragma omp flush (x)
}
X++;

if(cond)
   #ifdef )OPENMP
   #pragma omp flush (x)
   #endif
x++;

parallel Construct

The #pragma omp parallel directive defines a parallel region, which is a region of the program that is to be executed by multiple threads in parallel.

When a thread encounters a parallel construct and no if clause is present, or the if expression evaluates to a nonzero value, a team of threads is created. This thread becomes the master thread with a thread number of 0. The number of threads is controlled by environment variables and library calls. If the value of the if expression is zero, the region is serialized.

The number of threads remains constant while that parallel region is being executed. It can be changed either explicitly by the user or automatically by the runtime system from one parallel region to another. The omp_set_dynamic(3) library function and the OMP_DYNAMIC environment variable can be used to enable and disable the automatic adjustment of the number of threads. For more information on environment variables, see the pe_environ(5) man page.

If a thread in a team executing a parallel region encounters another parallel construct, it creates a new team, and it becomes the master of that new team. Nested parallel regions are seialized by default. By default, a nested parallel gregion is executed by a team composed of one threads. The default behavior can be changed by using either the omp_set_nested runtime library function or the OMP_NESTED environment variable.

Work-sharing Constructs

A work-sharing construct distributes the execution of the associated statement among the members of the team that encounter it. The work-sharing directives do not launch new threads, and there is no implied barrier on entry to a work-sharing construct.

The sequence of work-sharing constructs and barrier directives encountered must be the same for every thread in a team.

OpenMP defines the following work-sharing constructs:

  • The #pragma omp for directive identifies an iterative work-sharing construct that specifies a region in which the iterations of the associated loop should be executed in parallel. The iterations of the for loop are distributed across threads that already exist.

  • The #pragma omp sections directive identifies a non-iterative work-sharing construct that specifies a set of constructs that are to be divided among threads in a team. Each section is executed once by a thread in the team. Each section is preceded by a sections directive, although the sections directive is optional for the first section.

  • The #pragma omp single directive identifies a construct that specifies that the associated structured block is executed by only one thread in the team (not necessarily the master thread).

Combined Parallel Work-sharing Constructs

Combined parallel work-sharing constructs are short cuts for specifying a parallel region that contains only one work-sharing construct. The semantics of these directives are identical to that of explicitly specifying a parallel directive followed by a single work-sharing construct.

  • The parallel for directive is a shortcut for a parallel region that contains one for directive.

  • The #pragma omp parallel sections directive provides a shortcut form for specifying a parallel region containing one sections directive.

Master and Synchronization Constructs

The following list describes the synchronization constructs:

  • The #pragma omp master directive identifies a construct that specifies a structured block that is executed by the master thread of the team.

  • The #pragma omp critical directive identifies a construct that restricts execution of the associated structured block to one thread at a time.

  • The #pragma omp barrier directive synchronizes all the threads in a team, each thread waiting until all other threads have reached this point. After all threads have been synchronized, they begin executing the statements after the barrier directive in parallel.

  • The #pragma omp atomic directive ensures that a specific memory location is updated atomically.

  • The #pragma omp flush directive, explicit or implied, identifies precise synchronization points at which the implementation is required to provide a consistent view of certain objects in memory. This means that previous evaluations of expressions that reference those objects are complete and subsequent evaluations have not yet begun.

  • A #pragma omp ordered directive must be within the dynamic extent of a for or parallel for construct that has an ordered clause. The structured-block following an ordered directive is executed in the same order as iterations in a sequential loop.

Data Environment Constructs

The #pragma omp threadprivate directive makes named common blocks private to a thread but global within the thread.

Several directives accept clauses that allow a user to control the scope attributes of variables for the duration of the construct. Not all of the clauses are allowed on all directives, but the clauses that are valid on a particular directive are included with the description of the directive. Usually, if no data scope clauses are specified for a directive, the default scope for variables affected by the directive is share.

The following list describes the data scope attribute clauses:

  • The private clause declares the variables in list to be private to each thread in a team.

  • The firstprivate clause provides a superset of the functionality provided by the private clause.

  • The lastprivate clause provides a superset of the functionality provided by the private clause.

  • The shared clause shares variables that appear in the list among all the threads in a team. All threads within a team access the same storage area for shared variables.

  • The default clause allows the user to specify a shared or none default scope attribute for all variables in the lexical extent of any parallel region. Variables in threadprivate common blocks are not affected by this clause.

  • The reduction clause performs a reduction on the variables specified, with the operator or the intrinsic specified.

  • The copyin clause lets you assign the same value to threadprivate variables for each thread in the team executing the parallel region. For each variable specified, the value of the variable in the master thread of the team is copied to the threadprivate copies at the beginning of the parallel region.

Directive Binding

Some directives are bound to other directives. A binding specifies the way in which one directive is related to another. For instance, a directive is bound to a second directive if it can appear in the dynamic extent of that second directive. The following rules apply with respect to the dynamic binding of directives:

  • The for, sections, single, master, and barrier directives bind to the dynamically enclosing parallel directive, if one exists. If no parallel region is currently being executed, the directives have no effect.

  • The ordered directive binds to the dynamically enclosing for directive.

  • The atomic directive enforces exclusive access with respect to atomic directives in all threads, not just the current team.

  • The critical directive enforces exclusive access with respect to critical directives in all threads, not just the current team.

  • A directive cannot bind to a directive outside the closest enclosing parallel directive.

Directive Nesting

Dynamic nesting of directives must adhere to the following rules:

  • A parallel directive dynamically inside another parallel directive logically establishes a new team, which is composed of only the current thread, unless nested parallelism is enabled.

  • for, sections, and single directives that bind to the same parallel directive are not allowed to be nested inside each other.

  • critical directives with the same name are not allowed to be nested inside each other.

  • for, sections, and single directives are not permitted in the dynamic extent of critical, ordered, and master regions.

  • barrier directives are not permitted in the dynamic extent of for, ordered, sections, single, master, and critical regions.

  • master directives are not permitted in the dynamic extent of for, sections, and single directives.

  • ordered directives are not allowed in the dynamic extent of critical regions.

  • Any directive that is permitted when executed dynamically inside a parallel region is also permitted when executed outside a parallel region. When executed dynamically outside a user-specified parallel region, the directive is executed with respect to a team composed of only the master thread.