Chapter 8. Loop Nest Optimization #pragma Directives

Chapter 8. Loop Nest Optimization #pragma Directives
Prev		Next

Chapter 8. Loop Nest Optimization `#pragma` Directives

Table 8-1 contains an alphabetical list of the #pragma directives discussed in this chapter, along with a brief description of each and the compiler versions in which the directive is supported.

Table 8-1. Loop Nest Optimization #pragma Directives

#pragma	Short Description	Compiler Versions
`#pragma aggressive inner loopfission`	Tells the compiler to fission inner loops into as many loops as possible.	7.0 and later
`#pragma blocking size`	Sets the blocksize of the specified loop, if it is involved in a blocking for the primary (or secondary) cache.	7.0 and later
`#pragma fission`	Tells the compiler to fission the enclosing specified levels of loops after this directive.	7.0 and later
`#pragma fissionable`	Disables validity testing.	7.0 and later
`#pragma fusable`	Disables validity testing.	7.0 and later
`#pragma fuse`	Tells the compiler to fuse the following n loops, which must be immediately adjacent.	7.0 and later
`#pragma ivdep`	Liberalizes dependence analysis. This applies only to inner loops. Given two memory references, where at least one is loop variant, ignore any loop-carried dependences between the two references.	6.0 and later
`#pragma no blocking`	Prevents the compiler from involving this loop in cache blocking.	7.0 and later
`#pragma no fission`	Keeps the following loop from being fissioned. Its innermost loops, however, are allowed to be fissioned.	7.0 and later
`#pragma no fusion`	Keeps the following loop from being fused with other loops.	7.0 and later
`#pragma no interchange`	Prevents the compiler from involving the loop directly following this directive (or any loop nested within this loop) in an interchange.	7.0 and later
`#pragma prefetch`	Specifies prefetching for each level of the cache. Scope: entire function containing the directive.	7.1 and later
`#pragma prefetch_manua`	Specifies whether manual prefetches (through `#pragma` directives) should be respected or ignored. Scope: entire function containing the directive.	7.1 and later
`#pragma prefetch_ref`	Generates a prefetch and connects it to the specified reference (if possible).	7.0 and later
`#pragma prefetch_ref_disable`	Disables prefetching for the specified reference in the current loop nest.	7.1 and later
`#pragma unroll`	Suggests to the compiler that a specified number of copies of the loop body be added to the inner loop. If the loop following this directive is an inner loop, then it indicates standard unrolling (version 7.2 and later). If the loop following this directive is not innermost, then outer loop unrolling (unroll and jam) is performed (version 7.0 and later).	7.0 and later

`#pragma aggressive inner loop fission`

The #pragma aggressive inner loop fission directive instructs the compiler to fission inner loops into as many loops as possible.

The syntax of the #pragma aggressive inner loop fission directive is as follows:

#pragma aggressive inner loop fission

The #pragma aggressive inner loop fission directive must be followed by an inner loop and has no effect if that loop is no longer inner after loop interchange.

`#pragma blocking size`

The #pragma blocking size directive sets the blocksize of the specified loop.

The syntax of the #pragma blocking size directive is as follows:

#pragma blocking size [n1, n2]

The loop specified, if it is involved in a blocking for the primary (secondary) cache, will have a blocksize of n1 (n2). The compiler tries to include this loop within such a block. If a 0 blocking size is specified, then the loop is not stripped, but the entire loop is inside the block.

Example 8-1. #pragma blocking size

In the following code, the compiler makes 20 × 20 blocks when blocking:

void amat (double x, double y, double z, int n, int m, int mm)
{
  int i, j, k;

  for (k = 0; k < n; k++)
  {
    #pragma blocking size (20)
    for (j = 0; j < m; j++)
    {
      #pragma blocking size (20)
      for (i = 0; i < mm; i++)
      z[i,k] = z[i,k] + x[i,j] * y[j,k]
     }
  }
}

`#pragma no blocking`

The #pragma no blocking directive prevents the compiler from involving this loop in cache blocking.

The syntax of the #pragma no blocking directive is as follows:

#pragma no blocking

`#pragma fission`

The #pragma fission directive instructs the compiler to fission the enclosing n levels of loops after this directive.

The syntax of the #pragma fission directive is as follows:

#pragma fission [n]

The default for n is 1. The compiler performs a validity test unless #pragma fissionable is also specified. The compiler does not reorder statements.

`#pragma fissionable`

The #pragma fissionable directive disables validity testing for loop fissioning.

The syntax of the #pragma fissionable directive is as follows:

#pragma fissionable

`#pragma no fission`

The #pragma no fission instructs the compiler to not fission the loop directly following this directive. Any inner loops, however, are allowed to be fissioned.

The syntax of the #pragma no fission directive is as follows:

#pragma no fission

`#pragma fuse`

The #pragma fuse directive instructs the compiler to fuse the specified number of immediately adjacent loops.

The syntax of the #pragma fuse directive is as follows:

#pragma fuse [num, level]

The loops to be fused must immediately follow the #pragma fusion directive.

The default value for num is 2. Fusion is attempted on each pair of adjacent loops and the level, by default, is determined by the maximal perfectly nested loop levels of the fused loops, although partial fusion is allowed. Iterations may be peeled as needed during fusion; the limit of this peeling is 5 or the number specified by the -LNO:fusion_peeling_limit option. No fusion is done for non-adjacent outer loops.

When the #pragma fusable directive is present, no validity test is done and the fusion is done up to the maximal common levels.

`#pragma fusable`

The #pragma fusable directive disables validity testing for loop fusing.

The syntax of the #pragma fusable directive is as follows:

#pragma fusable

`#pragma no fusion`

The #pragma no fusion directive instructs the compiler that the loop following this directive should not be fused with other loops.

The syntax of the #pragma no fusion directive is as follows:

#pragma no fusion

`#pragma no interchange`

The #pragma no interchange directive prevents the compiler from involving the next loop in an interchange. This directive also applies to any loop nested within the indicated loop.

The syntax of the #pragma no interchange directive is as follows:

#pragma no interchange

The pragma directive statement must immediately precede the loop to which it applies.

`#pragma ivdep`

The #pragma ivdep directive instructs the compiler to liberalize dependence analysis.

The syntax of the #pragma ivdep directive is as follows:

#pragma ivdep

Given two memory references, where at least one is loop variant, this directive instructs the compiler to ignore any loop-carried dependences between the two references. The #pragma ivdep directive applies only to inner loops. If #pragma ivdep is used on a loop that has an inner loop, the compiler ignores it.

Example 8-2. #pragma ivdep

The following are some examples of the use of #pragma ivdep:

ivdep does not break the dependence because b(k) is not loop variant:
#pragma ivdep for (i = 0; i < n; i++) b[k] = b[k] +a[i];
ivdep breaks the dependence, but the compiler warns the user that it is breaking an obvious dependence:
#pragma ivdep for (i = 0; i < n; i++) a[i] = a[i-1] + 3.0;

ivdep breaks the dependence:

#pragma ivdep   
for (i = 0; i < n; i++)   
a[b[i]] = a[b[i]] + 3.0;

ivdep does not break the dependence on a[i] because it is within an iteration:

#pragma ivdep   
for (i = 0; i < n; i++)   
{   
  a[i] = b[i]; 
  c[i] = a[i] + 3.0; 
}

If -OPT:cray_ivdep=TRUE is specified, ivdep instructs the compiler to use Cray semantics and break all backward dependences:

ivdep breaks the dependence but the compiler warns the user that it is breaking an obvious dependence:
#pragma ivdep for (i = 0; i < n; i++) { a[i] = a[i - 1] + 3.0; }
ivdep does not break the dependence, because the it is from the load to the store, and the load comes lexically before the store:
#pragma ivdep for (i = 0; i < n; i++) { a[i] = a[i + 1] + 3.0; }

To break all dependences, specify the following: -OPT:liberal_ivdep=TRUE.

`#pragma prefetch`

The #pragma prefetch directive specifies prefetching for each level of the cache.

The syntax of the #pragma prefetch directive is as follows:

#pragma prefetch [n1, n2]

n1 controls the level 1 cache; n2 controls level 2. n1 and n2 can have the following values:

0: prefetching is off (default for all processors except R10000)
1: prefetching is on but conservative (default at -03 when prefetch is on)
2: prefetching on and aggressive

The scope of this directive is the entire function that contains it.

`#pragma prefetch_manual`

The #pragma prefetch_manual directive instructs the compiler as to whether manual prefetches (through #pragma directives) should be respected or ignored.

The syntax of the #pragma prefetch_manual directive is as follows:

#pragma prefetch_manual[n]

n can have a value of 0 (the compiler ignores manual prefetches; this is the default for all processors except R10000) or 1 (the compiler respects manual prefetches; default at -03 for R10000 and beyond).

The scope of this directive is the entire function that contains it.

`#pragma prefetch_ref`

The #pragma prefetch_ref directive generates a prefetch and connects it to the specified reference (if possible).

The syntax of the #pragma prefetch_ref directive is as follows:

pragma prefetch_ref = ref [, stride = num1 [, num2]] 
[, level = [lev1][, lev2]] 
[, kind = {rd|wr}] 
[, size = sz]

ref is the object you want prefetched.

Table 8-2 describes each of the possible #pragma prefetch_ref clauses. These clauses are optional.

Table 8-2. Clauses for #pragma prefetch_ref

Clause	Effect	Default Value
`stride`	Prefetches every num iteration(s) of this loop.	1
`level`	Specifies the level in memory hierarchy to prefetch. The possible values for `level` are 1: prefetch from L2 to L1 cache 2: prefetch from memory to L1 cache	2
`kind`	Specifies read or write.	write
`size`	Specifies the size (in KB) of the object referenced in this loop. Must be a constant.	N/A

The #pragma prefetch_ref directive instructs the compiler to take the following actions:

Generate a prefetch and connect to the specified object (if possible).
Search for references in the current loop-nest that match the supplied object.
- If such a reference is found, then the prefetch for that object is scheduled relative to the prefetch node, based on the miss latency for the specified level of the cache.
- If no such reference is found, the prefetch is generated at the start of the loop body.
Ignore all references by the automatic prefetcher (if enabled) to this variable in this loop-nest.
Have the automatic prefetcher (if enabled) use the supplied size (if specified) in its volume analysis for this object.

This directive has no scope; it just generates a prefetch.

`#pragma prefetch_ref_disable`

The #pragma prefetch_ref_disable directive explicitly disables prefetching for the specified reference (in the current loop nest).

The syntax of the #pragma prefetch_ref_disable directive is as follows:

#pragma prefetch_ref_disable = ref [, size = num]

ref is the object for which you want to disable prefetching.
num specifies the size (in KB) of the object referenced in this loop (optional). The size must be a constant. This explicitly disables the prefetching of all references to object ref in the current loop nest. If enabled, the auto-prefetcher runs but ignores ref. The size is used for volume analysis.

The scope of this directive is the entire function containing it.

`#pragma unroll`

The #pragma unroll directive suggests to the compiler the type of unrolling that should be done.

The syntax of the #pragma unroll directive is as follows:

#pragma unroll [n]

This directive instructs the compiler to add n-1 copies of the loop body to the inner loop. If the loop that this directive immediately precedes is an inner loop, then it indicates standard unrolling (version 7.2 and later). If the loop that this directive immediately precedes is not innermost, then outer loop unrolling (unroll and jam) is performed (version 7.0 and later).

The value of n must be at least 1. If it is 1, then unrolling is not performed.

Caution: The #pragma unroll directive works only on loops that are legal to unroll. Loops are often not unrollable in C because of potential aliasing. In these cases, you may want to use restrict pointers or the option -OPT:alias=disjoint (see the C Language Reference Manual for more information on restrict pointers). When -OPT:alias=disjoint is specified, distinct pointer expressions are assumed to point to distinct, non-overlapping objects.

-OPT:alias=disjoint is unsafe and may cause existing C programs to fail in obscure ways, so it should be used with extreme care.

Example 8-3. #pragma unroll

The following code samples show the effect of using #pragma unroll. The code in Sample 1 becomes Sample 2, not Sample 3:

Sample 1:

#pragma unroll (2) 
for (i = 0; i < 10; i++) 
{ 
  for (j = 0; j < 10; j++)
  {
    a i[j] = a[i][j] + b[i][j]; 
  } 
}

Sample 2:

for (i = 0; i < 10; i + 2) 
{ 
  for (j = 0; j < 10; j++) 
  { 
    a [i][j] = a[i][j] + b[i][j]; 
    ai+1j = ai+1j + bi+1j; 
  } 
}

Sample 3:

for (i = 0; i < 10; i + 2) 
{ 
  for (j = 0; j < 10; j++) 
  a[i][j] = a[i][j] + b[i][j]; 
  for (j = 0; j < 10; j++) 
  {
    a[i+1][j] = a[i+1][[j] + b[i+1][j]; 
  }
}

The #pragma unroll directive is attached to the given loop, so that if an interchange is performed, the corresponding loop is still unrolled. That is, Sample 1 is equivalent to the following:

#pragma interchange
for (j = 0; j < 10; j++)
{
  #pragma unroll (2)
  for (i = 0; i < 10; i++)
  a[i][j] = a[i][j] + b[i][j];
}

Prev	Table of Contents	Next
Chapter 7. Loader Information #pragma Directives		Chapter 9. Multiprocessing #pragma Directives