Go backward to 3 The Termination Detection AlgorithmGo up to TopGo forward to 5 Progress of the Algorithm

# 4 Correctness of the Algorithm

#### Basics

The derivation of the termination detection algorithm by repeated refinement of its invariant gives good reason to believe that the algorithm indeed solves the stated problem. However, the development was based on "high-level" reasoning about a concurrent system with rather large inference steps. We cannot be really sure that the algorithm is correct unless we formalize our reasoning, i.e., create a map for our thoughts, and decompose the large inference steps into smaller ones whose correctness is much more self-evident. A formal verification of a concurrent algorithm is tedious, because at every step some process may perform an action that violates the invariant. We have to convince ourselves that this is not the case. During this process we get a much deeper insight into the algorithm and its behavior because many subtleties are exposed that may be overlooked in a high-level presentation.

The core property to be proved is defined by a formula of the form [][P]. The basic strategy to prove such a property is based on induction: we prove that

1. the initial state condition implies P,
2. if P holds and the system performs an action, then P`'` holds (where P`'` is derived from P by replacing every program variable by its primed counterpart).

However, even if P is invariant, one may not be able to prove this fact, because P on its own may be to weak to show the second part of above rule, i.e., because P is not inductive. The solution is to find a stronger property I such that I is inductive and thus [][I] can be proven. Since I => P, we can then conclude [][P].

The invariant constructed in the previous section is an attempt to such an inductive property but it will turn out that I alone is also not sufficient. Strengthening I to some inductive I' will however make the proof quite clumsy. Instead we will repeatedly apply the same principle that we use in the proof of P: we construct a sequence of inductive properties such that in the proof of inductiveness of a property we can use the previously proven properties as additional "axioms". In this way, we can "modularize" the proof by decomposing it into a number of layers that are thin enough to be managed reasonably well.

#### Core Properties

First, we have to make sure that the termination detection algorithm does not unappropriately interfere with the working of the original system. Every behavior of the extended system must be also a possible behavior of the original system; vice versa, every behavior of the original system should be also allowed by the extended system. This idea is expressed by

Proposition 1 (Equivalence) SystemT implements System and vice versa, i.e., for all n, `act`, and `chan`:

(var `term` in B: SystemTn(`act`, `chan`, `term`)) <=> Systemn(`act`, `chan`).

Proof: SystemT is constructed from System

1. by adding actions that leave the variables of System unchanged and that can be thefore viewed as stuttering steps of System, and
2. by conjoining conditions to Init, Send, and Receive.
Any behavior of SystemT is therefore a behavior of System. In the conditions conjoined to the System actions, SystemT does not constrain the System variables, thus also the converse holds. []

In general, implementation proofs are more complicated because one has to show that every step of the implementation simulates some step of the specification. However, the construction of SystemT from System makes equivalence such obvious that we content ourselves with above sketch. Because of equivalence, we refer in the context of this section to SystemT simply as "the system". The system property that the algorithm is expected to detect is formalized by

Definition 1 (Termination) The system is terminated if all processes are inactive and all channels are empty:

terminated_n(in `act`, in `chan`) :<=>

/\
• forall i in Zn: ~`act`i
• forall i in Zn, j in Zn: `chan`i, j = < > .

An important property of termination is established by

Proposition 2 (Stability of Termination) Once the system is terminated, it stays so, i.e. for all n, `act`, `chan`:

[](terminated_n(`act`, `chan`) => []terminated_n(`act`, `chan`)).

Proof: It suffices to prove

[](terminated_n(`act`, `chan`) => terminated_n(`act``'`, `chan``'`)).
Init implies ~terminated_n(`act`, `chan`). Assume terminated_n(`act`, `chan`). We know ~`act`k for every k in Zn, thus Sendi, j(`act`i, `chan`) is not enabled. From `chan`i, j= < > we know that Receivei, j(`act`, `chan`, `cnt`, `mark`) is not enabled. By ~`act`i, Inactivatei(`act`) does not have any effect. Neither Start`i`(`run`, `sig`, `mark`, `tval`, `tmark`) nor Forwardi(`act`i, `cnt`i, `sig`, `mark`, `tval`, `tmark`, `term`) modify `act` or `chan`. []

The fundamental property of the algorithm is then expressed by

Proposition 3 (Correctness) If the system detects its termination, it has indeed terminated, i.e., for all n, `act`, `chan`, `term`:

SystemTn(`act`, `chan`, `term`) =>
[](`term` => terminated_n(`act`, `chan`)).

The rest of this section is devoted to setting up the stage for finally proving this property.

#### General Assumption

We will establish a number of system invariants that will be used in the proof the invariant derived in the previous section, which in turn will be used to prove the corrrectness property. In order to simplify the formulation of these properties, we will from now on consider arbitrary but fixed n, `act`, `chan`, `term` (the specification parameters) and take `cnt`, `sig`, `mark`, `tval`, `tmark` (the local variables of the algorithm) such that

/\
• Init(`act`, `chan`, `term`, `run`, `cnt`, `sig`, `mark`)
• [][exists i in Zn: Actionn, i(`act`, `chan`, `term`, `run`, `cnt`, `sig`, `mark`, `tval`, `tmark`)]
holds. The following propositions are expressed in the context of this assumption, i.e., they are true only for behaviors of the system.

#### System Invariants

We start the construction of the proof by establishing a simple invariant that assures that ensures that the algorithm is "well-typed".

Proposition 4 (Type Correctness) The system preserves the asserted variable types, i.e.,

[](`act` in Zn->B/\ `chan` in Zn x Zn->Seq(MSG)/\ `term` in B/\ `run` in B/\ `cnt` in Zn->N/\ `sig` in Zn->B/\ `mark` in Zn->B/\ `tval` in N/\ `tmark` in B).

Proof: We show []`act` in Zn->B. Init implies `act`i in B for every i in Zn. The only actions that modify `act` are Receive and Inactivate both of which construct `act``'` from `act` by replacing the mapping of some i in Zn by an element of B. Type correctness for the other variables is shown analogously. []

We will not refer to above property explicitly any more but use it implicitly in many of our reasoning steps.

The following proposition guarantees that the algorithm does not make any mistakes when counting messages.

Proposition 5 (Message Count) The system keeps in `cnt` track of the number of messages pending in communication channels, i.e.,

[](msgcntn(in `chan`, in `cnt`))
where
• msgcntn(in `chan`, in `cnt`) :<=> countn(`chan`) = sumi in Z_n`cnt`i,
• countn(`chan`) := sumi in Z_n sumj in Z_n len(`chan`i, j),
• len(`ch`) := such l in N: exists m0, ..., ml-1: `ch` = < m0, ..., ml-1 > .

Proof: Init implies `cnt`i = 0 and `chan`i,j = < > for every i in Zn and j in Zn. Assume msgcntn(`count`, `chan`). We show msgcntn(`chan``'`, `cnt``'`). The only actions that modify `chan` or `cnt` are Send and Receive:

Case Sendi, j(`act`i, `chan`, `cnt`)
We know countn(`chan``'`) = countn(`chan`[(i, j) |-> `chan`i, j o < m > ]) = 1+countn(`chan`) = 1+sumk in Z_n`cnt`k = sumk in Z_n`cnt`[i |-> `cnt`i+1]k = `cnt``'`.
Case Receivei, j(`act`, `chan`, `cnt`, `mark`)
Analogous to the previous case. []

The following property ensures that Process 0 indeed knows whether there is a token in the system or not.

Proposition 6 (Detection Run) The system records in `run` whether there is an ongoing termination detection run, i.e.,

[](`run` <=> exists k in Zn: `sig`k).

It turns out that above invariant is itself not inductive; we therefore combine its proof with the proof of

Proposition 7 (Single Token) There is at most one token in the system, i.e.,

[](forall k in Zn: `sig`k => forall j in Zn: (`sig`j => j = k)).

Proof of Property 6 and of Property 7:

Init implies ~`run` and ~`sig`k for every k in Zn. Assume `run` <=> exists k in Zn: `sig`k and forall k in Zn: `sig`k => forall j in Zn: (`sig`j => j = k).

1. We show `run``'` <=> exists k in Zn: `sig``'`k.

The only actions that modify `run` or `sig` are Start and Forward.

Case Start`i`(`run`, `sig`, `mark`, `tval`, `tmark`)
By `run``'` and `sig``'`[i-n1].
Case Forwardi(`act`i, `cnt`i, `sig`, `mark`, `tval`, `tmark`, `term`)
The action gives `sig`i and thus `run`. Assume i != 0. Then `run``'` = `run` and `sig``'` = `sig`[i |-> false, i-n1 |-> true]. Thus we have `run``'` and `sig`'[i-n1]. Now assume i = 0. Then we have ~`run``'` and `sig``'` = `sig`[i |-> false]. By `sig`i and the induction hypothesis, we thus have ~`sig``'`k for every k in Zn.

(Here we used in the proof of Proposition 6 the induction hypothesis of Proposition 7).

2. We show forall k in Zn: `sig``'`k => forall j in Zn: (`sig``'`j => j = k). The only actions that modify `sig` are Start and Forward.
Case Start`i`(`run`, `sig`, `mark`, `tval`, `tmark`)
We know ~`run` and therefore ~`sig`j for every j in Zn. From `sig``'` = `sig`[i-n1 |-> true] we then know `sig``'`j => j = i-n1 for every j in Zn.

(Here we used in the proof of Proposition 7 the induction hypothesis of Proposition 6).

Case Forwardi(`act`i, `cnt`i, `sig`, `mark`, `tval`, `tmark`, `term`)
The action gives `sig`i and thus `sig`j => j = i for every j in Zn. Assume i != 0. Then `sig``'` = `sig`[i |-> false, i-n1 |-> true] and therefore `sig``'`j => j = i-n1 for every j in Zn. Now assume i = 0. Then `sig``'` = `sig`[i |-> false] and thus ~`sig`j for every j in Zn. []

Now we have set up enough machinery to proof the crucial fact of the termination detection algorithm:

Proposition 8 (Invariant) The system maintains the following invariant:

[]Invn(in `act`, in `cnt`, in `sig`, in `mark`, in `tval`, in `tmark`)
where
• Invn(in `act`, in `cnt`, in `sig`, in `mark`, in `tval`, in `tmark`) :<=>
forall k in Zn: `sig`k =>

\/
• InvCn, k(in `act`, in `cnt`, in `tval`)
• 0 < `tval`+sumj in Z_n, j <= k`cnt`j
• exists j in Zn: j <= k/\ `mark`j
• `tmark`.
• InvCn, k(in `act`, in `cnt`, in `tval`) :<=>

/\
• forall j in Zn: j > k => ~`act`j
• `tval` = sumj in Z_n, j > k`cnt`j

Proof: Init implies ~`sig`k for every k in Zn. Assume Invn(`act`, `cnt`, `sig`, `mark`, `tval`, `tmark`). We show Invn(`act``'`, `cnt``'`, `sig``'`, `mark``'`, `tval``'`, `tmark``'`).

Case Sendi, j(`act`i, `chan`, `cnt`)
Take k in Zn with sigk. The only variable in the invariant changed by the action is `cnt`, thus only the first or the second disjunct of the invariant can be invalidated. However, the action implies `act`i and thus by the invariant also i < k; it can therefore not affect the first disjunct. Since `cnt``'` differs from `cnt` only in `cnt`'i = `cnt`i+1, the second disjunct cannot be invalidated either.

(The last remark explains why the second disjunct of the invariant has to establish that the denoted sum remains positive, not just non-zero.)

Case Receivei, j(`act`, `chan`, `cnt`, `mark`)
Take k in Zn with `sig``'`k. We have to show

\/
• ...
• 0 < `tval`+suml in Z_n, l <= k`cnt`[j |-> `cnt`j-1]l
• exists l in Zn: l <= k/\ `mark`[j |-> true]l
• `tmark`
If j <= k, we have `mark`[j |-> true]j and thus the third disjunct.

(This case is the cause of the third disjunct of the invariant: a process which currently holds or did not yet get the token receives a message and becomes marked.)

Assume j > k. We know `sig`k from `sig``'`k and the action. We proceed by case distinction on the consequence of the induction hypothesis.

Case (forall l in Zn: l > k => ~`act`l)/\ (`tval` = suml in Z_n, l > k`cnt`l)
By j > k we have suml in Z_n, l <= k`cnt`[j |-> `cnt`j-1]l = suml in Z_n, l <= k`cnt`l. The case gives us `tval`+suml in Z_n, l <= k`cnt`l = suml in Z_n`cnt`l. From the action and Proposition 5, we know suml in Z_n`cnt`l > 0 and thus the second disjunct.

(This case is the cause for the second disjunct of the invariant: a message is received by a process that previously held the token, which invalidates the token counter, but keeps the message count positive.)

Case (0 < `tval`+suml in Z_n, l <= k`cnt`l)
The second disjunct follows from j > k and thus suml in Z_n, l <= k`cnt`l = suml in Z_n, l <= k`cnt`[j |-> `cnt`j-1]l.
Case (l <= k/\ `mark`l, for some l in Zn)
By j > k and `mark`[j |-> true]l.
Case `tmark`
Obvious.
Case Inactivatei(`act`)
The only effect of this action is to falsify `act`i, which cannot invalidate the invariant.
Case Start`i`(`run`, `sig`, `mark`, `tval`, `tmark`)
Take k in Zn with `sig``'`k, therefore we have `sig`[i-n1 |-> true]k. By Proposition 7 and i = 0, we know k = i-n1 = n-n1. We have to show

\/

• /\
• forall j in Zn: j > n-n1 => ~`act`j
• 0 = sumj in Z_n, j > n-_n1`cnt`j
• ...
which is trivially true.
Case Forwardi(`act`i, `cnt`i, `sig`, `mark`, `tval`, `tmark`, `term`)
If i = 0, we know from `sig``'` = `sig`[i |-> false] that ~`sig``'`k for all k in Zn. Now assume i != 0 and take k in Zn with `sig``'`k. Since `sig``'` = `sig`[i |-> false, i-n1 |-> true], Proposition 7 gives us k = i-n1. We have to show

\/

• /\
• forall j in Zn: j > i-n1 => ~`act`j
• `tval`+`cnt`i = sumj in Z_n, j > i-_n1`cnt`j
• 0 < `tval`+`cnt`i+sumj in Z_n, j <= i-_n1`cnt`j
• exists j in Zn: j <= i-n1/\ `mark`[i |-> false]j
• `tmark`\/ `mark`i.
The action implies `sig`i; we proceed by case distinction on the consequence of the induction hypothesis.
Case (forall j in Zn: j > i => ~`act`j)/\ (`tval` = sumj in Z_n, j > i`cnt`j)
The action implies ~`act`i and thus forall j in Zn: j > i-n1 => ~`act`j. Furthermore we know `tval`+`cnt`i = (sumj in Z_n, j > i `cnt`j)+`cnt`i = sumj in Z_n, j > i-_n1 `cnt`j.
Case (0 < `tval`+sumj in Z_n, j <= i`cnt`j)
The second disjunct is a consequence of sumj in Z_n, j <= i`cnt`j = `cnt`i+sumj in Z_n, j <= i-_n1`cnt`j.
Case (j <= i/\ `mark`j, for some j in Zn)
If j <= i-n1, we have `mark`[i |-> false]j. If j = i, we have `mark`i.

(The second part of this case is the cause of the fourth disjunct of the invariant. If the token reaches a process that is marked because it has previously received a message, the token gets marked.)

Case `tmark`
Obvious. []

#### Conclusion

Finally we come to the heart of the matter: showing that our system invariant indeed implies the stated correctness property.

Proof of Proposition 3 (Correctness): Init implies ~`term`. Assume `term` => terminated_n(`act`, `chan`). We show `term``'` => terminated_n(`act``'`, `chan``'`). If `term`, then the induction hypothesis and Proposition 2 imply terminated_n(`act``'`, `chan``'`). Now assume ~`term` and `term``'`. We then have Forwardi(`act`i, `cnt`i, `sig`, `mark`, `tval`, `tmark`, `term`) with i = 0. Thus we know:
 (act) ~`act`0, (sig) `sig`0, (tval) `tval``'` = `tval`+`cnt`0, (tmark) `tmark``'` = `tmark`\/ `mark`0, (last) ~`tmark``'`/\ `tval``'` = 0.
By (tval) and (last), we have
 (tvaln) `tval`+`cnt`0 = 0.
By (tmark) and (last), we have
 (markn) ~`tmark`/\ ~`mark`0.
With (sig) and Proposition 8, (tvaln) and (markn) give us
 (inact) forall j in Zn: j > 0 => ~`act`j, (tvals) `tval` = sumj in Z_n, j > 0`cnt`j.
(Here we have finally used both parts of the first disjunct of the invariant.)

From (act) and (inact), we conclude
 (cres1) forall j in Zn: ~`act`j,
while (tvaln) and (tvals) give us
 (cres2) sumj in Z_n`cnt`j = 0.
Using Proposition 5, `act` = `act``'`, and `chan` = `chan``'`, (cres1) and (cres2) imply terminated_n(`act``'`, `chan``'`). []

Maintainer: Wolfgang Schreiner
Last Modification: August 20, 1998