previous up next
Go backward to 3 The Termination Detection Algorithm
Go up to Top
Go forward to 5 Progress of the Algorithm
RISC-Linz logo

4 Correctness of the Algorithm

Basics

The derivation of the termination detection algorithm by repeated refinement of its invariant gives good reason to believe that the algorithm indeed solves the stated problem. However, the development was based on "high-level" reasoning about a concurrent system with rather large inference steps. We cannot be really sure that the algorithm is correct unless we formalize our reasoning, i.e., create a map for our thoughts, and decompose the large inference steps into smaller ones whose correctness is much more self-evident. A formal verification of a concurrent algorithm is tedious, because at every step some process may perform an action that violates the invariant. We have to convince ourselves that this is not the case. During this process we get a much deeper insight into the algorithm and its behavior because many subtleties are exposed that may be overlooked in a high-level presentation.

The core property to be proved is defined by a formula of the form [][P]. The basic strategy to prove such a property is based on induction: we prove that

  1. the initial state condition implies P,
  2. if P holds and the system performs an action, then P' holds (where P' is derived from P by replacing every program variable by its primed counterpart).

However, even if P is invariant, one may not be able to prove this fact, because P on its own may be to weak to show the second part of above rule, i.e., because P is not inductive. The solution is to find a stronger property I such that I is inductive and thus [][I] can be proven. Since I => P, we can then conclude [][P].

The invariant constructed in the previous section is an attempt to such an inductive property but it will turn out that I alone is also not sufficient. Strengthening I to some inductive I' will however make the proof quite clumsy. Instead we will repeatedly apply the same principle that we use in the proof of P: we construct a sequence of inductive properties such that in the proof of inductiveness of a property we can use the previously proven properties as additional "axioms". In this way, we can "modularize" the proof by decomposing it into a number of layers that are thin enough to be managed reasonably well.

Core Properties

First, we have to make sure that the termination detection algorithm does not unappropriately interfere with the working of the original system. Every behavior of the extended system must be also a possible behavior of the original system; vice versa, every behavior of the original system should be also allowed by the extended system. This idea is expressed by

 Proposition 1 (Equivalence) SystemT implements System and vice versa, i.e., for all n, act, and chan:

(var term in B: SystemTn(act, chan, term)) <=> Systemn(act, chan).

Proof: SystemT is constructed from System

  1. by adding actions that leave the variables of System unchanged and that can be thefore viewed as stuttering steps of System, and
  2. by conjoining conditions to Init, Send, and Receive.
Any behavior of SystemT is therefore a behavior of System. In the conditions conjoined to the System actions, SystemT does not constrain the System variables, thus also the converse holds. []

In general, implementation proofs are more complicated because one has to show that every step of the implementation simulates some step of the specification. However, the construction of SystemT from System makes equivalence such obvious that we content ourselves with above sketch. Because of equivalence, we refer in the context of this section to SystemT simply as "the system". The system property that the algorithm is expected to detect is formalized by

 Definition 1 (Termination) The system is terminated if all processes are inactive and all channels are empty:

terminated_n(in act, in chan) :<=>

/\

An important property of termination is established by

 Proposition 2 (Stability of Termination) Once the system is terminated, it stays so, i.e. for all n, act, chan:

[](terminated_n(act, chan) => []terminated_n(act, chan)).

Proof: It suffices to prove

[](terminated_n(act, chan) => terminated_n(act', chan')).
Init implies ~terminated_n(act, chan). Assume terminated_n(act, chan). We know ~actk for every k in Zn, thus Sendi, j(acti, chan) is not enabled. From chani, j= < > we know that Receivei, j(act, chan, cnt, mark) is not enabled. By ~acti, Inactivatei(act) does not have any effect. Neither Starti(run, sig, mark, tval, tmark) nor Forwardi(acti, cnti, sig, mark, tval, tmark, term) modify act or chan. []

The fundamental property of the algorithm is then expressed by

 Proposition 3 (Correctness) If the system detects its termination, it has indeed terminated, i.e., for all n, act, chan, term:

SystemTn(act, chan, term) =>
[](term => terminated_n(act, chan)).

The rest of this section is devoted to setting up the stage for finally proving this property.

General Assumption

We will establish a number of system invariants that will be used in the proof the invariant derived in the previous section, which in turn will be used to prove the corrrectness property. In order to simplify the formulation of these properties, we will from now on consider arbitrary but fixed n, act, chan, term (the specification parameters) and take cnt, sig, mark, tval, tmark (the local variables of the algorithm) such that


/\
holds. The following propositions are expressed in the context of this assumption, i.e., they are true only for behaviors of the system.

System Invariants

We start the construction of the proof by establishing a simple invariant that assures that ensures that the algorithm is "well-typed".

 Proposition 4 (Type Correctness) The system preserves the asserted variable types, i.e.,

[](act in Zn->B/\ chan in Zn x Zn->Seq(MSG)/\ term in B/\ run in B/\ cnt in Zn->N/\ sig in Zn->B/\ mark in Zn->B/\ tval in N/\ tmark in B).

Proof: We show []act in Zn->B. Init implies acti in B for every i in Zn. The only actions that modify act are Receive and Inactivate both of which construct act' from act by replacing the mapping of some i in Zn by an element of B. Type correctness for the other variables is shown analogously. []

We will not refer to above property explicitly any more but use it implicitly in many of our reasoning steps.

The following proposition guarantees that the algorithm does not make any mistakes when counting messages.

 Proposition 5 (Message Count) The system keeps in cnt track of the number of messages pending in communication channels, i.e.,

[](msgcntn(in chan, in cnt))
where

Proof: Init implies cnti = 0 and chani,j = < > for every i in Zn and j in Zn. Assume msgcntn(count, chan). We show msgcntn(chan', cnt'). The only actions that modify chan or cnt are Send and Receive:

Case Sendi, j(acti, chan, cnt)
We know countn(chan') = countn(chan[(i, j) |-> chani, j o < m > ]) = 1+countn(chan) = 1+sumk in Z_ncntk = sumk in Z_ncnt[i |-> cnti+1]k = cnt'.
Case Receivei, j(act, chan, cnt, mark)
Analogous to the previous case. []

The following property ensures that Process 0 indeed knows whether there is a token in the system or not.

 Proposition 6 (Detection Run) The system records in run whether there is an ongoing termination detection run, i.e.,

[](run <=> exists k in Zn: sigk).

It turns out that above invariant is itself not inductive; we therefore combine its proof with the proof of

 Proposition 7 (Single Token) There is at most one token in the system, i.e.,

[](forall k in Zn: sigk => forall j in Zn: (sigj => j = k)).

Proof of Property 6 and of Property 7:

Init implies ~run and ~sigk for every k in Zn. Assume run <=> exists k in Zn: sigk and forall k in Zn: sigk => forall j in Zn: (sigj => j = k).

  1. We show run' <=> exists k in Zn: sig'k.

    The only actions that modify run or sig are Start and Forward.

    Case Starti(run, sig, mark, tval, tmark)
    By run' and sig'[i-n1].
    Case Forwardi(acti, cnti, sig, mark, tval, tmark, term)
    The action gives sigi and thus run. Assume i != 0. Then run' = run and sig' = sig[i |-> false, i-n1 |-> true]. Thus we have run' and sig'[i-n1]. Now assume i = 0. Then we have ~run' and sig' = sig[i |-> false]. By sigi and the induction hypothesis, we thus have ~sig'k for every k in Zn.

    (Here we used in the proof of Proposition 6 the induction hypothesis of Proposition 7).

  2. We show forall k in Zn: sig'k => forall j in Zn: (sig'j => j = k). The only actions that modify sig are Start and Forward.
    Case Starti(run, sig, mark, tval, tmark)
    We know ~run and therefore ~sigj for every j in Zn. From sig' = sig[i-n1 |-> true] we then know sig'j => j = i-n1 for every j in Zn.

    (Here we used in the proof of Proposition 7 the induction hypothesis of Proposition 6).

    Case Forwardi(acti, cnti, sig, mark, tval, tmark, term)
    The action gives sigi and thus sigj => j = i for every j in Zn. Assume i != 0. Then sig' = sig[i |-> false, i-n1 |-> true] and therefore sig'j => j = i-n1 for every j in Zn. Now assume i = 0. Then sig' = sig[i |-> false] and thus ~sigj for every j in Zn. []

Now we have set up enough machinery to proof the crucial fact of the termination detection algorithm:

 Proposition 8 (Invariant) The system maintains the following invariant:

[]Invn(in act, in cnt, in sig, in mark, in tval, in tmark)
where

Proof: Init implies ~sigk for every k in Zn. Assume Invn(act, cnt, sig, mark, tval, tmark). We show Invn(act', cnt', sig', mark', tval', tmark').

Case Sendi, j(acti, chan, cnt)
Take k in Zn with sigk. The only variable in the invariant changed by the action is cnt, thus only the first or the second disjunct of the invariant can be invalidated. However, the action implies acti and thus by the invariant also i < k; it can therefore not affect the first disjunct. Since cnt' differs from cnt only in cnt'i = cnti+1, the second disjunct cannot be invalidated either.

(The last remark explains why the second disjunct of the invariant has to establish that the denoted sum remains positive, not just non-zero.)

Case Receivei, j(act, chan, cnt, mark)
Take k in Zn with sig'k. We have to show

\/
  • ...
  • 0 < tval+suml in Z_n, l <= kcnt[j |-> cntj-1]l
  • exists l in Zn: l <= k/\ mark[j |-> true]l
  • tmark
If j <= k, we have mark[j |-> true]j and thus the third disjunct.

(This case is the cause of the third disjunct of the invariant: a process which currently holds or did not yet get the token receives a message and becomes marked.)

Assume j > k. We know sigk from sig'k and the action. We proceed by case distinction on the consequence of the induction hypothesis.

Case (forall l in Zn: l > k => ~actl)/\ (tval = suml in Z_n, l > kcntl)
By j > k we have suml in Z_n, l <= kcnt[j |-> cntj-1]l = suml in Z_n, l <= kcntl. The case gives us tval+suml in Z_n, l <= kcntl = suml in Z_ncntl. From the action and Proposition 5, we know suml in Z_ncntl > 0 and thus the second disjunct.

(This case is the cause for the second disjunct of the invariant: a message is received by a process that previously held the token, which invalidates the token counter, but keeps the message count positive.)

Case (0 < tval+suml in Z_n, l <= kcntl)
The second disjunct follows from j > k and thus suml in Z_n, l <= kcntl = suml in Z_n, l <= kcnt[j |-> cntj-1]l.
Case (l <= k/\ markl, for some l in Zn)
By j > k and mark[j |-> true]l.
Case tmark
Obvious.
Case Inactivatei(act)
The only effect of this action is to falsify acti, which cannot invalidate the invariant.
Case Starti(run, sig, mark, tval, tmark)
Take k in Zn with sig'k, therefore we have sig[i-n1 |-> true]k. By Proposition 7 and i = 0, we know k = i-n1 = n-n1. We have to show

\/

  • /\
    • forall j in Zn: j > n-n1 => ~actj
    • 0 = sumj in Z_n, j > n-_n1cntj
  • ...
which is trivially true.
Case Forwardi(acti, cnti, sig, mark, tval, tmark, term)
If i = 0, we know from sig' = sig[i |-> false] that ~sig'k for all k in Zn. Now assume i != 0 and take k in Zn with sig'k. Since sig' = sig[i |-> false, i-n1 |-> true], Proposition 7 gives us k = i-n1. We have to show

\/

  • /\
    • forall j in Zn: j > i-n1 => ~actj
    • tval+cnti = sumj in Z_n, j > i-_n1cntj
  • 0 < tval+cnti+sumj in Z_n, j <= i-_n1cntj
  • exists j in Zn: j <= i-n1/\ mark[i |-> false]j
  • tmark\/ marki.
The action implies sigi; we proceed by case distinction on the consequence of the induction hypothesis.
Case (forall j in Zn: j > i => ~actj)/\ (tval = sumj in Z_n, j > icntj)
The action implies ~acti and thus forall j in Zn: j > i-n1 => ~actj. Furthermore we know tval+cnti = (sumj in Z_n, j > i cntj)+cnti = sumj in Z_n, j > i-_n1 cntj.
Case (0 < tval+sumj in Z_n, j <= icntj)
The second disjunct is a consequence of sumj in Z_n, j <= icntj = cnti+sumj in Z_n, j <= i-_n1cntj.
Case (j <= i/\ markj, for some j in Zn)
If j <= i-n1, we have mark[i |-> false]j. If j = i, we have marki.

(The second part of this case is the cause of the fourth disjunct of the invariant. If the token reaches a process that is marked because it has previously received a message, the token gets marked.)

Case tmark
Obvious. []

Conclusion

Finally we come to the heart of the matter: showing that our system invariant indeed implies the stated correctness property.

Proof of Proposition 3 (Correctness): Init implies ~term. Assume term => terminated_n(act, chan). We show term' => terminated_n(act', chan'). If term, then the induction hypothesis and Proposition 2 imply terminated_n(act', chan'). Now assume ~term and term'. We then have Forwardi(acti, cnti, sig, mark, tval, tmark, term) with i = 0. Thus we know:
(act)~act0,
(sig)sig0,
(tval)tval' = tval+cnt0,
(tmark)tmark' = tmark\/ mark0,
(last)~tmark'/\ tval' = 0.
By (tval) and (last), we have
(tvaln)tval+cnt0 = 0.
By (tmark) and (last), we have
(markn)~tmark/\ ~mark0.
With (sig) and Proposition 8, (tvaln) and (markn) give us
(inact)forall j in Zn: j > 0 => ~actj,
(tvals)tval = sumj in Z_n, j > 0cntj.
(Here we have finally used both parts of the first disjunct of the invariant.)

From (act) and (inact), we conclude
(cres1)forall j in Zn: ~actj,
while (tvaln) and (tvals) give us
(cres2)sumj in Z_ncntj = 0.
Using Proposition 5, act = act', and chan = chan', (cres1) and (cres2) imply terminated_n(act', chan'). []


Maintainer: Wolfgang Schreiner
Last Modification: August 20, 1998

previous up next