previous up next
Go backward to 2 The System Model
Go up to Top
Go forward to 4 Correctness of the Algorithm
RISC-Linz logo

3 The Termination Detection Algorithm

We will extend the System interface to

SystemTn in N(act in Zn->B, chan in Zn x Zn->Seq(MSG), term in B)
such that term initialized by
Init(...) :<=>

/\
signals termination of the system.

Basic Idea The derivation of the termination detection algorithm is based on the following idea (see Figure 2): If Process 0 wants to detect termination, it sends a signal to process n-1. If an inactive process i > 0 receives the signal, it forwards the signal to process i-1. An active process keeps the signal until it becomes inactive. The signal therefore represents a "token" that circulates through the ring of processes.

We model this circulation by introducing a variable

sig in Zn->B

initialized by

Init(...) :<=>

/\

and by two actions

where a-nb denotes the difference modulo n, i.e., 0-n1 = n-1).

(Modeling token circulation by an array of boolean signals reflects the behavior of the distributed algorithm closer than modeling it by a single integer position that "automatically" guarantees token unicity.)

 

Figure 2: Token Circulation


However, above specification allows Process 0 to submit a new token before it has received the token it has previously submitted. We therefore introduce another variable

run in B

that records whether there is a token in the system and then disables Start:

However, when Process 0 finally receives the token, it can only deduce that each process has been inactive at some time in the past. Since a process mave have received a message after forwarding the token, it may have become active again. Only if the algorithm could maintain the invariant

forall k in Zn: sigk =>
forall j in Zn: j > k => ~actj,

Process 0 could deduce from the receipt of the token (i.e., sig0) and its own inactivity (i.e., ~act0) that all processes are currently inactive.

Message Counters

Even if Process 0 receives the token under above invariant, there may be still messages pending in the network that can cause processes to become active again. We therefore need some mean to keep track of the number of messages pending in the network. Since a process only knows about the messages this it itself sends or receives, this can be only achieved by introducing a distributed counter

cnt in Zn->Z

such that every process sending a message increases its counter and every process receiving a message decreases its counter (see Figure 3):

 

Figure 3: Message Counters


Then clearly the total sum of the distributed counter equals the number of the messages in the network, i.e., the system maintains the invariant

sumi in Z_ncnti = sumi in Z_nsumj in Z_nlen(chani, j)

where len(ch) := such l in N: exists m0, ..., ml-1: ch = < m0, ..., ml-1 > .

Token Value

How can Process 0 learn about this sum? Apparently this is only possible, if the circulating token collects this information and delivers it to this process. We therefore introduce

tval in Z

modeling the value carried by the token. Process 0 initializes the value to 0 when submitting the token; each forwarding process adds the value of its counter (see Figure 4):

 

Figure 4: Token Value


If the algorithm could maintain the invariant

forall k in Zn: sigk =>

/\

then Process 0 could on receipt of the token deduce from ~act0 that all processes are inactive and from cnt0+tval = 0 that no message is in the network, i.e., that the system has terminated.

Unfortunately, the algorithm cannot maintain this invariant. We therefore have to disjoin the conclusion part (the "core") of the invariant with some weakening conditions such that the algorithm is able to maintain the weaker invariant but Process 0 can still conclude termination from the information stored locally and received by the token.

Awaking a Process

Figure 5: Invalidating the Core Invariant


Let us investigate how above invariant can be falsified by an action (i.e., the invariant holds in the state before the action but does not any more hold in the state after the action). Since all processes that have already forwarded the token are inactive, the only possibility is that such a process receives a message and becomes active again (see Figure 5). This however means that there must be still pending messages in the network. Since the invariant still holds before invalidation, a weaker version of the variant is

forall k in Zn: sigk =>

\/

If Process 0 receives the token (i.e., sig0 holds), it can still conclude termination from ~act0 and cnt0+tval = 0.

Process Marks

Is it still possible for an action to falsify the weaker invariant? Clearly it can be invalidated, if a process that has not yet forwarded the token receives a message, which causes the summation term in above condition to drop by one (see Figure 6). A simple way to record whether such an incident has taken place is to associate to each process a marker that is set on message receipt. We therefore introduce

mark in Zn->B

which is initialized to false in every position and which is set to true when a message is received:

Then we can weaken the invariant further to

forall k in Zn: sigk =>

\/

If Process 0 receives the token, i.e., sig0 holds, it can still conclude termination from ~act0, cnt0+tval = 0, and ~mark0.

 

Figure 6: Process Markers


Token Marker

Clearly this weakened invariant is still falsified when the marked process with the least index forwards the token (see Figure 7). Therefore the token itself has to record the information when it passes a marked process. We introduce a token marker

tmark in B

which is initialized by Process 0 and forwarded by each process as

Then we can weaken the invariant further by

forall k in Zn: sigk =>

\/

This invariant cannot be falsified any more by the activity of any process.

Process 0 can then conclude termination from sig0, ~act0, cnt0+tval = 0, cnt0+tval = 0, and ~tmark. By definition of tmark' and tval', this can be stated as

Forwardi(...) :<=>

/\

 

Figure 7: Token Marker


Resetting Process Markers

If Process 0 cannot yet conclude termination after a termination detection round, it may later start another round. Without resetting the process markers, however, such an attempt is doomed to fail. Fortunately, after it has marked the token, the process marker has fulfilled its task and can be reset:

Forwardi(...) :<=>

/\

Since this action takes place simultaneously with token transmission (i.e., ~sig'i holds), it cannot falsify the invariant.

If the system terminates during a termination detection round, Process 0 may not yet conclude termination at the end of this round. However, after this run, no process is active, there are no messages in the network any more, and the sum of the message counters equals 0. Still some processes may be marked; the next termination detection round may thus fail, too. Anyway, after this run all processes are unmarked, and the next termination detection round will succeed.

Algorithm Specification

The complete algorithm is compiled in Specification 2. The existential quantifier

var v in T: ...

introduces a local program variable v in a formula like (exists v in T: ...) introduces a local mathematical variable v. The crucial difference between both kinds is that the "variable" v may have different values in different states of a program behavior while the "constant" v has always the same value in all states.

 

Specification 2: The Termination Detection Algorithm

SystemTn in N(act in Zn->B, chan in Zn x Zn->Seq(MSG), term in B) :<=>
(a system that may detect its termination)
let

var
(internal variables of the algorithm)


/\

Maintainer: Wolfgang Schreiner
Last Modification: August 20, 1998

previous up next