Go up to TopGo forward to 2 The System Model |

We present a small case study on the engineering of a distributed algorithm: its derivation from the problem specification, formal specifications of its behavior and of the desired correctness property, and rigorous proofs that the behavior always meets this property and eventually leads to a desired state.

This paper was inspired by a lecture of E.W. Dijkstra and corresponding handouts which derived in 45 minutes respectively on eight hand-written A5 pages a distributed algorithm from the problem specification [Dij98]. The derivation starts with a simple algorithmic idea and a corresponding invariant that captures the state of the system at any time and that implies the desired property at termination. However, the initial algorithm cannot maintain the stated invariant; both the algorithm and the invariant are therefore gradually refined (always implying correctness at termination) until algorithm and invariant match.

This paper puts the high-level reasoning shown in that lecture into a formal framework in which we are able to specify every detail of the algorithm and to rigorously reason about its correctness. The chosen language is essentially (i.e., in spirit, not in all syntax) the Temporal Logic of Actions [Lam94] which specifies a concurrent system by a logic formula that describes every possible behavior of the system. Reasoning about the system is ultimately reduced to reasoning in classical first-order predicate logic.

At first glance, it may sound strange to call such a process "engineering" since this term is in computer science usually associated to heuristics, practices, and supporting tools that experts consider useful for constructing large software systems. Since the underlying algorithms are often rather simple and intuitively well understood, the algorithmic side of the problem tends to be neglected and rigorous reasoning is rarely applied.

Likewise sometimes the impression exists that the central problem of
engineering distributed software systems is that of developing suitable
programming languages, monitoring and debugging tools, standardized
communication protocols, component interfaces, and the like. While these are
undoubtedly important aspects, this view captures only one side, that of
*technology*. Another at least equally important side is that
of *thinking* in the context of distributed systems, which is difficult
because one has to consider concurrent activities that may interfere with each
other. Reasoning by "hand-waving" is here dangerous and ultimately yields
erroneous and unreliable software.

In order to build industrial-strength reliable distributed systems [Bir96], a software engineer needs a mental skeleton that helps her to cope with the issues related to concurrency, a framework within she can express and refine her ideas, and rules that allow heer to reason about their correctness. Even if the resulting algorithm is actually never formally verified in detail, this skeleton helps one to focus on the important aspects of the design and allows one to increase confidence in its correctness by checking the most critical parts. An integrated part of the engineering of distributed systems is therefore the modeling of concurrent activities and the reasoning about them.

It is the aim of this paper to demonstrate rigorous reasoning in the context of a distributed system. For this purpose, we use a formal language that give a unique meaning to concepts that can be only ambiguously expressed in natural language (which we however extensively use for motivation and comments). Proofs are presented in a semi-formal style suitable for human reading; some go into large detail, while some contain just the crucial information that should enable anyone to fill in the missing parts. Chapter 2 introduces the formal language that we will use to specify the problem and the algorithm; Chapter 3 is essentially a transcription of Dijkstra's derivation in this language; Chapter 4 presents a proof of the partial correctness of the algorithm; Chapter 5 includes a corresponding termination proof.

Maintainer: Wolfgang Schreiner

Last Modification: August 20, 1998