Notes from Dijkstra’s A Discipline of Programming

9 min readDec 24, 2021

Edsger Dijkstra, famous for graph algorithms, also wrote the 1976 book A Discipline of Programming about the practice of computer science. Thanks to my friend Ben Klemens for the hard-bound copy of the book. It’s interesting to see how insights from almost 50 years ago remain relevant, and so while I read the book I recorded here Dijkstra’s key points, arranged by chapter:

Chapter 0: Executional Abstraction

Algorithms can be contrasted with rote memorization of answers. Whereas rote memorization can give the right answer for a single problem, an algorithm extends a single problem to a class of similar problems with different inputs. But that the algorithm is useful for a wide range of inputs is only a minor benefit.
The major benefit of algorithms over memorization is that a generalized procedure can be verified to produce the right output, whereas rote memorization of outputs for particular inputs must simply be trusted.
By creating algorithms based on other trusted algorithms (e.g. assume the computer knows how to perform addition and subtraction), one can get a larger domain for the inputs without increased cost to verify that it works.

Chapter 1: The Role of Programming Languages

Formal notations for algorithms are unambiguous descriptions of the procedure.
The purpose of formal notation is to create “confidence” that the algorithm works, and without confidence its “right of existence” is lost. Dijkstra later writes that the goal is for the program text and the proof of correctness to go “hand in hand.”
Programming languages differ fundamentally from human languages because human languages are designed for ambiguity rather than specificity. [I’m not sure I would agree with that.]

Chapter 2: States and their Characterization

A variable is a thing that can be in multiple positions, with each position representing a value. A “state space” is all of the combinations in which the variables in a system are filled with values. An algorithm “can be viewed as the system traveling through its state space.”
There are other variables called “free variables” which are not a part of the state space because when the algorithm is executed, these variables will be filled in with a fixed value for the duration of a specific execution.
Sets of states in the state space can be described logically, e.g. x = y, meaning the set of states in the state space where the two variables x and y have the same value.

Chapter 3: The Characterization of Semantics

An algorithm starts in an “initial state” in the state space (i.e. initial variable values) and ends up in a “final state” whose variables’ values at that state represent its output or answer.
The algorithm may not meet its goal for all initial states, and one is not usually interested to describe what the algorithm does for all initial states. The “semantics” of an algorithm are what it does for some set of initial states plus a description of those initial states. (“Does” means it terminates with an answer, not goes on indefinitely or get stuck.) Those initial states may be a subset of the initial states that the algorithm actually meets its goal for (called the “weakest pre-condition corresponding to that post-condition”) but for which a description is not available.
To fully know an algorithm is to be able to say, for any goal, which initial states are going to work to reach that goal. The function from goals to initial states he calls a predicate transformer.
Testing an algorithm can show that bugs exist but cannot show that no bugs exist.

Chapter 4: The Semantic Characterization of a Programming Language

Program text (i.e. source code of an algorithm) has two purposes. One is of course to instruct a machine to execute the algorithm.
The second purpose is to instruct the reader how to find the algorithm’s predicate transformer so that the algorithm can be fully understood. A “well-defined programming language” is one in which the predicate transformer can always be found for any program text.
Finally we get to the first examples of a predicate transformer: The programming language statement “skip” (i.e. do nothing, no-op) has the identity transformation as its predicate transformer. That is, whatever state space you want the program to be in after “skip” is the same as the state space it needs to be in before “skip.” [This is not the same as skip being an identity transformation on states, i.e. do nothing, which it also is. Dijkstra’s predicate transformer is about the goal of the algorithm as sets of possible inputs and outputs, not a particular set of inputs in a single execution.] The predicate transformer of “abort” is the function from whatever state space you want the program to be in after “abort” to an empty set — i.e., no state space before abort will get you to the goal.
The predicate transformer of a variable assignment (e.g. “x := 1”) is a function from desired state spaces (x = 0, x= 1, x = 2) to initial state spaces (x = 0, x =1, x = 2) such that if the desired state space matches the variable assignment then any initial state is permitted, and if the desired state space does not match the variable assignment then no initial state space is permitted.
Dijkstra then defines control structures with a syntax that has been lost to history. An if-type “guarded command set” is denoted as “if E1→S1 ▯ E2→S2 ▯ … fi.” Each boolean expression E1, E2, … is a “guard” and is followed by a command (S1, S2, …) that is only executed if the guard expression is true. It is like our modern day if-elseif but with two differences: 1) An implicit else abort is at the end, and 2) if multiple conditions match, which statement is executed is undefined. [An outlined “|” for elseif makes some sense as a meta “or”. “fi” remains in the bash family of languages.] A do-od guarded command set is denoted “do E1→S1 ▯ E2→S2 ▯ … od” and has no modern-day equivalent: This command is like the if-fi construct but it repeats indefinitely until no guard matches (i.e. instead of aborting, it exits the loop). Dijkstra defines the predicate transformers for these, but I’ve lost interest in that part of the book.

Chapters 5 and 6 deal with proofs of post-conditions after if-fi and do-od constructs, loop invariants, and termination of do-od loops.

Chapter 7: Euclid’s Algorithm Revisited

Chapter 0 used Euclid’s algorithm for finding greatest common divisors as an example to demonstrate an algorithm and state spaces. This chapter writes the algorithm in Dijkstra’s mini programming language:

if x > 0 and y > 0 →
do x>y → x≔x-y ▯ y>x → y≔y-x; od

where x and y are set initially to the two numbers whose GCD is sought, and x at the end holds the GCD. Dijkstra applies the logic of state spaces to prove that it works, I think?

The next chapter does some other small examples, and the one after that some formal proofs. I didn’t read them.

Chapter 10: An Essay on the Notation: “The Scope of Variables”

In circuit design there is a practical limit on the number of “flip-flops” that can hold binary state, and system state is compactly stored in the flip-flops. But there is no practical limit on the number of variables in a program. Variables are created freely and with the intention to separately capture orthogonal concepts in the program.
The problem with having many variables is the programmer may make a mistake and co-opt a variable name already in use. The first version of FORTRAN only had global variables and variables did not require declaration, so this mistake was possible. Algol 60 introduced nestable variable scope blocks: Within a begin/end block variables declared in that block cannot be accessed in program text before or after that block (but the opposite — accessing variables at a higher scope — is permitted, as is normal today).
Algol 60’s scope blocks partially solve the problem when scope blocks are nested. Dijkstra proposes that variables declared in higher-level scopes should not be accessible in narrower scopes unless explicitly listed. (Python has a global statement that does something like this, and C++ lambda functions have optional explicit closure lists. But modern programming languages generally did not take Dijkstra’s advice. To guard against the variable co-opting problem, we typically move the code so that it is no longer contained within a scope that has any variables in danger of being co-opted, and we use a function call to get there.)
Programming languages should be designed so that accessing an uninitialized variable is impossible: The language should permit flow analysis so that detecting this is possible. “Passive scope” is the part of a scope where a variable has not yet been initialized and the variable is not used. “Active scope” is the part of a scope after a variable has been initialized. Scopes should be dividable into passive and active regions for each variable.
When explicitly listing variables in higher scopes that the narrower scope should have access to, there are two types: variables that come initialized (the narrower scope is in its active scope) and variables that come uninitialized which the narrower scope has an obligation to initialize (the narrower scope is on the boundary between the variable’s passive and active scope). (The latter are similar to “out parameters” of functions, and also return values which are basically a special case of an “out parameter.”)

Chapter 11: Array Variables

Dijkstra adds to his toy language “arrays,” but these are not modern fixed-sized regions of memory. Dijkstra’s arrays are data structures that have an access and assignment operation that takes an integer index (so far this is like an array), the operations hiext and loext (‘extend on the high/low side’, Javascript’s push and unshift or C++’s push_back and push_front), hirem/lorem (‘remove on the high/low side’, Javascript’s pop and shift, C++’s pop_back and pop_front), and shift which has no modern equivalent. shift changes the mapping of indexes to values by shifting the indexes of the array elements up or down (i.e. changes the index corresponding to the lower bound of the array). Dijkstra adds aliases hipop and lopop with their obvious meanings.

Chapters 12–25

The next chapters present small algorithmic challenges in Dijkstra’s toy language, including a few graph algorithms, which is how he’s now most remembered.

Chapter 26: On Manuals and Interpretations

Is it that computational abstractions defined in books are modeling physical computing machines, or are physical computing machines merely an approximate “working model” of true computation? When a machine and its manual are not in agreement, is it a bug in the manual or a bug in the machine? Dijkstra describes the coming of age of “computing science” as a shift from the former points of view to the latter. Abstractions are proved correct, and machines implement those specifications.
He laments that software is too often created without regard to correctness — if only programmers used his methods, alas!, he is saying.
Implementing a computation in hardware was still thought about in terms of Turing machines and the primitive availability of computer memory in the 1970s. So he introduces the concept of a “hopefully sufficiently large machine” (HSLM), a machine that halts if the program it executes requires more state space (memory) than the machine has, which lets the programmer know that an even larger machine is necessary. All modern computers are HSLMs!

Chapter 27: In Retrospect

This final chapter could very nearly have been written today… except for the not so subtle attempts to convince the reader to use his programming language. But he describes key challenges of computing in a way that remains helpful:

Comparing a programmer with a surgeon, Dijkstra writes,

“Both should exercise the utmost care, but the surgeon has fulfilled his obligations in this respect when he has taken the known precautions and is then allowed to how that circumstances outside his control will not ruin his work. Nor is the surgeon blamed for the incompleteness of his control: the unfathomed complexity of the human body is an accepted fact of life. But the programmer can hardly exonerate himself by appealing to the unfathomed complexity of his program, for the latter is his own construction! With the possibility of complete control, he also gets the obligation: it is the consequence of the basic simplicity.” (p210)

In computing, he writes, the programmer has control over a process at 10 orders of magnitude: an instruction operating in less than a microsecond (1e-6 seconds) and the program operating over hours of computation time (1e+4 seconds, making a ratio of 1e10-to-1). “And in this respect the challenge of the programming task seems indeed without precedent.”

For managing this hierarchical complexity, “I usually refer to it as ‘a separation of concerns,’ ” a term now familiar in both programming and computer security, and Dijkstra seems to have making an effort here to promote the use of the term (perhaps implying he coined it). He explains it as forgetting, for a moment, the other orders of magnitude while focusing on one module — like forgetting how hardware will implement a computation while focusing on the correctness-oriented programming text.