Specifying Concurrency¶
The assignment statement, sequential composition, conditional statement, and repetitive statement are universal in the sense that any sequential program can be expressed through those. For concurrent programs, we need to add means to express:
- parallel composition
- mutual exclusion
- condition sychronization
When reasoning about concurrency, it is natural to think in terms of those. Programming languages commonly do not support parallel composition, mutual exclusion, and condition synchronization in full generality, but their constructs can be understood by those notions.
Parallel Composition¶
Statements execute in a sequence of atomic steps. Such sequences are called traces. These can be either finite or infinite. The behaviour of a statement is given by the set of all traces.
The parallel composition S₁ ‖ S₂
executes S₁
and S₂
"in parallel" by interleaving (merging) their traces; formally, σ
is a trace of S₁ ‖ S₂
if σ
projected onto S₁
steps is a trace of S₁
and projected onto S₂
steps is a trace of S₂
.
In state diagrams, parallel composition is visualized by a dashed line; transitions are atomic steps:
The composition S₁ ‖ S₂
terminates when both S₁
and S₂
terminate. For example:
Statement | Behaviour |
---|---|
A | {➀ ➁} |
B | {➀ ➁} |
| A ‖ B | {➀ ➁ ➀ ➁, ➀ ➀ ➁ ➁, ➀ ➀ ➁ ➁,
➀ ➀ ➁ ➁, ➀ ➀ ➁ ➁, ➀ ➁ ➀ ➁}
Question: If ➀ = x := 1
, ➁ = x := 2
, ➀ = y := 1
, ➁ = y := 2
what is the final state?
Answer: x = 2 ∧ y = 2
The statement fork P
, where P
is a procedure name, creates a new process and executes it concurrently with the one creating it. The statement join P
waits for a process to terminate. For example:
fork P
x := 1
join P
procedure P
y := 2
Any ‖
statement can be equivalently expressed by fork
and join
:
S₁ ‖ … ‖ Sₙ
is equivalent to
fork P₁ ; … ; fork Pₙ
join P₁ ; … ; join Pₙ
procedure Pᵢ
Sᵢ
Unlike the parallel composition, fork
statements allow an arbitrary number of processes to be started, but introduce a parent-child relation among the processes.
The process declaration executes the declared process concurrently with any other declared processes.
process p
S
Process declarations can be equivalently expressed with ‖
or with fork
and join
. For example
process setX
x := 1
process setY
y := 2
is equivalent to:
setX ‖ setY
procedure setX
x := 1
procedure setY
y := 2
The par
statement is a generalization of ‖
in the same way as for
is a generalization of ;
:
par x = E to F do S
Here, x
is a constant local to the par
statement. For example, vectors a, b : array 1 .. N of integer
can be added in parallel:
par i = 1 to N do
b(i) := b(i) + a(i)
This can be equivalently expressed with fork
and join
:
for i = 1 to N do fork P(i)
for i = 1 to N do join P(i)
procedure P(i: 1 .. N)
b(i) := b(i) + a(i)
Dedicated languages or language extension for concurrent and for numeric computing have par
statements (Matlab) or process declarations (called tasks in Ada); Python, Java, and C/Pthread have only fork
and join
. Go has only fork
(called go
). None of these languages have ‖
as an operator.
For a, b, c: array 0 .. N - 1, 0 .. N - 1 of float
, sequential matrix multiplication is expressed by:
for i = 0 to N - 1 do
for j = 0 to N - 1 do
c(i, j) := 0.0
for k = 0 to N - 1 do
c(i, j) := c(i, j) + a(i, k) × b(k, j)
Question: Which of the three for
can be turned into a par
?
Answer: Only the outer two can:
par i = 0 to N - 1 do
par j = 0 to N - 1 do
c(i, j) := 0.0
for k = 0 to N - 1 do
c(i, j) := c(i, j) + a(i, k) × b(k, j)
Aside: It would be more systematic to write seq
instead of for
for generalized sequential composition, as in occam, a predecessor of Go.
The previous program creates N²
processes, which may be far more than there are processors; in some languages, the overhead of process creation may outweigh the benefit of parallelism. A solution is to use one worker process per strip.
Let P
be the number of worker processes and N
a multiple of P
:
procedure worker(w: 1 .. P)
const first = (w - 1) × N div P
const last = first + N div P - 1
for i = first to last do
for j = 0 to N - 1 do
c(i, j) := 0.0
for k = 0 to N - 1 do
c(i, j) := c(i, j) + a(i, k) × b(k, j)
par w = 1 to P do worker(w)
Matrix multiplication is an example of "embarrassing parallelism", as no synchronization is needed: all processes assign to different array elements.
Mutual Exclusion¶
In algorithms, atomic steps are indicated by atomicity brackets. These can be placed around expressions and statements, for example:
⟨x := x + 1⟩ ‖ ⟨x := x + 2⟩
⟨x⟩ := ⟨x + 1⟩ ‖ ⟨x⟩ := ⟨x + 2⟩
Atomicity brackets cannot be nested. Auxiliary variables ("registers") are needed to express atomicity in state diagrams. Above statements are represented by:
When atomicity brackets are left out, implicitly only access to individual basic variables and array elements of basic types is atomic:
x := x + 1 = ⟨x⟩ := ⟨x⟩ + ⟨1⟩
If processes run by multiplexing on a single processor, atomicity can be implemented by disabling and enabling interrupts: any switch to another process is caused by a timer interrupt at the end of a process' time slice or by an interrupt from an external event. Care is needed to ensure that the atomic regions don't last too long and other processes are not unduly delayed or external events lost.
If multiple processors (cores) are available, ensuring atomicity by suspending processors would be too drastic. Instead, the hardware guarantees that certain operations are atomic. Typically, reading and writing a word from and to memory is atomic. Programming languages reflect this: for example, in Java access to int
, boolean
, float
, and pointers, which are all word-sized, is atomic, but not access to double
and long
. We continue to use atomicity brackets as a design notation and return to techniques for implementing atomic regions.
Question: Which of these algorithms for computing the maximum of a: array 1 .. N of integer
is correct?
m := a(1)
for i = 2 to N do if a(i) > m then m := a(i)
m := a(1)
par i = 2 to N do if ⟨a(i)⟩ > ⟨m⟩ then ⟨m⟩ := ⟨a(i)⟩
m := a(1)
par i = 2 to N do ⟨if a(i) > m then m := a(i)⟩
m := a(1)
par i = 2 to N do if ⟨a(i) > m⟩ then ⟨m := a(i)⟩
m := a(1)
par i = 2 to N do
if a(i) > m then ⟨if a(i) > m then m := a(i)⟩
Answer: Correct, incorrect, correct (but less efficient than sequential version), incorrect, correct (and potentially faster than sequential version).
What can we say about the postcondition of a parallel composition? In the example below, the statements x := x + 1
and y := y + 2
are correct with respect to their pre- and postconditions:
Their parallel composition establishes the conjunction of their postconditions.
Compare this to the parallel composition of x := 1
and x := 2
:
Their parallel composition establishes the disjunction of their postconditions.
Now consider the parallel composition of ⟨x := x + 1⟩
and ⟨x := x + 2⟩
:
Question: What are appropriate predicates P₁
, P₂
, Q₁
, Q₂
?
Answer: Here is a naive attempt:
If both processes start with x = 0
and ⟨x := x + 1⟩
is executed first, the precondition of ⟨x := x + 2⟩
is no longer x = 0
. Likewise, if ⟨x := x + 2⟩
is executed first and establishes x = 2
, then ⟨x := x + 1⟩
would invalidate that postcondition. The reason is interference. Additionally, from x = 1
and x = 2
the postcondition x = 3
cannot be inferred.
Question: How can the annotation be modified such that the parallel execution will not invalidate the predicates?
Answer: The annotation needs to be weakened.
Now each process does not interfere with the states of the other process.
As another example, consider following two procedures operating on global variables a
, b
, t
:
procedure P
t, a := false, 0
while ¬t do
t, a := true, a + 1
procedure Q
t, b := true, 0
while t do
t, b := false, b + 1
Question: Do P
and Q
terminate and what is their outcome? What are the loop invariants?
Answer:
P
terminates witht ∧ a = 1
; the invariant is(¬t ∧ a = 0) ∨ (t ∧ a = 1)
;Q
terminates with¬t ∧ b = 1
; the invariant is(t ∧ b = 0) ∨ (¬t ∧ b = 1)
.
Question: Does P ‖ Q
terminate and what is the outcome?
Answer: P ‖ Q
may terminate with a ≥ 0 ∧ b ≥ 0
or may not terminate. The state diagram shows how the annotation of P
that is correct for sequential execution is invalidated by the concurrent execution of Q
:
Question: How can the annotation be modified such that the parallel execution will not invalidate the predicates?
Answer: The loop invariant in P
is a ≥ 0
and in Q
is b ≥ 0
.
Consider following parallel compositions:
x := y + z ‖ (y := 3 ; z := 5)
x := y + z ‖ y := 3
The assignment x := y + z
is executed as ⟨x⟩ := ⟨y⟩ + ⟨z⟩
. On the left-hand side, the value taken for y
could be that before or after y := 3
and the value of z
could be that before or after z := 5
. However, on the right-hand side the effect of x := y + z
is the same as ⟨x := y + z⟩
, allowing a simpler state diagram:
The reason is that on the right-hand side, only one of the variables of y + z
is changed to in the parallel process.
In a parallel composition
S₁ ‖ … ‖ Sₙ
an expression E
in Sᵢ
satisfies the at-most-once property if it refers at most once to a basic variable that may be changed in another process. An assignment x, y := E, F
in Sᵢ
satisfies the at-most-once property if it refers at most once to a basic variable (x
, y
, or in E
, F
) that may be changed in another process.
Expressions and assignment statements with the at-most-once property are evaluated or executed as if they were atomic. Thus in state diagrams they don't need to be broken up.
Question: which of the following assignments satisfy the at-most once property?
- the two assignments in
x := 3 ‖ x := x + 1
- the three assignments in
x := 3 ‖ y := 5 ‖ z := x + y
- the assignment
t, a := false, 0
in procedureP
above when consideringP ‖ Q
- the assignment
t, a := true, a + 1
in procedureQ
above when consideringP ‖ Q
Answer:
- only
x := 3
- only
x := 3
andy := 5
- yes, as only
t
is changed inQ
- yes, as only
t
is changed inP
This retrospectively justifies the state diagram for P ‖ Q
.
An atomic statement A
that starts in state P
does not interfere with state Q
if A
preserves Q
, formally:
algorithm
{P ∧ Q} A {Q}
In state diagrams non-interference is visualized by a green dotted line. If A
is a guarded assignment statement, the condition for non-interference is:
P ∧ B ∧ Q ⇒ Q[x, y := E, F]
then
Question: Consider again the state diagram for ⟨x := x + 1⟩ ‖ ⟨x := x + 2⟩
. What are the conditions for non-interference? Give the proofs!
Answer: The conditions for x := x + 1
not intefering are:
(x = 0 ∨ x = 2) ∧ (x = 0 ∨ x = 1) ⇒ (x = 0 ∨ x = 1)[x := x + 1]
(x = 0 ∨ x = 2) ∧ (x = 2 ∨ x = 3) ⇒ (x = 2 ∨ x = 3)[x := x + 1]
The conditions for x := x + 2
not intefering are:
(x = 0 ∨ x = 1) ∧ (x = 0 ∨ x = 2) ⇒ (x = 0 ∨ x = 2)[x := x + 2]
(x = 0 ∨ x = 1) ∧ (x = 1 ∨ x = 3) ⇒ (x = 1 ∨ x = 3)[x := x + 2]
The proof of the first condition is:
algorithm
(x = 0 ∨ x = 2) ∧ (x = 0 ∨ x = 1) ⇒ (x = 0 ∨ x = 1)[x := x + 1]
≡ {by substitution, factoring out x = 0, simplification}
x = 0 ⇒ (x + 1 = 0 ∨ x + 1 = 1)
≡ {arithmetic}
x = 0 ⇒ (x = - 1 ∨ x = 1)
≡ {weakening / strengthening}
true
Rule for correctness of parallel composition: Assume that statements S₁
, S₂
establish Q₁
, Q₂
under P₁
, P₂
, respectively:
algorithm
{P₁} S₁ {Q₁}
{P₂} S₂ {Q₂}
Their parallel composition will establish the conjunction of the postconditions under the conjunction of the preconditions,
algorithm
{P₁ ∧ P₂} S₁ ‖ S₂ {Q₁ ∧ Q₂}
provided that all atomic statements (transitions) S₁,ᵢ
in S₁
do not interfere with any state of S₂
and vice versa, visually:
The rule generalizes to more than two processes.
Condition Synchronization¶
Condition synchronization delays one process until another establishes a certain condition. The fundamental mechanism is the await
statement:
⟨await B then S⟩
The await statement "waits" until Boolean expression B
is true and then executes statement S
atomically. Formally, this reduces the number of interleavings. The body S
can also be empty:
⟨await B⟩ = ⟨await B then skip⟩
For example:
⟨await s > 0 then s := s - 1⟩
⟨await s > 0⟩ ; s := s - 1
Question: What is the difference between these two?
Answer: The first guarantees that s
will not be negative, the second not.
The await
statement is at the very core of every synchronization mechanism, even though few programming languages support it in full generality. One way to implement await B
is by spinning: the process keeps looping as long as B
is false.
Formally, the statement ⟨await B then S⟩
is just ⟨if B → S⟩
, so the same correctness rule holds:
Rule for correctness of await:
algorithm
{P} ⟨await B then S⟩ {Q}
P ⇒ ∆B
and
{P ∧ B'} S {Q}
If B
is false, the interpretation in concurrent programs is waiting until it becomes true, rather than staying in the initial state forever.
Producer and Consumer¶
A producer places objects into a (one-place) buffer. A consumer removes objects from the buffer. The producer has to wait until the buffer is empty before placing an object. Producers and consumers proceed in their own pace. The consumer has to wait until the buffer is not empty before removing the object. Assume buf: T
:
p := 0 ; c := 0
process Producer
var a: array 0 .. N - 1 of T = …
while p < N do
⟨await p = c⟩
buf := a(p)
p := p + 1
process Consumer
var b: array 0 .. N - 1 of T
while c < N do
⟨await p > c⟩
b(c) := buf
c := c + 1
Question: What is the relation among a, b, p, c
?
The state diagram shows the required annotations to allow to conclude that upon termination of Consumer
, it will have a copy of array a
of Producer
in its own local array b
. Formally, the producer invariant PI
is conjoined to every state of Producer
, the consumer invariant CI
is conjoined to every state of Consumer
, and the global invariant PC
is conjoined to every state:
Textually, with some simplifications this is equivalently expressed as:
algorithm
p := 0 ; c := 0
{PC: 0 ≤ p ≤ c + 1 ∧ (p = c + 1 ⇒ buf = a(p - 1))}
algorithm
process producer
var a: array 0 .. N - 1 of T = …
{PI: p ≤ N}
{PC ∧ PI}
while p < N do
{PC ∧ p < N}
⟨await p = c⟩
{PC ∧ p < N ∧ p = c}
buf := a(p)
{PC ∧ p < N ∧ p = c ∧ buf = a(p)}
p := p + 1
{PC ∧ PI}
algorithm
process consumer
var b: array 0 .. N - 1 of T
{CI: c ≤ N ∧ b[0..c-1] = a[0..c-1]}
{PC ∧ CI}
while c < N do
{PC ∧ CI ∧ c < N}
⟨await p > c⟩
{PC ∧ CI ∧ c < N ∧ p > c}
b(c) := buf
{PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c)}
c := c + 1
{PC ∧ CI}
Each of the black solid transitions has to be correct in isolation. The producer invariant PI
does not contain variables that are modified by Consumer
and likewise the consumer invariant CI
does not contain variables that are modified by Producer
, so these are always preserved by transitions of the other process. The conditions for Consumer
are
algorithm {PC ∧ CI ∧ c < N ∧ p > c} b(c) := buf {PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c)}
2. ```algorithm
{PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c)}
c := c + 1
{PC ∧ CI}
or equivalently:
PC ∧ CI ∧ c < N ∧ p > c ⇒ (PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c))[b := (b; c: buf)]
2. ```
PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c) ⇒ (PC ∧ CI)[c := c + 1]
The green dotted arrows indicate which transitions of Consumer
may interfere with the states of Producer
. There are six non-interference conditions to check for the transition with b(c) := buf
; of those the three with dotted arrows going to "empty" states of Producer
are identical, leaving four conditions:
algorithm {PC ∧ PI ∧ CI ∧ c < N ∧ p > c} b(c) := buf {true}
2. ```algorithm
{PC ∧ PI ∧ CI ∧ c < N ∧ p > c ∧ p < N}
b(c) := buf
{p < N}
algorithm {PC ∧ PI ∧ CI ∧ c < N ∧ p > c ∧ p < N ∧ p = c} b(c) := buf {p < N ∧ p = c}
4. ```algorithm
{PC ∧ PI ∧ CI ∧ c < N ∧ p > c ∧ p < N ∧ p = c ∧ buf = a(p)}
b(c) := buf
{p < N ∧ p = c ∧ buf = a(p)}
These hold as b(c) := buf
does not modify any of the variables in the states that need to be preserved.
Similarly, there are four distinct non-interference conditions for the transition with c := c + 1
:
algorithm {PC ∧ PI ∧ CI ∧ c < N ∧ p > c} c := c + 1 {true}
2. ```algorithm
{PC ∧ PI ∧ CI ∧ c < N ∧ p > c ∧ p < N}
c := c + 1
{p < N}
algorithm {PC ∧ PI ∧ CI ∧ c < N ∧ p > c ∧ p < N ∧ p = c} c := c + 1 {p < N ∧ p = c}
4. ```algorithm
{PC ∧ PI ∧ CI ∧ c < N ∧ p > c ∧ p < N ∧ p = c ∧ buf =a(p)}
c := c + 1
{p < N ∧ p = c ∧ buf =a(p)}
The first two hold as c := c + 1
does not modify any of the variables in the states that need to be preserved. The last two hold as the precondition (antecedent) is false
: p = c
and p > c
cannot hold at the same time.
Formally, concurrent states P
and Q
exclude each other if ¬(P ∧ Q)
. For example, states p < N ∧ p = c
and c < N ∧ p > c
exclude each other.
The producer-consumer processes do not interfere with each other. In general, interference can be avoided by
- having annotations only over local variables, i.e. not relying on any values of shared variables,
- when shared variables are involved, making annotations sufficiently weak, as in the example with
⟨x := x + 1⟩ ‖ ⟨x := x + 2⟩
, - using atomic regions where needed; in the example with variables
a
,b
,t
, that could be⟨while ¬t do t := true ; a := a + 1⟩
Fairness¶
Question: Does the following program terminate? What is the postcondition if it ever terminates?
algorithm
n, t := 0, false
(while ¬t do n := n + 1) ‖ t := true
The minimal progress assumption implies only that one process has to make progress, if possible at all. Under minimal progress, above program may not terminate.
An unfair scheduler may select one process and neglect others. We may require a scheduler to be (weakly) fair:
- an unconditional atomic region is executed eventually;
- a conditional atomic region whose guard is continuously true is executed eventually.
Under fairness, above program does necessarily terminate; if it does, n
will be at least 0
.
Question: Does the following program terminate under a (weakly) fair scheduler?
t, u := false, true
⟨await t then u := false⟩ ‖ while u do t := true ‖ while u do t := false
Under weak fairness, it does not necessarily terminate. For that, the scheduler would have to be strongly fair: if the guard of a conditional atomic region is repeatedly (but not necessarily continuously) true, it is taken.
Typically, schedulers of operating systems and programming languages are weakly fair. Strong fairness is more difficult to implement.
Safety and Liveness¶
Every functional property of a (sequential or concurrent) program can be formulated as either a safety or a liveness property.
Safety: nothing bad will ever happen; all states of all traces are "not bad". Safety is expressed by invariance properties.
Liveness: something good will eventually happen; all traces will eventually lead to a "good" state.
Termination is a liveness property: a final state will eventually be reached.
Formally, safety properties are properties of individual states of traces (or finite sequences of states), and liveness properties are properties of infinite traces.
Question: What are the safety and liveness properties of a traffic intersection?
Question: What is the outcome of P ‖ Q
? Do P
and Q
interfere with each other with the given annotation?
algorithm
procedure P
{true}
while n > 0 do n := n - 1
a := 1
{a = 1}
algorithm
procedure Q
{true}
while n < 5 do n := n + 1
b := 1
{b = 1}
The parallel composition P ‖ Q
does not always terminate, but if it does, it establishes both postconditions; P
and Q
do not interfere with each other.
Without interference, safety properties are preserved in a parallel composition, but liveness, including termination not!
Critical Section¶
Mutual exclusion is typically implemented by locks that protect critical sections.
algorithm
process CSi
while true do
entry protocol
critical section
exit protocol
noncritical section
Any solution has to satisfy three safety properties and one liveness property:
- Mutual exclusion: at most one process can be in its critical section
- No deadlock (livelock): if two or more are trying to enter and no other process is in its critical section, one will.
- No unnecessary delay: if one is trying to enter and no other process is in its critical section, it is not prevented.
- Eventual entry: a process trying to enter will eventually succeed.
Question: What are these properties at a traffic intersection? You may consider 4-way stops and traffic lights.
Answer:
- Mutual exclusion: at most one car enters the intersection
- No deadlock: if four cars arrive at a 4-way stop at the same time, they will not all wait, but one will enter
No livelock: if four cars arrive at a 4-way stop at the same and all drivers gesture to the car on the right that they may proceed, they will not keep doing that forever but one will enter - No unnecessary delay: if a car arrives at a traffic light and there are no other cars, the traffic light will eventually turn green
- Eventual entry: all cars arriving at an intersection will eventually be able to enter it
Here is an abstract solution, i.e. with "coarse-grained" atomic regions; CS
is the critical section invariant:
algorithm
in1, in2 := false, false
{CS: ¬(in1 ∧ in2)}
algorithm
process CS1
while true do
⟨await ¬in2 then in1 := true⟩
critical section
in1 := false
noncritical section
algorithm
process CS2
while true do
⟨await ¬in1 then in2 := true⟩
critical section
in2 := false
noncritical section
Question: Do the four properties, mutual exclusion, no deadlock (livelock), no unnecessary delay, eventual entry hold?
Answer: Yes, yes, yes, no.
Exercise: Give the annotation that is required for the CS
invariant in a state diagram! Check both the correctness of transitions as well as non-interference.
The solution can be easily generalized to more processes by observing that only one of in1
, in2
, ...
can be true, and hence these can be replaced by a single variable lock
. The relationship is lock = in1 ∨ in2 ∨ ...
:
algorithm
lock := false
algorithm
process CS1
while true do
⟨await ¬lock then lock := true⟩
critical section
lock := false
noncritical section
algorithm
process CS2
while true do
⟨await ¬lock then lock := true⟩
critical section
lock := false
noncritical section
All processors have an atomic test-and-set instruction or an equivalent instruction:
algoritm
procedure TAS(lock: pointer to boolean) → (init: boolean)
⟨init := *lock ; *lock := true⟩
Here, *lock
refers to the variable pointed to by lock
; procedure TAS
returns a boolean result init
(in a processor, lock
and init
would be registers).
By employing a spin lock, eventual entry (property 4) is "more likely":
algorithm
process CS1
while true do
repeat init ← TAS(lock) until ¬init
critical section
lock := false
noncritical section
The loop repeat S until B
is formally equivalent to S ; while ¬B do S
.
The atomic region ⟨S⟩
can be implemented by
algorithm
CSenter
S
CSexit
provided that access to all variables of S
are protected with the same lock; atomic regions accessing disjoint variables can proceed in parallel.
The guarded atomic region ⟨await B then S⟩
can be implemented by
CSenter
while ¬B do CSexit ; CSenter
S
CSexit
provided that access to all variables of B
and S
are protected with the same lock.
This involves spinning and may lead to memory contention when accessing the variables of B
. To reduce this, it is preferable to insert a delay:
CSenter
while ¬B do CSexit ; delay ; CSenter
S
CSexit
A delay can be randomly chosen from an interval that is doubled each time, known as the exponential back-off protocol.
The Tie-Breaker Algorithm (Peterson's Algorithm) obtains a fair solution to the critical sections problem by recording which process had the last turn; in1
, in2
now mean that CS1
, CS2
want to enter their critical section:
algorithm
in1, in2, last := false, false, 1
algorithm
process CS1
while true do
last := 1 ; in1 := true
⟨await ¬in2 or last = 2⟩
{in1 ∧ (¬in2 ∨ last = 2)}
critical section
in1 := false
noncritical section
algorithm
process CS2
while true do
last := 2 ; in2 := true
⟨await ¬in1 or last = 1⟩
{in2 ∧ (¬in1 ∨ last = 1}
critical section
in2 := false
noncritical section
The drawback of the tie-breaker algorithm is that it is difficult to generalize to more than two processes. The idea of the ticket algorithm is that each process first gets a ticket with the number in the queue. Assume turn: array 0 .. N - 1 of integer
:
algorithm
number, next, turn := 1, 1, [0, ..., 0]
{next > 0 ∧
(∀ i ∈ 0 .. N - 1 • (CS(i) in critical section ⇒ turn(i) = next) ∧
(turn(i) > 0 ⇒ (∀ j ∈ 0 .. N - 1 • j ≠ i ⇒ turn(i) ≠ turn(j))))}
process CS(i: 0 .. N - 1)
while true do
⟨turn(i) := number ; number := number + 1⟩
⟨await turn(i) = next⟩
critical section
⟨next := next + 1⟩
noncritical section
The parameter of CS
is a notation for process replication: N
copies of CS
are started, each with a different value of i
.
The ticket algorithm requires fetching and incrementing an integer as one atomic operation. Some processors have a fetch-and-add instruction specifically for this purpose.
- The x86 processor has an XADD instruction that can be used with the LOCK prefix to achieve atomicity,
algorithm
LOCK XADD dst, src = ⟨dst, src := dst + src, dst⟩
where dst
is a register or memory location and src
is a register.
- The ARM v7 processor splits atomic updates into two instructions, load-exclusive and store-exclusive, with the hardware monitoring what happens in between,
algorithm
LDREX dst, [src] = ⟨dst := Memory[src]⟩
STREX R, src, [dst] = ⟨Memory[dst], R := src, 0 ⫿ R := 1⟩
were dst
, src
, R
are registers and ⫿
is nondeterministic choice: if the memory location was modified in the meantime, R
is set to 1
, otherwise to 0
. An increment of Memory[ptr]
is performed in a loop:
arm
L: LDREX R1, [ptr] ; load exclusive and monitor Memory[ptr]
ADD R1, R1, 1 ; add 1
STREX R2, R1, [ptr] ; attempt to store
CMPNE R2, #1 ; check if store-exclusive failed
BEQ L ; failed, retry
- The gcc compiler has built-in functions for atomic operations that are compiled to whatever the hardware supports:
type __atomic_add_fetch (type *ptr, type val, int memorder)
where type
must be 1, 2, 4, or 8 bytes in length.
Question: Does the ARM code guarantee fairness among processes trying to increment Memory[ptr]
?
Answer: No, it is not fair, but the chances of one process starving are negligible.
If a special instruction is not available, then an implementation of
algorithm
⟨turn(i) := number ; number := number + 1⟩
would be
algorithm
CSenter
turn(i) := number ; number := number + 1
CSexit
where CSenter
and CSexit
use a lock for variable number
. Although the contention for accessing number
may not be resolved in a fair way, it turns out that this implementation works sufficiently well in practice.
The ticket algorithm leads to memory contention if many processors are trying to access number
. The bakery algorithm does not rely on a single global counter and does not need a special fetch-and-add instruction. Assuming turn: array 0 .. N - 1 of integer
, a coarse-grained solution is:
algorithm
turn := [0, ..., 0]
process CS(i: 0 .. N - 1)
while true do
⟨turn(i) := max(turn) + 1⟩
for j = 0 to N - 1 do
if i ≠ j then ⟨await turn(j) = 0 or turn(i) < turn(j)⟩
critical section
turn(i) := 0
noncritical section
Operation max(turn)
returns the maximum value in array turn
.
The max
operation would need to be implemented by a loop, making entry inefficient. If we were to implement
turn(i) := max(turn) + 1
non-atomically, two processes, say i
and j
, may obtain the same value for turn(i)
and turn(j)
, leading to a deadlock in the await
statement.
This situation can be avoided by giving preference to one of the two or more processes with the same value of turn in some arbitrary way, for example by using the process number. The conditions of the await do not need to be evaluated atomically. Thus we can arrive at a solution that only relies on assignments being atomic:
algorithm
turn:= [0, ..., 0]
process CS(i: 0 .. N - 1)
while true do
turn(i) := max(turn) + 1
for j = 0 to N - 1 do
while turn(j) ≠ 0 and (turn(i) > turn(j) or (turn(i) = turn(j) and i > j)) do
skip
critical section
turn(i) := 0
noncritical section
Question: Does this guarantee fairness among processes trying to enter their critical section?
Answer: Yes, although processes with a lower process number may have an advantage.