Problème 1.

Consider we want to build McMaster Maps, the best online route planning tool in existence. The developers of McMaster maps have decided to represent road information in terms of a a massive graph in which nodes are crossing and the edges are the roads between crossings. The developers of McMaster Maps will be optimised toward computing the directions to McMaster University is the only destination that matters. Hence, McMaster Maps will be optimised toward computing the directions to McMaster University. To do so, McMaster Maps maintains single-sink shortest path index that maintains the shortest path from any node to McMaster University. This index is represented by the typical path and cost arrays as computed by either Dijkstra or Bellman-Ford. Once in a while, an update to the road network happens: The weight of a single edge changes (this can represent adding and removing edges: addition changes the weight of the edge from \infty to a numeric value and removing changes the weight of the edge from a numeric value to \infty)

P1.1

Given the road network as a graph G\mathcal{G}, the shortest path index, and the edge that was changes, write an algorithm that determines whether the shortest path index is still valid. You may assume the graph is already updated. Argue why your algorithm is correct. What is the complexity of your algorithm?

"\\begin{algorithm}\n\\caption{CheckShortestPathIndexValidity($\\mathcal{G}, path, cost, u, v, w$)}\n\\begin{algorithmic}\n\\Require $\\mathcal{G}$, $path$ and $cost$, edge $(u, v)$ with new weight $w$\n\\Ensure Returns true if the shortest path index is still valid, false otherwise\n\\IF{$cost[u] + w < cost[v]$}\n\\State \\Return false\n\\ENDIF\n\\If{$path[v] = u$ and $cost[u] + w > cost[v]$}\n\\State \\Return false\n\\EndIf\n\\State \\Return true\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 8 CheckShortestPathIndexValidity(G,path,cost,u,v,w\mathcal{G}, path, cost, u, v, w)

Require: G\mathcal{G}, pathpath and costcost, edge (u,v)(u, v) with new weight ww

Ensure: Returns true if the shortest path index is still valid, false otherwise

1:if cost[u]+w<cost[v]cost[u] + w < cost[v] then

2:

3:return false

4:end if

5:if path[v]=upath[v] = u and cost[u]+w>cost[v]cost[u] + w > cost[v] then

6:

7:return false

8:end if

9:

10:return true

The shortest path index is invalid in two cases:

  • If the new shortest path from uu via edge (u,v)(u,v) is shorter than the current shortest path from vv to the destination. This is checked by the condition cost[u]+w<cost[v]cost[u] + w < cost[v].
  • If vv‘s current shortest path to the destination goes through uu and the new weight ww makes this path longer than vv current shortest path cost. This is checked by cost[u]+w>cost[v]cost[u] + w > cost[v]. If both of these conditions met, shortest path index remains valid.

Therefore the algorithm is correct \square

The algorithm performs a constant number of comparison and array lookups. Thus it is O(1)O(1)

P1.2

Assume the shortest path index is no longer valid: provide a modification of Dijkstra algorithm that restores the index to a valid state without recomputing all shortest paths. You may assume the graph is already updated. Argue why your algorithm is correct.

"\\begin{algorithm}\n\\caption{UpdateSingleSinkShortestPath($G, w, e=(u,v), \\text{path}, \\text{cost}$)}\n\\begin{algorithmic}\n\\REQUIRE Graph $G=(V,E)$, weight function $w:E\\to\\mathbb{R}^+$, updated edge $e=(u,v) \\in E$ with new weight $w'(e)$, $\\text{path}[1..|V|]$ and $\\text{cost}[1..|V|]$\n\\ENSURE Updated shortest path tree arrays $\\text{path}[1..|V|]$ and $\\text{cost}[1..|V|]$\n\\STATE Initialize min-priority queue $Q$ with $(v, \\text{cost}[u] + w'(e))$\n\\WHILE{$Q$ is not empty}\n \\STATE Extract vertex $x$ with minimum $\\text{cost}$ from $Q$\n \\FOR{each neighbor $y$ of $x$}\n \\IF{$\\text{cost}[x] + w(x,y) < \\text{cost}[y]$}\n \\STATE $\\text{cost}[y] \\leftarrow \\text{cost}[x] + w(x,y)$\n \\STATE $\\text{path}[y] \\leftarrow x$\n \\IF{$y$ is in $Q$}\n \\STATE Decrease key of $y$ in $Q$ to $\\text{cost}[y]$\n \\ELSE\n \\STATE Insert $y$ into $Q$ with key $\\text{cost}[y]$\n \\ENDIF\n \\ENDIF\n \\ENDFOR\n\\ENDWHILE\n\\RETURN $\\text{path}, \\text{cost}$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 9 UpdateSingleSinkShortestPath(G,w,e=(u,v),path,costG, w, e=(u,v), \text{path}, \text{cost})

Require: Graph G=(V,E)G=(V,E), weight function w:ER+w:E\to\mathbb{R}^+, updated edge e=(u,v)Ee=(u,v) \in E with new weight w(e)w'(e), path[1..V]\text{path}[1..|V|] and cost[1..V]\text{cost}[1..|V|]

Ensure: Updated shortest path tree arrays path[1..V]\text{path}[1..|V|] and cost[1..V]\text{cost}[1..|V|]

1:Initialize min-priority queue QQ with (v,cost[u]+w(e))(v, \text{cost}[u] + w'(e))

2:while QQ is not empty do

3:Extract vertex xx with minimum cost\text{cost} from QQ

4:for each neighbor yy of xx do

5:if cost[x]+w(x,y)<cost[y]\text{cost}[x] + w(x,y) < \text{cost}[y] then

6:cost[y]cost[x]+w(x,y)\text{cost}[y] \leftarrow \text{cost}[x] + w(x,y)

7:path[y]x\text{path}[y] \leftarrow x

8:if yy is in QQ then

9:Decrease key of yy in QQ to cost[y]\text{cost}[y]

10:else

11:Insert yy into QQ with key cost[y]\text{cost}[y]

12:end if

13:end if

14:end for

15:end while

16:return path,cost\text{path}, \text{cost}

Correctness: Let TT be the shortest path tree rooted at the sink vertex tt before the edge update, and let TT' be the tree true shortest path tree after the edge update. We define a following boundary function:

B(i)={vV:distT(v,t)i  distT(v,t)<distT(v,t)}B(i) = \{ v \in V: \text{dist}_{T'}(v,t) \leq i \space \land \space \text{dist}_{T'}(v,t) < \text{dist}_T(v,t) \}

or B(i)B(i) is the set of vertices whose true shortest path to tt is at most ii and strictly less than their distance in TT. The following invariant will hold of the while loop:

vB(i):cost[v]=distT(v,t)  path[v] is correct\forall v \in B(i): \text{cost}[v] = \text{dist}_{T'}(v,t) \space \land \space \text{path}[v] \text{ is correct}

For base case, i=0i=0, B(0)B(0) is empty, thus invariant holds

Suppose the invariant holds for kk, that is at kk-th iteration B(k)B(k) and invariant holds. Let xx be the vertex extracted from QQ in the k+1k+1-th iteration. We argue that xB(k+1)B(k)x \in B(k+1) \setminus B(k), or distT(x,t)=k  distT(x,t)<distT(x,t)\text{dist}_{T'}(x,t)=k \space \land \space \text{dist}_{T'}(x,t) < \text{dist}_{T}(x,t)

First, note that xx must have been inserted into QQ by some previous iteration, say when visiting a vertex yB(j)y \in B(j) for some j<ij < i. By the induction hypothesis, cost[y]=distT(y,t)\text{cost}[y] = \text{dist}_{T'}(y,t) at that time, so the tentative distance cost[y]+w(y,x)\text{cost}[y] + w(y,x) used to insert xx into QQ equals distT(x,t)\text{dist}_{T'}(x,t). Since xx is extracted in the k+1k+1-th iteration, we have distT(x,t)=k+1\text{dist}_{T'}(x,t) = k+1. Moreover, we must have distT(x,t)<distT(x,t)\text{dist}_{T'}(x,t) < \text{dist}_{T}(x,t), for otherwise xx would have been visited before via a shorter path in TT, contradicting the fact that distT(x,t)=i\text{dist}_{T'}(x,t) = i. Thus, xB(k+1)B(k)x \in B(k+1) \setminus B(k). The algorithm then correctly updates cost[x]\text{cost}[x] to distT(x,t)\text{dist}_{T'}(x,t) and path[x]\text{path}[x] to its parent yy in TT'. Furthermore, for each neighbor zz of xx, if cost[x]+w(x,z)<cost[z]\text{cost}[x] + w(x,z) < \text{cost}[z], then the path to zz through xx is shorter than its current path, so the algorithm correctly updates cost[z]\text{cost}[z] and path[z]\text{path}[z] and inserts zz into QQ with the updated distance.

Therefore, after the k+1k+1-th iteration, the invariant holds for all vertices in B(k+1)B(k+1), including the newly added vertex xx and possibly some of its neighbors. By induction, the invariant holds for all ii. \square

P1.3

Explain which graph representation you used for your algorithm and what the complexity of your modified-Dijkstra algorithm is using this graph representation.

note: Express the complexity in terms of the number of nodes affected by the change. For example, use a notation in which CC is the number of nodes affected by the edge change, incoming(C)\text{incoming}(C) the incoming edges of CC, and outgoing(C)\text{outgoing}(C) the outgoing edges of CC.

The algorithm use an adjacency list representation, as each vertex maintains a list of its outgoing edges.

The overall complexity is O((outgoing(C)C)logC)O((|\text{outgoing}(C)| |C|)\log |C|)

The main loop run until QQ is empty. Per iteration, it extracts the vertex xx with minimum distance from QQ, which takes O(logC)O(\log|C|) time, where C|C| is the number of affected vertices.

For each outgoing edge (x,y)(x,y) of xx, update the distance and parent of yy and either decrease its keep in QQ or insert into QQ. The operation takes O(logC)O(\log|C|) time worst-case.

The total number of iterations of the inner loop is bounded by outgoing(C)|\text{outgoing}(C)| as each outgoing edge of an affected vertex is processed at most once.

Space complexity is O(C)O(|C|), as min-priority queue QQ stores at most one entry per affected vertex.

P1.4

What is the worst-case complexity of your solution if you use the other graph representation?

If an adjacency matrix representation is used, the graph is represented by a V×V|V|\times|V| matrix, where V|V| is the number of vertices in the graph.

With this representation, the main difference in the algorithm is in the loop that iterates over the neighbors of the current vertex xx. In an adjacency matrix, finding the neighbors of xx requires scanning the entire row corresponding to xx in the matrix, which takes O(V)O(|V|) time. Therefore, the overall time complexity of the modified Dijkstra’s algorithm using an adjacency matrix becomes:

O(CVlogC)O(|C|\cdot |V| \log |C|)

If all vertices are affected (C=V|C| = |V|), the complexity becomes O(V2logV)O(|V|^2 \log |V|).

Space complexity is O(V2)O(|V|^2), as the matrix requires storing V2|V|^2 entries.


Problème 2.

Consider a company managing many servers placed all over the world. The company wants to add network connections between a minimal amount of servers to ensure that there is a path of communication between all pairs of servers. While researching this problem, the company was advised that some connections can be built more reliable than others: according to the consulted contractors, the probability that a connection (m,n)(m,n) between server mm and nn will work at any given time is p(m,n)p(m,n) (We have p(m,n)=p(n,m)p(m,n)=p(n,m)). The company wants to minimize the number of connects, which maximising the probability that all servers are connected to each other at any given time. We will help the company out in their challenge to figure out which connections they need to built.

P2.1

Model the above problem as a graph problem: what are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph?

Nodes: Each server is represented by a node (vertex) in the graph. Denote the set of all servers as VV. Edges: potential network connections between servers are represented by edges in the graph. An edge (m,n)(m,n) exists between node mm and nn if a connection can be built. Denote set of all possible connections as EE. Weights: Each edge (m,n)(m,n) has a weight log(p(m,n))-\log(p(m,n)) representing the probability that the connection will work at any given time.

The problem is to find a minimum spanning tree (MST) of the graph that connects all servers, such that the sum of the weights of the edges in the MST is maximized. Or it can be stated as:

Find a subset of edges EEE' \subseteq E such that the graph (V,E)(V, E') is a minimum spanning tree and the sum of the weights of the edges in EE' is maximized.

P2.2

Provide an algorithm NetworkPlan to find the network connections to build. Explain why your algorithm is correct.

"\\begin{algorithm}\n\\caption{NetworkPlan}\n\\begin{algorithmic}\n\\REQUIRE $G$\n\\REQUIRE $p$\n\\STATE $E \\gets {}$ \\COMMENT{Set of edges in the graph}\n\\FOR{each edge $(m,n)$ in $G$}\n\\STATE $w(m,n) \\gets -\\log(p(m,n))$ \\COMMENT{Compute edge weight}\n\\STATE $E \\gets E \\cup {(m,n)}$\n\\ENDFOR\n\\STATE $E \\gets \\text{SortEdges}(E)$ \\COMMENT{Sort edges by weight in ascending order}\n\\STATE $MST \\gets \\emptyset$\n\\STATE $DS \\gets \\text{MakeSet}(G.V)$ \\COMMENT{Initialize disjoint sets}\n\\FOR{each edge $(m,n)$ in $E$}\n\\IF{$\\text{FindSet}(DS, m) \\neq \\text{FindSet}(DS, n)$}\n\\STATE $MST \\gets MST \\cup {(m,n)}$ \\COMMENT{Add edge to MST}\n\\STATE $\\text{UnionSets}(DS, m, n)$ \\COMMENT{Union sets containing $m$ and $n$}\n\\ENDIF\n\\ENDFOR\n\\RETURN $MST$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 10 NetworkPlan

Require: GG

Require: pp

1:EE \gets {} ▷Set of edges in the graph

2:for each edge (m,n)(m,n) in GG do

3:w(m,n)log(p(m,n))w(m,n) \gets -\log(p(m,n)) ▷Compute edge weight

4:EE(m,n)E \gets E \cup {(m,n)}

5:end for

6:ESortEdges(E)E \gets \text{SortEdges}(E) ▷Sort edges by weight in ascending order

7:MSTMST \gets \emptyset

8:DSMakeSet(G.V)DS \gets \text{MakeSet}(G.V) ▷Initialize disjoint sets

9:for each edge (m,n)(m,n) in EE do

10:if FindSet(DS,m)FindSet(DS,n)\text{FindSet}(DS, m) \neq \text{FindSet}(DS, n) then

11:MSTMST(m,n)MST \gets MST \cup {(m,n)} ▷Add edge to MST

12:UnionSets(DS,m,n)\text{UnionSets}(DS, m, n) ▷Union sets containing mm and nn

13:end if

14:end for

15:return MSTMST

It follows Kruskal’s algorithm to find MST. Since we are using negative logarithm of the probabilities as edge weights, the MST found by Kruskal will maximise the probability.

P2.3

Explain which graph representation you used for your algorithm and what the complexity of your algorithm is using this graph representation.

It uses an adjacency list representation, where each node maintains a list of its neighboring nodes and the corresponding edge weights. Using an adjacency list, the time complexity of Kruskal’s algorithm is O(ElogE)O(E \log E), where EE is the number of edges in the graph. This is because sorting the edges takes O(ElogE)O(E \log E) time, and the main loop of Kruskal’s algorithm takes O(E)O(E) time using a disjoint set data structure to efficiently check for cycles.

P2.4

What is the worst-case complexity of your solution if you use the other graph representation? Explain your answer.

Constructing the adjacency matrix representation of the graph takes O(V2)O(V^2) time, where VV is the number of vertices in the graph. Sorting the edges takes O(V2logV2)=O(V2logV)O(V^2 \log V^2) = O(V^2 \log V) time. The main loop of Kruskal’s algorithm takes O(ElogV)O(E \log V) time, where EE is the number of edges. This is because we need to iterate over all edges (O(E)O(E)) and perform the UnionSets operation, which takes O(logV)O(\log V) time using an efficient disjoint set data structure like union-by-rank and path compression.

Therefore, the worst case scenario would be O(V2+V2logV+ElogV)=O(V2logV)O(V^2 + V^2 \log V + E \log V) = O(V^2 \log V)