Problème 1.

Consider we want to build McMaster Maps, the best online route planning tool in existence. The developers of McMaster maps have decided to represent road information in terms of a a massive graph in which nodes are crossing and the edges are the roads between crossings. The developers of McMaster Maps will be optimised toward computing the directions to McMaster University is the only destination that matters. Hence, McMaster Maps will be optimised toward computing the directions to McMaster University. To do so, McMaster Maps maintains single-sink shortest path index that maintains the shortest path from any node to McMaster University. This index is represented by the typical path and cost arrays as computed by either Dijkstra or Bellman-Ford. Once in a while, an update to the road network happens: The weight of a single edge changes (this can represent adding and removing edges: addition changes the weight of the edge from to a numeric value and removing changes the weight of the edge from a numeric value to )

P1.1

Given the road network as a graph , the shortest path index, and the edge that was changes, write an algorithm that determines whether the shortest path index is still valid. You may assume the graph is already updated. Argue why your algorithm is correct. What is the complexity of your algorithm?

Algorithm 8 CheckShortestPathIndexValidity(G,path,cost,u,v,w\mathcal{G}, path, cost, u, v, w)

Require: G\mathcal{G}, pathpath and costcost, edge (u,v)(u, v) with new weight ww

Ensure: Returns true if the shortest path index is still valid, false otherwise

1:if cost[u]+w<cost[v]cost[u] + w < cost[v] then

2:

3:return false

4:end if

5:if path[v]=upath[v] = u and cost[u]+w>cost[v]cost[u] + w > cost[v] then

6:

7:return false

8:end if

9:

10:return true

The shortest path index is invalid in two cases:

  • If the new shortest path from via edge is shorter than the current shortest path from to the destination. This is checked by the condition .
  • If ‘s current shortest path to the destination goes through and the new weight makes this path longer than current shortest path cost. This is checked by . If both of these conditions met, shortest path index remains valid.

Therefore the algorithm is correct

The algorithm performs a constant number of comparison and array lookups. Thus it is

P1.2

Assume the shortest path index is no longer valid: provide a modification of Dijkstra algorithm that restores the index to a valid state without recomputing all shortest paths. You may assume the graph is already updated. Argue why your algorithm is correct.

Algorithm 9 UpdateSingleSinkShortestPath(G,w,e=(u,v),path,costG, w, e=(u,v), \text{path}, \text{cost})

Require: Graph G=(V,E)G=(V,E), weight function w:ER+w:E\to\mathbb{R}^+, updated edge e=(u,v)Ee=(u,v) \in E with new weight w(e)w'(e), path[1..V]\text{path}[1..|V|] and cost[1..V]\text{cost}[1..|V|]

Ensure: Updated shortest path tree arrays path[1..V]\text{path}[1..|V|] and cost[1..V]\text{cost}[1..|V|]

1:Initialize min-priority queue QQ with (v,cost[u]+w(e))(v, \text{cost}[u] + w'(e))

2:while QQ is not empty do

3:Extract vertex xx with minimum cost\text{cost} from QQ

4:for each neighbor yy of xx do

5:if cost[x]+w(x,y)<cost[y]\text{cost}[x] + w(x,y) < \text{cost}[y] then

6:cost[y]cost[x]+w(x,y)\text{cost}[y] \leftarrow \text{cost}[x] + w(x,y)

7:path[y]x\text{path}[y] \leftarrow x

8:if yy is in QQ then

9:Decrease key of yy in QQ to cost[y]\text{cost}[y]

10:else

11:Insert yy into QQ with key cost[y]\text{cost}[y]

12:end if

13:end if

14:end for

15:end while

16:return path,cost\text{path}, \text{cost}

Correctness: Let be the shortest path tree rooted at the sink vertex before the edge update, and let be the tree true shortest path tree after the edge update. We define a following boundary function:

or is the set of vertices whose true shortest path to is at most and strictly less than their distance in . The following invariant will hold of the while loop:

For base case, , is empty, thus invariant holds

Suppose the invariant holds for , that is at -th iteration and invariant holds. Let be the vertex extracted from in the -th iteration. We argue that , or

First, note that must have been inserted into by some previous iteration, say when visiting a vertex for some . By the induction hypothesis, at that time, so the tentative distance used to insert into equals . Since is extracted in the -th iteration, we have . Moreover, we must have , for otherwise would have been visited before via a shorter path in , contradicting the fact that . Thus, . The algorithm then correctly updates to and to its parent in . Furthermore, for each neighbor of , if , then the path to through is shorter than its current path, so the algorithm correctly updates and and inserts into with the updated distance.

Therefore, after the -th iteration, the invariant holds for all vertices in , including the newly added vertex and possibly some of its neighbors. By induction, the invariant holds for all .

P1.3

Explain which graph representation you used for your algorithm and what the complexity of your modified-Dijkstra algorithm is using this graph representation.

note: Express the complexity in terms of the number of nodes affected by the change. For example, use a notation in which is the number of nodes affected by the edge change, the incoming edges of , and the outgoing edges of .

The algorithm use an adjacency list representation, as each vertex maintains a list of its outgoing edges.

The overall complexity is

The main loop run until is empty. Per iteration, it extracts the vertex with minimum distance from , which takes time, where is the number of affected vertices.

For each outgoing edge of , update the distance and parent of and either decrease its keep in or insert into . The operation takes time worst-case.

The total number of iterations of the inner loop is bounded by as each outgoing edge of an affected vertex is processed at most once.

Space complexity is , as min-priority queue stores at most one entry per affected vertex.

P1.4

What is the worst-case complexity of your solution if you use the other graph representation?

If an adjacency matrix representation is used, the graph is represented by a matrix, where is the number of vertices in the graph.

With this representation, the main difference in the algorithm is in the loop that iterates over the neighbors of the current vertex . In an adjacency matrix, finding the neighbors of requires scanning the entire row corresponding to in the matrix, which takes time. Therefore, the overall time complexity of the modified Dijkstra’s algorithm using an adjacency matrix becomes:

If all vertices are affected (), the complexity becomes .

Space complexity is , as the matrix requires storing entries.


Problème 2.

Consider a company managing many servers placed all over the world. The company wants to add network connections between a minimal amount of servers to ensure that there is a path of communication between all pairs of servers. While researching this problem, the company was advised that some connections can be built more reliable than others: according to the consulted contractors, the probability that a connection between server and will work at any given time is (We have ). The company wants to minimize the number of connects, which maximising the probability that all servers are connected to each other at any given time. We will help the company out in their challenge to figure out which connections they need to built.

P2.1

Model the above problem as a graph problem: what are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph?

Nodes: Each server is represented by a node (vertex) in the graph. Denote the set of all servers as . Edges: potential network connections between servers are represented by edges in the graph. An edge exists between node and if a connection can be built. Denote set of all possible connections as . Weights: Each edge has a weight representing the probability that the connection will work at any given time.

The problem is to find a minimum spanning tree (MST) of the graph that connects all servers, such that the sum of the weights of the edges in the MST is maximized. Or it can be stated as:

Find a subset of edges such that the graph is a minimum spanning tree and the sum of the weights of the edges in is maximized.

P2.2

Provide an algorithm NetworkPlan to find the network connections to build. Explain why your algorithm is correct.

Algorithm 10 NetworkPlan

Require: GG

Require: pp

1:EE \gets {}//Set of edges in the graph

2:for each edge (m,n)(m,n) in GG do

3:w(m,n)log(p(m,n))w(m,n) \gets -\log(p(m,n))//Compute edge weight

4:EE(m,n)E \gets E \cup {(m,n)}

5:end for

6:ESortEdges(E)E \gets \text{SortEdges}(E)//Sort edges by weight in ascending order

7:MSTMST \gets \emptyset

8:DSMakeSet(G.V)DS \gets \text{MakeSet}(G.V)//Initialize disjoint sets

9:for each edge (m,n)(m,n) in EE do

10:if FindSet(DS,m)FindSet(DS,n)\text{FindSet}(DS, m) \neq \text{FindSet}(DS, n) then

11:MSTMST(m,n)MST \gets MST \cup {(m,n)}//Add edge to MST

12:UnionSets(DS,m,n)\text{UnionSets}(DS, m, n)//Union sets containing mm and nn

13:end if

14:end for

15:return MSTMST

It follows Kruskal’s algorithm to find MST. Since we are using negative logarithm of the probabilities as edge weights, the MST found by Kruskal will maximise the probability.

P2.3

Explain which graph representation you used for your algorithm and what the complexity of your algorithm is using this graph representation.

It uses an adjacency list representation, where each node maintains a list of its neighboring nodes and the corresponding edge weights. Using an adjacency list, the time complexity of Kruskal’s algorithm is , where is the number of edges in the graph. This is because sorting the edges takes time, and the main loop of Kruskal’s algorithm takes time using a disjoint set data structure to efficiently check for cycles.

P2.4

What is the worst-case complexity of your solution if you use the other graph representation? Explain your answer.

Constructing the adjacency matrix representation of the graph takes time, where is the number of vertices in the graph. Sorting the edges takes time. The main loop of Kruskal’s algorithm takes time, where is the number of edges. This is because we need to iterate over all edges () and perform the UnionSets operation, which takes time using an efficient disjoint set data structure like union-by-rank and path compression.

Therefore, the worst case scenario would be