Problème 1.

Consider a m×nm \times n game board in which each cell has a numeric value, e.g:

ABCDEFG122342H344441I141314J231412K332142\begin{array}{c|c|c|c|c|c|} & \textbf{A} & \textbf{B} & \textbf{C} & \textbf{D} & \textbf{E} & \textbf{F}\\ \hline \textbf{G} & 1 & 2 & 2 & 3 & 4 & 2\\ \hline \textbf{H} & 3 & 4 & 4 & 4 & 4 & 1\\ \hline \textbf{I} & 1 & 4 & 1 & 3 & 1 & 4\\ \hline \textbf{J} & 2 & 3 & 1 & 4 & 1 & 2\\ \hline \textbf{K} & 3 & 3 & 2 & 1 & 4 & 2\\ \hline \end{array}

A player starts the game with a token in the top-left cell (the cell GA in this example) and the player finishes the game by moving to the bottom-right cell (the cell KF in this example). In each round of the game, the player can move in four directions (up, down, left, and right). The distance of each move is determined by the value of the cell. When going over the border of the game board, one ends up on the other side. For example, if the player is in the cell JB, which has value 3, then the player can move 3 steps up (reaching GB), 3 steps right (reaching JE), 3 steps down (reaching HB), and 3 steps left (reaching JE). The score of a player is determined in the total number of rounds the player needs to reach the bottom-right cell.

P1.1

Model the above problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph?

The game will be modelled as an unweighted directed graph, where the problem is to find the shortest path (minimum number of rounds) to get from top-left cell to bottom-right cell:

  • Nodes: Each cell in the game board is a node.
  • Edges: edges are possible moves from one cell to another. The edges will have unweighted, as each moves takes one round regardless of the distance. For example, cell JBJB will be connected to GB,JE,JE,BHGB, JE, JE, BH

P1.2

Provide an efficient algorithm that given a m×nm \times n game board, will find an optimal solution if such a solution exists. If the game board has no solution, then the algorithm should report that the game board is invalid. The runtime of the algorithm should be worst-case O(mn)\mathcal{O}(mn)

Implement a breadth-first search to find shortest path:

"\\begin{algorithm}\n\\caption{Shortest Path}\n\\begin{algorithmic}\n\\Procedure{ShortestPath}{$\\text{board, m, n}$}\n \\State ${start} \\gets (0, 0)$\n \\State ${end} \\gets (m-1, n-1)$\n \\State ${queue} \\gets \\text{empty queue}$\n \\State ${visited} \\gets \\text{boolean array of size } m \\times n \\text{ initialized to } false$\n \\State ${distance} \\gets \\text{integer array of size } m \\times n \\text{ initialized to } \\infty$\n\n \\State $visited[start] \\gets true$\n \\State $distance[start] \\gets 0$\n \\State $queue.\\text{enqueue}(start)$\n\n \\While{$queue \\text{ is not empty}$}\n \\State $(i, j) \\gets queue.\\text{dequeue}()$\n \\If{$(i, j) = end$}\n \\State \\Return $distance[end]$\n \\EndIf\n \\State $value \\gets board[i][j]$\n \\For{$(x,y) \\text{ in } \\{(i - value, j), (i + value, j), (i, j - value), (i, j + value)\\}$}\n \\State $(x, y) \\gets (x \\bmod m, y \\bmod n)$\n \\State $new \\gets (x, y)$\n \\If{$visited[new] = false$}\n \\State $visited[new] \\gets true$\n \\State $distance[new] \\gets distance[cell] + 1$\n \\State $queue.\\text{enqueue}(new)$\n \\EndIf\n \\EndFor\n \\EndWhile\n\n \\State \\Return $\\infty$ \\Comment{ No solution exists}\n\\EndProcedure\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 32 Shortest Path

1:procedure ShortestPath(board, m, n\text{board, m, n})

2:start(0,0){start} \gets (0, 0)

3:end(m1,n1){end} \gets (m-1, n-1)

4:queueempty queue{queue} \gets \text{empty queue}

5:visitedboolean array of size m×n initialized to false{visited} \gets \text{boolean array of size } m \times n \text{ initialized to } false

6:distanceinteger array of size m×n initialized to {distance} \gets \text{integer array of size } m \times n \text{ initialized to } \infty

7:visited[start]truevisited[start] \gets true

8:distance[start]0distance[start] \gets 0

9:queue.enqueue(start)queue.\text{enqueue}(start)

10:while queue is not emptyqueue \text{ is not empty} do

11:(i,j)queue.dequeue()(i, j) \gets queue.\text{dequeue}()

12:if (i,j)=end(i, j) = end then

13:

14:return distance[end]distance[end]

15:end if

16:valueboard[i][j]value \gets board[i][j]

17:for (x,y) in {(ivalue,j),(i+value,j),(i,jvalue),(i,j+value)}(x,y) \text{ in } \{(i - value, j), (i + value, j), (i, j - value), (i, j + value)\} do

18:(x,y)(xmodm,ymodn)(x, y) \gets (x \bmod m, y \bmod n)

19:new(x,y)new \gets (x, y)

20:if visited[new]=falsevisited[new] = false then

21:visited[new]truevisited[new] \gets true

22:distance[new]distance[cell]+1distance[new] \gets distance[cell] + 1

23:queue.enqueue(new)queue.\text{enqueue}(new)

24:end if

25:end for

26:end while

27:

28:return \infty ▷No solution exists

29:end procedure

P1.3

Explain why your algorithm is correct and has a complexity that is worst-case O(mn)\mathcal{O}(mn)

For BFS, for shortest path in unweighted path has runtime complexity of O(V+E)\mathcal{O}(V+E) worst-case. In this setup, V=mnV=mn (number of vertices, which is mnmn), and E4mnE \leq 4mn (number of possible moves, up, left, right, down, which would be 4mn\leq 4mn). Thus, worst-case runtime complexity is O(mn)\mathcal{O}(mn)

Let T(m,n)T(m,n) be maximum number of operations performed by the algorithm. The boundary function can be defined as:

T(m,n)cmnT(m,n) \leq c \cdot mn

where cc is a constant. Invariance requires BFS traversal holds true.

Base case: starts cell is (0, 0), therefore T(1,1)=1c11T(1,1) = 1 \leq c \cdot 1 \cdot 1

Assume that invariance holds for all cell up to kthk^{\text{th}} cell, that is, for any (i,j)(i, j) we have T(i,j)cijT(i,j) \leq c \cdots ij

Consider the (k+1)th(k+1)^{\text{th}} cell (i,j)(i,j) processed. The number of operations is as followed:

  • dequeue (i,j)(i,j): constant c2c_2
  • check if (i,j)(i,j) is the end cell: constant c3c_3
  • retrieve value of cell: constant c4c_4
  • iterate through the 4 possible moves: constant c5c_5
  • check if neighbor is visited and mark nodes: constant c6c_6

Total number of operations is c2+c3+c4+c5+c6=c7c_2 + c_3 + c_4 + c_5 + c_6 = c_7

Thus, number of operations performed for the (k+1)th(k+1)^{\text{th}} cell is: T(i,j)cij+c7cmn+c7T(i,j) \leq c \cdot ij + c_7 \leq c \cdot mn + c_7

T(i,j)cmn+c7cmn+c2cmnT(i,j) \leq c \cdot mn + c_7 \leq c \cdot mn + c \leq 2c \cdot mn

Therefore, it holds for (k+1)th(k+1)^{\text{th}} cell.

P1.4

Which of the two graph representation we saw in the course material did you use to store the game board? What would the complexity of your algorithm be if you used the other graph representation?

The graph representation is used as an adjacency list to store the game board.

If the graph representation was an adjacency matrix, the graph as 2D matrix of size (mn)×(mn)(mn) \times (mn), space complexity would increase to O((mn)2)\mathcal{O}((mn)^2), time complexity would remain O(mn+E)O(mn + E), where E4mnE \leq 4mn. Given that accessing neighbour cell would be faster since constant-time access of matrix entries. Overall time complexity would still be O(mn)\mathcal{O}(mn)


Problème 2.

Edge-labeled graphs are graphs in which edges have labels that represent the type of relationship that is expressed by that edge. For example, in a social network graph, the edges could be labeled parentOf, friendOf, and worksWith. One way to express graph queries (that express how new information can be derived from edge-labeled graphs) is via a query graph that expresses how new relationships between source node s and target node t can be derived from existing information. The first query relates nodes that represent grandparents and their grandchildren, the second query relates nodes that represent ancestors and their descendants, and the third query relates everyone with a direct family relationship. Let QQ be a graph query and GG be an edge-labeled graph representing a data set. The graph query evaluation problem is the problem of computing the derived relationship in GG expressed by QQ. Typically, queries are small and data graphs are enormous. Hence, here we will assume that the size of a query graph is constant.

P2.1

Model the graph query evaluation problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph?

The graph query evaluation can be modelled as directed graph, where:

  • Nodes: Each node represents a data elements in the graph
  • Edges: Each edge represents a relationship between two nodes, for example, ‘childOf’, ‘parentOf’.

The problem is to find all the subgraphs or paths in the graph databases that matches the pattern specified by the query.

For example, consider the following query grandParentOf(s, t), the problems is to find all pairs of node (s,t)(s, t) in graph such that there exists a path of length 2 from s to t, with both edges labeled “parentOf”.

P2.2

Provide an efficient algorithm that, given a graph GG, a source node nn in graph GG, and query QQ, will find all nodes mm such that the pair (n,m)(n, m) is in the derived relationship in GG expressed by QQ. Assuming QQ has a constant time, the runtime of your algorithm should be worst-case O(G)\mathcal{O}(| G |) in which G|G| is the total number of nodes and edges in GG.

"\\begin{algorithm}\n\\caption{Graph Query Evaluation}\n\\begin{algorithmic}\n\\STATE \\textbf{Input:} Graph $G$, Source node $n$, Query $Q$\n\\STATE \\textbf{Output:} All nodes $m$ such that $(n, m)$ is in the derived relationship\n\n\\STATE $R \\leftarrow []$ \\COMMENT{List to store result nodes}\n\\STATE $Visited \\leftarrow \\{\\}$\n\\STATE $Queue \\leftarrow \\text{InitializeQueue}()$\n\\STATE $\\text{Enqueue}(Queue, (n, 0))$ \\COMMENT{Enqueue source node with depth 0}\n\n\\WHILE{$\\text{Queue is not empty}$}\n \\STATE $(u, depth) \\leftarrow \\text{Dequeue}(Queue)$\n \\IF{$u \\not\\in Visited$}\n \\STATE $Visited \\leftarrow Visited \\cup \\{u\\}$\n \\FORALL{$v \\text{ such that } Q(u, v) \\text{ is true}$}\n \\STATE $R.\\text{append}(v)$\n \\IF{$\\text{Q is transitive}$}\n \\STATE $\\text{Enqueue}(Queue, (v, depth + 1))$\n \\ENDIF\n \\ENDFOR\n \\FORALL{$v \\text{ in Neighbors}(G, u)$}\n \\IF{$v \\not\\in Visited$}\n \\STATE $\\text{Enqueue}(Queue, (v, depth + 1))$\n \\ENDIF\n \\ENDFOR\n \\ENDIF\n\\ENDWHILE\n\n\\RETURN $R$\n\\end{algorithmic}\n\\end{algorithm}"

Algorithm 33 Graph Query Evaluation

1:Input: Graph GG, Source node nn, Query QQ

2:Output: All nodes mm such that (n,m)(n, m) is in the derived relationship

3:R[]R \leftarrow [] ▷List to store result nodes

4:Visited{}Visited \leftarrow \{\}

5:QueueInitializeQueue()Queue \leftarrow \text{InitializeQueue}()

6:Enqueue(Queue,(n,0))\text{Enqueue}(Queue, (n, 0)) ▷Enqueue source node with depth 0

7:while Queue is not empty\text{Queue is not empty} do

8:(u,depth)Dequeue(Queue)(u, depth) \leftarrow \text{Dequeue}(Queue)

9:if u∉Visitedu \not\in Visited then

10:VisitedVisited{u}Visited \leftarrow Visited \cup \{u\}

11:for all v such that Q(u,v) is truev \text{ such that } Q(u, v) \text{ is true} do

12:R.append(v)R.\text{append}(v)

13:if Q is transitive\text{Q is transitive} then

14:Enqueue(Queue,(v,depth+1))\text{Enqueue}(Queue, (v, depth + 1))

15:end if

16:end for

17:for all v in Neighbors(G,u)v \text{ in Neighbors}(G, u) do

18:if v∉Visitedv \not\in Visited then

19:Enqueue(Queue,(v,depth+1))\text{Enqueue}(Queue, (v, depth + 1))

20:end if

21:end for

22:end if

23:end while

24:return RR

P2.3

Explain how you represented your graph GG, why your algorithm is correct, and why your algorithm has a complexity that is worst-case O(G)O(|G|).

The graph GG is represented as an adjacency list, where each node maintains a list of its neighbors and its edge labels. Since we are doing BFS traversal from source node nn.

Time complexity for performing a BFS traversal is O(G+R)O(|G| + |R|), where G|G| is the number of nodes and edges in the graph, and R|R| is the number of nodes in the derived relationship. Since QQ has a constant size, number of derived relationship R|R| is constant, therefore, overall time complexity is O(G)O(|G|).

Correctness of BFS traversal is guaranteed by the invariance of BFS traversal. The algorithm visits all nodes in the graph, and for each node, it visits all its neighbors. The algorithm also checks if the node has been visited before, and if not, it adds the node to the visited set and enqueues its neighbors. Therefore, the algorithm is correct.