Problème 1.
Consider a game board in which each cell has a numeric value, e.g:
A player starts the game with a token in the top-left cell (the cell GA in this example) and the player finishes the game by moving to the bottom-right cell (the cell KF in this example). In each round of the game, the player can move in four directions (up, down, left, and right). The distance of each move is determined by the value of the cell. When going over the border of the game board, one ends up on the other side. For example, if the player is in the cell JB, which has value 3, then the player can move 3 steps up (reaching GB), 3 steps right (reaching JE), 3 steps down (reaching HB), and 3 steps left (reaching JE). The score of a player is determined in the total number of rounds the player needs to reach the bottom-right cell.
P1.1
Model the above problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph?
The game will be modelled as an unweighted directed graph, where the problem is to find the shortest path (minimum number of rounds) to get from top-left cell to bottom-right cell:
- Nodes: Each cell in the game board is a node.
- Edges: edges are possible moves from one cell to another. The edges will have unweighted, as each moves takes one round regardless of the distance. For example, cell will be connected to
P1.2
Provide an efficient algorithm that given a game board, will find an optimal solution if such a solution exists. If the game board has no solution, then the algorithm should report that the game board is invalid. The runtime of the algorithm should be worst-case
Implement a breadth-first search to find shortest path:
P1.3
Explain why your algorithm is correct and has a complexity that is worst-case
For BFS, for shortest path in unweighted path has runtime complexity of worst-case. In this setup, (number of vertices, which is ), and (number of possible moves, up, left, right, down, which would be ). Thus, worst-case runtime complexity is
Let be maximum number of operations performed by the algorithm. The boundary function can be defined as:
where is a constant. Invariance requires BFS traversal holds true.
Base case: starts cell is (0, 0), therefore
Assume that invariance holds for all cell up to cell, that is, for any we have
Consider the cell processed. The number of operations is as followed:
- dequeue : constant
- check if is the end cell: constant
- retrieve value of cell: constant
- iterate through the 4 possible moves: constant
- check if neighbor is visited and mark nodes: constant
Total number of operations is
Thus, number of operations performed for the cell is:
Therefore, it holds for cell.
P1.4
Which of the two graph representation we saw in the course material did you use to store the game board? What would the complexity of your algorithm be if you used the other graph representation?
The graph representation is used as an adjacency list to store the game board.
If the graph representation was an adjacency matrix, the graph as 2D matrix of size , space complexity would increase to , time complexity would remain , where . Given that accessing neighbour cell would be faster since constant-time access of matrix entries. Overall time complexity would still be
Problème 2.
Edge-labeled graphs are graphs in which edges have labels that represent the type of relationship that is expressed by that edge. For example, in a social network graph, the edges could be labeled parentOf
, friendOf
, and worksWith
.
One way to express graph queries (that express how new information can be derived from edge-labeled graphs) is via a query graph that expresses how new relationships between source node s and target node t can be derived from existing information.
The first query relates nodes that represent grandparents and their grandchildren, the second query relates nodes that represent ancestors and their descendants, and the third query relates everyone with a direct family relationship. Let be a graph query and be an edge-labeled graph representing a data set.
The graph query evaluation problem is the problem of computing the derived relationship in expressed by . Typically, queries are small and data graphs are enormous. Hence, here we will assume that the size of a query graph is constant.
P2.1
Model the graph query evaluation problem as a graph problem: What are the nodes and edges in your graph, do the edges have weights, and what problem are you trying to answer on your graph?
The graph query evaluation can be modelled as directed graph, where:
- Nodes: Each node represents a data elements in the graph
- Edges: Each edge represents a relationship between two nodes, for example, ‘childOf’, ‘parentOf’.
The problem is to find all the subgraphs or paths in the graph databases that matches the pattern specified by the query.
For example, consider the following query grandParentOf(s, t)
, the problems is to find all pairs of node in graph such that there exists a path of length 2 from s to t, with both edges labeled “parentOf”.
P2.2
Provide an efficient algorithm that, given a graph , a source node in graph , and query , will find all nodes such that the pair is in the derived relationship in expressed by . Assuming has a constant time, the runtime of your algorithm should be worst-case in which is the total number of nodes and edges in .
P2.3
Explain how you represented your graph , why your algorithm is correct, and why your algorithm has a complexity that is worst-case .
The graph is represented as an adjacency list, where each node maintains a list of its neighbors and its edge labels. Since we are doing BFS traversal from source node .
Time complexity for performing a BFS traversal is , where is the number of nodes and edges in the graph, and is the number of nodes in the derived relationship. Since has a constant size, number of derived relationship is constant, therefore, overall time complexity is .
Correctness of BFS traversal is guaranteed by the invariance of BFS traversal. The algorithm visits all nodes in the graph, and for each node, it visits all its neighbors. The algorithm also checks if the node has been visited before, and if not, it adds the node to the visited set and enqueues its neighbors. Therefore, the algorithm is correct.