Problem 1

P1.1

Consider an initially-empty stack and the following sequence of operations:

PUSH(S, 3), POP(S), PUSH(S, 17), PUSH(S, 5), PUSH(S, 15), POP(S), POP(S), POP(S)

Illustrate the result of each operation (clearly indicate the content of the stack after the operation and, in case of a POP, the returned value by the operation

Solution

Stack is initially empty, or

The sequence of operations is as follows:

  1. PUSH(S, 3):

  2. POP(S): , returned value is

  3. PUSH(S, 17):

  4. PUSH(S, 5):

  5. PUSH(S, 15):

  6. POP(S): , returned value is

  7. POP(S): , returned value is

  8. POP(S): , returned value is

P1.2

Consider an initially-empty queue and the following sequence of operations:

ENQUEUE(Q, 3), DEQUEUE(Q), ENQUEUE(Q, 17), ENQUEUE(Q, 5), ENQUEUE(Q, 15), DEQUEUE(Q), DEQUEUE(Q), DEQUEUE(Q)

Illustrate the result of each operation (clearly indicate the content of the queue after the operation and, in case of a DEQUEUE, the returned value by the operation

Solution

Queue is initially empty, or

The sequence of operations is as follows:

  1. ENQUEUE(Q, 3):

  2. DEQUEUE(Q): , returned value is

  3. ENQUEUE(Q, 17):

  4. ENQUEUE(Q, 5):

  5. ENQUEUE(Q, 15):

  6. DEQUEUE(Q): , returned value is

  7. DEQUEUE(Q): , returned value is

  8. DEQUEUE(Q): , returned value is

P1.3

Assume we have a stack implementation MyDynArrayStack using dynamic arrays: the implementation supports N push operations with an amortized runtime complexity of per operation, and the implementation supports POP, EMPTY, and SIZE operations with runtime time complexity of

Provide a queue implementation that uses MyDynArrayStack and supports any valid sequence of N ENQUEUE and DEQUEUE operations with an amortized runtime complexity of per operation. Explain why your implementation has the stated amortized runtime complexity for ENQUEUE and DEQUEUE operations.

Solution

Queue implementation MyDynArrayQueue using two MyDynArrayStack, sIn and sOut:

Algorithm 11 Queue

1:sIn:=MyDynArrayStack()sIn := \text{MyDynArrayStack()}

2:sOut:=MyDynArrayStack()sOut := \text{MyDynArrayStack()}

Algorithm 12 ENQUEUE(x)

1:sIn.PUSH(x)sIn.\text{PUSH}(x)

Algorithm 13 DEQUEUE()

1:if sOut.EMPTY()sOut.\text{EMPTY}() then

2:while ¬sIn.EMPTY()\neg sIn.\text{EMPTY}() do

3:sOut.PUSH(sIn.POP())sOut.\text{PUSH}(sIn.\text{POP}())

4:end while

5:end if

6:return sOut.POP()sOut.\text{POP}()

where sIn is used to store elements, and sOut is used to store elements in reverse order.

Amortized runtime complexity explanation:

  1. ENQUEUE
  • PUSH to sIn has an amortized runtime complexity of per operation, as stated in the problem.
  1. DEQUEUE
  • When sOut is empty, the transfer of element from sIn to sOut has a runtime complexity of with worst case.

  • However, the item was moved once per enqueue-dequeue cycle, so the total time spent is proportional to , thus making amortized runtime complexity of per operation.


Problem 2

Consider the relations courses(prog, code, name) (that models courses named name and identified by the program prog the course is part of, e.g., SFWRENG, and the course code code, e.g., 2C03) and enrolled(prog, code, sid) (that models students with identifier sid enrolling for a course identified by program prog and course code code).

We want to compute the list of all pairs (sid, name) in which sid is a student identifier and name is the name of a course the student with identifier is enrolled in. To compute this list, we developed the following two nested-loop algorithms:

Algorithm 14 CEJOIN(courses, enrolled)

1:output:=output := \emptyset.

2:for (pc,cc,nc)courses(p_c, c_c, n_c) \in \text{courses} do

3:for (pe,ce,se)enrolled(p_e, c_e, s_e) \in \text{enrolled} do

4:if pc=pe and cc=cep_c = p_e \textbf{ and } c_c = c_e then

5:add (se,nc)(s_e, n_c) to outputoutput.

6:end if

7:end for

8:end for

9:return outputoutput

Algorithm 15 ECJOIN(courses, enrolled)

1:output:=output := \emptyset.

2:for (pe,ce,se)enrolled(p_e, c_e, s_e) \in \text{enrolled} do

3:for (pc,cc,nc)courses(p_c, c_c, n_c) \in \text{courses} do

4:if pc=pe and cc=cep_c = p_e \textbf{ and } c_c = c_e then

5:add (se,nc)(s_e, n_c) to outputoutput.

6:end if

7:end for

8:end for

9:return outputoutput

Assume we have significantly more students enrolled for courses than courses (|enrolled| > |courses|).

P2.1

Assume we are running algorithm CEJOIN and ECJOIN on a computer where every instruction takes exactly the same amount of time to execute. Argue why CEJOIN must be faster than ECJOIN when running on computer ?

Solution

Given , the difference between the two lies in the number of iterations of the inner loop.

The outer loop table will only be scanned once, and the inner loop will be scanned for each iteration of the outer loop in the nested-loop algorithm:

  • CEJOIN will iterate over enrolled once, and courses over each iteration. Since |enrolled| > |courses|, CEJOIN’s inner loop will result in fewer iteration.
  • ECJOIN will iterate over courses once, and enrolled over each iteration. Since |enrolled| > |courses|, ECJOIN’s inner loop will means more iterations, comparing to CEJOIN.

Thus, we can conclude that CEJOIN must be faster than ECJOIN when running on computer \mathbb{C.

P2.2

real-world-system Implementation of CEJOIN and ECJOIN in impl_22.cpp shows that ECJOIN is actually faster than CEJOIN. Explain why this is the case in a real-world system.

Solution

Given the processor is using fast caches, the inner loop of ECJOIN will be faster than CEJOIN due to cache locality.

Since |enrolled| > |courses|, the inner loop of ECJOIN will have better cache locality, as it will be accessing the same memory locations more frequently. Additionally, ECJOIN’s outer loop will only run once, therefore we can expect ECJOIN to be faster comparing to CEJOIN, since CEJOIN will have to access enrolled each iteration, which would be slower comparing to accessing courses (since every item in enrolled might not be cached).

P2.3

The measurements in the above figure have a few sudden jumps, e.g., at 1 500 000, 2 500 000, 4 500 000, 8 500 000, and 17 000 000. Explain what causes these jumps.

Solution

These jumps are probably caused by the cache size and cache associativity.

As the data size increases, the number of elements might not fit into processor’s L1, L2, L3 cache, thus, causing more frequent cache misses. At these points, the dataset is so large such that the processor might have to access on slower main memory, which induces these bump we observed from the graph.

We can also expect context-switching overhead given the code might be running in multiple processes or threads. However, the implementation implies that this code is running in a single thread, so we can rule out context-switching overhead.

P2.4

Write an algorithm that efficiently computes the same result as CEJOIN and ECJOIN in all-case . In the design of your algorithm, you may require that either enrolled or courses is ordered. Argue why your algorithm is correct and why it has the runtime complexity we specified.

Solution

Assuming courses is sorted and ordered by (prog, code) pair, the following algorithm can be used:

Algorithm 16 EFFJOIN(courses, enrolled)

1:output:=output := \emptyset.

2:course:=sort(course,by=(pc,cc))course := \text{sort}(course, \text{by}=(p_c, c_c))

3:for (pe,ce,ne)enrolled(p_e, c_e, n_e) \in \text{enrolled} do

4:i:=binary-search(courses,(pe,ce))i := \text{binary-search}(courses, (p_e, c_e))

5:if inulli \neq \text{null} then

6:add (se,nc)(s_e, n_c) to outputoutput.

7:end if

8:end for

9:return outputoutput

With the following binary search algorithm:

Algorithm 17 binary-search(course, (p, c))

1:l:=0l := 0

2:r:=len(course)1r := \text{len}(course) - 1

3:while lrl \leq r do

4:m:=l+(rl)/2m := l + (r-l)/2

5:if course[m].pc=p and course[m].cc=ccourse[m].p_c = p \textbf{ and } course[m].c_c = c then

6:return course[m]course[m]

7:else if course[m].pc<p or (course[m].pc=p and course[m].cc<c)course[m].p_c < p \textbf{ or } (course[m].p_c = p \textbf{ and } course[m].c_c < c) then

8:l:=m+1l := m + 1

9:else

10:r:=m1r := m - 1

11:end if

12:end while

13:return null\text{null}

In all cases this would yield a runtime complexity of

Correctness

  1. Sorting course, has time complexity of , which will be executed once.
  2. The binary search has time complexity of , which will be executed for each iteration of enrolled.
  3. The for loop iterate each element in enrolled once, which has time complexity of .

Thus, the total time complexity is .