CS124 midterm review solutions

CS 124 Midterm Review 3/5/23
You will have 75 minutes in class to complete the exam. The exam will have true/false questions, and may have multiple choice problems, example/counterexample problems, run-this-algorithm problems, and problem set style present-and-prove problems.
The exam is difficult. In previous years, the median score has been around 50%.
2 Topics Covered
This is a pretty comprehensive list of topics we have gone over this first half of the semester, but anything said in lecture is fair game for the exam.
2.1 Math Fundamentals
• Coin Flipping and basic probability.
• Induction. If P(n) is a statement (“2n is even”), P(1) is true, and ∀n, P(n) → P(n + 1),
then ∀n, P(n) is true.
• Big-O Notation. o, O, ω, Ω, Θ and identifying which of these relations hold for two functions.
• Recurrence Relations. Solve simple ones by finding a pattern and proving them with induction. More complicated recurrences can be solved using the Master Theorem.
2.2 Graph Search
• Representation. Adjacency list versus adjacency matrix
• Depth First Search (DFS). Uses a stack to process nodes, Pre and post order numbers, labels for the edges (tree, forward, back, cross).
Important Applications. Detecting cycles, topological sorting, finding strongly connected components
• Breadth First Search (BFS). Uses a queue to process nodes, can be used to find the shortest path when all edge weights are 1.
• Dijkstra’s Algorithm. Single source shortest path for non-negative edge weights. Uses a heap or priority queue to process nodes. Does not work when there are negative edge weights
• Heaps. binary heap implementation, operations: Peek, ExtractMin, Insert (Min-
Heapify), how they are used in Dijkstra’s algorithm
• Bellman-Ford Algorithm. Single source shortest path for general edge weights, edge
relaxation procedure (referred to as update in the lecture notes), detecting negative cycles
• Shortest Path in DAG. Can be done in linear time via dynamic programming regardless of edge weights.
程序代写 CS代考加微信: cstutorcs
2.3 Minimum Spanning Trees
• Basic Properties. Connected, acyclic, |E| = |V | − 1 (any two imply the third).
• Cut Property: The basis for why our MST algorithms are correct, allows us to greedily add
edges to a sub-MST.
• Prim’s Algorithm: implemented in a similar fashion as Dijkstra’s, builds MST from a single vertex.
• Kruskal’s Algorithm: implemented using a union-find data structure, builds MST from lightest edges of the graph
• Disjoint Forest Data Structure: maintain a bunch of disjoint sets that can be combined (“unioned”) efficiently
Operations: makeset(x), find(x), union(x,y)
Heuristics: Union by Rank, Path Compression
2.4 Greedy
• Main Idea: At each step, make a locally optimal choice in hope of reaching the globally optimal solution. (Does not always work. Usually very fast when it does.)
• Horn Formula: set all variables to false and greedily set ones to be true when forced to
• Huffman Coding: find the best encoding by greedily combining the two least frequently
• Set Cover: greedy approximation algorithm with O(log n) performance ratio, showed on
problem set 3 that we can actually achieve Θ(log n) by example.
2.5 Divide and Combine
• Main Idea: Divide the problem into smaller pieces, recursively solve those, and then combine them in the right way.
• Mergesort and Windowsort.
• Integer Multiplication. perform 3 multiplications on n/2 digit numbers, and then do some
• Strassen’s Algorithm. multiplies two n × n matrices by doing 7 multiplications on n/2 × n/2 matrices.
2.6 Dynamic Programming
• Main Idea: Maintain a lookup table of correct solutions to sub-problems and build up this table in a particular order.
• Run-time and Space Analysis. How does one go about analyzing the time and space complexity of a DP algorithm?
• String Reconstruction. Tries to find where the first dictionary word ends by checking all possible breaking points.
• Edit Distance. Tries all possibilities of insert, delete, change on letters that are different.
• All Pairs Shortest Paths. Uses the idea that the shortest path to a node must have come via one of the neighboring nodes.

• Traveling Salesman. DP can provide better exponential-time solutions to NP-hard problems.
3 Practice Problems
Exercise 1. Answer True or False for the following:
log5(n) = Ω(log3(n))
2logn=ω(n)
Suppose T (n) = 2T (n/b) + n. Then, as b → ∞, we eventually get T (n) = Θ(1). IfT(n)=T(√4 n)+1,thenT(n)=o(loglogn).
n log(n!) = O(n log n) n! = o(nn). from CLRS
2 2logn=o(n).fromCLRS
f(n)2 = ω(f(n)) for every positive f.
T Let’s take the limit:
lim n =lim n =lim2
2√n−n The limit is 0 and therefore the little o relationship holds.
log5n= log3n =log53·log3n≥0.1log3n log3 5
n→∞ 2 n→∞ 2 n→∞ T Using our change of base formulas, we have:
and therefore the Ω relationship is true. (c) F Take the limit:
2 logn 2 logn =lim logn =0
n→∞ n n→∞2
Therefore, the ω relationship is false and the relationship should be o.

F As b → ∞, it is true that T (n) becomes smaller because you are breaking the problem into more and more sub-parts. However, no matter how big b is, you always need to pay the fixed cost of n and so asymptotically, T(n) will never be constant. You can also see this through the Master Theorem.
F Perform a similar substitution as the one we did on the problem set. Let S(n) = T(2n). We get the recurrence: S(n) = S(n/4)+1 which solves to S(n) = log(n) and thus T (n) = log log(n). Therefore, the correct relationship should be Θ and not o.
(g) T Take the limit:
F log(n!) = O(n log n), so clearly n log(n!) = ω(n log n). The proof is by laws of logarithms: log(n!) = Pni=1 logi ≤ Pni=1 logn = nlogn.
limn!=limYn i≤lim1=0 n→∞ nn n→∞i=1 n n→∞ n
√ √1 √2logn
T Since 2x=o(x),thenforlargex, 2x≤2x. Thenplugginginx=logn,2 ≤
1 logn logn 1/2 √2logn 22 =(2 ) forlargen. Thus,2
F Counterexample. f(n) = 1. Clearly, f(n)2 = 1 = Θ(f(n)), so it is not ω.
),andthuso(n).
Exercise 2. Run each of the following algorithms on the following graph, roughly describing what happens at each step.
(a) Kruskal’s for MST
(b) Prim’s for MST, beginning at A.
(c) Dijkstra’s for single-source shortest paths, beginning at A. A3B1C
(a) Sort edges in non-decreasing order. So we may choose (B, C), then (B, D), then (D, E), then discard (B, E) and (C, E) because they would form a cycle, then add (A, B), and we are done. The final MST contains edges {(A, B), (B, C), (B, D), (D, E)}. (It would also be correct to

break edge weight ties in a different order.)
(b) Initialize the value of A to be 0 and the rest to be infinity. Add A to the cut, so {B,C,D,E} are not in the cut. Then update the value of B to 3, and the value of D to 4; and B,D point to A. Add B to the cut (via the edge (A,B)), so {C,D,E} are not in the cut. Then update the value of D to 1, the value of E to 3, and the value of C to 1; and B,C,E point to B. Add D to the cut (via the edge (B,D)), so {C,E} are not in the cut. Update the value of E to 2, and it points to D. Add C to the cut (via the edge (B,C)), so E is not in the cut. Do not update the value of E, since it is not larger. Then add E to the cut (via the edge (D,E)). (It would also be correct to break value ties in a different order.)
Remark: In general, we might not get the exact same tree from Prim’s and Kruskal’s, but they both give some minimum spanning tree.
(c) The distance of A is 0, and everything else is infinity. Update the distance of B to 3, and the distance of D to 4. Then add B and D to the priority heap with distances as keys. Pop B from the heap. Update the distance of C to be dist(B) + 1 = 4, update the distance of E to be dist(B) + 3 = 6, and do not update the distances of A or D, since they are not shorter. Add C and E to the priority heap. Pop D from the priority heap. Update the distance of E to be dist(D) + 2 = 6, and do not update A or B. Add E to the priority heap. Pop C from the priority heap. Do not update the distances of B or E. Pop E from the priority heap. Do not update the distances of B,C,D. So the final distances are A : 0,B : 3,C : 4,D : 4,E : 6.
Exercise 3. (Problem Set 2) The input to 2SAT is a logical expression of a specific form: it is the conjunction (AND) of a set of clauses, where each clause is the disjunction (OR) of two literals. (A literal is either a Boolean variable or the negation of a Boolean variable.) For example, the following expression is an instance of 2SAT:
(x1 ∨x2)∧(x1 ∨x3)∧(x1 ∨x2)∧(x4 ∨x3)∧(x4 ∨x1)
A solution to an instance of a 2SAT formula is an assignment of the variables to the values T (true) and F (false) so that all the clauses are satisfied that is, there is at least one true literal in each clause. For example, the assignment x1 = T , x2 = F , x3 = F , x4 = T satisfies the 2SAT formula above.
Derive an algorithm that either finds a solution to a 2SAT formula, or returns that no solution exists.
Hint: Reduce to an appropriate problem. It may help to consider the following directed graph, given a formula I in 2SAT: the nodes of the graph are all the variables appearing in I and their negations. For each clause (α ∨ β) in I, we add a directed edge from α to β and a second directed edge from β to α. How can this be interpreted?
Each edge from the hint above represents an implication, i.e. if (u, v) is a directed edge in the graph, thenu=T =⇒ v=T.Thisapplieshere:ifαistrue,thenβmustbetruefortheclausetobe true; if β is true, then α must be true for the clause to be true.
In this graph, strongly connected components correspond to a set of literals that must simultaneously be true or false. This is because for any vertices/literals u, v in the same SCC, if u = T , then v = T

by following the implications along the path from u to v, and if u = F, assume for contradiction that v = T, then we must get u = T by following the implications along the path from v to u, contradiction. An immediate consequence of this is that if a strongly connected component contains both α and α, then the formula is not satisfiable, since it contains a contradiction.
It now suffices to prove that if no pair of literals α and α are in the same SCC, then the formula is satisfiable. We will attempt to construct a solution as follows. We topologically sort the graph of SCCs to obtain S1, S2, . . . , Sk, where Si are sets of variables that form an SCC, such that there is animplicationu→vforu∈Si andv∈Sj iffi≤j. Startingfromt=kandgoingbackwards,we set all literals in the SCC St to false if some u ∈ St has been assigned to false, and we set all literals in the SCC St to true otherwise. (Note that we always go with the latter case when k = t, since no literals have been set yet at that point.) We then assign α ̄ to the negation of what we assign to literals in St for each α ∈ St. Note that by our construction, α ̄ is either preset already to the desired value before we process St, or it has not been set yet and is in some SCC Sj such that j < t. We show that there can be no contradiction in such assignment iff no literal and its negation are in the same SCC. The “only if” direction is obvious. For the “if” direction, assume that the first SCC during our algorithm where we encounter a contradicting assignment is St for some t < k, where u,v ∈ St, u ≠ v ̄ and u,v have been preset to T and F, respectively. By the skew-symmetric construction of the graph, since there is a path from u to v, there must be a path from v ̄ to u ̄, and since there is a path from v to u, there must be a path from u ̄ to v ̄. (Note the relationship between (α ̄, β) and (β ̄, α).) This implies that u ̄, v ̄ must be in the same SCC, and since they are what caused u and v to be preset with contradictory values, they are in some Sj with j > t. Since u ̄, v ̄ must be F and T, respectively, this contradicts the maximality I think We of t. Therefore, our algorithm is correct.
To finish, we check the runtime of this algorithm. Converting the graph into an SCC graph G and running DFS on G′ takes O(n + m) time. Constructing the solution involves consider- ing each literal a constant number of times, so it takes O(n) time. Therefore, this algorithm takes O(n + m), which is linear. (Note: The original solution was published in 1978, here: https://mathweb.ucsd.edu/ ̃sbuss/CourseWeb/Math268_2007WS/2SAT.pdf.)
Exercise 4. Elections Ballotville is having an election for mayor. Voters in Ballotville submit ranked choice ballots, where they specify an ordering of the set of candidates C = c1, · · · , cn they prefer, such as c2, cn, c1, . . . , c2. Let Mi,j be the margin of votes between candidate i and j, so
Mi,j = #(voters who prefer i to j) − #(voters who prefer j to i)
A (non-empty) set S ⊂ C of candidates is called the Smith set if it is the smallest set S′ such that for all ci ∈ S′ and for all cj ∈ S′c, Mi,j ≥ 0 (that is, every candidate in S′ would pairwise beat every candidate not in S′). Give an O(n2) algorithm for determining the unique Smith set or outputs that no unique set exists. You may assume that such a set does exist. Hint: consider a graph where the candidates are vertices, and determine a suitable representation for the edges.
Let G = (V,E) be a graph with vertices v1,v2,…,vn corresponding to candidates c1,c2,…,cn. Let
程序代写 CS代考加QQ: 749389476
ei,j be a directed edge from i to j if Mi,j > 0. Note that ei,j,ej,i ∈ E iff Mi,j = 0.
Thus we want to find a set of vertices S such that for all vi ∈ S,vj ∈ Sc, there exists an edge ei,j. This is equivalent to finding the source SCC in a directed graph. Thus, we apply Tarjan’s algorithm for finding the SCCs in a graph.
Correctness: We proved the correctness of Tarjan’s algorithm in class. It therefore suffices to show that the Smith set corresponds to the source SCC.
(Smith set implies source SCC) Suppose towards a contradiction that there is a candidate ci that is in the Smith set but vi is not in the source SCC of the graph. Since it is not in the source SCC, there must be a path from some vj (in the source SCC) to vi but not vice versa. Note that Mi,j = −Mj,i and there exists an edge ei,j iff Mi,j > 0, so it must be that either ei,j ∈ E or ej,i ∈ E
(or both).
Thus, one path from vj to vi is ej,i. It cannot be that ei,j also exists, as then every vertex in the source SCC would be reachable from vi (through ei,j). Thus, vj beats vi and vi does not beat vj, so vi is not in the Smith set, a contradiction.
(source SCC implies Smith set) Suppose towards a contradiction that there is a vertex vi in the source SCC but ci is not in the Smith set. Since ci is not in the Smith set, there exists a candidate cj in the Smith set such that ej,i exists and ei,j does not.
Moreover, there is no other “path” by which vj is reachable from vi, since vj is only reachable via candidates in the Smith set, which vi is not in. Therefore, there is a node vj not reachable from vi, a contradiction to vi’s membership in the source SCC.
Runtime: The runtime for Tarjan’s algorithm is linear, as shown in class. Here, O(|V | + |E|) = O(n + n2) = O(n2).
Exercise 5. A directed graph G = (V, E) is called semiconnected if for each pair of distinct vertices u,v∈V,thenthereisapathfromutov,apathfromvtou,orboth. Findanalgorithmto determine whether a directed graph is semiconnected. Hint: Look at the SCC graph.
Let G′ be the DAG on the strongly connected components of G. Number the nodes of G′ in some topological order. We claim that G is semiconnected iff there is always a path from the i-th node to thej-thnodeofG′ ifi dist[v] + w(v, w) for some edge (v, w), we replace the + with a × and the < with a >. The resulting expression is: dist[w] < dist[v] · w(v, w). We update the distance of some node w a larger value can be achieved by dist[v] · w(v, w). In other words, going through v to get to w gets us a larger total distance (i.e. greater reliability). The correctness of either solutions falls immediately from that of Dijkstra’s. Their run-time is therefore also the same as that of Dijkstra’s. 浙大学霸代写加微信 cstutorcs Exercise 7. Each of the following statements is false. Provide an explanation or counterexample for each. (a) Suppose we have a disjoint forest data structure where we use the path compression heuristic. Let x be a node at depth d in a tree, then calling find(x) a total of n times takes O(nd) time. (b) Using Huffman encoding, the character with the largest frequency is always compressed down to 1 bit. (c) Consider a variation on the set cover problem where each set is of size at most 3. Then the greedy set cover algorithm discussed in class is optimal. (a) If we use path compression, then the second time we call find, it will only take O(1) time. Therefore, the total run-time is actually O(d + n). (b) Suppose the frequencies are: A : 100, B : 90, C : 80, D : 70. We would combine C and D first, and then A and B to get AB : 190, BC : 150. The Huffman encoding tree would look like a complete binary tree with A = 00, B = 01, C = 10, D = 11 (as an example). Even though A is the most frequent character, it is not represented by a single bit. (c) Suppose X = {1,2,3,4,5,6} with sets S1 = {1,2},S2 = {3,4},S3 = {5,6}, and S4 = {1,3,5}. The greedy algorithm will first choose S4 and the be forced to pick all three of S1,S2 and S3. But the optimal solution would have been to simply pick S1, S2, S3. Exercise 8. Cutting Wood From MIT 6.006, Spring 2011 Given a log of wood of length k, Woody the woodcutter will cut it once, in any place you choose, for the price of k dollars. Suppose you have a log of length L, marked to be cut in n different locations labeled 1, 2, . . . , n. For simplicity, let indices 0 and n + 1 denote the left and right endpoints of the original log of length L. Let the distance of mark i from the left end of the log be di, and assume that0=d0 100 floors, and they provide you some k > 1 prototype phones to test with. Come up with an algorithm to determine the minimum number of drops needed to find the maximum floor you can drop the phone at so that it survives. (+2 bonus marks if your solution uses O(k) space)
The idea is to use DP, as hinted by the O(k) space bonus mark hint.
First starting with an O(nk) space solution, we have a 2D DP array where D[i][j] stores the number of drops to determine the cutoff for a building with i floors with j prototype phones, such that D[n][k] will be the final result. Then the recurrence is as follows:
  0 i = 0   ∞ j = 0
1 + mink∈[i](max(D[i − k][j], D[k − 1][j − 1])) else
Breaking this down, if we have no floors, it takes 0 drops to find out the cutoff. If we have some floors but no prototype phones, we can’t solve the problem. If we have one prototype phone left with i floors to check, we have to start from the bottom floor and drop at each floor, taking i drops.
Now for the main part of the DP. If we have i floors to search and j phones left, then let’s consider what happens if we choose to drop at a floor k:
• The phone breaks: if the phone breaks when we drop it at floor k, we now have k − 1 floors to search, and j − 1 phones.
• The phone survives: if the phone survives, the cutoff has to be above those k floors, so we need to search the i − k floors above floor k, and we get to keep that phone so we still have j phones.
We take a max of these two results since we are interested in reducing the worst number of drops, since we have no idea where the cutoff point is.
We take a minimum over all k ∈ [i] as we can choose any floor between 0 and floor i to drop the phone at, and we want to choose the floor that minimizes the maximum number of more drops we have to do.
We add 1 since whichever floor k we pick, it cost us 1 drop to drop a phone at that floor.
Time complexity
There are O(nk) positions to fill. For each of these, we do some O(1) checks to determine which case of the DP we are in. In the worst case (the bottom/’else’ case), we take a minimum over O(n) previously stored values (the max is always just over two values), hence each step takes at most O(n) time and our overall time complexity is O(n2k).

First a note about the bonus points. We note that in our recurrence, since 1 ≤ k ≤ i, 0 ≤ i − k < i, so we need all values D[i′][j] where i′ < i to calculate D[i − k][j]. For the second termD[k−1][j−1],againk−1