CS6601 Fall Exam

CS6601 Final Exam Professor Thomas Ploetz Fall 2023
1. Before solving the exam, you should read the Ed exam posts for any updated information or clarifications on the exam.
2. You should work on this pdf in parallel with the canvas exam to ensure your work is always backed up.
3. Enter your answers as specified into each questions’ provided solution space
4. When you submit the current working copy of your exam download a new copy of the pdf will include any
updates or clarifications made to the exam. So submit regularly
5. Make sure your answers follow the instructions given in the problem area.
6. Unless expressly instructed, round all your final answers to 6 decimal places. Do not round intermediate
7. If you are asked to round to a different number of decimal places, do it or you will lose points.
8. We do this as many questions are auto grade based on regular expressions which have exponential
combinations of solutions. We cannot accommodate all possible combinations. So please help us by closely
reading the instructions.
9. Hint: E.g., For rounding to 6 decimals you can use the round(answer, 6) function in Python
10. This pdf is for your convenience and may be submitted through Fall 2023 Final Calculations. This pdf is not
graded and is not a substitute for the canvas exam. However, any regrades or requests for additional points
must be based on submitted calculations, and are at the discretion of the question author.
11. The exam is open-book, open-note, and open video lectures, with no time limit aside from the due date of the
exam. No internet use is allowed, except for e-text versions of the textbook, this semesterʼs course materials,
Ed, and any links provided in the exam itself.
12. No resources outside this semesterʼs class should be used. Do not discuss the exam with anyone except the
instructional staff. That means no posting on Ed, Slack, Discord, Groupme, or any social media platform. Do not post publicly about the exam. If there is a question for the teaching staff, please make it private on Ed and tag it as directed in Ed by question subject and number.
13. Please make separate posts for each question area (e.g., Search, CSP, Patterns over time, Logic and Planning), in general grouping question areas together will have no response from the staff as they focus on responding to their question area
Points breakdown is provided in the canvas exam

Q 1. Search Part A
In the heart of Silicon Valley, a team of AI researchers at Alphabet-a has just unveiled their latest creation: Gemini, a groundbreaking multimodal language model with the ability to interpret and execute complex tasks. Unlike its predecessors, Gemini is not just a master of words; itʼs proficient in understanding and generating actionable plans, and has been very popular since its global release.
Gemini has also been integrated into a new line of sophisticated household robots which are in their beta testing phase. In the hands of one eager beta tester, there is one of these robots about to face challenges that will put Geminiʼs capabilities to the test. Tasked with maintaining domestic tranquility, the robot is ready for its first command. The beta tester, with high expectations, gives the directive: “Make the bathroom sparkle like never before”.
In order to fulfill the command, Gemini evaluates several potential actions, some of which are shown in Figure 1. These outputs were sampled via top-k sampling with k = 2, meaning all but the 2 most probable words are ignored in the search tree. Normalized probabilities are shown. To determine how Gemini will make decisions, Alphabet-a engineers have decided to apply Uniform Cost Search, with a cost function given by:
C = -ln(p)
Where C represents the cost, p represents the probability of a particular word being chosen, and ln is the natural logarithm. For example, if p = 0.50, then C = −ln(0.50) = 0.693147181.
Figure1:GeminiʼssearchspaceforUniformCostSearch.
Computer Science Tutoring
1. What is the order in which the nodes are explored? (Exclude the root node) You must indicate the search has ended by entering “search ended”. Ties are broken by choosing the leftmost path (1 point)
2. What is Gemini’s final path? (Exclude the root node)
You must indicate the path has ended by entering “path ended”. Ties are broken by choosing the leftmost path (1 point)
3. What is the total cost of the final path? (1 point)
4.Whichofthefollowingdescribesareasonforusingnegativelog probabilities as costs in the search algorithm? (1 point)
o To prioritize words with higher probabilities by assigning them lower costs.
o Toensurethatlessprobablewordsarechosenmorefrequently. o Tosimplifythecalculationofprobabilitiesintowholenumbers. o Tomaximizetherewardforthemostefficientpath.
5. Is Uniform Cost Search complete and optimal with respect to the cost function for the tree in Figure 1? (1 point)
o Itisbothcompleteandoptimalbecauseitalwaysfindstheleast cost solution
o It is complete but not optimal because it doesn’t consider a heuristic o It is optimal but not complete because it can get stuck in loops
o It is neither complete nor optimal due to the absence of a heuristic

Figure 2: Geminiʼssearchspace afterapplying RLHF.
Q1. Part B
After some initial testing, the AI safety team at Alphabet-a identifies potential misinterpretations by Gemini. To address this, they introduce a heuristic h derived via Reinforcement Learning from Human Feedback (RLHF), representing the expected human satisfaction with the outcome. This heuristic is designed to guide the search towards solutions that align more closely with human values and expectations, and approaches zero as Gemini gets closer to the userʼs goal. The engineers decide to update their search strategy to include this heuristic, keeping the same costs from before. They opt for the A* search algorithm, using the negative log probabilities for the cost as before and their new RLHF heuristic to estimate the remaining cost to reach the goal (Figure 2).
6. What is the order in which the nodes are explored? (Exclude the root node). You must indicate the search has ended by the option “search ended”. Ties are broken by choosing the leftmost path (1 point)
7. What is Gemini’s final path? (Exclude the root node). You must indicate the search has ended by the option “search ended”. Ties are broken by choosing the leftmost path (1 point)

What is the total cost of the final path? (1 point)
The addition of the RLHF heuristic is intendedto: (1 point) Predictfuture states beyondthe currentsearchhorizon.
Increase theprobability of words beingchosenby lowering theircosts. Alignthesearchmorecloselywithhumansatisfaction.
Eliminate the need fornegativelog probability costs in the search.
Which of the following statementsis true? (1 point) oTheRLHFheuristicisconsistent. oTheRLHFheuristicisadmissible.
o The RLHF heuristic is consistent andadmissible.
o The RLHF heuristic isneitherconsistent noradmissible.

Q2. Game Playing Part A
Two players are playing a game which starts with an array populated with an even number of elements. Each player takes turns removing an element from the left or right end of the array and adds it to their score. The game start with both players having no points. The game ends when the array is empty. At the end of the game, the player with the maximum score wins the game. As an example, letʼs say Sam and Tom are playing a game that starts with the array [7, 2, 4, 9], and letʼs say that Sam takes the first move. The resulting tree would be:
Note: In this game tree with a branching factor of 2, a choice of left receives the leftmost value from the array, and a choice of right receives the rightmost value from the array. Note that the chosen array elements are restricted and not available for the next ply or layer.
Assume Sam has the first turn and the starting array is [3,4,5,2], to answer the following questions

11 Sam has decided to build a tree to find the optimal way to play and win the game. Can you help Sam complete his tree? Please enter the scores for Sam and Tom at each of the steps queried. Enter as integers (1.5 points)
A. Sam:______,Tom:______ B. Sam:______,Tom:______ C. Sam:______,Tom:______ D. Sam:______,Tom:______
Q2. Game Playing Part B
12 Which of the following is/are true? (1 point)
o Given that both players play optimally and an initial array, the outcome of such a game is always predictable
o If we apply the minimax algorithm with Sam as the maximum player then it will find the shortest path in the above game tree
o The minimax algorithm will provide Sam with a better solution to play the game than alpha-beta pruning for this particular game as it will explore all the nodes in the game tree
o Both alpha-beta pruning and minimax algorithm will provide Sam with the same solution to play the games
Let us say that we are rooting for Sam, and thus we hope she wins. By strictly looking at the scores and the minimax tree, we obtain the following tree:

13 Complete the missing values in the minimax tree. (1.5 points)
A. Sam:______,Tom:______ B. Sam:______,Tom:______ C. Sam:______,Tom:______
14 If we apply alpha-beta pruning to the above tree with Sam as the maximum player, how many leaf nodes are pruned?
15 If both players play optimally, who will win this game? (1 point)
o Sam o Tom

Q3 Optimization
A research grant has been funded by the US department of transportation to optimize a high-speed train infrastructure between 6 key cities in the Southeast. You are tasked with creating a Genetic Algorithm to determine the best acyclic route. Ignore any infrastructure currently in place between these cities. The cities are:
Atlanta (ATL) Savannah (SAV) Nashville (NAS) Charlotte (CLT) Raleigh (RAL) Birmingham (BHM)
To do this, we will randomly generate a population of paths, to be parents. Each parent will be denoted as a chromosome, composed of a set of genes, where each gene is named with a city abbreviation. The cities are stitched together to create a path that the train can take. The parent population will be:
Parent 1: NAS → ATL → CLT → RAL → SAV → BHM Parent 2: CLT → NAS → ATL → SAV → BHM → RAL Parent 3: CLT → BHM → ATL → SAV → RAL → NAS Parent 4: RAL → NAS → SAV → BHM → ATL → CLT
We then evaluate their fitness with a custom fitness function. For a project such as this, the fitness of a path would be affected by length, geography, building costs, zoning, noise ordinances, and legal costs. In this problem, we will only use length, meaning the shorter the length the more fit the candidate is. To calculate the fitness we will use this function, which is the inverse of the length:
f(fitness) = length-1
Now letʼs evaluate each candidate based on this function. Here are the distances:
ATL ↔ SAV: 250 miles ATL ↔ NAS: 250 miles ATL ↔ CLT: 245 miles ATL ↔- RAL: 400 miles ATL ↔ BHM: 150 miles SAV ↔ CLT: 250 miles NAS ↔ CLT: 400 miles NAS ↔ BHM: 185 miles CLT ↔ RAL: 170 miles RAL ↔ SAV: 330 miles BHM ↔ SAV: 350 miles RAL ↔ BHM: 500 miles CLT ↔ BHM: 350 miles RAL ↔ NAS: 450 miles
NOTE: If the distance is not listed here, then the path between the cities is not possible for legal or geographical reasons.

Q3.1 Optimization
Provide the fitness for each parent: (.75 points ea.)
16: Fitness of Parent 1: __________________________ 17: Fitness of Parent 2: __________________________ 18: Fitness of Parent 3: __________________________ 19: Fitness of Parent 4: __________________________
Now we will perform crossovers[1] from two pairs of existing parents to create a new generation of child paths. We will then check to see if each new child path is a viable child. Our criteria for viability is that the train can reach each city in the path exactly once. We will also check the fitness of each viable child. We have randomly selected a crossover point at index 3 (zero-based indexing) for all the children. Parents 1 & 2 will partner to create Child 1 (P1|P2) and Child 2 (P2|P1), and Parents 3 & 4 will partner to create Child 3 (P3|P4) and Child 4 (P4| P3). What is the fitness of each child? Please put -1 if the child is not viable, because a path does not exist or the sequence does not go to every city exactly once:
1. See “Evolutionary algorithms”, AIMA 4th Ed., Russell & Norvig, Chptr. 4.1.4, Figure 4.6 (4 Queens ex.)
(.75 points ea)
20: Fitness of Child 1: __________________________ 21: Fitness of Child 2: __________________________ 22: Fitness of Child 3: __________________________ 23: Fitness of Child 4: __________________________
Q3.3 Optimization
We will now introduce mutations to the children and recheck viability and fitness. The possible mutations are: Swap mutation: Two genes are swapped at random
Scramble mutation: A subset of genes in the chromosome are scrambled to a random order Inverse mutation: The order of a random subset of genes in the chromosome is inverted
We have chosen to do a swap mutation next, at indices 2 and 5 (again, zero-based indexing). Swap the genes at these indices and reevaluate the fitness of each child. Please put -1 if the child is not viable.

(.75 points ea)
24: New fitness of Child 1: __________________________ 25: New fitness of Child 2: __________________________ 26: New fitness of Child 3: __________________________ 27: New fitness of Child 4: __________________________
28. Normally we would do these steps for a few generations before selecting the most fit path. But for this question, we are now done. Please note which path was the most fit so far in our search, by ordering each city in terms of what stop it would be from 1-6: (1 point)
Atlanta Birmingham Charlotte Nashville Raleigh Savannah
___________
___________
___________
___________
___________
___________

Q4 ConstraintSatisfactionProblems(CSPs)
YourfriendshavebeentrainingandpreparingforSailGPallsummerandthetimehasfinallycomefor registration.Howevertheyaresquabblingoverwhatpositioneachpersonshouldbeinchargeofandso theyʼveturnedtoyoutohelpthemresolvetheissue!RecognizingthatthisisaCSPproblem,youʼvedecidedto take an algorithmic approach toassign everyoneroles.
Variables:Thereare6variables,andtheyareyourfriends,Alice,Bob,Carol,Dan,Elise,andFred.
Domains:Thereare4valuesthateachvariablecanbeassigned,andtheyarethepositionsthatonecan take on the boat, Driver (D), Flight Controller (FC),Wing Trimmer(WT), and Grinder(G).
It is important to note that there have to be 6 people onthe boat at all times, and that each boat may only haveexactly1Driver,exactly1FlightController,andexactly1WingTrimmer(therecanbemorethan1 Grinder).Weʼlltrytocapturetheserequirementsasconstraints,butyouʼllseethatwewonʼtbeableto capture thementirely.
Below in Figure1. is the starting setof domains for the CSP problem before any domainreductionoccurs. We willreferto it as domain state zero.
Figure 1: Domain State Zero
Constraints: You’ve talked to each of your friends individually and collected all the different wishes and grievances they have and reduced them to a set of unary and binary constraints that are easy to implement in the CSP algorithm. They are as follows:
Unary Constraints
1. BobcannotbetheFCoraG. 2. Carol cannot be the FC.

Code Help, Add WeChat: cstutorcs
Binary Constraints
1. If a person is assigned D, no other person may be assignedD.
2. If a person is assigned FC, no other personmay be assigned FC.
3. If a personis assignedWT, no other person may be assignedWT.
4. If Alice is the FC, neither Bob,Dan, nor Elise can be the WT.
5. If Alice is the FC, neither Bobnor Fred can be the D.
6. If Carol is a G, neither Bob norDan can be the G.
7. Elise can be the Dor WT if and only if Carol is the FC.
8. Dan or Fred can be theFC if and only if Elise is either theD or the WT.
Constraint Graph: Notice how unary constraints are not reflected in theconstraint graph shown in Figure 2. and that all binary constraints between twovariablesX and Y are expressed in theform of one edge between X and Y.Also note that all edges that only contain the binary constraints1, 2, or 3 from above are represented using a dashed line and thatall edges that containat least one of the binary constraints4, 5, 6, 7, 8 are representedusing a solid line.
Figure 2: The Constraint Graph
ConstraintTable:Wealsoprovideyouwithaniftytablewithalloftheallowedassignmentutples representedbyeachedgeintheconstraintgraph(whatthealgorithmwillcheckwhenevaluatingarcconsis- tency/binaryconstraints).
Notation: The symbol * means “any”. The symbol ∼means “not”.

Edge Total (Alice,Bob) 11
AllowedTuples
(FC, G); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, *)
(FC, D); (FC, G); (D, ∼D); (WT, ∼WT); (G, *)
(FC, D); (FC, G); (D, ∼D); (WT, ∼WT); (G, *)
(FC, WT); (FC, G); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, ∼G)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, ∼G)
(FC, ∼FC); (D, FC); (D, G); (WT, FC); (WT, G); (G, FC); (G, G)(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, *)
(FC, D); (FC, WT); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, *)
(FC, ∼FC); (D, ∼D); (WT, ∼WT); (G, ∼FC)
(Alice,Carol) (Alice,Dan) (Alice,Elise) (Alice,Fred) (Bob, Carol) (Bob, Dan) (Bob, Elise) (Bob, Fred) (Carol, Dan) (Carol, Elise) (Carol, Fred) (Dan, Elise) (Dan, Fred) (Elise,Fred)
Table 1: The Constraint Table Note*:Thistableonlyappliestobinaryconstraintsanddoesnotaccountfortheunaryconstraints.
29. (2 points)
How many complete assignments are there in total for this CSP problem? For an assignment to be
complete, all variables must be assigned a single value and the assignment does not have to be consistent.
30. (2 points)
While using a backtracking search based algorithm, heuristics may be used at each step to choose what variable to assign next and what value to assign it. One such heuristic is the degree heuristic where we select the variable that is involved in the largest number of constraints on other unassigned variables (binary constraints) [1]. Beginning with domain state zero provided above, we will first perform the following steps:
Reduce all domains by applying node consistency. (3 values should get removed across all domains)
Assign the value of FC to the variable Alice. (All remaining values in Aliceʼs domain should get removed and Aliceʼs domain is now fixed)
Apply arc-consistency on Alice only. (8 values should get removed across all domains)
Afterapplyingtheabovesteps,wedecidethatwewanttocontinuewiththealgorithm.Ifweuse the degree heuristic based only on the number of solid edges in the constraint graph provided above to select the next variable for assignment, which variable(s) may we select for assignment next? (select all that applies) Hint*: The node Fred is considered to have a node degree of 2.
1. AIMA, 4th Ed., Russell & Norvig, Chptr. 6.3.1
o Alice o Bob o Carol o Dan o Elise o Fred

31. (2 points)
An alternative heuristic that we can use to pick a variable to assign a value to is the minimum remaining value heuristic (MRV). MRV chooses the variable with the fewest “legal” values (values available in its domain) [1]. Beginning with domain state zero provided above, we will first perform the following steps:
Reduce all domains by applying node consistency. (3 values should get removed across all domains)
Assign the value of G to the variable Carol. (All remaining values in Carol’s domain should get removed and Carol’s domain is now fixed)
Make the new state of domains arc-consistent. (5 values should get removed across all domains)
After applying the above steps, we decide that we want to continue with the algorithm. If we use the minimum remaining value heuristic, which variable(s) may we select for assignment next?
1. AIMA, 4th Ed., Russell & Norvig, Chptr. 6.3.1
o Alice o Bob o Carol o Dan o Elise o Fred
Figure 3: Domain State One
Hint:NotethatweareapplyingLCVtothesetofdomainswithoutapplyingnodeconsistencyfirst.Also note that G is missing from some of the domains.
32. (2 points)
The two heuristics mentioned above deal with picking variables to make assignments to at each step. But what about choosing the value to assign to the variable after the variable to be changed has been chosen? A heuristic that comes in handy for selecting a value is the least constraining value heuristic (LCV). LCV chooses the value that rules out the fewest choices for the neighboring variables in the constraint graph[1]. Beginning from the set of domains provided below (domain state one), we choose the variable Dan to assign a value to. If we use the least constraining value heuristic, which value(s) may we assign to variable Dan next?
1.AIMA,4thEd.,Russell&Norvig,Chptr.6.3.1 Dan:
oG o FC oD o WT

Q5 Probability
34. Picking Out of a Bag (1 point)
A bag contains 30 piecesof paper marked from 1 to 30. Find the probability of drawing one odd and one even ina single draw of two tickets. _____________
35. Picking Out of a Bag Part II (1 point)
Bag 1 has four dimes and two nickels. Bag 2 has three dimes and three nickels. A bag is selected at random and a dime is drawn. What is the probability the dime came from bag 2? _____________
36. Logical Proofs (1 point) Mark all that are True
o If P(A|B) = P(A) then P(B|A) = P(B)
o IfP(A,B|C)=P(C|A,B)thenP(A|C)=P(B|C)
o If P (B|C) = P (B) then P (A|B, C) = P (A|B) ∗ P (A|C)/P (A)
37. Entering a Raffle (1 point)
There is a raffle with prizes. To enter the raffle you read and agree to the following rules: 1. Buy a ticket for $2
2. Each ticket is numbered from 000 – 999, and is assigned at random, and is unique
3. After all of the tickets are sold a winning number from 0 to 999 is selected at random 4. If a ticket contains any of the numbers in the winning numbers in the same ordered
sequence (ones, tens, hundreds) as the winning number a monetary prize is awarded 5. The possible prizes are shown below
a. All 3 numbers the same: $1200 b. Two numbers the same: $100 c. One number the same: $10
For example, if the winning number was 692 and you got ticket with the number 632 you would get a prize of $100 for having two of the numbers in the same ordered sequence as the winning number
Note: You can only win the highest possible prize that applies to your ticket. If you get 2 numbers correct you only receive $100 and not any of the $10 prize for getting 1 number correct.
What is the expected profit(or loss) you would expectfrom a purchase of 3 tickets?If a loss please enter a negative number. (2 points)
(37.)____________________

Q6. BayesNet
Sometimes, the outputs from Bayes Networks can seem counter-intuitive. To understand this, consider the following example: Earthquakes and burglaries areindependentevents.Eitherofthemhappenswithaprobabilityof0.01,as shown in Table 1 and 2. Either can possibly cause an alarm to go off. The probability graph is shown in Figure 1. The joint probability distribution shouldbecalculatedas:
Pr(B = b, E = e, A = a) = p(b) · p(e) · p(a|b, e)
1 0.01 0 0.99
Table1:Theprobabilityofanearthquakehappened.Thevalueof1represents the occurrenceof an event, andthe value of 0 represents thenon-occurrence.
1 0.01 0 0.99
Table2:Theprobabilityofaburglaryhappened.Thevalueof1representsthe occurrenceof an event,and the value of0 representsthe non-occurrence.
TheconditionalprobabilityofP(a|b,e)isprovidedintheTable3.For example,P(a = 1 | b = 0,e = 1) = 1.0000 denotes ʻtheprobability that if there is an earthquake andno burglary, the alarm will go offʼ.
Figure 1: The probability graph of an alarm (A) goes off.

b e a P(a|b,e)
0 0 0 1.0000 0 0 1 0.0000 0 1 0 0.0000 0 1 1 1.0000 1 0 0 0.0000 1 0 1 1.0000 1 1 0 0.0000 1 1 1 1.0000
Table 3: Conditional probability table for P (a | b, e)
38. Given all the information above, please fill in the joint probability distribution table. Round your answers to 4 decimal places (2.5 points)
b e a P (A = a, B = b, E = e) 000
39. (2 points)
Giventhatthealarmhassounded,whatistheprobabilityofaburglaryhaving occurred, without knowing about an earthquake?
Probability:
40. (2 points)
Calculate the probability P (B = 1 | A = 1, E = 1). Round your answer to 2 decimal places.
Probability:

41. Does hearing that thereʼs an earthquake increase, decrease, or keep constant the probability of a burglary? (1.5 points)
o Increase
o Decrease
o Keepconstant

Q7. Machine Learning
Modelcomplexity
In the realm of machine learning, the choice of an appropriate level of model complexity is crucial for achieving thebestperformanceonatask.Whileacomplexmodelispronetooverfitting,anoverlysimplisticmodel may not have enoughrepresentationalpowerto capture the relationships betweenthe input variablesand desiredoutputs.Letusstudythisconceptinthecontextofbinaryclassificationtasksbasedonthelogical circuits shown below.
Eachcircuithasthreebinaryinputsx1,x2andx3,whichcanbe0or1.Letustrainoneofthesimplest models,alinearclassifier,topredicttheoutputsoflogicalcircuits(separately).Thelinearclassifierisgiven
F (x1, x2, x3) = w1x1 + w2x2 + w3x3 + b
Where, w1, w2 and w3 are weights and b is the bias. w1, w2, w3 and b are real numbers. The model predicts 1 if F (x1, x2, x3) ≥ 0 and 0 otherwise.
Programming Help
42. On which of the provided logic circuits, is it possible for the linear classifier F to attain 100% test accuracy? (Hint: you may find building truth tables and the notion of separability useful.) (1 point)
o Circuit A o Circuit B o Circuit C o Circuit D
7.2MachineLearning
DecisionTrees
YouareanMLengineeratamajormusicstreamingserviceT.oanalyzetrendsinmusicandimprovingthe quality of music recommendations, you collected a dataset analyzing musical characteristics of 10 songs, showninTable1.Thefeaturesyoufocusedonwere,Tempo,whichismeasuredinbeatsperminute; Instrumentalness(0to1),whichmeasureshowinstrumental(1)orvocal(0)thesongcontentis;Energy(0to 1),whichistheaverageenergylevelotfhesongbasedonitsfrequencyspectrumandDanceability(0to1), whichmeasureoffavorabilityofasongfordancing.
Song Tempo Instrumentality
1 120 0.05
2 140 0.10
3 100 0.80
Energy Danceability Genre
0.85 0.75 Rock 0.80 0.70 Pop 0.40 0.60 Jazz 0.30 0.50 Jazz 0.90 0.80 Rock 0.50 0.65 Jazz 0.75 0.70 Pop 0.35 0.55 Jazz 0.95 0.85 Rock 0.55 0.65 Pop
5 160 0.20
6 110 0.70
7 130 0.15
8 105 0.90
9 150 0.30
10 125 0.60
Table 1: Song dataset
What is the Gini gain corresponding to the following featuresin the dataset, assuming that the split is pe rforme d at the me an of the fe ature value s? (Re port 6 digits) (1.5 points ea)
43: Tempo:
44: Instrumentality:
7.3MachineLearning
KNNsandFeaturepre-processing
J3PO,anupcomingartist,justreleasedasongonyourstreamingplatform.Youwantedtoaddittoa genre relevantplaylist. However,it is not possible for you to manually listen to the song and determinethe genre,asyouhavethousandsofothersongstoclassify. Havingwrittencodetoanalyzemusicalfeatures,you determine that J3POʼs song has a tempo of 125 bpm, instrumentality of 0.85, energy of 0.4 and danceability of 0.5.
45: De te rmine the 5 most similar songs to J3POʼs song base d on the e uclide an distance me tric. The euclidean distance metric is givenas,

Where,p,q are two vectors in n-dimensional space. (2 points)
□ Song 1 □ Song 2 □ Song 3 □ Song 4 □ Song 5
□ Song 6 □ Song 7 □ Song 8 □ Song 9 □ Song 10
46: What is ge nre of J3POʼs song base d on k-ne are st ne ighbors, with k = 5. (Choose the majority g)e(2npreoints)
47: It was lunch break and you decide to relax to J3POʼs song. Something definitely felt wrong. You found out that your kNN model had wrongly predicted its genre! You rush to your computer terminal to investigate whatʼs wrong. Turns out, using raw non-normalized tempo values had biased your classifier significantly. Apply min-max normalization (between 0 and 1) to the tempo feature and determine the 5 most similar songs to J3POʼs song. (Use the euclidean distance metric) (2 points)
□ Song 1 □ Song 2 □ Song 3 □ Song 4 □ Song 5
□ Song 6 □ Song 7 □ Song 8 □ Song 9 □ Song 10
48: Now that tempo values are min-max normalized, what is genre of J3POʼs song based on k-nearest neighbors
with k = 5. (Choose the majority genre) (2 points)
7.4 Machine Learning
Neuralnets
49: Which of the following statementsare true? (2 points)
o Gradient descent is guaranteed to converge tothe global minimum on a convex losssurface. o Precision is a more suitable metric than recall when the cost of false positives is significantly
higher than the cost of false negatives.
o Choosingk=1foraKNNmodelisguaranteedtocauseunderfitting.
o Amulti-layerperceptronwithanactivationfunctionf(x)=1.5xcanlearnanydecisionboundary, givenanarbitrarynumberoflayers.Assumethatthesameactivationfunctionisusedinevery layer.

Q8 Patterns Recognition Through Time : Dynamic Time Warping (DTW)
Dynamic Time Warping (DTW) – Tutorial:
DynamicTimeWarping(DTW)standsasarobustalgorithmutilizedto assessthesimilaritybetweentwotime-seriesequences.Itsapplicationsspana widerangeofdomains,encompassingshaperecognition,handwritingrecogni- tion,andspeechrecognition.Thisalgorithmoffersahigherdegreeofflexibility when compared to the Euclidean distancemetric, making it particularly valu- ablefordatawithsignificantvariance.Toillustratethedistinction,considera visual representation ofthe waythese two algorithmscompare twosimilar time series by evaluating individual datapoints. The Dynamic Time Warping algorithm aligns data points in a manner that optimizes theircomparability, in contrasttothesomewhatmoresimplisticapproachoftheEuclideanmatching algorithm.
We can perform