CS4296 CS5296 Assignment 2

Deadline: 23:59 March 19, 2023 HKT
CS 4296 Spring 2023 Assignment 2 (2 questions, 7 marks) CS 5296 Spring 2023 Assignment 2 (3 questions, 7 marks)
Question 1: [2 marks for CS5296 only]
(a) [1 mark] Given n jobs with each is presented by a tuple (si, pi, xi), where si is the units of time slots allowed for a period of pi time units, and xi indicates whether the job runs in work conser𝑛𝑛vin𝑠𝑠g or not. Show that there is a feasible scheduling for the n jobs if and only if𝑝𝑝𝑖𝑖
when applying the scheduling algorithm of the SEDF (the simple earliest
deadline first).
(b) [1 mark] In the borrowed virtual time scheduling, assume that there are three
virtual machines VM1, VM2, and VM3 with weights 0.5, 0.125, and 0.25 respectively, they share a physical CPU resource. Draw the CPU resource allocation curves among the three VMs within the first 20 time slots.
General note for the rest of two questions based on the AWS EC2 platform, you can either setup AWS EMR cluster with Hadoop 2.8.5 or your own Hadoop 2.8.5 single-node environment on your desktop, and make sure your code with Hadoop 2.8.5 run correctly.
To help you to test your program, a set of test files is available, which can be found from Canvas under this assignment directory with file name testFiles.zip.
Question 2: [3 marks for CS4296, and 2 marks for CS5296] Write a MapReduce program to generate the bag-of-word (BoW) vectors for the TEN text files provided to you on Canvas. In practice, the BoW model is a useful tool for feature generation. Having transformed a text file into a “bag of words”, you can use various metrics to characterize the file such as the squared Euclidean distance metric in Question 3. For example, suppose the dictionary is {“John”, “likes”, “to”, “watch”, “movies”, “also”, “football”, “games”, “Mary”, “too”}, the BoW
� 𝑖𝑖 ≤ 1, 𝑖𝑖=1
Code Help, Add WeChat: cstutorcs
vector of a text file with content “John likes to watch movies. Mary likes movies too.” is [1, 2, 1, 1, 2, 0, 0, 0, 1, 1], and the BoW vector is [1, 1, 1, 1, 0, 1, 1, 1, 0, 0] of another text file with content “John also likes to watch football games.” You are asked to write a MapReduce program similar to the above mentioned WordCount example.
1) The first step is to identify the key words that exist in the specific word dictionary. The key words have been extracted from WIKI. You could find a defined string array in Appendix (do not change the sequence of the words). During this procedure, you should record the occurrence of each key word.
2) The second step is to generate a vector of 100 dimensions, which contains 100 elements as the components of the vector, where each element represents the number of occurrences of the corresponding word in the 100-most-common words. In case the number of occurrences a word is beyond the maximum integer in Java, you simply use the maximum integer in Java to represent that value. The output format shouldbe“[filename v1,v2,v3…,v100]”,e.g.,
file_1_name 1, 2, 3, 4, …, 100
file_2_name 5, 2, 9, 6, …, 30
3) Note that the word identification is case-insensitive, anything that is not a letter or a digit can be ignored, and is replaced by whitespace, e.g., “Man.&” will be identified as “man”, “Harry’s” will be identified as “harry” and “s”, respectively.
Question 3: [4 marks for CS4296, and 3 marks for CS5296] In this question, you are required to do some more complex operations with Hadoop based on Question 2. You need to find all words with a given prefix “ex” and count the number of these words in each file first. You then output the words in descending order of their appearance count. If the appearance frequencies of two or more words are identical, the words will be listed in the alphabetic order. Each file is displayed in a single line in the following format:
Filename_1 word_1, count_1, word_2, count_2, …, word_N, count_N Filename_2 word_1, count_1, word_2, count_2, …, word_M, count_M

Programming Help, Add QQ: 749389476
You need to calculate the total appearances of each word in all test files, and output all of them (not only top 10) in descending order of their appearance counts as the following format:
Total word_1, count_1, …, word_N, count_N
For example, there are 2 test files “text1” and “text2” respectively, and each
of the files has 3 words with prefix “ex”:
Explore experience Extra
Appear Counts
excellent expensive extra
Appear Counts
Your output should be:
Text1 experience, 50, explore, 40, extra, 30
Text2 expensive, 45, excellent, 30, extra, 20
Total experience, 50, extra, 50, expensive 45, explore, 40, excellent, 30
Write a Hadoop program to fulfill all specified requirements. Specifically, your tasks for this question are as follows.
1) Find words with the prefix “ex” and count their appearances, using the same the test files for Question 1.
2) Output your results in the specified format. That is, the result for each test file should take a single line, in the format “filename, word_1, counter_1, word_2, counter_2, …”
3) Generate the total result by aggregating the results of all test files.
4) Note that the word identification is case-insensitive, anything that is not
a letter or a digit can be ignored, and is replaced by whitespace, e.g., “Man.&” will be identified as “man”, “Harry’s” will be identified as “harry” and “s”, respectively.
Submission Instructions
You are required to upload a zip file with your Java source code, i.e., two java files for the two experimental questions, and a PDF file for Question 1(CS5296 only).

You can name them as question_1.pdf, bow.java, and dist.java. Your source code will be read and tested on Hadoop 2.8.5. Make sure your program can compile and execute correctly prior to your submission. Your assignment must be submitted on Canvas with the name “[CS4296/CS5296]-[Your Name]-[Student ID]- Assignment2.zip” (e.g., CS5296-HarryPotter-12345678-Assignment2.zip) before the deadline: 23:59 March 19 , 2023.
Top 100 common words in English: (You can copy this code into your submission)
private final static String[] top100Word = { “the”, “be”, “to”, “of”, “and”, “a”, “in”, “that”, “have”, “i”, “it”, “for”, “not”, “on”, “with”, “he”, “as”, “you”, “do”, “at”, “this”, “but”, “his”, “by”, “from”, “they”, “we”, “say”, “her”, “she”, “or”, “an”, “will”, “my”, “one”, “all”, “would”, “there”, “their”, “what”, “so”, “up”, “out”, “if”, “about”, “who”, “get”, “which”, “go”, “me”, “when”, “make”, “can”, “like”, “time”, “no”, “just”, “him”, “know”, “take”, “people”, “into”, “year”, “your”, “good”, “some”, “could”, “them”, “see”, “other”, “than”, “then”, “now”, “look”, “only”, “come”, “its”, “over”, “think”, “also”, “back”, “after”, “use”, “two”, “how”, “our”, “work”, “first”, “well”, “way”, “even”, “new”, “want”, “because”, “any”, “these”, “give”, “day”, “most”, “us” };
浙大学霸代写 加微信 cstutorcs