are ready for use and the rest of machines will be ready by September 17. You are responsible to learn and practice all the remaining steps (Step 1 to the end) within your own home directory. You are also given the sudo permission to run docker commands. Please do not hesitate to send your questions to the TA (Bofan Li and/or the instructor, preferably on or before Oct 13, 2024 for a timely response. Note that we are not responsible for any delay/failures of your project caused by the lack of responses to your questions after Oct 13. Please do send reminders if you do not get a response from us after 2 business days (48 hours).
4. Practice the creation and execution of MPI programs.
There are many MPI implementations you can use for this project. A few are listed below.
Open MPI: https://www.open-mpi.org/faq/
MPICH: https://www.mpich.org/documentation/guides/ and https://anl.app.box.com/v/2019-06-21- basic-mpi (from Slide 27 – What is MPICH).
A lecture in class will be provided on how to install, compile and run a few sample MPI programs.
5. Description of Provided Files
Two small test cases are provided for you, along with the output files for them. But your programs shall work for test cases up to 10GB, and from 1 to 8 parallel processes.
o test1.txt: an input file of multiple sentences in one paragraph, about 1 KB in size. o test2.txt: an input file of multiple paragraphs, about 10 KB in size.
o test1_output.txt:theoutputfortest1.txt.
o test2_output.txt:theoutputfortest2.txt.
6. Project Description
Write a parallel program that will take an input file and leverage 1 to 8 processes to count the number of times each word and character appeared in the file. The results from all processes should be combined and reported once by one of the processes.
o Youshallcountallprintablecharactersexcept¡°DEL¡±,i.e.,theoneswithASCIIcode32-126. Other space or special characters are not to be counted.
o Amongalltheprintablecharacters,awordisdefinedasaconsecutivesequenceof62 alphanumeric characters (‘a’..’z’, ¡®0¡¯..¡¯9¡¯, or ‘A’..’Z’).
o Wordsarecaseinsensitive(“AA”,”Aa”,”aA”,and”aa”arethesame).However,charactersare
case sensitive (¡®A¡¯ and ¡®a¡¯ are different).
o Notethatwordsmaycontainoneormorealphanumericcharacters. o Wordsareseparatedbyanynon-alphanumericcharacters.
Detailed output specifications:
o Yourparallelprogramshouldreportthenumberoftimeseachcharacterorwordappears.
o Theparallelprogramshouldthenoutputthetenmostusedcharactersandwords,alongwiththe
number of times they are used.
o Sincewordsarecaseinsensitive,theparallelprogramshouldonlyoutputthewordsinlowercase. o Thecharactersandwordsshouldbeprintedindescendingorderbasedonthenumberoftimes
they are used.
o Breakingties(forthe”TopTens”):
A. When two characters occur the same number of times, the character with the smaller ASCII value should be considered as being used more frequently.
wordsandcharacter
Code Help, Add WeChat: cstutorcs
B. When two words occur the same number of times, the word that occuros earlier in the input file should be considered as being used more frequently. To keep track of their order of occurrence, you may assign each word with a sequential ID (starting from 1). Redundant words each have different IDs. Please note the complexity on how to compare the IDs globally when each process makes their assignment of local sequential IDs.
o Theoutputofyourparallelprogrammustbeformattedexactly(orascloselyaspossible)tothe provided output file. One of the ways to check your program output would be to run a “diff” command between the output from your executable and those from the provided files.
7. Implementation requirements
o Youareopentouseanylanguageandanystandardlibrariesfromthelanguageofyourchoice.
o YouneedtoprovideaMakefilesothatweknowhowtocompile,executeandcollecttheresults
with your program on the provided cluster: ns01 – ns10.
o You are recommended to verify your analysis of your program elements by testing larger input files and also by measuring the actual run time speed of those test runs. If your program is written in C, you can do this in a program easily by using the ctime library and capturing the returns from the clock() function before and after an algorithm, then subtract the two clock times to see the difference. Convert the time to seconds by dividing by the constant CLOCKS_PER_SEC. You may use other timing tools, such as the time command.
8. Performance Analysis requirements
A. You are encouraged to choose the most efficient algorithms to create efficient programs in terms of the complexity growth rate with respect to the growing input file size and the growing number of processes.
B. In a file called analysis.txt, write up your analysis of the complexity analysis of the important algorithms and procedures in your programs.
C. Your analyses need to include complexity analysis of the following execution cases:
1) Case 1: Run your program with 1, 2, 4, 6, or 8 native processes with an input testfile of 1MB, 10MB, 100MB, or 1000MB. A total of 20 execution cases. Each needs to run at least 3 times to obtain consistent performance numbers.
2) Case 2: Run your program with 1, 2, 4, 6, or 8 docker containers with an input testfile of 1MB, 10MB, 100MB, or 1000MB. A total of 20 execution cases. Each needs to run at least 3 times to obtain consistent performance numbers. It is likely that the system may not support 6 or more containers. If so, you need to document the behaviors you have observed.
3) Report the collected average execution times as 1 or more tables or graphs as you see necessary. Then describe your observations on the performance and scalability trends on Case 1, Case 2 and the comparison between Case 1 and Case 2. You may use word, PDF or text files for the analysis report.
9. Programs and files you need to submit.
You need to submit a tarball file (fsuid_proj1.tgz) with at least three files for the completion of this
assignment:
1) Makefile
2) analysis.{txt,docx,pdf} depending on your choice of document. 3) parcount.{c,cpp,java,py} depending on the language you used.
程序代写 CS代考 加微信: cstutorcs