Coursework: High-performance computing (resit)
Module: Introduction to HPC (PHYS 52015) Term: Summer 2024
Lecturer: Dr. Christopher Marcotte
Submission Please submit two PDF files (part1.pdf & part2.pdf) and two code files (part1.c & part2.c) in a zip archive.
Deadlines Consult the MISCADA learning and teaching handbook for submission dead- lines.
Plagiarism and collusion Students suspected of plagiarism, either of published work or work from unpublished sources, including the work of other students, or of collusion will be dealt with according to Computer Science and University guidelines.
Description
Throughout the course we have considered simple programming problems largely distinct from the typical day-to-day practice of scientific computing. In this assignment you will experience a sliver of that practice by inheriting a codebase it is your responsibility to make faster while maintaining correctness.
Consider the logistic growth equation,
du =u(1−u), u(t=0)=u0,
which models the self-limiting growth of a biological population. Under some assumptions, you may represent the state of this system as a map through discrete time (e.g., an annual value) to yield a discrete map or difference equation, u ← ru(1 − u), where our assumptions of continuity no longer hold. This is an archetypal model of chaotic dynamics for some values of r, and known as the Logistic Map. In the code supplied to you, we step a related difference equation,
u ← ru(1 − u), (1) whereudenotesthelocalmeanofuinpatchsizel3 (forlodd)and0≤r≤4isafixed,
real-valued, scalar parameter. Around index (i, j, k) the local mean can be written as, ui,j,k = l−3 X ui′,j′,k′ ,
(i′,j′,k′)∈Nl(i,j,k)
such that the neighborhood of index position (i, j, k) is written
Nl(i,j,k)={(i′,j′,k′):∥(i,j,k)−(i′,j′,k′)∥∞ ≤(l−1)/2},
where ∥ . . . ∥∞ is the Chebyshev-distance or ∞-norm or sup-norm: ∥x∥∞ = max(|x1 |, |x2 |, . . .). The formula for Nl(i,j,k) may be interpreted as “all indices whose maximum axis-aligned distance is less than (l − 1)/2 from (i, j, k)”. 1 You may think of equation 1 as modelling,
1Note that this implies N∞(i, j, k) ≡ N∞(i′, j′, k′), i.e., if l → ∞ then the entire domain is included for all indices (i, j, k).
for example, the probability that someone is ill depending on their proximity to others who may be ill, set in a (bizarre) cube environment, and mapped over a discrete interval.
In the provided serial code you will see that u is computed by averaging a 3 × 3 × 3 chunk of the u array. Of course, at the boundaries of the domain there are missing elements in the local element u010 has fewer neighbors than u111, and element u000 has fewer neighbors than u001–which is accounted for by only computing the local patch averages over existing neighbor elements. Your implementation should respect this effect near the physical boundaries, and care should be taken that this is correct for part 2.
Provided materials
In addition to this document you are provided with five code files: serial.c, params.h, serial.sh, part1.sbatch, and part2.sbatch. The file serial.c is a serial implementation of the model which you should modify to complete the coursework, and use to verify the correctness of your parallel implementations. The second is a header file, params.h, which contains the parameters used in the program. The third file is a simple run script for the serial code. The fourth and fifth files are templates for the compilation and execution of your code on Hamilton; you must ensure that your code functions correctly with these files as provided. Further, example good and bad reports are provided – their content may not be specifically related to this task and any data presented therein is fictionalized. These are meant to serve as guidance and to set expectations for the writing.
Notes and Warnings:
If you submit new header files with your assignment2, it is your responsibility to ensure the submission compiles and runs correctly on Hamilton 8 with the build scripts provided – I will not debug; code which does not run will receive no credit .
• If, for whatever reason, u0 < 0 or u0 > 1 then your results will be incorrect; ensure your initial condition is within 0 ≤ u0 ≤ 1. Likewise, if 0 ≤ u0 ≤ 1 and u is ever outside this range then your implementation is incorrect.
• The serial code records the trace of several diagnostic variables, namely the extrema of u, the global mean and variance,
μ = N∞(i, j, k)−1 X ui,j,k, σ2 = N∞(i, j, k)−1 X |ui,j,k − μ|2. (i,j,k) (i,j,k)
Your code must produce the same outputs as the serial implementation for the same initial conditions.
• You may recognize the local mean computation as a ‘box blur’ or convolution operation. The implementation in the serial code is pedagogical and not optimized. You may be inspired to improve the implementation by refactoring some computations. In your performance investigations, any refactoring done for your parallel codes should also be applied to the serial code and described to ensure you are making a fair performance comparison.
• If you find yourself unable to complete a task, you should address why that is in the report. This may go some way to getting you partial credit, provided you explain your reasoning and demonstrate that you have considered how to do it in some detail.
• Reports which do not demonstrate meeting the learning outcomes or adequately address the content of the reports will receive a low mark; please see the good and bad example reports distributed with this coursework.
2Note: you should not need to submit new header files for this assignment! 2
CS Help, Email: tutorcs@163.com
In this assessment, you will compile and run a serial three-dimensional logistic map code, and compare its performance against a parallelized version that you will write, and explore this comparison in a report. The serial code is made of six functions, init, dudt, step, stat, write, and main.
Part 1: OpenMP
The expectations for your parallel implementation are to use OpenMP #pragmas to:
• Parallelise the function init. • Parallelise the function dudt. • Parallelise the function step. • Parallelise the function stat.
Your code should be in a single C file named part1.c. Your code must compile and run with the provided submission script part1.sbatch on Hamilton, and produce the same outputs as the serial code for equivalent initial conditions in a file named part1.dat.
Explain and justify your parallelization strategy, using arguments based in the topics cov- ered in the module. Demonstrate the equivalence of the OpenMP-parallelized program outputs visually – e.g., by comparing the final states, or the statistical outputs in serial.dat and part1.dat, or something more creative; explain any discrepancies. Investigate the strong scaling of your implementation, reporting scaling results using transferable met- rics in your report, and compare to theory-informed expectation. Your report should be no more than one (1) page (plus images), in a file named part1.pdf. Additional questions you may wish to consider when writing your report are listed below.
Questions you may wish to consider while writing: What options for parallelisation are available? Why are some more suitable than others? What difficulties arise in the parallelisation? Why have you not been asked to parallelise the write function? Where are the necessary synchronisation points? The solution statistics require the generation of a a few output numbers from a large array; what patterns are available for this computation? How did you avoid data races in your solution? Is your parallelisation approach the best option? What alternative approaches could be used?
Part 2: MPI
Your task is to parallelise the serial code using MPI, by distributing the original problem domain into distinct regions on each process. Particular care should be taken to ensure that your results compile, run, and all outputs are correct (for deterministic inputs). For this part, your code must run correctly with 4 MPI processes. You need not stick to the functions as presented in the serial code. Your implementation should:
• Correctly distribute the allocation of u across processes.
• Correctly exchange necessary information across processes.
• Correctly calculate the statistical outputs when u is distributed across processes.
• Correctly evaluate the mapping of u across processes, taking into account the physical boundaries.
Your code should be a single C file called part2.c. Your code should compile and run, producing correct outputs, with the provided submission script (using 4 MPI processes), and produce the same outputs as the serial code in a file named part2.dat.
Explain and justify your parallelization strategy, using arguments based in the topics cov- ered in the module. Demonstrate the equivalence of the MPI-parallelized program outputs visually – e.g., by comparing the final states, or the statistical outputs in serial.dat and part2.dat, or something more creative; explain any discrepancies. Investigate the weak scaling of your implementation. Report scaling results using transferable metrics in your report. Additional questions you may wish to consider in your report are listed below. Your report should be no more than one (1) page (plus images), in a file named part2.pdf.
Questions you may wish to consider while writing: What topologies for distribution are available with 4 MPI processes? Why might some be preferred over others? What difficulties arise in the parallelisation? The solution statistics require the generation of a few output numbers from a large distributed array — what patterns are available for this problem? What are some constraints on the possible domain sizes and number of MPI processes for your solution – what choices influence these constraints and how does it affect the weak scaling?
Each part of your submission will be considered holistically, e.g. your code and report for Part 1 will be considered in tandem so that discrepancies between them (e.g., describing a parallel strategy in your report that is substantially different from the one you employed in your code) will affect your marks. Your code will be run for correctness on Hamilton. If you develop your programs on your own machine, then you should test that it works on Hamilton with the provided submission scripts.
Submission
Description
Correct parallelization of the serial code using OpenMP, produc- ing correct outputs.
Description and justification of parallelisation scheme, presenta- tion of transferable strong scaling results, and sufficient analysis of the scaling results using concepts and theory from the course. Correct parallelization of the serial model using MPI, producing correct outputs.
Description and justification of parallelisation scheme, presenta- tion of transferable weak scaling results, and analysis of the scal- ing results using concepts and theory from the course.
Table 1: Marking rubric for the summative coursework. Please see the report marking criteria in the Appendix.
Submission format
Your submission should be a single zip file, uploaded to gradescope, containing part1.c, part2.c, part1.pdf, and part2.pdf.
Generic coursework remarks
Stick exactly to the submission format as specified. If you alter the format (submit an archive instead of plain files, use Word documents rather than PDFs, …), the marker may refuse to mark the whole submission. Markers will not ask for missing files. If you have to submit code, ensure that this code does compile and, unless specified otherwise, does not require any manual interaction. Notably, markers will not debug your code, change parameters, or assess lines that are commented out.
All of MISCADA’s deadlines are hard deadlines: In accordance with University procedures, submissions that are up to 5 working days late will be subject to a cap of the module pass mark. Later submissions will receive a mark of zero. If you require an extension, please submit an official extension request including medical evidence and/or acknowledgement by college. Do not contact the lecturers directly, as lecturers are not entitled to grant extensions. Details on extensions and valid reasons to grant extended deadlines can be found in the Learning and Teaching Handbook.
It is the responsibility of the student to ensure that there are sufficient backups of their work and that coursework is submitted with sufficient slack. Submit your coursework ahead of time. If in doubt, submit early versions. Technical difficulties (slow internet connection around submission deadline, lost computer hardware, accidentially deleted files, . . . ) will not be mitigated. Please see https://www.dur.ac.uk/learningandteaching.handbook/ 6/2/6/ for further information regarding illness and adverse circumstances affecting your academic performance.
If collusion or plagiarism are detected, both students who copy and students who help to copy can be penalised. Do not share any coursework with other students, do not assist other students, cite all used sources incl. figures, code snippets, equations, …Please see https://www.dur.ac.uk/learningandteaching.handbook/6/2/4 and https://www.dur. ac.uk/learningandteaching.handbook/6/2/4/1 for further information.
Coursework is to be treated as private and confidential. Do not publish the whole or parts of the coursework publicly. This includes both solutions and the plain coursework as handed out.
Generic report quality criteria
Where summative coursework is assessed through written work in the form of a report, the report will be assessed against some generic criteria.
The relevant grade bands (as percentages) are
50–59 Pass
60–69 Merit 70–79 Distinction
80–100 Outstanding
A fail-level report displays an unsatisfactory knowledge and understanding of the topic. The setup and evaluation of any experimental studies is incomplete. It contains many omissions or factual inaccuracies. Limited in scope and shows little or no evidence of critical thinking and application of the course material to the problem. No recognition of limitations of the approach or evaluation. Experimental data are generally presented incorrectly, or without clarity.
程序代写 CS代考 加微信: cstutorcs
A pass-level report displays some knowledge and understanding of the topic. The setup and evaluation of any experimental studies is competent. May contain some omissions or factual inaccuracies. Evidence of critical thinking and application of the course material to the problem occurs in some places. Has some recognition of limitations of the approach or evaluation. Most experimental data are presented correctly and clearly.
A merit-level report displays good knowledge and understanding of the topic as presented in the course material. The setup and evaluation of any experimental studies is careful and detailed. Broadly complete in scope, with few or no errors. Evidence of critical thinking and application of the course material to the problem is mostly clear throughout. Recognises limitations of the approach or evaluation, and has some discussion on how to overcome them. Experimental data are presented correctly and clearly.
A distinction-level report displays effectively complete knowledge and understanding of the topic. The setup and evaluation of any experimental studies is well-motivated and near- flawless. Effectively no errors. Evidence of critical thinking and application of the course material to the problem is clear throughout, and some of the discussion goes beyond the taught material. Recognises limitations of the approach or evaluation, and presents plausible approaches to overcome them. Experimental data are presented carefully and with attention to detail throughout.
An outstanding-level report is similar to a distinction-level report but is effectively flawless throughout, and shows a significant independent intellectual contribution going beyond the taught material.
浙大学霸代写 加微信 cstutorcs