Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Coursework: High-performance computing (resit)
Module: Introduction to HPC (PHYS 52015)
Term: Summer 2024
Description
Throughout the course we have considered simple programming problems largely distinct from the typical day-to-day practice of scientific computing. In this assignment you will experience a sliver of that practice by inheriting a codebase it is your responsibility to make faster while maintaining correctness.
Consider the logistic growth equation,
which models the self-limiting growth of a biological population. Under some assumptions, you may represent the state of this system as a map through discrete time (e.g., an annual value) to yield a discrete map or difference equation, u ← ru(1 − u), where our assumptions of continuity no longer hold. This is an archetypal model of chaotic dynamics for some values of r, and known as the Logistic Map. In the code supplied to you, we step a related difference equation,
where u denotes the local mean of u in patch size ℓ 3 (for ℓ odd) and 0 ≤ r ≤ 4 is a fixed, real-valued, scalar parameter. Around index (i, j, k) the local mean can be written as,
such that the neighborhood of index position (i, j, k) is written
where ∥ . . . ∥∞ is the Chebyshev-distance or ∞-norm or sup-norm: ∥x∥∞ = max(|x1|, |x2|, . . .). The formula for Nℓ(i, j, k) may be interpreted as “all indices whose maximum axis-aligned distance is less than (ℓ − 1)/2 from (i, j, k)”. You may think of equation 1 as modelling, for example, the probability that someone is ill depending on their proximity to others who may be ill, set in a (bizarre) cube environment, and mapped over a discrete interval.
In the provided serial code you will see that u is computed by averaging a 3 × 3 × 3 chunk of the u array. Of course, at the boundaries of the domain there are missing elements in the local element u010 has fewer neighbors than u111, and element u000 has fewer neighbors than u001–which is accounted for by only computing the local patch averages over existing neighbor elements. Your implementation should respect this effect near the physical boundaries, and care should be taken that this is correct for part 2.
Provided materials
In addition to this document you are provided with five code files: serial.c, params.h, serial.sh, part1.sbatch, and part2.sbatch. The file serial.c is a serial implementation of the model which you should modify to complete the coursework, and use to verify the correctness of your parallel implementations. The second is a header file, params.h, which contains the parameters used in the program. The third file is a simple run script for the serial code. The fourth and fifth files are templates for the compilation and execution of your code on Hamilton; you must ensure that your code functions correctly with these files as provided. Further, example good and bad reports are provided – their content may not be specifically related to this task and any data presented therein is fictionalized. These are meant to serve as guidance and to set expectations for the writing.
Notes and Warnings:
If you submit new header files with your assignment, it is your responsibility to ensure the submission compiles and runs correctly on Hamilton 8 with the build scripts provided – I will not debug; code which does not run will receive no credit.
• If, for whatever reason, u0 < 0 or u0 > 1 then your results will be incorrect; ensure your initial condition is within 0 ≤ u0 ≤ 1. Likewise, if 0 ≤ u0 ≤ 1 and u is ever outside this range then your implementation is incorrect.
• The serial code records the trace of several diagnostic variables, namely the extrema of u, the global mean and variance,
Your code must produce the same outputs as the serial implementation for the same initial conditions.
• You may recognize the local mean computation as a ‘box blur’ or convolution operation. The implementation in the serial code is pedagogical and not optimized. You may be inspired to improve the implementation by refactoring some computations. In your performance investigations, any refactoring done for your parallel codes should also be applied to the serial code and described to ensure you are making a fair performance comparison.
• If you find yourself unable to complete a task, you should address why that is in the report. This may go some way to getting you partial credit, provided you explain your reasoning and demonstrate that you have considered how to do it in some detail.
• Reports which do not demonstrate meeting the learning outcomes or adequately address the content of the reports will receive a low mark; please see the good and bad example reports distributed with this coursework.
In this assessment, you will compile and run a serial three-dimensional logistic map code, and compare its performance against a parallelized version that you will write, and explore this comparison in a report. The serial code is made of six functions, init, dudt, step, stat, write, and main.
Part 1: OpenMP
Code
The expectations for your parallel implementation are to use OpenMP #pragmas to:
• Parallelise the function init.
• Parallelise the function dudt.
• Parallelise the function step.
• Parallelise the function stat.
Your code should be in a single C file named part1.c. Your code must compile and run with the provided submission script part1.sbatch on Hamilton, and produce the same outputs as the serial code for equivalent initial conditions in a file named part1.dat.
Report
Explain and justify your parallelization strategy, using arguments based in the topics cov-ered in the module. Demonstrate the equivalence of the OpenMP-parallelized program outputs visually – e.g., by comparing the final states, or the statistical outputs in serial.dat and part1.dat, or something more creative; explain any discrepancies. Investigate the strong scaling of your implementation, reporting scaling results using transferable met-rics in your report, and compare to theory-informed expectation. Your report should be no more than one (1) page (plus images), in a file named part1.pdf. Additional questions you may wish to consider when writing your report are listed below.
Questions you may wish to consider while writing: What options for parallelisation are available? Why are some more suitable than others? What difficulties arise in the parallelisation? Why have you not been asked to parallelise the write function? Where are the necessary synchronisation points? The solution statistics require the generation of a a few output numbers from a large array; what patterns are available for this computation? How did you avoid data races in your solution? Is your parallelisation approach the best option?
What alternative approaches could be used?
Part 2: MPI
Code
Your task is to parallelise the serial code using MPI, by distributing the original problem domain into distinct regions on each process. Particular care should be taken to ensure that your results compile, run, and all outputs are correct (for deterministic inputs). For this part, your code must run correctly with 4 MPI processes. You need not stick to the functions as presented in the serial code. Your implementation should:
• Correctly distribute the allocation of u across processes.
• Correctly exchange necessary information across processes.
• Correctly calculate the statistical outputs when u is distributed across processes.
• Correctly evaluate the mapping of u across processes, taking into account the physical boundaries.
Your code should be a single C file called part2.c. Your code should compile and run, producing correct outputs, with the provided submission script (using 4 MPI processes), and produce the same outputs as the serial code in a file named part2.dat.
Report
Explain and justify your parallelization strategy, using arguments based in the topics cov-ered in the module. Demonstrate the equivalence of the MPI-parallelized program outputs visually – e.g., by comparing the final states, or the statistical outputs in serial.dat and part2.dat, or something more creative; explain any discrepancies. Investigate the weak scaling of your implementation. Report scaling results using transferable metrics in your report. Additional questions you may wish to consider in your report are listed below. Your report should be no more than one (1) page (plus images), in a file named part2.pdf.
Questions you may wish to consider while writing: What topologies for distribution are available with 4 MPI processes? Why might some be preferred over others? What difficulties arise in the parallelisation? The solution statistics require the generation of a few output numbers from a large distributed array — what patterns are available for this problem? What are some constraints on the possible domain sizes and number of MPI processes for your solution – what choices influence these constraints and how does it affect the weak scaling?
Marking
Each part of your submission will be considered holistically, e.g. your code and report for Part 1 will be considered in tandem so that discrepancies between them (e.g., describing a parallel strategy in your report that is substantially different from the one you employed in your code) will affect your marks. Your code will be run for correctness on Hamilton. If you develop your programs on your own machine, then you should test that it works on Hamilton with the provided submission scripts.
Submission Points Description
part1.c 25 Correct parallelization of the serial code using OpenMP, produc-ing correct outputs.
part1.pdf 25 Description and justification of parallelisation scheme, presenta-tion of transferable strong scaling results, and sufficient analysis of the scaling results using concepts and theory from the course.
part2.c 25 Correct parallelization of the serial model using MPI, producing correct outputs.
part2.pdf 25 Description and justification of parallelisation scheme, presenta-tion of transferable weak scaling results, and analysis of the scal-ing results using concepts and theory from the course.
Table 1: Marking rubric for the summative coursework. Please see the report marking criteria in the Appendix.
Submission format
Your submission should be a single zip file, uploaded to gradescope, containing part1.c, part2.c, part1.pdf, and part2.pdf.