Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Assignment 2 (70 Points)
Due Monday Sep 23 at 11:59 PM
In this assignment, you need to parallelize simple programs using C++11 threads. There are two problems in this assignment, and for each problem you are provided the serial C++ implementation, the expected parallelization strategy, and the expected output to be generated by your parallel solution.
Before starting this assignment, you should have completed the Slurm Tutorial (https://canvas.sfu.ca/courses/84236/pages/slurm-tutorial) which walks you through how to use our private cluster for your code development. [NOTE: The cluster is not set up to run Slurm yet, so please start the assignment on CSIL machines. We will send an announcement when the cluster is ready.]
General Instructions
1. You are provided with the serial version of all the programs at assignment2.tar.gz
(https://canvas.sfu.ca/courses/84236/files/24448336?wrap=1)
(https://canvas.sfu.ca/courses/84236/files/24448336/download?download_frd=1) . To run a program
(e.g., curve_area.cpp ), follow the steps below:
Run make curve_area . This creates a binary file called curve_area .
Create a slurm job to run the binary file using the following command: ./curve_area --nPoints
10000000 --coeffA 2.0 --coeffB 4.0 --rSeed 15
Use the command-line argument --nPoints to specify the number of points to be generated (detailed description about curve_area mentioned below).
2. All parallel programs should have the command-line argument --nThreads to specify the number of threads for the program. Example: --nThreads 4 .
3. While testing your solutions, make sure that cpus-per-task is correctly specified in your slurm config file based on your requirement.
4. You will be asked to print the time spent by different threads on specific code regions. The time spent by any code region can be computed as follows:
timer t1;
t1.start();
* ---- Code region whose time is to be measured --- */
double time_taken = t1.stop();
5. Sample outputs for all the programs can be found in sample_outputs directory. Programs will be evaluated and graded automatically. Please make sure that your program output strictly follows the sample output format.
6. We have provided test scripts for you to quickly test your solutions during your development process. You can test your code using the test script available at test_scripts/ . Note that these test scripts only validate the output formats, and a different evaluation script will be used for grading the assignments. Important: You should use slurm when performing these and other tests. The test scripts under test_scripts/ folder test for up to 8 threads; make sure --cpus-per-task=8 is set in your slurm job.
$ ls test_scripts/*tester.pyc
curve_area_tester.pyc heat_transfer_tester.pyc
1. Monte Carlo Estimation of Area Inside a Curve [25 Points]
The area inside an arbitrary curve can be computed using the mechanism explained in class, similar to the Monte Carlo Pi Estimation and Monte Carlo Ellipse Area Estimation. In this problem, you will compute the area inside a curve with the following equation:
Where a and b are positive numbers. For example, the curve represented by is shown
here:
Another example: The curve represented by is shown here:
The method can be summarized in the following steps:
1.Consider a curve that follows the above equation with coefficients a and b. For the purposes of this problem, both coefficients are >= 1. With these coefficient values, the curve will be completely enclosed inside a square with corner coordinates (-1,-1), (-1,1), (1,1), (1,-1).
2. The ratio of the curve area to the square area is determined by the relative count of points inside the curve to total points inside the square.
3. We randomly generate n points inside the square, where both the x-coordinate and y-coordinate are between -1 and 1. Let cpoints out of the n points fall inside the curve. A point is determined to be inside the curve if
4. The area inside the curve is then approximated as: Area / 4 = cpoints / n ==> Area = 4 * cpoints / n .
The program below implements the above algorithm.
uint curve_count = 0;
double x_coord, y_coord;
for (uint i = 0; i < n; i++) {
x_coord = (2.0 * get_random_coordinate(&random_seed)) - 1.0);
y_coord = (2.0 * get_random_coordinate(&random_seed)) - 1.0);
if ( (a * (x^2)) + (b * (y^4)) ) <= 1.0)
curve_count++;
}
double area = 4.0 * (double)curve_points / (double)n;
Your goal is to parallelize the above algorithm. Specifically, we are interested in parallelizing the for loop such that each thread generates (approximately) n/T points, where T is the number of threads. Below is the pseudo-code showing the logic of our parallel solution:
Create T threads
for each thread in parallel {
Get the local_curve_count for (approximately) n/T points
}
total_curve_points = Accumulate the local_curve_counts from all threads
area = 4.0 *(double)total_curve_points / (double)n;
The serial implementation is available in curve_area.cpp . You have to parallelize the given serial implementation using C++11 threads.
Your parallel solution must satisfy the following:
1. The file should be named curve_area_parallel.cpp .
2. Your program should accept the following parameters: nThreads: Number of threads. nPoints: Total number of points used for estimating the area. This number should be divided equally among threads (with the remainder r=nPoints % nThreads going to threads 0,...,r-1 ) coeffA: Value of coefficient a. coeffB: Value of coefficient b. rSeed: Seed of the random number generator that you use in the program.
3. Your parallel solution must output the following information:
Total number of threads used.
For each thread: the number of random points generated, the number of points within the curve, and the time taken to generate and process these points (your threads should be numbered between
[0, T)
).
The total number of points generated.
The total number of points within the curve.
The total time taken for the entire execution (the code region to be timed is highlighted using
comments in the serial code).
4. The sample output can be found in
sample_outputs/curve_area.output .
Please note that the output format should strictly match the expected format (including "spaces" and
"commas"). You can test your code using the test script as follows:
$ python <absolute_path>/curve_area_tester.pyc --execPath=<absolute path of curve_area_parallel>
--scriptPath=<absolute path>/curve_area_evaluator.pyc
2. Heat Transfer [45 Points]
A description of the heat transfer problem was discussed in class, and is also available from the LLNL
Parallel Computing Tutorial here (https://hpc.llnl.gov/training/tutorials/introduction-parallel-computing-tutorial#ExamplesHeat) . The following code shows the basic serial implementation for this problem:
// Initialize Temperature Array Prev[][]. Points in the middle are set to mTemp while rest is set to 0
for (uint stepcount=1; stepcount <= tSteps; stepcount ++) {
for (uint x = 0; x < gSize; x++) {
for (uint (y = 0; y < gSize; y++) {
// Compute new Temperature Array Curr[x][y] from Prev[][] values
} // for y
} // for x
// swap Prev[][], Curr[][]
} // for stepcount
// Print temperature of certain points in Temperature Array
Your goal is to parallelize the above algorithm such that each thread works on a vertical slice of the array. Below is the pseudo-code showing the logic of the parallel solution:
Create T threads
for each thread in parallel {
for (uint local_stepcount=1; local_stepcount <= tSteps; local_stepcount ++) {
Compute the Temperature Array values Curr[][] in the slice allocated to this thread from Pre
v[][]
Barrier(); // all threads need to finish current time step
if (this is thread 0) { swap Curr[][], Prev[][]; Barrier(); }
else Barrier(); // wait till thread 0 is done with the swap before moving to next time step
} // for local_stepcount
} // for each thread
// Print temperatures of points of interest
The serial implementation is available in heat_transfer.cpp . You have to parallelize the given serial
implementation using C++11 threads. For your parallel code, you can use the custom barrier implementation available in core/utils.h or you can implement your own
1. The file should be named heat_transfer_parallel.cpp . Your program should accept the following parameters: nThreads: Total number of threads used. gSize: Grid size. The size of the temperature array is gSize x gSize. mTemp: Temperature values in the middle of the array, from [gSize/3 , gSize/3] to [2*gSize/3 , 2*gSize/3].
iCX: Coefficient of horizontal heat transfer.
iCY: Coefficient of vertical heat transfer. tSteps: Time steps of the simulation
2. Your parallel solution must output the following information: Grid Size.
Total number of threads used.
Values of iCX, iCY, mTemp and tSteps
For each thread: thread id, start column, end column, time taken.
Temperatures at end of simulation for points at [0,0], [gSize/6, gSize/6], [gSize/3, gSize/3],
[gSize/2, gSize/2], [2*gSize/3, 2*gSize/3], [5*gSize/6, 5*gSize/6].
Temperatures at the right boundary of all threads: [endx[0], endx[0]], [endx[1], endx[1]], ...,
[[endx[nThreads-1],endx[nThreads-1]].
The total time taken for the entire execution (the code region to be timed is highlighted using
comments in the serial code).
3. The sample console output can be found in sample_outputs/heat_transfer.output .
Please note that the output format should strictly match the expected format (including "spaces" and "commas"). You can test your code using the test script as follows:
$ python <path to your directory>/test_scripts/heat_transfer_tester.pyc --execPath=<absolute pat h of heat_tranfer_parallel> --scriptPath=<absolute path of heat_transfer_evaluator.pyc>
Submission Guidelines
Make sure that your solutions folder has the following files and sub-folders. Let's say your solutions folder is called my_assignment2_solutions
. It should contain:
core/
-- The folder containing all core files. It is already available in the assignment package. Do
not modify it or remove any files.
Makefile
-- Makefile for the assignment. This file should not be changed.
curve_area_parallel.cpp
heat_transfer_parallel.cpp
To create the submission file, follow the steps below:
1. Enter in your solutions folder, and remove all the object/temporary files.
$ cd my_assignment2_solutions/
$ make clean
2. Create the tar.gz file.
$ tar cvzf assignment2.tar.gz *
which creates a compressed tar ball that contains the contents of the folder.
3. Validate the tar ball using the submission_validator.pyc script.
$ python <path to your directory>/test_scripts/submission_validator.pyc --tarPath=<absolute p
ath>/assignment2.tar.gz
Submit via canvas by the deadline.