COMP 328 MPI

University of Liverpool Continuous Assessment COMP 328

In this assignment, you are asked to implement a linear algebra operation that is at the heart
of many computations we do. You do not need to have previously studied linear algebra
since this document explains the operation in detail. You are encouraged to use your spare
time in your labs to work on this assignment. The teaching team would be happy to offer
clarifications in the labs/lectures/otherwise.

1 Two-dimensional Box stencils

This is a generalised two-dimensional version of the stencil operation we discussed in class.

InputImagem,n =

Filterk,k =

1 0 −11 0 −1

Outputm,n =

4 6 −9 −8 4
3 −3 −2 −3 4
2 −3 0 −2 7

Figure 1: An example of this stencil being applied

This operation takes as input two matrices (two-dimensional arrays). We call one of them
the ”input image” and the other the ”filter”. The input image can be assumed to be at least
the same size as the filter if not larger, in both dimensions. The filter can be assumed to
always be square, while the input image need not be a square matrix (i.e. its two dimensions
need not be the same). The operation produces as output, an ”output image” that is the
same size as the input. The input and output images are of dimensions m×n while the filter
is of dimensions k × k, and we can assume m >= k, n >= k.
The operation is applied point-wise to the input image. To start, we identify the centre
point of the filter. This is element (km, km), where km = ceiling(k/2), assuming (1, 1) is
the top-left element. To calculate element (i, j) of the output matrix, the filter is centred on
the point (i, j) of the input matrix, and the corresponding elements of the input matrix are
multiplied by the elements of the filter matrix. These products are then all summed up to
get the value of the output matrix at point (i, j). For the example in Figure 1, the filter is
centred on point (2, 2), which has the value 5. To calculate the value of the output element
(2, 2), we now multiply all corresponding elements of the input matrix and the filter matrix,
i.e. 7 ∗ 1 + 2 ∗ 0 + 3 ∗ (−1) + 4 ∗ 1 + 5 ∗ 0 + 3 ∗ (−1) + 3 ∗ 1 + 3 ∗ 0 + 2 ∗ (−1) = 6. Similarly,

2022-2023 1

University of Liverpool Continuous Assessment COMP 328

for element (2, 3) of the output matrix, the value is calculated as 2 ∗ 1 + 3 ∗ 0 + 3 ∗ (−1) +
5 ∗ 1 + 3 ∗ 0 + 8 ∗ (−1) + 3 ∗ 1 + 2 ∗ 0 + 8 ∗ (−1) = −9.
Note that we would never want to read outside the bounds of the input array, nor pad the
input array with zeros. Therefore this operation will leave a ”boundary region” where the
values of the output matrix are not yet defined. For example, while the input has rows from
1 to m, the filter would start to be applied at row 1 + blower and end at row m− bupper. We
will follow blower = floor(k − 1)/2 and bupper = ceiling(k − 1)/2. For the elements in this
boundary region at the beginning and end of each dimension, we copy the corresponding
elements from the input image to the output.

You are asked to implement this operation in a C function. This function should have the
following signature:

void s t e n c i l ( f loat ∗ input vec , int m, int n , f loat ∗ f i l t e r v e c , int k ,
f loat ∗ output vec , int b) {

f loat (∗ input ) [m] [ n ] = ( f loat ( ∗ ) [m] [ n ] ) input vec ;
f loat (∗ f i l t e r ) [ k ] = ( f loat ( ∗ ) [ k ] ) f i l t e r v e c ;
f loat (∗ output ) [m] [ n ] = ( f loat ( ∗ ) [m] [ n ] ) output vec ;
// Your code s t a r t s here

Here, ∗input vec, ∗ filter vec , and ∗output vec are pre-allocated one-dimensional arrays of
the appropriate size, being passed as pointers to the first element. The provided lines of code
change the 1-D arrays into the appropriate dimensionality. The input and output arrays
are changed to 3-D, so that b matrices of size m × n are processed in a single function call,
while the filter matrix is changed to 2-D as described above. In other words, your code is
expected to perform this operation in batches of size b. After the given three lines of code,
this function can treat input and output as 3-D arrays of size [b ][m][n], and kernel as a 2-D
array of size [k ][ k]. e.g. the following line is valid:

output [ 0 ] [ 0 ] [ 0 ] = input [ 0 ] [ 0 ] [ 0 ] ∗ ke rne l [ 0 ] [ 0 ] ;

1.1 Distributed implementation

This time, you are asked to implement this operation in a complete C program (i.e. including
your own main function). Your program should be able to run with the following command:

$ mpirun −np . / program . exe < i n p u t d a t a f i l e>

Here, is the file containing the values for the input array, < kernel data file>
is the file containing the values for the kernel array, and is the name of
the file your program is expected to write the output array to. For this operation,
all data files will contain two-dimensional data. Details of the data file format are given in
Section 1.2.

2022-2023 2

University of Liverpool Continuous Assessment COMP 328

1.2 Data file format

The first line of each file has some integers, each followed by a space. These integers define
the dimensionality of the data in the file. The second line of the data file is a space-separated
list of all the values in the array flattened out in a row-major one-dimensional format, each
with 7 digits after the decimal.

You are provided with the code for reading from the input and kernel data files and to write
to the output data file. These are located in main-serial.c and main-mpi.c.

Instructions

• Implement a multi-threaded stencil using OpenMP. Save it in a file called stencil.c.

• Modify the main-serial.c file so that it reads from the input and kernel data files, calls
your OpenMP stencil function, and writes to the output data file. Use the output to
make sure your implementation is correct. Ensure your code is saved as main-serial.c .

• Modify the main-mpi.c file so that it performs the same functionality as main-serial.c
but distributes the input matrices over multiple MPI processes. Ensure it is saved as
main-mpi.c.

• Write a Makefile that includes instructions to compile your programs. We will discuss
this in the labs.

• Try running your program for 1, 2, 4, 8, 16 and 32 threads, measuring the time taken
in each instance. Use this to plot a speedup plot with speedup on the y-axis and the
number of threads on the x-axis. Ensure you do this with the large data file.

• For the fastest running instance, run this with a varying number of ranks over MPI.
Draw a strong-scaling plot. Specify clearly how many threads were used.

• When testing your program on barkla to get your results for the speedup and strong-
scaling plot, ensure you test with the input file ‘input 64 512 960.dat‘ and the filter file
‘kernel 5.dat‘.

• Using up to one page, describe the serial version, your parallelisation strategy, and how
you conducted the measurements and calculations to draw the above plots. Also include
screenshots of running/compiling your programs, making sure that the screenshot shows
your username. The page limit does not include images/tables.

• Your submission should include:

1. stencil.c – the parallel implementation using OpenMP.

2022-2023 3

University of Liverpool Continuous Assessment COMP 328

2. main-serial.c – a main function that calls the function defined in stencil.c to perform
the operations described above

3. main-mpi.c – the complete implementation using OpenMP and MPI.

4. Makefile – a makefile that can compile 4 different programs when given the follow-
ing commands:

(a) make gccserial – compiles ‘main-serial.c‘ and ‘stencil.c‘ into
‘stencil-omp-gcc.exe‘ with the GNU compiler (gcc)

(b) make gcccomplete – compiles ‘main-mpi.c‘ and ‘stencil.c‘ into
‘stencil-complete-gcc.exe‘ with the GNU mpi compiler (mpicc)

(c) make iccserial – compiles ‘main-serial.c‘ and ‘stencil.c‘ into
‘stencil-omp-icc.exe‘ with the Intel compiler (icc)

(d) make icccomplete – compiles ‘main-mpi.c‘ and ‘stencil.c‘ into
‘stencil-complete-icc.exe‘ with the Intel mpi compiler (mpiicc)

5. A word file containing the plots, descriptions, and screenshots. This should be
called ”Report.docx” or ”Report.doc”.

6. Your slurm script called ”slurm-stencil.sh”.

• This should be uploaded in Codegrade, following the instructions present there.

• Failure to follow any of the above instructions is likely to lead to reduction in scores.

If you get any segmentation faults when running your program, use a tool called gdb to help
debug. Read its manual to understand how to use it.

Make sure to test your code with small as well as big matrices.

Runtimes quickly become meaningless for programs that run for less than 0.5 seconds. If
you measure any program runtimes below 0.5 seconds, you should consider trying again with
bigger input sizes.

You will need to use malloc to allocate memory for the arrays. An example of this is given
to you in main-serial.c and main-mpi.c

2022-2023 4

University of Liverpool Continuous Assessment COMP 328

Marking scheme

1 Code that compiles without errors or warnings 15%
2 Same numerical results for test cases 20%
3 Speedup plot 10%
4 Strong scaling plot 10%
5 Scaling efficiency up to 32 threads 15%
6 Scaling efficiency up to 32 ranks 10%
11 Clean code and comments 10%
12 Report 10%

Table 1: Marking scheme

The purpose of this assessment is to develop your skills in analysing numerical programs
and developing parallel programs using OpenMP and MPI. This assessment contributes 20%
to the overall mark in the module. The scores from the above marking scheme will be
scaled down to reflect that. Failure in this assessment may be compensated for by higher
marks in other components of the module. Your work will be submitted to automatic pla-
giarism/collusion detection systems, and those exceeding a threshold will be reported to
the Academic Integrity Officer for investigation regarding adhesion to the university’s policy
https://www.liverpool.ac.uk/media/livacuk/tqsd/code-of-practice-on-assessmen

t/appendix_L_cop_assess.pdf.

The deadline is 17:00 GMT Wednesday 10 May 2023.

Usual university rules, penalties and exceptions for lateness/sickness & exceptional circum-
stances apply

https://www.liverpool.ac.uk/aqsd/academic-codes-of-practice/code-of-practic

e-on-assessment/

2022-2023 5

https://www.liverpool.ac.uk/media/livacuk/tqsd/code-of-practice-on-assessment/appendix_L_cop_assess.pdf
https://www.liverpool.ac.uk/media/livacuk/tqsd/code-of-practice-on-assessment/appendix_L_cop_assess.pdf
https://www.liverpool.ac.uk/aqsd/academic-codes-of-practice/code-of-practice-on-assessment/
https://www.liverpool.ac.uk/aqsd/academic-codes-of-practice/code-of-practice-on-assessment/

Two-dimensional Box stencils
Distributed implementation
Data file format