# solution

Below you will find four sequential codes. Augment the code snippets with OpenMP pragmas if applicable. You have to make sure that no dependencies are broken and thus the algorithm still computes correct result. If the algorithm is parallelizable, briefly discuss your parallelization strategy. Otherwise, explain why it cannot be parallelized. Justify your claims.

a) (Dense Matrix Multiplication)
1 // computes the product of an M x L matrix A
2 // with an L x N matrix B
3 for (i = 0; i
4 for (j = 0; j
5 value_t sum = 0;
6 for (k = 0; k
7 sum += A[i*L+k] * B[k*N+j];
8 C[i*N+j] = sum;
9 }

(b) (Pseudo-Polynomial Knapsack Using Dynamic Programming)
1
2 #define AT(i,j) ((i)*(C+1)+(j))
3 #define MAX(x,y) ((x)
4
5 // w and v are __constant__ and __non-negative__
6 // arrays of length N, m is an array of length
7 // (C+1)*(N+1) initialized with zeros
8 for (i = 1; i
9 for (j = 0; j
10 if (w[i-1]
11 m[AT(i,j)] = MAX(m[AT(i-1,j)],
12 m[AT(i-1,j-w[i-1])]+v[i-1]);
13 else
14 m[AT(i,j)] = m[AT(i-1, j)];
(c) (Left Fold of a Binary Operation)
1 value_t result = 1;
2
3 // v is a __constant__ array of length N
4 for (i = 0; i
5 result = result + v[i] + result * v[i];
6
7 // bonus: can you solve this without a custom
8 // declare reduction directive?
(d) (Repetitive Smoothing of a Vector)
1 // v is a pre-initialized array of length N
2 // s is the smoothed version of v, preinitialized with v
3 // M is the number of iterations
4
5 for (i = 0; i
6 for (j = 2; j
7 s[j] = 0;
8 for (k = -2; k
9 s[j] += 0.2*v[j+k];
10 }
11 for (j = 0; j
12 v[j] = s[j];
13