Removed file links from document.

This commit is contained in:
F. Dijkstra 2017-06-27 14:36:45 +02:00
parent 9e3dbd99b7
commit b95b8958d6
2 changed files with 9 additions and 9 deletions

View File

@ -207,7 +207,7 @@
"source": [
"### Benchmarking\n",
"\n",
"Before you start modifying code and adding OpenACC directives, you should benchmark the serial version of the program. To facilitate benchmarking after this and every other step in our parallel porting effort, we have built a timing routine around the main structure of our program -- a process we recommend you follow in your own efforts. Let's run the [`task1.c`](/4vwkFv7K/edit/C/task1/task1.c) file without making any changes -- using the *-fast* set of compiler options on the serial version of the Jacobi Iteration program -- and see how fast the serial program executes. This will establish a baseline for future comparisons. Execute the following two commands to compile and run the program."
"Before you start modifying code and adding OpenACC directives, you should benchmark the serial version of the program. To facilitate benchmarking after this and every other step in our parallel porting effort, we have built a timing routine around the main structure of our program -- a process we recommend you follow in your own efforts. Let's run the `task1.c` file without making any changes -- using the *-fast* set of compiler options on the serial version of the Jacobi Iteration program -- and see how fast the serial program executes. This will establish a baseline for future comparisons. Execute the following two commands to compile and run the program."
]
},
{
@ -298,7 +298,7 @@
"source": [
"### Profiling\n",
"\n",
"Back to our lab. Your objective in the step after this one (Step 2) will be to modify [`task2.c`](/4vwkFv7K/edit/C/task2/task2.c) in a way that moves the most computationally intensive, independent loops to the accelerator. With a simple code, you can identify which loops are candidates for acceleration with a little bit of code inspection. On more complex codes, a great way to find these computationally intense areas is to use a profiler (such as PGI's pgprof, NVIDIA's nvprof or open-source *gprof*) to determine which functions are consuming the largest amounts of compute time. To profile a C program on your own workstation, you'd type the lines below on the command line, but in this workshop, you just need to execute the following command, and then click on the link below it to see the pgprof interface"
"Back to our lab. Your objective in the step after this one (Step 2) will be to modify `task2.c` in a way that moves the most computationally intensive, independent loops to the accelerator. With a simple code, you can identify which loops are candidates for acceleration with a little bit of code inspection. On more complex codes, a great way to find these computationally intense areas is to use a profiler (such as PGI's pgprof, NVIDIA's nvprof or open-source *gprof*) to determine which functions are consuming the largest amounts of compute time. To profile a C program on your own workstation, you'd type the lines below on the command line, but in this workshop, you just need to execute the following command, and then click on the link below it to see the pgprof interface"
]
},
{
@ -424,7 +424,7 @@
"source": [
"One, two or several loops may be inside the structured block, the kernels directive will try to parallelize it, telling you what it found and generating as many kernels as it thinks it safely can. At some point, you will encounter the OpenACC *parallel* directive, which provides another method for defining compute regions in OpenACC. For now, let's drop in a simple OpenACC `kernels` directive in front of and embracing *both* the two for-loop codeblocks that follow the while loop using curly braces. The kernels directive is designed to find the parallel acceleration opportunities implicit in the for-loops in the Jacobi Iteration code. \n",
"\n",
"To get some hints about how and where to place your kernels directives, click on the links below. When you feel you are done, **make sure to save the [`task2.c`](/4vwkFv7K/edit/C/task2/task2.c) file you've modified with File -> Save, and continue on.** If you get completely stuck, you can look at [task2_solution.c](/4vwkFv7K/edit/C/task2/task2_solution.c) to see the answer."
"To get some hints about how and where to place your kernels directives, click on the links below. When you feel you are done, **make sure to save the `task2.c` file you've modified with File -> Save, and continue on.** If you get completely stuck, you can look at `task2_solution.c` to see the answer."
]
},
{
@ -439,7 +439,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now compile our [`task2.c`](/4vwkFv7K/edit/C/task2/task2.c) file by executing the command below with Ctrl-Enter (or press the play button in the toolbar above). Note that we've now added a new compiler option `-ta` to specify the type of accelerator to use. We've set it to `tesla` as we're using NVIDIA GPUs in this lab."
"Let's now compile our `task2.c` file by executing the command below. Note that we've now added a new compiler option `-ta` to specify the type of accelerator to use. We've set it to `tesla` as we're using NVIDIA GPUs in this lab."
]
},
{
@ -581,7 +581,7 @@
"\n",
"For detailed information on the `data` directive clauses, you can refer to the [OpenACC 2.5](http://www.openacc.org/sites/default/files/OpenACC_2pt5.pdf) specification.\n",
"\n",
"In the [`task3.c`](/4vwkFv7K/edit/C/task3/task3.c) file, see if you can add in a `data` directive to minimize data transfers in the Jacobi Iteration. There's a place for the `create` clause in this exercise too. As usual, there are some hints provided, and you can look at [`task3_solution.c`](/4vwkFv7K/edit/C/task3/task3_solution.c) to see the answer if you get stuck or want to check your work. **Don't forget to save with File -> Save in the editor below before moving on.**"
"In the `task3.c` file, see if you can add in a `data` directive to minimize data transfers in the Jacobi Iteration. There's a place for the `create` clause in this exercise too. As usual, there are some hints provided, and you can look at `task3_solution.c` to see the answer if you get stuck or want to check your work. **Don't forget to save with File -> Save in the editor below before moving on.**"
]
},
{
@ -597,7 +597,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you think you have [`task3.c`](/4vwkFv7K/edit/C/task3/task3.c) saved with a directive to manage data transfer, compile it with the below command and note the changes in the compiler output in the areas discussing data movement (lines starting with `Generating ...`). Then modify Anew using the `create` clause, if you haven't yet, and compile again."
"Once you think you have `task3.c` saved with a directive to manage data transfer, compile it with the below command and note the changes in the compiler output in the areas discussing data movement (lines starting with `Generating ...`). Then modify Anew using the `create` clause, if you haven't yet, and compile again."
]
},
{
@ -751,7 +751,7 @@
"| | | | | 16 | 32 | 0.410 |\n",
"| | | | | 4 | 64 | 0.379 |\n",
"\n",
"Try to modify the [`task4.c`](/4vwkFv7K/edit/C/task4/task4.c) code for the main computational loop nests in the window below. You'll be using the openacc loop constructs `gang()` and `vector()`. Look at task4_solution.c if you get stuck:\n"
"Try to modify the `task4.c` code for the main computational loop nests in the window below. You'll be using the openacc loop constructs `gang()` and `vector()`. Look at task4_solution.c if you get stuck:\n"
]
},
{
@ -796,7 +796,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Looking at [task4_solution.c](/4vwkFv7K/edit/C/task4/task4_solution.c), the gang(8) clause on the inner loop tells it to launch 8 blocks in the X(column) direction. The vector(32) clause on the inner loop tells the compiler to use blocks that are 32 threads (one warp) wide. The absence of clause on the outer loop lets the compiler decide how many rows of threads and how many blocks to use in the Y(row) direction. We can see what it says, again, with:"
"Looking at `task4_solution.c`, the gang(8) clause on the inner loop tells it to launch 8 blocks in the X(column) direction. The vector(32) clause on the inner loop tells the compiler to use blocks that are 32 threads (one warp) wide. The absence of clause on the outer loop lets the compiler decide how many rows of threads and how many blocks to use in the Y(row) direction. We can see what it says, again, with:"
]
},
{
@ -877,7 +877,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point, some of you may be wondering what kind of speed-up we get against the OpenMP version of this code. If you look at [task1_omp.c](/4vwkFv7K/edit/C/task4/task1_omp.c) in the text editor above, you can see a simple OpenMP version of the Jacobi Iteration code. Running this using 8-OpenMP threads on an Intel Xeon E5-2670 , our Kepler GK520 about 2X faster. If we scale the matrix up to an even larger 4096x4096, our Kepler GK520 GPU becomes significantly faster than the 8-OpenMP thread version. If you have some time remaining in this lab, feel free to compile & run the OpenMP and OpenACC versions below with the larger matrices.\n",
"At this point, some of you may be wondering what kind of speed-up we get against the OpenMP version of this code. If you look at `task1_omp.c` in the text editor above, you can see a simple OpenMP version of the Jacobi Iteration code. Running this using 8-OpenMP threads on an Intel Xeon E5-2670 , our Kepler GK520 about 2X faster. If we scale the matrix up to an even larger 4096x4096, our Kepler GK520 GPU becomes significantly faster than the 8-OpenMP thread version. If you have some time remaining in this lab, feel free to compile & run the OpenMP and OpenACC versions below with the larger matrices.\n",
"\n",
"First, compile the OpenMP version:"
]

Binary file not shown.