Difference between revisions of "Openacc"
(→Matrix Multiply) |
|||
(4 intermediate revisions by one other user not shown) | |||
Line 6: | Line 6: | ||
= Matrix Multiply = | = Matrix Multiply = | ||
− | TAU v 2. | + | TAU v 2.25.1 has support for the [http://www.openacc-standard.org/ OpenACC] directives available in [http://www.pgroup.com PGI] 12.3 and greater. TAU provides instrumentation at the PGI runtime library layer with detailed source information. This simple matrix multiply application written with OpenACC annotations was compiled with the PGI -ta=nvidia flag to generate the executable. To use TAU to profile this application, you may: |
Configure TAU: | Configure TAU: | ||
Line 14: | Line 14: | ||
export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-pgi | export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-pgi | ||
− | |||
Compile | Compile | ||
Line 22: | Line 21: | ||
Run: | Run: | ||
− | ./mm | + | tau_exec -T pgi -openacc ./mm |
Use TAU's analysis tools to view the performance data: | Use TAU's analysis tools to view the performance data: | ||
Line 30: | Line 29: | ||
[[Image:openacc_profile1.png|750px]] | [[Image:openacc_profile1.png|750px]] | ||
+ | |||
Here we see the time spent in the PGI runtime library routines. The download time for variable a in the source code dominates the execution. We can see the nature of each operation in parenthesis. | Here we see the time spent in the PGI runtime library routines. The download time for variable a in the source code dominates the execution. We can see the nature of each operation in parenthesis. | ||
+ | |||
[[Image:openacc_profile2.png|750px]] | [[Image:openacc_profile2.png|750px]] | ||
+ | |||
Next, this data is presented in ParaProf's thread statistics window. | Next, this data is presented in ParaProf's thread statistics window. | ||
+ | |||
[[Image:openacc_profile3.png|750px]] | [[Image:openacc_profile3.png|750px]] | ||
+ | |||
The driver code. | The driver code. | ||
+ | |||
[[Image:openacc_profile4.png|750px]] | [[Image:openacc_profile4.png|750px]] | ||
By clicking on a runtime layer routine, we can see the function in the application where the kernel was invoked along with the associated variable, source line number as well as the size of the array. By right clicking and choosing the 'Show Source Code' window, we can see the source line where this transfer takes place. For the downloadxx_multiply_matrices routine with the variable 'a', the time is attributed on the host at the source location shown below. It represents the transfer time and the time spent waiting on the host for results to be returned from the GPU. | By clicking on a runtime layer routine, we can see the function in the application where the kernel was invoked along with the associated variable, source line number as well as the size of the array. By right clicking and choosing the 'Show Source Code' window, we can see the source line where this transfer takes place. For the downloadxx_multiply_matrices routine with the variable 'a', the time is attributed on the host at the source location shown below. It represents the transfer time and the time spent waiting on the host for results to be returned from the GPU. | ||
+ | |||
[[Image:openacc_profile5.png|750px]] | [[Image:openacc_profile5.png|750px]] | ||
== OpenACC example source code == | == OpenACC example source code == | ||
+ | |||
Matrix Multiply using the OpenACC directives and the Makefile to run with TAU. | Matrix Multiply using the OpenACC directives and the Makefile to run with TAU. | ||
Latest revision as of 21:59, 2 May 2016
Matrix Multiply
TAU v 2.25.1 has support for the OpenACC directives available in PGI 12.3 and greater. TAU provides instrumentation at the PGI runtime library layer with detailed source information. This simple matrix multiply application written with OpenACC annotations was compiled with the PGI -ta=nvidia flag to generate the executable. To use TAU to profile this application, you may:
Configure TAU:
./configure -c++=pgCC -cc=pgcc -fortran=pgi make install
export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-pgi
Compile
make
Run:
tau_exec -T pgi -openacc ./mm
Use TAU's analysis tools to view the performance data:
pprof paraprof
Here we see the time spent in the PGI runtime library routines. The download time for variable a in the source code dominates the execution. We can see the nature of each operation in parenthesis.
Next, this data is presented in ParaProf's thread statistics window.
The driver code.
By clicking on a runtime layer routine, we can see the function in the application where the kernel was invoked along with the associated variable, source line number as well as the size of the array. By right clicking and choosing the 'Show Source Code' window, we can see the source line where this transfer takes place. For the downloadxx_multiply_matrices routine with the variable 'a', the time is attributed on the host at the source location shown below. It represents the transfer time and the time spent waiting on the host for results to be returned from the GPU.
OpenACC example source code
Matrix Multiply using the OpenACC directives and the Makefile to run with TAU.