Difference between revisions of "Cruft"
(→Performance Data) |
|||
Line 73: | Line 73: | ||
EAM method: | EAM method: | ||
+ | |||
+ | First the serial version of Cruft shows two loops in eam.c consumes most of the time. | ||
[[Image:cruft-EAM-profile.png|750px]] | [[Image:cruft-EAM-profile.png|750px]] | ||
− | + | In comparison the OpenCL accelerated version two kernels dominate the runtime. | |
[[Image:cruftOCL-eam-profile.png|450px]] | [[Image:cruftOCL-eam-profile.png|450px]] | ||
+ | One thing you can check with OpenCL application is the time spent in command queue here the table for each kernel: | ||
+ | |||
+ | [[Image:cruftOCL-eam-queue.png|750px]] | ||
+ | |||
+ | Profile Data: | ||
+ | |||
+ | [[Image:cruft-EAM.ppk]], | ||
[[Image:cruftOCL-EAM.ppk]] | [[Image:cruftOCL-EAM.ppk]] | ||
LJ method: | LJ method: | ||
+ | |||
+ | First the serial version of Cruft shows a single loop accounts for runtime. | ||
[[Image:cruft-LJ-profile.png|750px]] | [[Image:cruft-LJ-profile.png|750px]] | ||
− | + | In comparison the OpenCL accelerated version the LJ_Force kernel dominate the runtime. | |
[[Image:cruftOCL-lj-profile.png|450px]] | [[Image:cruftOCL-lj-profile.png|450px]] | ||
+ | Ones again here is the time spent in the queue for this kernels. | ||
+ | |||
+ | [[Image:cruftOCL-lj-queue.png|750px]] | ||
+ | |||
+ | Profile Data: | ||
+ | |||
+ | [[Image:cruft-LJ.ppk]], | ||
[[Image:cruftOCL-LJ.ppk]] | [[Image:cruftOCL-LJ.ppk]] |
Revision as of 22:23, 8 March 2012
Background
Link | Code | Version | Machine | Date |
---|---|---|---|---|
LLNL website | git repo | Kyle Spafford fork | Keeneland | March 2012 |
Building Cruft
Modify the CmakeLists.txt and add these lines:
set (CMAKE_CXX_COMPILER tau_cxx.sh) set (CMAKE_C_COMPILER tau_cc.sh)
Then issue
cmake .
You can safety proceed when you encounter reversions.
Selective instrumentation of Loops:
BEGIN_INSTRUMENT_SECTION loops file="eam.c" routine="eamForce#" loops file="ljForce.c" routine="LJ#" END_INSTRUMENT_SECTION
For the OpenCL binary edit src-ocl/eam_kernels.c to move this section about the typedef CL_REAL_T real_t;
#if defined(cl_khr_fp64) // Khronos extension available? #pragma OPENCL EXTENSION cl_khr_fp64 : enable #elif defined(cl_amd_fp64) // AMD extension available? #pragma OPENCL EXTENSION cl_amd_fp64 : enable #endif
Then set:
export TAU_OPTIONS="-optVerbose -optTauSelectFile=`pwd`/select.tau" export TAU_MAKEFILE=<path to TAU>/x86_64/lib/Makefile.tau-icpc-pdt make
Running Cruft
./cruft -p ag -e -f data/8k.inp.gz
or
./cruft -f data/8k.inp.gz
And for OpenCL accelerated version:
tau_exec -T serial -opencl ./cruftOCL -p ag -e -f data/8k.inp.gz
tau_exec -T serial -opencl ./cruftOCL -f data/8k.inp.gz
Performance Data
EAM method:
First the serial version of Cruft shows two loops in eam.c consumes most of the time.
In comparison the OpenCL accelerated version two kernels dominate the runtime.
One thing you can check with OpenCL application is the time spent in command queue here the table for each kernel:
Profile Data:
File:Cruft-EAM.ppk, File:CruftOCL-EAM.ppk
LJ method:
First the serial version of Cruft shows a single loop accounts for runtime.
In comparison the OpenCL accelerated version the LJ_Force kernel dominate the runtime.
Ones again here is the time spent in the queue for this kernels.
Profile Data: