Difference between revisions of "Keeneland"
(→Traces) |
|||
(22 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
= Guide for using TAU on Keeneland = | = Guide for using TAU on Keeneland = | ||
+ | |||
+ | == Slide about TAU == | ||
+ | |||
+ | [http://nic.uoregon.edu/~scottb/tau-overview.pdf TAU overview slides] | ||
+ | |||
== Setting up environment == | == Setting up environment == | ||
− | + | setup your environment this way: | |
+ | |||
+ | module load tau | ||
+ | export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-pdt | ||
+ | |||
+ | == Compiling SHOC 1.0.1 with TAU == | ||
+ | |||
+ | After configuring SHOC edit the '''config/common.mk''' to: | ||
− | |||
− | |||
− | == | + | # === Basics === |
+ | <b>CC = tau_cc.sh</b> | ||
+ | <b>CXX = tau_cxx.sh</b> | ||
+ | <b>LD = tau_cxx.sh</b> | ||
+ | AR = /usr/bin/ar | ||
+ | RANLIB = ranlib | ||
+ | |||
+ | CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config | ||
+ | CFLAGS += -m64 -g -O2 | ||
+ | CXXFLAGS += -m64 -g -O2 | ||
+ | ARFLAGS = rcv | ||
+ | LDFLAGS = | ||
+ | LIBS = -L$(SHOC_ROOT)/lib -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart | ||
+ | |||
+ | USE_MPI = no | ||
+ | |||
+ | OCL_CPPFLAGS += -I${SHOC_ROOT}/src/opencl/common | ||
+ | OCL_LIBS = | ||
+ | |||
+ | NVCC = /sw/keeneland/cuda/3.2/bin/nvcc | ||
+ | CUDA_CXX = tau_cxx.sh | ||
+ | CUDA_INC = -I/sw/keeneland/cuda/3.2/include | ||
+ | CUDA_CPPFLAGS += -gencode=arch=compute_10,code=sm_10 \ | ||
+ | -gencode=arch=compute_11,code=sm_11 -gencode=arch=compute_13,code=sm_13 \ | ||
+ | -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 \ | ||
+ | -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS) | ||
+ | |||
− | + | Then make/install as you normally would. | |
− | + | More info at: [http://www.cs.uoregon.edu/research/tau/docs/newguide/bk01ch01s02.html TAU's userguide] | |
− | == | + | == Building SHOC with VampirTrace == |
− | + | In this case edit the '''config/common.mk''' to read: | |
+ | # === Basics === | ||
+ | <b>CC = vtcc --vt:cc mpicc</b> | ||
+ | <b>CXX = vtcxx --vt:cxx mpicxx</b> | ||
+ | <b>LD = vtcxx --vt:cxx mpicxx</b> | ||
+ | AR = /usr/bin/ar | ||
+ | RANLIB = ranlib | ||
+ | |||
+ | CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config | ||
+ | CFLAGS += -m64 -g -O2 | ||
+ | CXXFLAGS += -m64 -g -O2 | ||
+ | ARFLAGS = rcv | ||
+ | LDFLAGS = | ||
+ | LIBS = -L$(SHOC_ROOT)/lib -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart | ||
+ | |||
+ | USE_MPI = no | ||
+ | |||
+ | OCL_CPPFLAGS += -I${SHOC_ROOT}/src/opencl/common | ||
+ | OCL_LIBS = | ||
+ | |||
+ | NVCC = vtnvcc | ||
+ | CUDA_CXX = vtnvcc | ||
+ | CUDA_INC = -I/sw/keeneland/cuda/3.2/include | ||
+ | CUDA_CPPFLAGS += -gencode=arch=compute_10,code=sm_10 \ | ||
+ | -gencode=arch=compute_11,code=sm_11 -gencode=arch=compute_13,code=sm_13 \ | ||
+ | -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 \ | ||
+ | -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS) | ||
== Running CUDA applications == | == Running CUDA applications == | ||
Line 24: | Line 86: | ||
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the '''tau_exec''' script to run the CUDA application: | Both CUDA and OpenCL are instrumented dynamically through library preloading, use the '''tau_exec''' script to run the CUDA application: | ||
− | %> tau_exec -T serial - | + | %> tau_exec -T serial,cupti -cupti ./Stencil2D |
+ | |||
+ | The '''-T serial''' specifies with TAU configuration to use, you can change this for MPI applications and run: | ||
+ | |||
+ | %> mpirun -np 4 tau_exec -T mpi,cupti -cupti ./SGEMM | ||
+ | |||
+ | This could be done with executables build with or without TAU. | ||
+ | |||
+ | === Traces === | ||
+ | |||
+ | Traces can be recorded by first setting: | ||
− | + | %> export TAU_TRACE=1 | |
+ | %> tau_exec -T serial,cupti -cupti ./Stencil2D | ||
+ | %> tau_multimerge | ||
+ | %> tau2slog2 tau.trc tau.edf -o stencil2d.slog2 | ||
+ | %> jumpshot stencil2d.slog2 | ||
− | === | + | == Running OpenCL applications == |
− | |||
− | |||
− | + | Use '''tau_exec''' as well: | |
− | + | %> tau_exec -T serial -opencl ./SGEMM | |
− | + | <!-- | |
+ | == CUpti == | ||
− | + | Using a configuration of TAU compiled with CUpti you can get performance metrics recorded from the GPU. | |
− | + | First use '''tau_cupti_avail''' to see the available counters. | |
+ | Then choose a set of counters to record: | ||
+ | export TAU_METRICS= | ||
− | + | Finally use the '''cupti''' option to <b>tau_exec</b> when running an application: | |
− | + | tau_exec -T serial,cupti -cuda ./S3D | |
+ | --> | ||
− | + | == Performance Data == | |
− | + | Some example performance data from S3D: | |
− | + | [[Image:S3D-cuda.ppk]] and [[Image:S3D-cuda.slog2]] |
Latest revision as of 17:59, 25 August 2012
Contents
Guide for using TAU on Keeneland
Slide about TAU
Setting up environment
setup your environment this way:
module load tau export TAU_MAKEFILE=$TAUROOT/lib/Makefile.tau-cupti-pdt
Compiling SHOC 1.0.1 with TAU
After configuring SHOC edit the config/common.mk to:
# === Basics === CC = tau_cc.sh CXX = tau_cxx.sh LD = tau_cxx.sh AR = /usr/bin/ar RANLIB = ranlib CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config CFLAGS += -m64 -g -O2 CXXFLAGS += -m64 -g -O2 ARFLAGS = rcv LDFLAGS = LIBS = -L$(SHOC_ROOT)/lib -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart USE_MPI = no OCL_CPPFLAGS += -I${SHOC_ROOT}/src/opencl/common OCL_LIBS = NVCC = /sw/keeneland/cuda/3.2/bin/nvcc CUDA_CXX = tau_cxx.sh CUDA_INC = -I/sw/keeneland/cuda/3.2/include CUDA_CPPFLAGS += -gencode=arch=compute_10,code=sm_10 \ -gencode=arch=compute_11,code=sm_11 -gencode=arch=compute_13,code=sm_13 \ -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 \ -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
Then make/install as you normally would.
More info at: TAU's userguide
Building SHOC with VampirTrace
In this case edit the config/common.mk to read:
# === Basics === CC = vtcc --vt:cc mpicc CXX = vtcxx --vt:cxx mpicxx LD = vtcxx --vt:cxx mpicxx AR = /usr/bin/ar RANLIB = ranlib CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config CFLAGS += -m64 -g -O2 CXXFLAGS += -m64 -g -O2 ARFLAGS = rcv LDFLAGS = LIBS = -L$(SHOC_ROOT)/lib -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart USE_MPI = no OCL_CPPFLAGS += -I${SHOC_ROOT}/src/opencl/common OCL_LIBS = NVCC = vtnvcc CUDA_CXX = vtnvcc CUDA_INC = -I/sw/keeneland/cuda/3.2/include CUDA_CPPFLAGS += -gencode=arch=compute_10,code=sm_10 \ -gencode=arch=compute_11,code=sm_11 -gencode=arch=compute_13,code=sm_13 \ -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 \ -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
Running CUDA applications
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the tau_exec script to run the CUDA application:
%> tau_exec -T serial,cupti -cupti ./Stencil2D
The -T serial specifies with TAU configuration to use, you can change this for MPI applications and run:
%> mpirun -np 4 tau_exec -T mpi,cupti -cupti ./SGEMM
This could be done with executables build with or without TAU.
Traces
Traces can be recorded by first setting:
%> export TAU_TRACE=1 %> tau_exec -T serial,cupti -cupti ./Stencil2D %> tau_multimerge %> tau2slog2 tau.trc tau.edf -o stencil2d.slog2 %> jumpshot stencil2d.slog2
Running OpenCL applications
Use tau_exec as well:
%> tau_exec -T serial -opencl ./SGEMM
Performance Data
Some example performance data from S3D: