Keeneland
Contents
Guide for using TAU on Keeneland
Slide about TAU
Setting up environment
setup your environment this way:
module load tau/2.21
Compiling SHOC 1.0.1 with TAU
After configuring SHOC edit the config/common.mk to:
# === Basics === CC = tau_cc.sh CXX = tau_cxx.sh LD = tau_cxx.sh AR = /usr/bin/ar RANLIB = ranlib CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config CFLAGS += -m64 -g -O2 CXXFLAGS += -m64 -g -O2 ARFLAGS = rcv LDFLAGS = LIBS = -L$(SHOC_ROOT)/lib -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart USE_MPI = no OCL_CPPFLAGS += -I${SHOC_ROOT}/src/opencl/common OCL_LIBS = NVCC = /sw/keeneland/cuda/3.2/bin/nvcc CUDA_CXX = tau_cxx.sh CUDA_INC = -I/sw/keeneland/cuda/3.2/include CUDA_CPPFLAGS += -gencode=arch=compute_10,code=sm_10 \ -gencode=arch=compute_11,code=sm_11 -gencode=arch=compute_13,code=sm_13 \ -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 \ -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
Then make/install as you normally would.
More info at: TAU's userguide
Building SHOC with VampirTrace
In this case edit the config/common.mk to read:
# === Basics === CC = vtcc --vt:cc mpicc CXX = vtcxx --vt:cxx mpicxx LD = vtcxx --vt:cxx mpicxx AR = /usr/bin/ar RANLIB = ranlib CPPFLAGS += -I$(SHOC_ROOT)/src/common -I${SHOC_ROOT}/config CFLAGS += -m64 -g -O2 CXXFLAGS += -m64 -g -O2 ARFLAGS = rcv LDFLAGS = LIBS = -L$(SHOC_ROOT)/lib -lrt -L/sw/keeneland/cuda/3.2RC/lib64/ -lcudart USE_MPI = no OCL_CPPFLAGS += -I${SHOC_ROOT}/src/opencl/common OCL_LIBS = NVCC = vtnvcc CUDA_CXX = vtnvcc CUDA_INC = -I/sw/keeneland/cuda/3.2/include CUDA_CPPFLAGS += -gencode=arch=compute_10,code=sm_10 \ -gencode=arch=compute_11,code=sm_11 -gencode=arch=compute_13,code=sm_13 \ -gencode=arch=compute_20,code=sm_20 -gencode=arch=compute_20,code=compute_20 \ -I${SHOC_ROOT}/src/cuda/include $(TAU_LIBS)
Running CUDA applications
Both CUDA and OpenCL are instrumented dynamically through library preloading, use the tau_exec script to run the CUDA application:
%> tau_exec -T serial -cuda ./Stencil2D
The -T serial specifies with TAU configuration to use, you can change this for MPI applications and run:
%> mpirun -np 4 tau_exec -T mpi -cuda ./SGEMM
This could be done with executable build with or without TAU.
Traces
Traces can be recorded by first setting:
%> export TAU_TRACE=1 %> tau_exec -T serial -cuda ./Stencil2D %> tau_multimerge %> tau2slog2 tau.trc tau.ed -o stencil2d.slog2 %> jumpshot
Trouble-shooting
- CPU side looks fine but no GPU profile/trace generated.
This is likely because there is no cudaThreadExit() call at the end the application. By placing one there this will signal TAU that the applications CUDA accelerated section is finished and it can go ahead and write out the profile/trace.
Fix: Place cudaThreadExit() at the end of the application.
- Receiving Error calculating kernel event [start|stop], error #: 33. during execution.
This means that CUDA could not retrieve the event object at synchronization. Try placing the synchronize event right after the kernel is launched. In some cases no configuration of kernel launches/synchronization points will suffice, and although this one kernel could not be tracked any other ones taking place in the application should be tracked correctly.
Fix: Try placing a synchronization called right after the kernel launch.
Running OpenCL applications
Use tau_exec as well:
%> tau_exec -T serial -opencl ./SGEMM
CUpti and PAPI
Coming soon...