cufft - How to view CUDA library function calls in profiler? -
i using cufft library. how modify code see function calls library (or other cuda library) in nvidia visual profiler nvvp? using windows , visual studio 2013.
below code. convert image , filter fourier domain, perform point-wise complex matrix multiplication in custom cuda kernel wrote, , perform inverse dft on filtered images spectrum. results accurate, not able figure out how view cufft functions in profiler.
// execute fft plans cufftexecr2c(fftplanfwd, (cufftreal *)d_in, (cufftcomplex *)d_img_spectrum); cufftexecr2c(fftplanfwd, (cufftreal *)d_filter, (cufftcomplex *)d_filter_spectrum); // perform complex pointwise muliplication on filter spectrum , image spectrum pointwise_complex_matrix_mult_kernel << <grid, block >> >(d_img_spectrum, d_filter_spectrum, d_filtered_spectrum, rows, cols); // execute fft^-1 plan cufftexecc2r(fftplaninv, (cufftcomplex *)d_filtered_spectrum, (cufftreal *)d_out);
at entry point library, library call other call c or c++ library: executing on host. within library call, there may calls cuda kernels or other cuda api functions, cuda gpu-enabled library such cufft.
the profilers (at least through cuda 7.0 - see note cuda 7.5 nvprof below) don't natively support profiling of host code. focused on kernel calls , cuda api calls. call library cufft not considered cuda api call.
you haven't shown complete profiler output, should see cufft library make cuda kernel calls; these show in profiler output. first 2 cufft calls prior pointwise_complex_matrix_mult_kernel
should have 1 or more kernel calls each show left of kernel, , last cufft call should have 1 or more kernel calls show right of kernel.
one possible way specific sections of host code show in profiler use nvtx (nvidia tools extension) library annotate source code, cause annotations show in profiler output. might want put nvtx range event around library call wish see identified in profiler output.
another approach try out new cpu profiling features in nvprof
in cuda 7.5. can refer section 3.4 of profiler guide ships cuda 7.5rc.
finally, ordinary host profilers should able profile cuda application, including cufft library calls, won't have visibility happening on gpu.
edit: based on discussion in comments below, code appears similar simplecufft sample code. when compile , profile code on win7 x64, vs 2013 community, , cuda 7, following output (zoomed in depict interesting part of timeline):
you can see there cufft kernels being called both before , after complex pointwise multiply , scale kernel appears in code. suggestion start doing similar simplecufft sample code rather own code, , see if can duplicate output above. if so, problem lies in code (perhaps cufft calls failing, perhaps need add proper error checking, etc.)
Comments
Post a Comment