Applications must be compiled with the -g option and link in an
extra object file to enable collection of performance data. If vprof was
installed into /usr/local, then the extra object file would be one
of the following:
/usr/local/lib/vmonauto_pmpi.o: This can be linked
with any MPI application and will cause profiling to begin just
after MPI_Init is called and to end just before MPI_Finalize is
called. This will only work if the PMPI interface is available.
With some MPI distributions, it may be necessary for -lpmpi to be
specified on the link line. If you are simultaneously obtaining
other profiles that require the PMPI interface, then this method
will not work.
/usr/local/lib/vmonauto_gcc.o: If vprof was
compiled with gcc, then this file should be available. It can be
linked with any application and will cause profiling to begin
before main enters and to end after main returns or exit is
called. Some MPI implementations require that all file I/O be
completed before MPI_Finalize is called, in which case,
vmonauto_gcc.o will not work. Also, vmonauto_gcc.o requires
that special GCC extensions work correctly, which might not be
true on all platforms.
/usr/local/lib/vmonauto.o: If
vmonauto_gcc.o is not found and this file is present,
and the same C++ compiler is used to compile the application as
vprof, then this file can be used. It will behave the same as
vmonauto_gcc.o. Some MPI implementations require that
all file I/O be completed before MPI_Finalize is called, in which
case, vmonauto.o will not work.
vmon_begin() must be
called and vmon_done() must be called to end profiling.
For all three of these methods, the libvmon.a library must be linked into the application. Also, the application should be linked statically. If routines in shared libraries are sampled, they will be outside the range of the profile buffer and no information about the location of the event wil be recorded.
When the suitably linked executable is run, a profile data
file will be created with a default name of vmon.out.
The file name can be changed by using the VProf API to explicitly
select a file name. The name can also be changed by setting the
VMON_FILE environmental variable to the desired name. Also, if a
parallel environment is detected, different names will be used on
each task to avoid file name conflicts. The profile data file
can be analyzed with vprof or cprof.
Some event based profile methods permit the frequency with which the event is sampled to be altered. This can be done by setting the VMON_FREQ variable to an integer. Profiling based on perfctr currently supports this. The default value of VMON_FREQ is 100000. Setting this value too low will cause counter overflows and unnecessarily perturb the execution time of the program. Setting it too high will cause infrequently events to be missed.
The event that will be monitored can be selected by setting the
VMON environmental variable before running the exectuble. The
default event is equivalent to VMON=PROF which uses the profil
system call to sample where the program spends the most time.
If the perfctr package is used, then, for example, VMON=P6_FLOPS
would sample retired floating point instructions. The Intel Architecture
Software Developer's Manual, Volume 3, gives the complete list of flags
that can be used with the perfctr package.
If the PAPI library is available, then a wide range of system events can be
sampled as well. For example, the code locations where floating point
operations take place can be sampled by setting VMON=PAPI_FLOPS.
The vprof command is invoked as follows:
vprof [options] executable [vmon_file ...]
where the following options are recognized:
Print a help message.
Show the results as the number of samples seen rather than as a percent.
Print debugging information.
Search the directory dir for source files. This option is needed for compilers that do not give full file name information, or if the source code has been moved.
Search the directory dir and all of its subdirectories for source files. This option is needed for compilers that do not give full file name information, or if the source code has been moved.
Print the version number.
The cprof executable lets one analyze performance data in a terminal
window. If the Qt toolkit is not available on your system, you will have to
use cprof to analyze your data.
The cprof command is invoked as follows:
cprof [options] executable [vmon_file ...]
where the following options are recognized:
Print a help message.
Print the version number.
Show all information.
Show all files.
Show all functions.
Show all lines.
Annotate filename.
Show the results as the number of samples seen rather than as a percent.
Set the integer threshold to show collective data. If threshold or more results for a particular performance statistic are given, then the minimum, maximum, and sum of these results will be given instead of the individual results. The default is 4.
Search the directory dir for source files. This option is needed for compilers that do not give full file name information, or if the source code has been moved.
Search the directory dir and all of its subdirectories for source files. This option is needed for compilers that do not give full file name information, or if the source code has been moved.
Output the results as HTML in the directory dir. The directory is created if it doesn't already exist.
Show data that is not accurate.
Print debugging information.
By default, on one processor, the profile data for each node
will be written to a file named vmon.out. If a
parallel environment is detected, then the task number will be
appended to the file name, resulting in a separate file for each
task. However, some parallel environments might not be detected.
In that case the user must call vmon_done_task with the
integer task number as an argument to stop profiling and to write
the profile information for each node. Alternatively, the VMON_FILE
environmental variable could be set to a unique value in each MPI
task.
The vmon output files for all nodes can be examined
simultaneously by giving all of the file names on the vprof or
cprof command line. When more than two vmon files are given,
then the aggregate sum, minimum, and maximum data is shown
instead of the results for individual nodes.
If one vmonauto files cannot be linked in to the application or
if more contol is needed over the output filename or the regions
to be profiled, then the vprof routines must be directly
called. The file vmon.h provides signatures for the
user callable routines:
void vmon_begin() Starts profiling.
void vmon_done() Ends profiling and writes vmon.out.
void vmon_done_task(int task_number) Ends profiling and
writes to the file vmon.out.task_number.
void vmon_done_file(const char *file_name) Ends profiling and write to the file file_name.
To use the vmon routines from FORTRAN, call the following routines:
VMONBG() Starts profiling.
VMONDN() Ends profiling and writes vmon.out.
VMONDT(INTEGER TASK) Ends profiling and
writes to the file vmon.out.task_number.