Next Previous Contents

3. Using vprof

3.1 Compiling the Application

Applications must be compiled with the -g option and link in an extra object file to enable collection of performance data. If vprof was installed into /usr/local, then the extra object file would be one of the following:

For all three of these methods, the libvmon.a library must be linked into the application. Also, the application should be linked statically. If routines in shared libraries are sampled, they will be outside the range of the profile buffer and no information about the location of the event wil be recorded.

3.2 Running the Application

When the suitably linked executable is run, a profile data file will be created with a default name of vmon.out. The file name can be changed by using the VProf API to explicitly select a file name. The name can also be changed by setting the VMON_FILE environmental variable to the desired name. Also, if a parallel environment is detected, different names will be used on each task to avoid file name conflicts. The profile data file can be analyzed with vprof or cprof.

Some event based profile methods permit the frequency with which the event is sampled to be altered. This can be done by setting the VMON_FREQ variable to an integer. Profiling based on perfctr currently supports this. The default value of VMON_FREQ is 100000. Setting this value too low will cause counter overflows and unnecessarily perturb the execution time of the program. Setting it too high will cause infrequently events to be missed.

3.3 Selecting Events

The event that will be monitored can be selected by setting the VMON environmental variable before running the exectuble. The default event is equivalent to VMON=PROF which uses the profil system call to sample where the program spends the most time.

If the perfctr package is used, then, for example, VMON=P6_FLOPS would sample retired floating point instructions. The Intel Architecture Software Developer's Manual, Volume 3, gives the complete list of flags that can be used with the perfctr package.

If the PAPI library is available, then a wide range of system events can be sampled as well. For example, the code locations where floating point operations take place can be sampled by setting VMON=PAPI_FLOPS.

3.4 Running vprof

The vprof command is invoked as follows:

vprof [options] executable [vmon_file ...]

where the following options are recognized:

3.5 Running cprof

The cprof executable lets one analyze performance data in a terminal window. If the Qt toolkit is not available on your system, you will have to use cprof to analyze your data.

The cprof command is invoked as follows:

cprof [options] executable [vmon_file ...]

where the following options are recognized:

3.6 Parallel Applications

By default, on one processor, the profile data for each node will be written to a file named vmon.out. If a parallel environment is detected, then the task number will be appended to the file name, resulting in a separate file for each task. However, some parallel environments might not be detected. In that case the user must call vmon_done_task with the integer task number as an argument to stop profiling and to write the profile information for each node. Alternatively, the VMON_FILE environmental variable could be set to a unique value in each MPI task.

The vmon output files for all nodes can be examined simultaneously by giving all of the file names on the vprof or cprof command line. When more than two vmon files are given, then the aggregate sum, minimum, and maximum data is shown instead of the results for individual nodes.

3.7 C/C++ API

If one vmonauto files cannot be linked in to the application or if more contol is needed over the output filename or the regions to be profiled, then the vprof routines must be directly called. The file vmon.h provides signatures for the user callable routines:

3.8 FORTRAN API

To use the vmon routines from FORTRAN, call the following routines:


Next Previous Contents