Statprof Predefined Probe

From OC Systems Wiki!
Revision as of 20:57, 30 May 2018 by Swn (talk | contribs) (Statprof Workflow)
Jump to: navigation, search

Statistical Profiling: statprof.ual

The statprof.ual predefined probe uses the Unix statistical profiling functions to provide a very low overhead mechanism to sample CPU timing. This is the underlying mechanism used when linking an application with the -p flag (or equivalent), the output of which is viewed using the prof tool. Since this is based on Unix services it is not available for the Windows 2000 version of Aprobe.

The probe works by using the Unix profil function. Normally this would be called from the startup and exit code of an application linked with -p; with Aprobe we call it on program entry and exit. A buffer is allocated and passed to the kernel with a "slot" (or cell) for every couple of instructions. When a kernel clock "ticks", if one of the threads for the application is running, the value in that slot is incremented.

When the application exits (or we take a snapshot), the memory buffer is saved to the APD file. At format time we convert each cell index to an address and use it to compute the time spent in each routine. Obviously, since the kernel is sampling the application at intervals, this is a statistical profiling method: Theoretically an application could spend 99% of it's time in one routine and jump to another routine at exactly the time we are sampling and then immediately return back again. Although this is theoretically possible, in practice it will never happen (given enough samples).

Statprof.ual has a number of advantages over using -p when building your application:

  • You do not need to rebuild your application to use it!
  • With -p you can only get information about the application module. While -p does link in some special static libraries to replace the standard libs this is not a perfect solution. Statprof allows you to access any of your modules.
  • You can take snapshots with staprof, control the cell size, display lines and generally control things to a much finer level.

One thing that statprof doesn't replicate from prof is the call counts facility since this is adequately covered by profile.ual. (There is also an experimental probe, apcgp, which uses prof and provides call graph counts.

Note that unlike profile, which measures wallclock time, statprof only measures CPU time. If the application is paused waiting on input, this time will not be seen. Sometimes you will be more interested in this than wallclock time, other times it will be the reverse. The [../examples/predefined_probes/statprof/README $APROBE/examples/learn/statprof] directory has a worked example where both statprof and profile are used as appropriate.

Usage

Using statprof is straightforward - there is no configuration file or GUI to worry about. All options can be set by command line options (either at runtime or format-time).

Statprof UAL parameters

statprof.ual is specified on the aprobe command line or in an APO file as described in Command Line. The specific options are:


aprobe  -u statprof.ual [-p "[-a] [-c] [-h] [-l] [-s size] [-z] [-i interval]
    [module name]]" your_program

where:

-a
This performs a profile on the whole address space. This uses the smallest cell size possible, though it will generally be rather coarse. Be careful to set the APD file size large enough to hold all the data.
-c
This performs a coarse profile to see overall usage in shared libraries. This uses a very large "cell" size which allows you to determine which shared library was active when the kernel tick occurred but not which routine.
-i
This sets the profile timer interval. The default value is 10 milliseconds. Valid values range from 10 milliseconds to 999 milliseconds. (Since 4.4.9).
-l
This displays a breakdown of time within a function by line number as well as by function. The bigger the cell size the less reliable it will be but this can give useful information.
-o filename
This outputs comma-delimited data to the given filename. This is suitable for input into a spreadsheet or similar program. Note that the line breaks are PC-based rather than Unix-based as this seems to be more generally accepted by programs we've tested with.
-s
This sets the number of bytes used for each sampling cell. It's unlikely that you would modify this but if you have a huge application and limited memory you may wish to do so. Basically the larger the cell size, the smaller the amount of storage required to hold the sampling table but it's more likely that the output data will be incorrect. By default two instructions are mapped to a cell which has nearly a 100% chance of both instructions belonging to the same routine due to the padding usually used by compilers. But if there are four instructions for the same cell, there's a chance that some of the instructions are for one function and some for another. However, when formatted they are treated as one address (obviously) so wrong information can result.
-z
This displays symbols or modules which have no associated CPU: Normally just the names of the routines (or modules) that have matches in the sampling table are displayed. This switch will also display the list of all other routines (or modules).

If you do not provide a module name, the application will be profiled.

At format time, the -l, -o and -z options may be provided if they were not specified at runtime.

Statprof API

Users can control the behavior of the statprof probe by calls from within their own probes. The API is defined by $APROBE/include/statprof.h. The functions exported by statprof.ual are:

ap_Statprof_Enable

Enables collection of statprof data.
ap_Statprof_Disable

Disables collection of statprof data.
ap_Statprof_Snapshot

Takes a snapshot of current statprof data.

Statprof report

The reports generated by statprof differ depending on the options you choose. The following is a sample output using the default options:

 Snapshot #1 @ 11:26:39.952, Final Snapshot
 ------------------------------------------
 Statistical Profiling (cell size 8) for module thisone, sorted by time:
  %Time   Seconds   Name
  ------  -------   -----------------------------------------------
   75.0    0.48     extern:"DoWork()"
   23.4    0.15     extern:"DoLessWork()"
    1.6    0.01     Other functions (not in profiled module)
 
 

Example D-9. Statprof Report

Statprof Workflow

When profiling an application using statprof.ual, the following approach may be useful:

  • run the application under load using statprof "-c" option to get an overall view;
  • run the application under load using statprof to focus on the "heaviest" shared library;
  • make corrections to the shared library; and
  • repeat the process from the first step.

For example, a simple test application with shared libraries is profiled using statprof "-c" and produces the following report:

 
 $ aprobe -u statprof -p "-c" ./main.exe > aprobe.out
 $ apformat main.apd 
 
 Snapshot #1 @ 16:43:39.429, Final Snapshot
 ------------------------------------------
 Coarse Statistical Profiling for shared libraries, sorted by time:
   %Time    Seconds     Name
  ------   -------   -----------------------------------------------
   95.0   38.50    libmylib2.so
    3.9    1.56    main.exe
    1.1    0.45    libmylib1.so
 

The report shows that libmylib2.so is the "heaviest hitter" so the test is rerun focusing on that shared library:

 $ aprobe -u statprof -p "libmylib2.so" ./main.exe > aprobe.out
 $ apformat main.apd 

 Snapshot #1 @ 16:45:44.315, Final Snapshot
 ------------------------------------------
 Statistical Profiling (cell size 8) for module libmylib2.so, sorted by time:
  %Time     Seconds   Name
  --------  ---------   -----------------------------------------------
   95.505    38.6700     "mylib2.c":"p1()" in "libmylib2.so"
    4.495     1.8200     Other functions (not in profiled module)
 

The report shows that the function "p1" is the heaviest hitter.

At this point statprof.ual is about done. The sattprof.ual probe does not indicate how many times a particular funciton is called, for eample.

There is more to profiling an application than just where time is spent. It may be that the particular code is inefficient or it may be that it is overused. To get more in-depth information you will have to switch to a the profile probe, profile.ual.

Here is a sample report form profile.ual:

 
  Overall program profile 
 -----------------------------------------------------------------------------
    Calls To   |              Individual Time                |
   Self  Child | Pct    Total   Average   Minimum   Maximum  |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
    136      0   100 119.65829   0.87984   0.76885   1.15061   [4] p1()
    136      0     0   0.02576   0.00019   0.00003   0.02203   [5] p2()
    544    272     0   0.00078   0.00000   0.00000   0.00008   [13] mylib2_work() 
 -----------------------------------------------------------------------------
    Calls To   |               Cumulative Time               |
   Self  Child | Pct    Total    Average   Minimum   Maximum |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
    544    272   100 119.68484   0.22001   0.00000   1.15069   [13] mylib2_work()
    136      0   100 119.65829   0.87984   0.76885   1.15061   [4] p1()
    136      0     0   0.02576   0.00019   0.00003   0.02203   [5] p2()
 -----------------------------------------------------------------------------