Statprof Predefined Probe

From OC Systems Wiki!
Jump to: navigation, search

Statistical Profiling: statprof.ual

The statprof.ual predefined probe uses the Unix statistical profiling functions to provide a very low overhead mechanism to sample CPU timing. This is the underlying mechanism used when linking an application with the -p flag (or equivalent), the output of which is viewed using the prof tool. Since this is based on Unix services it is not available for the Windows 2000 version of Aprobe.

The probe works by using the Unix profil function. Normally this would be called from the startup and exit code of an application linked with -p; with Aprobe we call it on program entry and exit. A buffer is allocated and passed to the kernel with a "slot" (or cell) for every couple of instructions. When a kernel clock "ticks", if one of the threads for the application is running, the value in that slot is incremented.

When the application exits (or we take a snapshot), the memory buffer is saved to the APD file. At format time we convert each cell index to an address and use it to compute the time spent in each routine. Obviously, since the kernel is sampling the application at intervals, this is a statistical profiling method: Theoretically an application could spend 99% of it's time in one routine and jump to another routine at exactly the time we are sampling and then immediately return back again. Although this is theoretically possible, in practice it will never happen (given enough samples).

Statprof.ual has a number of advantages over using -p when building your application:

  • You do not need to rebuild your application to use it!
  • With -p you can only get information about the application module. While -p does link in some special static libraries to replace the standard libs this is not a perfect solution. Statprof allows you to access any of your modules.
  • You can take snapshots with staprof, control the cell size, display lines and generally control things to a much finer level.

One thing that statprof doesn't replicate from prof is the call counts facility since this is adequately covered by profile.ual. See apcgp.ual for a probe that provides call graph counts.

Note that unlike profile.ual, which measures wallclock time, statprof.ual only measures CPU time. If the application is paused waiting on input, this time will not be seen. Sometimes you will be more interested in this than wallclock time, other times it will be the reverse. The [../examples/predefined_probes/statprof/README $APROBE/examples/learn/statprof] directory has a worked example where both statprof and profile are used as appropriate.

See profile.ual and apcgp.ual for other approaches to profiling.

Usage

Using statprof is straightforward - there is no configuration file or GUI to worry about. All options can be set by command line options (either at runtime or format-time).

Statprof UAL parameters

statprof.ual is specified on the aprobe command line or in an APO file as described in Command Line. The specific options are:


aprobe  -u statprof.ual [-p "[-a start end] [-A] [-c] [-h] [-i interval] [-l] [-o filename] [-r] [-s size] [-S signum] [-v] [-z] 
    [module name]]" your_program

where:

-a start end
This performs a profile on the specified address range. This uses the smallest cell size possible, though it will generally be rather coarse. Be careful to set the APD file size large enough to hold all the data. (Since 4.4.9).
-A
This cause address range cell information to be displayed by address rather than by symbols. This is useful when there is no symbol information for the range. (Since 4.4.9).
-c
This performs a coarse profile to see overall usage in shared libraries. This uses a very large "cell" size which allows you to determine which shared library was active when the kernel tick occurred but not which routine.
-i
This sets the profile timer interval. The default value is 10 milliseconds. Valid values range from 10 milliseconds to 999 milliseconds. (Since 4.4.9).
-l
This displays a breakdown of time within a function by line number as well as by function. The bigger the cell size the less reliable it will be but this can give useful information. This is a format-time option.
-o filename
This outputs comma-delimited data to the given filename. This is suitable for input into a spreadsheet or similar program. Note that the line breaks are PC-based rather than Unix-based as this seems to be more generally accepted by programs we've tested with.
-r
This will reset profile counts for each new snapshot. Without this option profile information will be cumulative for the start of the test.
-s
This sets the number of bytes used for each sampling cell. It's unlikely that you would modify this but if you have a huge application and limited memory you may wish to do so. Basically the larger the cell size, the smaller the amount of storage required to hold the sampling table but it's more likely that the output data will be incorrect. By default two instructions are mapped to a cell which has nearly a 100% chance of both instructions belonging to the same routine due to the padding usually used by compilers. But if there are four instructions for the same cell, there's a chance that some of the instructions are for one function and some for another. However, when formatted they are treated as one address (obviously) so wrong information can result.
-S
This specifies a signal which should force a snapshot. The probe will register a signal handler for this signal so it should be one that is compatible with he application being tested. (Since 4.4.9).
-v
This displays information messages. (Since 4.4.9).
-z
This displays symbols or modules which have no associated CPU: Normally just the names of the routines (or modules) that have matches in the sampling table are displayed. This switch will also display the list of all other routines (or modules). This is a format-time option.

If you do not provide a module name, the application will be profiled.

At format time, the -l, -o, -r and -z options may be provided if they were not specified at runtime.

Statprof Memory And APD Usage

Using statprof requires some consideration for the memory and APD file sizes. Memory usage and APD file size is dependent on the number of sampling "cells" needed to record the profile counts. Memory space must be allocated by statprof to hold a counter (2 bytes per counter) for each sampling cell the cell counters must fit into a single APD file. Additionally, the APD files must not wrap around or the snapshot data will be lost.

Profiling more of the application requires more cells. Increasing the sampling cell size (with -s) will reduce the number of sampling cells needed at the expense of granularity. The -c option requires the fewest sampling cells. The -a option may require the most. On Linux the counters are allocated using mmap. On AIX the counters are allocated using malloc.

As and example, suppose you are profiling an application module whose text size is 10 MB. Roughly speaking, you will need about 1.25 MB (= 10 MB / 8) for the counters at runtime. As for APD files, they must be at least 1.25 MB in size. Use the Aprobe -s option to control individual APD file sizes and the Aprobe -n option to control the number of APD files in the ring.

Here is a sample command that might be used to achieve this:

 aprobe -u statprof -s 2000000 -n 10 myapp,exe
 

This example sets the APD file size to 2 MB and number to 10 which would allow for 10 snapshots during the test run.

Here is a sample command that might be used to profile a larger module but limit memory usage:

 aprobe -u statprof  -p "-s 32" -s 2000000 -n 10 myapp,exe
 

Note that there are two -s options in use here. In this case the cell size was set to 32 bytes. This allows profiling a module whose size if 4 times that of the previous example using the same memory footprint. This cells size is probably good enough to distinguish functions but not lines in the functions.

Statprof API

Users can control the behavior of the statprof probe by calls from within their own probes. The API is defined by $APROBE/include/statprof.h. The functions exported by statprof.ual are:

ap_Statprof_Enable

Enables collection of statprof data.
ap_Statprof_Disable

Disables collection of statprof data.
ap_Statprof_Snapshot

Takes a snapshot of current statprof data.

Statprof Demand Actions

(Since 4.4.9).

You can control statprof.ual using demand.ual and apdemand.

Include demand.ual on the Aprobe command line:

 aprobe -u statprof -u demand myapp.exe
 

then use apdemand to send actions:

 apdemand statprof snapshot
 

statprof.ual responds to the following actions:

  • statprof snapshot
  • statprof enable
  • statprof disable

Note that all actions containing the action string will be triggered.

Statprof report

The reports generated by statprof differ depending on the options you choose. The following is a sample output using the default options:

 Snapshot #1 @ 11:26:39.952, Final Snapshot
 ------------------------------------------
 Statistical Profiling (cell size 8) for module thisone, sorted by time:
  %Time   Seconds   Name
  ------  -------   -----------------------------------------------
   75.0    0.48     extern:"DoWork()"
   23.4    0.15     extern:"DoLessWork()"
    1.6    0.01     Other functions (not in profiled module)
 
 

Example D-9. Statprof Report

Statprof Workflow

When profiling an application using statprof.ual, the following approach may be useful:

  • run the application under load using statprof "-c" option to get an overall view;
  • run the application under load using statprof to focus on the "heaviest" (application or shared) module;
  • make corrections to the modue; and
  • repeat the process from the first step.

For example, a simple test application with shared libraries is profiled using statprof "-c" and produces the following report:

 
 $ aprobe -u statprof -p "-c" ./main.exe > aprobe.out
 $ apformat main.apd 
 
 Snapshot #1 @ 16:43:39.429, Final Snapshot
 ------------------------------------------
 Coarse Statistical Profiling for shared libraries, sorted by time:
   %Time    Seconds     Name
  ------   -------   -----------------------------------------------
   95.0   38.50    libmylib2.so
    3.9    1.56    main.exe
    1.1    0.45    libmylib1.so
 

The report shows that libmylib2.so is the "heaviest hitter" so the test is rerun focusing on that shared library:

 $ aprobe -u statprof -p "libmylib2.so" ./main.exe > aprobe.out
 $ apformat main.apd 

 Snapshot #1 @ 16:45:44.315, Final Snapshot
 ------------------------------------------
 Statistical Profiling (cell size 8) for module libmylib2.so, sorted by time:
  %Time     Seconds   Name
  --------  ---------   -----------------------------------------------
   95.505    38.6700     "mylib2.c":"p1()" in "libmylib2.so"
    4.495     1.8200     Other functions (not in profiled module)
 

The report shows that the function "p1" is the heaviest hitter.

On some system, AIX in particular, there may be "kernel" code that is executed by the program but which Aprobe does not assign to a module. In that case the profiling samples will be assigned to either the application module or unknown. You can use the address range option to profile kernel code. Here is an example of profiling the kernel code for optimized memory move/search operations on AIX:

 $ aprobe -u statprof -p "-a 0x3000 0x10000" ./main.exe > aprobe.out
 $ apformat main.apd 

 Snapshot #1 @ 13:55:10.819, Final Snapshot 
 ------------------------------------------
 Statistical Profiling (cell size 8) for address range 3000 .. 10000, sorted by time:
   %Time     Seconds     Name/Address
  --------  ---------   -----------------------------------------------
  100.000    18.4100     Unknown functions (no symbol information)
 

Since there is no symbol information about the kernel code it is more useful to use the -A option to display the profile data using addresses:

 $ aprobe -u statprof -p "-a 0x3000 0x10000 -A" ./main.exe > aprobe.out
 $ apformat -u statprof main.apd 

 Snapshot #1 @ 13:55:10.819, Final Snapshot
 ------------------------------------------
 Statistical Profiling (cell size 8) for address range 3000 .. 10000, sorted by time:
  %Time     Seconds     Name/Address
 --------  ---------   -----------------------------------------------
  22.922     4.2200     d028 - d02f	
  15.263     2.8100     d030 - d037	
  13.308     2.4500     d8c0 - d8c7	
   6.464     1.1900     d8b0 - d8b7	
   6.247     1.1500     d8c8 - d8cf	
   5.323     0.9800     f058 - f05f	
   3.857     0.7100     d898 - d89f	
   3.802     0.7000     d8a8 - d8af	
   3.639     0.6700     d8d8 - d8df	
   3.042     0.5600     d890 - d897	
   2.444     0.4500     f048 - f04f	
   2.118     0.3900     f870 - f877	
   1.738     0.3200     f880 - f887	
   1.575     0.2900     f068 - f06f	
   1.467     0.2700     e0a0 - e0a7	
   1.249     0.2300     f018 - f01f	
   0.760     0.1400     f010 - f017	
   0.760     0.1400     f060 - f067	
   0.706     0.1300     e090 - e097	
   0.543     0.1000     f860 - f867	
   0.489     0.0900     e0a8 - e0af	
   0.435     0.0800     e098 - e09f	
   0.217     0.0400     e088 - e08f	
   0.217     0.0400     f030 - f037	
   0.163     0.0300     d020 - d027	
   0.163     0.0300     e0d8 - e0df	
   0.109     0.0200     d018 - d01f	
   0.109     0.0200     d060 - d067	
   0.109     0.0200     d818 - d81f	
   0.109     0.0200     f000 - f007	
   0.109     0.0200     f078 - f07f	
   0.054     0.0100     d008 - d00f	
   0.054     0.0100     d058 - d05f	
   0.054     0.0100     d400 - d407	
   0.054     0.0100     d420 - d427	
   0.054     0.0100     d450 - d457	
   0.054     0.0100     e020 - e027	
   0.054     0.0100     e050 - e057	
   0.054     0.0100     e060 - e067	
   0.054     0.0100     e080 - e087	
   0.054     0.0100     f020 - f027	

 Statistical Profiling (cell size 8) for address range 3000 .. 10000, sorted by time:
  %Time     Seconds     Name/Address
 --------  ---------   -----------------------------------------------
   0.054     0.0100     d008 - d00f	
   0.109     0.0200     d018 - d01f	
   0.163     0.0300     d020 - d027	
  22.922     4.2200     d028 - d02f	
  15.263     2.8100     d030 - d037	
   0.054     0.0100     d058 - d05f	
   0.109     0.0200     d060 - d067	
   0.054     0.0100     d400 - d407	
   0.054     0.0100     d420 - d427	
   0.054     0.0100     d450 - d457	
   0.109     0.0200     d818 - d81f	
   3.042     0.5600     d890 - d897	
   3.857     0.7100     d898 - d89f	
   3.802     0.7000     d8a8 - d8af	
   6.464     1.1900     d8b0 - d8b7	
  13.308     2.4500     d8c0 - d8c7	
   6.247     1.1500     d8c8 - d8cf	
   3.639     0.6700     d8d8 - d8df	
   0.054     0.0100     e020 - e027	
   0.054     0.0100     e050 - e057	
   0.054     0.0100     e060 - e067	
   0.054     0.0100     e080 - e087	
   0.217     0.0400     e088 - e08f	
   0.706     0.1300     e090 - e097	
   0.435     0.0800     e098 - e09f	
   1.467     0.2700     e0a0 - e0a7	
   0.489     0.0900     e0a8 - e0af	
   0.163     0.0300     e0d8 - e0df	
   0.109     0.0200     f000 - f007	
   0.760     0.1400     f010 - f017	
   1.249     0.2300     f018 - f01f	
   0.054     0.0100     f020 - f027	
   0.217     0.0400     f030 - f037	
   2.444     0.4500     f048 - f04f	
   5.323     0.9800     f058 - f05f	
   0.760     0.1400     f060 - f067	
   1.575     0.2900     f068 - f06f	
   0.109     0.0200     f078 - f07f	
   0.543     0.1000     f860 - f867	
   2.118     0.3900     f870 - f877	
   1.738     0.3200     f880 - f887	
 

You can use the -A option at format time if you forget:

 $ apformat -u statprof -p "-A" main.apd 
 

To act on this information you have to determine what kernel code is where. Here is an extract from the AIX kernel extension import file /usr/lib/kernex.imp:

 *******************************************************************************
 * Milicode routines.  The arithmetic routines can only be called by
 * 32-bit code.
 *******************************************************************************
 #!
 # 32
 __mulh			0x3100
 __mull			0x3180
 __divss			0x3200
 __divus			0x3280
 __quoss			0x3300
 __quous			0x3380
 # no32

 # 32
 ___memcmp		0xD000
 ___strstr		0xD400
 ___memccpy		0xD800
 ___bzero		0xE000
 ___memset		0xE008
 ___memmove		0xF000
 ___memcpy		0xF000
 ___fill			0xF800
 # no32

 # 64
 ___memcmp64		0xD200
 ___strstr64		0xD600
 ___memccpy64		0xDA00
 ___bzero64		0xE800
 ___memset64		0xE808
 ___memmove64		0xF400
 ___memcpy64		0xF400
 ___fill64		0xFA00
 # no64
 #!/unix
 

Also check the the AIX system call import file /usr/lib/syscalls.imp for more addresses. The system call entry point is often located at 0x3700. Please verify this for your system.

At this point statprof.ual is about done. The sattprof.ual probe does not indicate how many times a particular funciton is called, for example.

There is more to profiling an application than just where time is spent. It may be that the particular code is inefficient or it may be that it is overused. To get more in-depth information you will have to switch to the profile probe, profile.ual.

Here is a sample report form profile.ual:

 
  Overall program profile 
 -----------------------------------------------------------------------------
    Calls To   |              Individual Time                |
   Self  Child | Pct    Total   Average   Minimum   Maximum  |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
    136      0   100 119.65829   0.87984   0.76885   1.15061   [4] p1()
    136      0     0   0.02576   0.00019   0.00003   0.02203   [5] p2()
    544    272     0   0.00078   0.00000   0.00000   0.00008   [13] mylib2_work() 
 -----------------------------------------------------------------------------
    Calls To   |               Cumulative Time               |
   Self  Child | Pct    Total    Average   Minimum   Maximum |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
    544    272   100 119.68484   0.22001   0.00000   1.15069   [13] mylib2_work()
    136      0   100 119.65829   0.87984   0.76885   1.15061   [4] p1()
    136      0     0   0.02576   0.00019   0.00003   0.02203   [5] p2()
 -----------------------------------------------------------------------------