Statprof Predefined Probe
Contents
Statistical Profiling: statprof.ual
The statprof.ual predefined probe uses the Unix statistical profiling functions to provide a very low overhead mechanism to sample CPU timing. This is the underlying mechanism used when linking an application with the -p flag (or equivalent), the output of which is viewed using the prof tool. Since this is based on Unix services it is not available for the Windows 2000 version of Aprobe.
The probe works by using the Unix profil function. Normally this would be called from the startup and exit code of an application linked with -p; with Aprobe we call it on program entry and exit. A buffer is allocated and passed to the kernel with a "slot" (or cell) for every couple of instructions. When a kernel clock "ticks", if one of the threads for the application is running, the value in that slot is incremented.
When the application exits (or we take a snapshot), the memory buffer is saved to the APD file. At format time we convert each cell index to an address and use it to compute the time spent in each routine. Obviously, since the kernel is sampling the application at intervals, this is a statistical profiling method: Theoretically an application could spend 99% of it's time in one routine and jump to another routine at exactly the time we are sampling and then immediately return back again. Although this is theoretically possible, in practice it will never happen (given enough samples).
Statprof.ual has a number of advantages over using -p when building your application:
- You do not need to rebuild your application to use it!
- With -p you can only get information about the application module. While -p does link in some special static libraries to replace the standard libs this is not a perfect solution. Statprof allows you to access any of your modules.
- You can take snapshots with staprof, control the cell size, display lines and generally control things to a much finer level.
One thing that statprof doesn't replicate from prof is the call counts facility since this is adequately covered by profile.ual. See apcgp.ual for a probe that provides call graph counts.
Note that unlike profile.ual
, which measures wallclock time, statprof.ual
only measures CPU time. If the application is paused waiting on input, this time will not be seen. Sometimes you will be more interested in this than wallclock time, other times it will be the reverse. The [../examples/predefined_probes/statprof/README $APROBE/examples/learn/statprof] directory has a worked example where both statprof and profile are used as appropriate.
See profile.ual and apcgp.ual for other approaches to profiling.
Usage
Using statprof is straightforward - there is no configuration file or GUI to worry about. All options can be set by command line options (either at runtime or format-time).
Statprof UAL parameters
statprof.ual is specified on the aprobe command line or in an APO file as described in Command Line. The specific options are:
aprobe -u statprof.ual [-p "[-a start end] [-A] [-c] [-h] [-i interval] [-l] [-o filename] [-r] [-s size] [-S signum] [-v] [-z] [module name]]" your_program
where:
- -a start end
- This performs a profile on the specified address range. This uses the smallest cell size possible, though it will generally be rather coarse. Be careful to set the APD file size large enough to hold all the data. (Since 4.4.9).
- -A
- This cause address range cell information to be displayed by address rather than by symbols. This is useful when there is no symbol information for the range. (Since 4.4.9).
- -c
- This performs a coarse profile to see overall usage in shared libraries. This uses a very large "cell" size which allows you to determine which shared library was active when the kernel tick occurred but not which routine.
- -i
- This sets the profile timer interval. The default value is 10 milliseconds. Valid values range from 10 milliseconds to 999 milliseconds. (Since 4.4.9).
- -l
- This displays a breakdown of time within a function by line number as well as by function. The bigger the cell size the less reliable it will be but this can give useful information. This is a format-time option.
- -o filename
- This outputs comma-delimited data to the given filename. This is suitable for input into a spreadsheet or similar program. Note that the line breaks are PC-based rather than Unix-based as this seems to be more generally accepted by programs we've tested with.
- -r
- This will reset profile counts for each new snapshot. Without this option profile information will be cumulative for the start of the test.
- -s
- This sets the number of bytes used for each sampling cell. It's unlikely that you would modify this but if you have a huge application and limited memory you may wish to do so. Basically the larger the cell size, the smaller the amount of storage required to hold the sampling table but it's more likely that the output data will be incorrect. By default two instructions are mapped to a cell which has nearly a 100% chance of both instructions belonging to the same routine due to the padding usually used by compilers. But if there are four instructions for the same cell, there's a chance that some of the instructions are for one function and some for another. However, when formatted they are treated as one address (obviously) so wrong information can result.
- -S
- This specifies a signal which should force a snapshot. The probe will register a signal handler for this signal so it should be one that is compatible with he application being tested. (Since 4.4.9).
- -v
- This displays information messages. (Since 4.4.9).
- -z
- This displays symbols or modules which have no associated CPU: Normally just the names of the routines (or modules) that have matches in the sampling table are displayed. This switch will also display the list of all other routines (or modules). This is a format-time option.
If you do not provide a module name, the application will be profiled.
At format time, the -l, -o, -r and -z options may be provided if they were not specified at runtime.
Statprof Memory And APD Usage
Using statprof requires some consideration for the memory and APD file sizes. Memory usage and APD file size is dependent on the number of sampling "cells" needed to record the profile counts. Memory space must be allocated by statprof to hold a counter (2 bytes per counter) for each sampling cell the cell counters must fit into a single APD file. Additionally, the APD files must not wrap around or the snapshot data will be lost.
Profiling more of the application requires more cells. Increasing the sampling cell size (with -s
) will reduce the number of sampling cells needed at the expense of granularity. The -c
option requires the fewest sampling cells. The -a
option may require the most. On Linux the counters are allocated using mmap. On AIX the counters are allocated using malloc.
As and example, suppose you are profiling an application module whose text size is 10 MB. Roughly speaking, you will need about 1.25 MB (= 10 MB / 8) for the counters at runtime. As for APD files, they must be at least 1.25 MB in size. Use the Aprobe -s
option to control individual APD file sizes and the Aprobe -n
option to control the number of APD files in the ring.
Here is a sample command that might be used to achieve this:
aprobe -u statprof -s 2000000 -n 10 myapp,exe
This example sets the APD file size to 2 MB and number to 10 which would allow for 10 snapshots during the test run.
Here is a sample command that might be used to profile a larger module but limit memory usage:
aprobe -u statprof -p "-s 32" -s 2000000 -n 10 myapp,exe
Note that there are two -s
options in use here. In this case the cell size was set to 32 bytes. This allows profiling a module whose size if 4 times that of the previous example using the same memory footprint. This cells size is probably good enough to distinguish functions but not lines in the functions.
Statprof API
Users can control the behavior of the statprof probe by calls from within their own probes. The API is defined by $APROBE/include/statprof.h
. The functions exported by statprof.ual are:
- ap_Statprof_Enable
Enables collection of statprof data.- ap_Statprof_Disable
Disables collection of statprof data.- ap_Statprof_Snapshot
Takes a snapshot of current statprof data.
Statprof Demand Actions
(Since 4.4.9).
You can control statprof.ual
using demand.ual and apdemand.
Include demand.ual
on the Aprobe command line:
aprobe -u statprof -u demand myapp.exe
then use apdemand
to send actions:
apdemand statprof snapshot
statprof.ual
responds to the following actions:
- statprof snapshot
- statprof enable
- statprof disable
Note that all actions containing the action string will be triggered.
Statprof report
The reports generated by statprof differ depending on the options you choose. The following is a sample output using the default options:
Snapshot #1 @ 11:26:39.952, Final Snapshot ------------------------------------------ Statistical Profiling (cell size 8) for module thisone, sorted by time: %Time Seconds Name ------ ------- ----------------------------------------------- 75.0 0.48 extern:"DoWork()" 23.4 0.15 extern:"DoLessWork()" 1.6 0.01 Other functions (not in profiled module)
Example D-9. Statprof Report
Statprof Workflow
When profiling an application using statprof.ual, the following approach may be useful:
- run the application under load using statprof "-c" option to get an overall view;
- run the application under load using statprof to focus on the "heaviest" (application or shared) module;
- make corrections to the modue; and
- repeat the process from the first step.
For example, a simple test application with shared libraries is profiled using statprof "-c" and produces the following report:
$ aprobe -u statprof -p "-c" ./main.exe > aprobe.out $ apformat main.apd Snapshot #1 @ 16:43:39.429, Final Snapshot ------------------------------------------ Coarse Statistical Profiling for shared libraries, sorted by time: %Time Seconds Name ------ ------- ----------------------------------------------- 95.0 38.50 libmylib2.so 3.9 1.56 main.exe 1.1 0.45 libmylib1.so
The report shows that libmylib2.so is the "heaviest hitter" so the test is rerun focusing on that shared library:
$ aprobe -u statprof -p "libmylib2.so" ./main.exe > aprobe.out $ apformat main.apd Snapshot #1 @ 16:45:44.315, Final Snapshot ------------------------------------------ Statistical Profiling (cell size 8) for module libmylib2.so, sorted by time: %Time Seconds Name -------- --------- ----------------------------------------------- 95.505 38.6700 "mylib2.c":"p1()" in "libmylib2.so" 4.495 1.8200 Other functions (not in profiled module)
The report shows that the function "p1" is the heaviest hitter.
On some system, AIX in particular, there may be "kernel" code that is executed by the program but which Aprobe does not assign to a module. In that case the profiling samples will be assigned to either the application module or unknown. You can use the address range option to profile kernel code. Here is an example of profiling the kernel code for optimized memory move/search operations on AIX:
$ aprobe -u statprof -p "-a 0x3000 0x10000" ./main.exe > aprobe.out $ apformat main.apd Snapshot #1 @ 13:55:10.819, Final Snapshot ------------------------------------------ Statistical Profiling (cell size 8) for address range 3000 .. 10000, sorted by time: %Time Seconds Name/Address -------- --------- ----------------------------------------------- 100.000 18.4100 Unknown functions (no symbol information)
Since there is no symbol information about the kernel code it is more useful to use the -A option to display the profile data using addresses:
$ aprobe -u statprof -p "-a 0x3000 0x10000 -A" ./main.exe > aprobe.out $ apformat -u statprof main.apd Snapshot #1 @ 13:55:10.819, Final Snapshot ------------------------------------------ Statistical Profiling (cell size 8) for address range 3000 .. 10000, sorted by time: %Time Seconds Name/Address -------- --------- ----------------------------------------------- 22.922 4.2200 d028 - d02f 15.263 2.8100 d030 - d037 13.308 2.4500 d8c0 - d8c7 6.464 1.1900 d8b0 - d8b7 6.247 1.1500 d8c8 - d8cf 5.323 0.9800 f058 - f05f 3.857 0.7100 d898 - d89f 3.802 0.7000 d8a8 - d8af 3.639 0.6700 d8d8 - d8df 3.042 0.5600 d890 - d897 2.444 0.4500 f048 - f04f 2.118 0.3900 f870 - f877 1.738 0.3200 f880 - f887 1.575 0.2900 f068 - f06f 1.467 0.2700 e0a0 - e0a7 1.249 0.2300 f018 - f01f 0.760 0.1400 f010 - f017 0.760 0.1400 f060 - f067 0.706 0.1300 e090 - e097 0.543 0.1000 f860 - f867 0.489 0.0900 e0a8 - e0af 0.435 0.0800 e098 - e09f 0.217 0.0400 e088 - e08f 0.217 0.0400 f030 - f037 0.163 0.0300 d020 - d027 0.163 0.0300 e0d8 - e0df 0.109 0.0200 d018 - d01f 0.109 0.0200 d060 - d067 0.109 0.0200 d818 - d81f 0.109 0.0200 f000 - f007 0.109 0.0200 f078 - f07f 0.054 0.0100 d008 - d00f 0.054 0.0100 d058 - d05f 0.054 0.0100 d400 - d407 0.054 0.0100 d420 - d427 0.054 0.0100 d450 - d457 0.054 0.0100 e020 - e027 0.054 0.0100 e050 - e057 0.054 0.0100 e060 - e067 0.054 0.0100 e080 - e087 0.054 0.0100 f020 - f027 Statistical Profiling (cell size 8) for address range 3000 .. 10000, sorted by time: %Time Seconds Name/Address -------- --------- ----------------------------------------------- 0.054 0.0100 d008 - d00f 0.109 0.0200 d018 - d01f 0.163 0.0300 d020 - d027 22.922 4.2200 d028 - d02f 15.263 2.8100 d030 - d037 0.054 0.0100 d058 - d05f 0.109 0.0200 d060 - d067 0.054 0.0100 d400 - d407 0.054 0.0100 d420 - d427 0.054 0.0100 d450 - d457 0.109 0.0200 d818 - d81f 3.042 0.5600 d890 - d897 3.857 0.7100 d898 - d89f 3.802 0.7000 d8a8 - d8af 6.464 1.1900 d8b0 - d8b7 13.308 2.4500 d8c0 - d8c7 6.247 1.1500 d8c8 - d8cf 3.639 0.6700 d8d8 - d8df 0.054 0.0100 e020 - e027 0.054 0.0100 e050 - e057 0.054 0.0100 e060 - e067 0.054 0.0100 e080 - e087 0.217 0.0400 e088 - e08f 0.706 0.1300 e090 - e097 0.435 0.0800 e098 - e09f 1.467 0.2700 e0a0 - e0a7 0.489 0.0900 e0a8 - e0af 0.163 0.0300 e0d8 - e0df 0.109 0.0200 f000 - f007 0.760 0.1400 f010 - f017 1.249 0.2300 f018 - f01f 0.054 0.0100 f020 - f027 0.217 0.0400 f030 - f037 2.444 0.4500 f048 - f04f 5.323 0.9800 f058 - f05f 0.760 0.1400 f060 - f067 1.575 0.2900 f068 - f06f 0.109 0.0200 f078 - f07f 0.543 0.1000 f860 - f867 2.118 0.3900 f870 - f877 1.738 0.3200 f880 - f887
You can use the -A
option at format time if you forget:
$ apformat -u statprof -p "-A" main.apd
To act on this information you have to determine what kernel code is where. Here is an extract from the AIX kernel extension import file /usr/lib/kernex.imp
:
******************************************************************************* * Milicode routines. The arithmetic routines can only be called by * 32-bit code. ******************************************************************************* #! # 32 __mulh 0x3100 __mull 0x3180 __divss 0x3200 __divus 0x3280 __quoss 0x3300 __quous 0x3380 # no32 # 32 ___memcmp 0xD000 ___strstr 0xD400 ___memccpy 0xD800 ___bzero 0xE000 ___memset 0xE008 ___memmove 0xF000 ___memcpy 0xF000 ___fill 0xF800 # no32 # 64 ___memcmp64 0xD200 ___strstr64 0xD600 ___memccpy64 0xDA00 ___bzero64 0xE800 ___memset64 0xE808 ___memmove64 0xF400 ___memcpy64 0xF400 ___fill64 0xFA00 # no64 #!/unix
Also check the the AIX system call import file /usr/lib/syscalls.imp
for more addresses.
The system call entry point is often located at 0x3700
.
Please verify this for your system.
At this point statprof.ual is about done. The sattprof.ual probe does not indicate how many times a particular funciton is called, for example.
There is more to profiling an application than just where time is spent. It may be that the particular code is inefficient or it may be that it is overused. To get more in-depth information you will have to switch to the profile probe, profile.ual.
Here is a sample report form profile.ual:
Overall program profile ----------------------------------------------------------------------------- Calls To | Individual Time | Self Child | Pct Total Average Minimum Maximum | Function Name ------ ------ --- --------- --------- --------- --------- --------------- 136 0 100 119.65829 0.87984 0.76885 1.15061 [4] p1() 136 0 0 0.02576 0.00019 0.00003 0.02203 [5] p2() 544 272 0 0.00078 0.00000 0.00000 0.00008 [13] mylib2_work() ----------------------------------------------------------------------------- Calls To | Cumulative Time | Self Child | Pct Total Average Minimum Maximum | Function Name ------ ------ --- --------- --------- --------- --------- --------------- 544 272 100 119.68484 0.22001 0.00000 1.15069 [13] mylib2_work() 136 0 100 119.65829 0.87984 0.76885 1.15061 [4] p1() 136 0 0 0.02576 0.00019 0.00003 0.02203 [5] p2() -----------------------------------------------------------------------------