Profile Predefined Probe

From OC Systems Wiki!
Jump to: navigation, search

Profile Probe: profile.ual

The profile.ual predefined probe profiles the execution of functions within an application program. The probe records the number of calls made to functions, the number of calls made by functions, the amount of time spent in each function, and the amount of time spent in each function plus all the functions it calls. The profiling information is reported for each thread and for the program as a whole.

Profiling allows you to determine how much time is spent executing the various functions that comprise your application. The profile information can be used to locate "hot spots" within your application on which to focus your attention when trying to improve your program's performance.

While this is a very powerful tool, it is not going to tell you in the first run what needs to be re-coded, unless it's a small application with a very obvious problem. Performance analysis of an application is an iterative process that requires an understanding of what the application does, and a feeling for how long operations should take.

This section does not address a methodology for performance analysis using aprobe. It simply describes the use of the profile.ual predefined probe. A larger example which also uses statprof.ual can be found in the ../examples/predefined_probes/statprof/README $APROBE/examples/learn/statprof directory and you are encouraged to contact OC Systems to discuss using Aprobe on your specific application.

See statprof.ual and apcgp.ual for other approaches to profiling.

Usage

This probe is applied at run time using aprobe as described under Profile UAL Parameters below. The only functions for which profiling data will be collected are those selected with the Profile keyword in the configuration file (see Configuration File and Configuration GUI). The Remove keyword allows you to remove a function from the set to be profiled, and the Trigger keyword limits profiling to specific paths rooted at a specified function.

The profile information is logged to a file for later processing by apformat. A table of profile information is generated for each thread in the application program, and one summary table is generated to display profile information for the program as a whole. See Profile Probe Output for an example.

Profile UAL Parameters

profile.ual is specified on the aprobe command line or in an APO file as described in Command Line. The specific options are:

aprobe  -u profile.ual     [-p "[-h] [-v] [-g] [-c config_filename]"] your_program

where:

-c config_filename

specifies that the name of the probe configuration options file will follow immediately after -c. The default file name is your_program.profile.cfg. For example, if your executable program is called wilbur.exe, then the default file name would be wilbur.exe.profile.cfg.
-g
starts they Java configuration GUI at probe startup, before running your program.
-h
produces brief help text.
-v
verbose mode, which produces additional progress messages.

Profile Configuration File

The Profile configuration file is used to specify what subprograms are to be analyzed, when snapshots are to be taken, and other options, as described in Configuration File. The example below shows one possible Profile configuration file.

 PROBE CONFIGURATION FILE FOR PROFILE VERSION 2.0.0
 
 Verbose                   FALSE
 ProfilingEnabledInitially TRUE
 DefaultLevels             5
 StartWithGUI              FALSE
 ReportCallers             FALSE
 IndexSymbols              TRUE
 SortByCumulativeTime      FALSE
 SeparateTables            TRUE
 CommaOutputFile           "app_output.csv"
 
 // see how much of that time is spent in system functions
 PROFILE "open()" in "libc.so"
 PROFILE "read()" in "libc.so"
 
 // profile ''almost'' everything in the Motif Library
 PROFILE "*" in "libXm.so"
 REMOVE "_XmRecordEvent" in "libXm.so"
 
 // ...but only when in RefreshCallback
 PROFILE "RefreshCallback()"
 TRIGGER "RefreshCallback()" Levels 4
 

Example D-7. profile.cfg File

Configuration Variables

The following are the only valid keywords that identify lines to set configuration variables. Each such line must begin with one of these keywords, and the keyword must be followed by its value. Nothing else is allowed on the same line.

Verbose

This must be followed by the value TRUE or FALSE. The default is FALSE. The value TRUE indicates the progress messages should be produced by the profile probe.

ProfilingEnabledInitially

This must be followed by the value TRUE or FALSE. The default is TRUE. The value TRUE indicates that data logging will begin as soon as the application program starts running. The value FALSE indicates that data logging will begin only after a call is made to the probe's function ap_Profile_Enable() rather than as soon as the application program starts running or a TRIGGER function is hit.

DefaultLevels

This must be followed by a valid number from 0 upwards. In the absence of any specific triggers (or levels on a TRIGGER) this controls how deep down the call stack profile will go before disabling profiling.

StartWithGUI

This must be followed by the value TRUE or FALSE. The default is FALSE. The value TRUE indicates that the configuration GUI should be started before the target program runs, even if -g wasn't specified on the command-line. A FALSE value is overridden by the -g command-line option.

ReportCallers.

must be followed by the value TRUE or FALSE. The default is FALSE. The value TRUE means that information about the caller of each profiled routine is saved at runtime and displayed at format time. This gives output akin to the Unix gprof command. This switch can be modified at format time to turn this output off (note that turning it on at format time when it was off at runtime will not give useful information).

IndexSymbols.

must be followed by the value TRUE or FALSE. The default is TRUE. When set, each routine is displayed in an abbreviated form when it occurs in the formatted tables. An index showing the mapping between abbreviated and "real" names is displayed at the end. When this is set to FALSE, full symbol names are displayed in the formatted output.

SortByCumulativeTime.

This must be followed by the value TRUE or FALSE. The default is FALSE. This option has no effect unless SeparateTables is set to TRUE. In that case one combined table showing both individual and cumulative times for each routine is displayed and this option controls the ordering of that table. The default is to sort the table by individual time (the routine which has the most individual time allocated to it is shown first); if this option is TRUE, the table is sorted by cumulative time.

SeparateTables.

This must be followed by the value TRUE or FALSE. The default is TRUE. When set to TRUE, two tables are produced for each thread (and / or for the entire program): One shows the breakdown sorted by individual time, the other by cumulative time. If this is set to FALSE one table is produced and the sort order depends on the SortByCumulativeTime option described above.

CommaOutputFile.

This must be followed by the name of an output file (or nothing for a null string). If this is not a null string, a file of the given name is created and data in a form suitable for input to a variety of spreadsheet programs (namely comma-separated-values) is produced. It should be noted that this file has PC line separators as it is found that more spreadsheet programs will accept data correctly in this format than if Unix line feeds are used.

Configuration of Profiled Functions

Each function to be profiled must be specified explicitly using the PROFILE keyword followed by the name of the function, as described in Configuration of Selected Functions.

The REMOVE keyword allows you to specify functions that should not be instrumented for profiling. This is useful when used in conjunction with a wildcard ("*"), to gather data about everything except certain routines.

A line beginning with the keyword TRIGGER specifies the name of a trigger function in the usual manner. Entry to the trigger function will enable profile information collection, and the corresponding exit will disable profiling. This is similar to the behavior of nested probes (see Nesting of Probes). Only while a trigger function is active (executing in the call chain), will profile data will be collected for functions selected with PROFILE statements.

The TRIGGER statement may optionally be followed by a "Levels n" option (where n is a number). This will cause the trigger to only be active for the given number of calls. For instance, Levels 2 will only turn the trigger on for the trigger function and the function it calls. If triggers are nested (e.g. we have a trigger on Routine1() which calls Routine2() which we also have a trigger on), the levels setting for the innermost trigger overrides those at the outer levels.

Note that TRIGGER may imply PROFILE. Specifically, if you have a TRIGGER statement for a particular module (including the application module) and have not specified any PROFILE lines for that module, all routines in that module will automatically be profiled.

Note that if you add a TRIGGER statement to your config file (either by editing the file directly or by using the GUI) you should check the state of the ProfilingEnabledInitially flag - by default that is TRUE and you usually want to make it FALSE when you add a trigger.

Profile Configuration GUI

The configuration GUI (Graphical User Interface) for profile.ual is the "Profile Probe Configuration" dialog. This dialog allows you to set the options that control the coverage probe and is generally as described in Configuration GUI. The Help button provides the following description:

'Trigger Functions' lists the functions whose invocation controls profiling. Profiling will occur when profiled functions are called from a trigger function.

This list is changed by clicking on 'Pick' below the list, and then choosing functions to be triggers.

Clicking the 'Advanced' button will bring up another dialog containing the list of 'Profiled Functions', which lists the functions whose invocation you want to profile.

'Produce verbose output' indicates that progress messages should be output by this probe

'StartWithGUI' indicates that the configuration GUI should be started before the target program runs.

'Profiling Enabled Initially' indicates that data logging will begin as soon as the application program starts running.

'Default depth of trigger profiling' controls how far down the call-chain below a trigger function profiling will occur (unless a given trigger specifies some other value).

'Index Symbols' indicates whether a list of symbol names should appear at the end of the output.

'Sort output by cumulative time' sorts subprograms by cumulative time rather than individual time.

'Separate Tables' indicates whether individual & cumulative time should be combined in one table, or in two separate tables.

'Report Callers', when selected breaks down each call to a subprogram by the subprogram which called it.

'Comma-delimited output file' indicates the name of a file (if any) to which spreadsheet or database readable tables should be written.

'Save As' button saves the current options to a configuration file. 'Run (no save)' button runs the target program using the selected options. 'Save & Run' button save the options to the configuration file, then runs the target program using the selected options.

'Abort' button will abort the target program instead.

Profile API

Users can control the behavior of the profile probe by calls from within their own probes. The API is defined by [../include/profile.h $APROBE/include/profile.h]. Some of the functions exported by profile.ual are:

ap_Profile_GetOptions

Get current set of options controlling the probe.
ap_Profile_SetOptions

Set current set of options to control the probe.
ap_Profile_SaveOptions

Save a set of options to a configuration file.
ap_Profile_Enable

Enables collection of profile data.
ap_Profile_Disable

Disables collection of profile data.
ap_Profile_DoSnapshot

Takes a snapshot of profile data for a specific thread.
ap_Profile_DoSnapshotForAll

Takes a snapshot of profile data for all threads.

See Snapshots for more information and an example.

Profile Demand Actions

(Since 4.4.8).

You can control profile.ual using demand.ual and apdemand.

Include demand.ual on the Aprobe command line:

 aprobe -u profile -u demand myapp.exe
 

then use apdemand to send actions:

 apdemand profile snapshot
 

profile.ual responds to the following actions:

  • profile snapshot all threads

Note that all actions containing the action string will be triggered.

Profile Performance Issues

See Performance_Issues for a general discussion of factors that affect performance.

Reducing Overhead with TRIGGER and LEVELS

The amount of data collected by the profile probe (and the trace probe, see trace.ual) may be greatly reduced by using the TRIGGER keyword to name a trigger function. Probing is turned on when entering the trigger function, and turned off when exiting it, so the probe is only active if the trigger function is a direct or indirect caller. For example, if you only want to profile the symbol-table-reading part of your application, you might put TRIGGER "ReadSymbolTable()" in your configuration file and then you'd only record data for calls made directly or indirectly from ReadSymbolTable().

Further control of data collection is provided with the LEVELS attribute of a trigger. This is essentially an "inverse trigger", which turns profile or trace off at a given call nesting level relative to the trigger function.

Used efficiently, the trigger and levels mechanisms this should enable you to instrument everything and only take a performance hit when your triggers are active.

By default there is a TRIGGER is at the start of each thread and continues for a depth of DefaultLevels (initially 5). This thread-level trigger is controlled by the configuration variable ProfilingEnabledInitially (or TracingEnabledInitially). So for a TRIGGER on some other function to be effective, you must set this to FALSE in the configuration file or GUI as well as selecting one or more trigger functions.

The distinction between functions that are probed and those that are designated as triggers, is crucial to understanding how to control the performance of the profile (and trace) probes.

The PROBE or PROFILE keyword identifies those functions that should be profiled under any circumstances; that is, those functions that should be patched at all. Generally you don't have to worry about this because the predefined probe logic makes a "guess" at what should be probed by what triggers are specified:

The current default is to probe each routine in a module when there is a trigger for a function in that module, or in the absence of any triggers, to probe everything in the application module.

This means don't need to specify any triggers in order for the probe to gather data, and in general you will only pick a few trigger points to limit what data is collected to the portion of the program you're focusing on. Within that portion of the program, data is collected for those functions which are probed. Outside of the triggered functions, the probes are still invoked, but are disabled.

In the event that your application spends most of the time with profile disabled (based on the triggers) this should have relatively low overhead. If, however, your application is such that your functions are always turned on, this may be a problem.

The profile probe is applied to the functions specified in the configuration file, so the run-time overhead is directly proportional to the number of calls to the functions selected. Note that a probe is executed for every function included by a PROFILE line (and not subsequently REMOVEd), even if it is not "active" due to a TRIGGER (subject to the extended instrumentation rules detailed above).

It's often hard to select the initial set of functions to be probed. One effective approach to this problem is to try to identify small functions that are called very often, and REMOVE those from profiling early, since the overhead of each profile probe is constant. You can often identify such "nuisance functions" by letting the application start, run for a few minutes, then send a Control-C to terminate it. This will invoke the thread and program exit processing, and profiling data up to that point will be valid. You can then update the profile.cfg file for the application, and re-run it.

Of course, if the application is complicated to run, you may want to subset it first in a test environment. This is an advantage of including Aprobe in your process early in development, rather than bringing it in during integration and delivery testing -- you can profile smaller parts of your application separately.

Implementation Note: Quick-Check Patching

An extended form of patching is used by the trace and profile predefined probes which provides a very low overhead to a function if there are no active triggers. When there are no active triggers the profiled (or traced) routines can run very close to their full (non-patched) speed.

Essentially a global flag is checked before performing any of the processing associated with the probe. This saves the time normally needed to determine whether there are any active probes for a specific function. Without this mechanism, if you profile (or trace) "the world" using regular patches there may be a significant overhead just determining that there are no probes active for the patched function.

The only user-visible effect of this "quick-check" patch mechanism, other than lower overhead, is that other predefined probes may be unintentionally disabled if a quick-check patch associated with trace or profile is applied to a function before a patch for another predefined probe such as coverage. This problem is avoided simply by naming trace and/or profile after other predefined probes when they are used in the same aprobe invocation.

Profile Report

The reports generated by running apformat vary considerably depending upon the options chosen as specified above. The default looks like the example below:

  Profile for thread: _start() in threads[0]
   Comment: End-of-thread profile information.
 -----------------------------------------------------------------------------
    Calls To   |              Individual Time                |
   Self  Child | Pct    Total   Average   Minimum   Maximum  |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
      1      1    99   2.02011   2.02011   2.02011   2.02011   [4] main()
      1      0     0   0.00035   0.00035   0.00035   0.00035   [0] MyRoutine()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [3] _init()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [5] _fini()
 -----------------------------------------------------------------------------
    Calls To   |               Cumulative Time               |
   Self  Child | Pct    Total    Average   Minimum   Maximum |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
      1      1    99   2.02046   2.02046   2.02046   2.02046   [4] main()
      1      0     0   0.00035   0.00035   0.00035   0.00035   [0] MyRoutine()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [3] _init()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [5] _fini()
 -----------------------------------------------------------------------------
 Total time spent in the timed functions: 2.020473
 Total probe overhead for the timed functions: 0.000000
 Total thread time: 2.029944
 
  Overall program profile
 -----------------------------------------------------------------------------
    Calls To   |              Individual Time                |
   Self  Child | Pct    Total   Average   Minimum   Maximum  |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
      1      1    50   2.02011   2.02011   2.02011   2.02011   [4] main()
      1      2    49   2.01504   2.01504   2.01504   2.01504   [2] MyThre..outine
      3      0     0   0.00054   0.00018   0.00009   0.00035   [0] MyRoutine()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [3] _init()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [5] _fini()
 -----------------------------------------------------------------------------
    Calls To   |               Cumulative Time               |
   Self  Child | Pct    Total    Average   Minimum   Maximum |  Function Name
 ------ ------   --- --------- --------- --------- ---------   ---------------
      1      1    99   2.02046   2.02046   2.02046   2.02046   [4] main()
      1      2    99   2.01524   2.01524   2.01524   2.01524   [2] MyThre..outine
      3      0     0   0.00054   0.00018   0.00009   0.00035   [0] MyRoutine()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [3] _init()
      1      0     0   0.00001   0.00001   0.00001   0.00001   [5] _fini()
 -----------------------------------------------------------------------------
 Total time spent in the timed functions: 4.035710
 Total probe overhead for the timed functions: 0.000000
 Total program time: 2.031339
 Adjustment factors: call overhead = 0.000000; measurement overhead = 0.000000
 
 Symbol Index Table: (maps the abbreviated names to full names)
 --------------------------------------------------------------
    [0] MyRoutine()            ->  "threads.c":"MyRoutine()"
    [2] MyThre..outine         ->  extern:"MyThreadRoutine()"
    [5] _fini()                ->  extern:"_fini()"
    [3] _init()                ->  extern:"_init()"
    [4] main()                 ->  extern:"main()"
 --------------------------------------------------------------
 
 

Example D-8. Profile Probe Output