Frequently Asked Questions for RootCause and Aprobe (All Platforms)
Updated Feburary 1, 2007
This document describes aspects of the products "RootCause"
and "Aprobe" from OC Systems, Inc. (www.ocsystems.com):
It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.
More complete and detailed descriptions of RootCause and Aprobe are provided by the User's Guides for those products, but this FAQ may provide answers not easily found there, and also includes specific code examples not applicable to a general User's Guide.
RootCause is built on Aprobe, a fully general mechanism for applying patches to programs without changing source or object code. See "What is Aprobe?" for more information.
Users are encouraged to send questions (and answers!) to .
This FAQ is Copyright (c) 2007 by OC Systems, Inc. ALL RIGHTS RESERVED.
This FAQ applies to all platforms, and some answers apply only to specific platform, so read carefully. To avoid excessive repetition, the Unix form of a command or path is used where it may apply to multiple targets. For example, paths to files are given in Unix format using forward slashes, environment variables use Unix format, and Windows users should read .dll where filenames end in .ual (see Q12.23 ).
malloc() listed as being LOAD_SHED in the
Trace Display when it really wasn't?
CoInitializeSecurity() when running under RootCause?
rootcause register xxx.dply?
RootCause is a tool for developing and deploying traces that act as a software "flight recorder", simplifying and speeding root cause analysis, as well as proactively monitoring the health and performance of the application. It can also be used to repair applications in the operational environment without rebuilding or reinstalling the software. RootCause is based on Aprobe (see "What is Aprobe?" ) but steps beyond Aprobe in a number of important ways:
This FAQ addresses questions that apply to these aspects of RootCause. The full power of Aprobe is delivered with RootCause, and is addressed by the Aprobe FAQ.
See also "What is Aprobe?" .
It's a long list. Here are just some of the uses of RootCause:
For a more in-depth discussion of some of these, see the
RootCause white papers
.
RootCause has several facets which apply to different classes of users. Technical support personnel will use it to gather information about a product in the field. Developers will use RootCause to develop traces that the support personnel can use, or which the developers themselves may use to track down problems. Testers might use it to gather data to provide back to developers to supplement test results.
There is RootCause for Java and RootCause for C/C++ . Support for both languages may be enabled to support mixed applications.
RootCause for Java supports tracing J2EE applications such as Sun iPlanet and AS7 , BEA WebLogic, JBOSS, and Tomcat applications. See "RootCause J2EE Support" for more information.
RootCause is currently available on Windows 2000; Windows XP; Sun Solaris (Sparc only), AIX version 5.1 or newer; and Red Hat Linux 7.1 or newer (x86 only). RootCause does not yet support 64-bit applications on any platform, though it _does_ support 32-bit applications running on 64-bit operating systems.
The detailed requirements are documented in Chapter 2 of the RootCause User's Guide for
Unix
or
Windows
.
The best way is to send e-mail to , or phone 703-359-8160, extension 3. You can expect a quick response between 9am and 5pm Eastern US Time.
Yes, in general, but the details differ between Unix and Windows:
Unix: Only if you want to apply probes to native code. You can trace Java and native code, and dump Java parameters, without a C compiler. However, the only thing you can do with native code is trace it; you can't dump parameters or variables or generate probes (e.g., SNAPSHOT or COMMENT) because those are implemented by generating APC source code and then compiling it with Aprobe's apc compiler, which requires a C compiler backend.
Windows:
Everything for Unix above is true for Windows, plus:
(a) the compiler must be Microsoft Visual C++; and
(b) if the program was compiled with
Visual C++ 6 (or Visual Basic 6) it can't even be traced, because
RootCause relies on a DLL that's part of those products which we're
not allowed to distribute.
Starting with version 2.1.1 of RootCause you can trace Visual C++
(VC7) programs
For VC6(VB6) programs RootCause needs MSVC++ to be installed to
provide the (non-redistributable) mechanism to read symbol information
from PDBs. Without MSVC++ installed only symbol information stored
in the executable or in DBG files can be read, plus the exports symbols.
In version 2.1.1 of RootCause an environment variable can be set
to enable the use of the new mechanism to access symbol
information in PDB files for VC6(VB6) programs. Set the
environment variable APROBE_USE_DIA=1 to enable this (experimental)
feature.
RootCause is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are available for pre-sales evaluation.
RootCause for C/C++, RootCause for Java, and the RootCause Agent (run-time) are licensed separately. Licensing is enforced on a per-user basis or per-CPU basis with FlexLM. Contact our sales department for more information at .
If you already have a license but it's not working for you, see "Licensing" or "How do I get technical support?"
Explicit support is provided for C, C++ and Ada. Functions written in Assembler will work to the extent that they adhere to standard calling conventions.
Functions written in other high-level languages (e.g., Basic, Fortran, Pascal, JOVIAL) may also be probed if the probe doesn't reference source-level identifiers ("target expressions"). Contact if you have a favorite.
Almost any program with symbols can be probed. The "full support" described below is based on the debug information needed for source lines and target expressions. Support for additional architectures, operating systems and compilers is always in progress, so please contact if you don't see what you need here.
Aprobe supports the Microsoft Visual C++ development system versions 6 and 7
but does not support .NET (Dynamic Runtime Model) applications.Aprobe supports any IBM C or C++ compiler that runs on AIX 4.2 or newer. There is partial support for gcc and g++ versions 2.95.x, and for gcc versions 3.x compiled with -gstabs+. If your program is Ada, Aprobe supports OC Systems' PowerAda, and (starting with version 4.4.2) GNATPro 5.04.
The C and C++ compilers supported are Sun WorkShop C++ compiler versions 4.2 and higher (Forte) and gcc/g++ compilers before version 3. If your program is Ada, Aprobe requires GNAT version 3.15 or 3.16.
The C and C++ compilers supported on Linux are gcc and g++ versions 2.95.x and 3.x. See also Q1.14 . If your program is Ada, Aprobe supports only PowerAda on Linux and AIX. (GNAT is supported only on AIX and Solaris.)
No, but for non-Java programs it helps. The suggested compromise is to build it with debug, develop your traces, then strip the debug information when shipping the product. This is fully discussed in Chapter 6 of RootCause for C++ User's Guide, "Building a Traceable Application".
RootCause has many unique features which require a unique terminology to describe. See the glossary in Chapter 3 of the user's guide for their definitions. Some basics are:
The part of the RootCause product which actually applies and enables the probes, also known as the Aprobe runtime.
The Graphical User Interface (GUI) used for developing probes , and viewing the data logged by them.
verb
: to efficiently record data into a memory-mapped file for later viewing.
noun
: the RootCause log, a list of all programs run with "rootcause on".
Programmatic actions to be inserted and executed at specific points in the probed application.
gcc/g++ 3.x is fully supported on Linux.
Support for GNAT 5.x, and for gcc/g++ 3.x on other OSes is not currently scheduled.
No. See "Is there any way to attach with Aprobe to a running application?" .
We have provided support for older versions of Java for specific customers: please contact us if you have a specific need.
Some of our probes, most notably java_memstat make use of the JVMPI debugging interface, which has turned out to be unreliable in earlier versions, and which has been eliminated entirely in Java 1.6. See the Memstat documentation for a detailed description.
An "agent installation" is the installation of the "RootCause Agent", a small subset of the product that allows one to run probes developed using the RootCause Console.
Note that this prompt is gone starting with RootCause 2.1.1: the agent is now just a self-installing file %APROBE%\deploy\RootCauseAgent.exe.
RootCause is a superset of Aprobe, and in fact shares the same installation script. You can choose a different name if you like.
Because probes on C/C++ (and Ada and other compiled languages) need to be compiled with a user-supplied C compiler, and the installation script has to know whether to check/prompt for that.
No. RootCause for C/C++, like Aprobe, requires a C compiler to build the probes. This is not provided with RootCause because it's assumed customers have one. If you don't, gcc is fine, and OC Systems can help you download and install it.
Yes. Just enter an empty string, ignore the warnings you may get, and then put the license key into the file
license.dat
in the
licenses
directory under the RootCause installation directory before you start using RootCause. See also
"Licensing".
On RedHat, the Korn shell is provided by the pdksh package. This is on the install media, but not usually installed unless you install everything or specifically request it. The pdksh RPM can be downloaded from the RedHat ftp site. Choose the appropriate link for your version of the RedHat Distribution:
Note that Linux RootCause version 2.2.2 (Aprobe 4.4.2) no longer requires ksh to install: the install script is finally bash-compatible!.
rootcause open
?Because the RootCause Console interface is in Java, and the default selection of fonts does match what's in your X-windows font path. This problem usually only happens when using older (pre-8) versions of Solaris. See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide.
You must be using an older (pre-8) version of Solaris, which requires an older (pre 1.4) version of Java to be used, which doesn't directly support this. Same for default buttons on dialogs. Additionally, on Unix you will find that the 'Copy' operations from various RootCause windows such as Trace Events don't show up in your X-Windows clipboard.
See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide for details, but the quickest fix is to start the X-windows application "xclipboard". When you copy something to the clipboard from Java, it will appear in the xclipboard window. You can then select it there and middle-click to paste elsewhere.
Just right-click farther up on the screen so there's room for the whole menu. The Java popup menu behavior is separate from the selection of the item on which it works. So once you've selected an item with a left -click (MB1), you can right-click anywhere in the window to see the context menu for that selected item.
Yes, you can point your browser (Netscape, Mozilla, Internet Explorer, etc.) to
$APROBE/html/rcguihelp.html
(where $APROBE is the value of the APROBE environment variable, the root of your RootCause installation.) However, the Help operations won't update that automatically -- you'll have to use your browser's Find operation.
However, note that Chapter 8 of the RootCause User's Guide is pretty much identical to the On-line help, and is cross-referenced with the rest of the user's guide (see Q1.8 ).
No. The RootCause Console must be run on the same kind of platform (AIX, Linux, Solaris, Windows) as that on which the data is collected, both for defining the trace and for viewing the data. The format of the deployed workspace and of the collected data is platform-specific.
The problem is that these emulators just don't support Java well. There are some hints in the user guide but it's still not very usable. Our advice: use VNC. It's so much better in every way, and it's free. You may download both the client and server from RealVNC. These sites explain it better than we could here, but if you need assistance feel free to .
Yes. The demo program that we beg everyone to do first shows exactly how to set this up and create a default workspace. There's one for Unix and one for Windows.
However, since you asked so nicely, here's what you do:
rootcause on in a window where you'll start your app.Yes, you should be able to trace anything. If you find one that you cannot trace, please report it as a bug. However, most executables that are part of the system have no symbolic information, so you cannot see functions in the executable itself. You can get functions in shared libraries/DLLs that are loaded, and use the predefined UALs without symbols and debug information.
Some programs like Java 1.4 and Netscape "fork and exec themselves" so these are distinct processes. You generally want the second one, since the first probably set up some things missing from the environment and then tried again.
There could be a number of reasons:
rootcause status
command.rootcause log -s.
Then choose a bigger number, say 20000, and
run rootcause log -s 20000 (see Q4.8).
You can clear out the current log contents with:
rootcause log -Z (see Q4.6).
rootcause register -l
from the command-line) and look at the verbose setting near the top of the output, and see if it's missing or off. To enable it on Windows, run the DOS command
rootcause on verbose
. On Unix:
rootcause register -s verbose
.
libapaudit.so
) from being loaded from its default, non-secure location. See "SetUID Applications" in Chapter 10 of the RootCause User's Guide.In all but the first case, you'll have to run the program again with "rootcause on" for it to show up in the RootCause log.
When you start a program, that may start a shell script. Korn shell, C shell and others can have associated "rc" files (e.g.,
~/.kshrc
,
~/.cshrc
), which run some commands. Then the script itself may run some commands to evaluate the environment. Then the program itself may start some processes (e.g,. by using
CreateProcess()
or
system()
) to do some tasks. You can learn amazing stuff when you use RootCause even without tracing!
Yes, by turning verbose logging off. This is done on Windows with the DOS command
rootcause on quiet
and on Unix with:
rootcause register -s verbose -e off
Also, on Unix, you can set the environment variable APROBE_LD_AUDIT_VERBOSE=FALSE in a shell and it will disable logging of all commands started in that shell and its subshells. This trick is used by the
rootcause_status
script.
There's currently no way to do this from the Console. From the command line:
rootcause log -Z
. Then do
File->Refresh
to see everything disappear.
Yes, it wraps so that it doesn't get huge. The default size is 100000 bytes. You can use the
rootcause log -s
command to query and change the size in bytes (there's no access to this from the Console). For example:
# show the log size:
rootcause log -s
100000
# set the log size to 20000 bytes:
rootcause log -s 20000
Yes, using the APROBE_HOME environment variable (supported starting with version 2.0.5). The value of this environment variable, if set, use used instead of the defaults (%USERPROFILE%\.rootcause on Windows, $HOME/.rootcause, .rootcause_aix, or .rootcause_linux on Unix). On Unix, this directory is where the RootCause Log and RootCause registry reside, so if you want these files accessible system-wide you should set APROBE_HOME to some central, writable location.
Yes. Edit the "preferences" file in your APROBE_HOME directory (see Q4.8)and change
<start_with_log value="true"/><start_with_log value="false"/>
You'll nearly always want to click Yes, which means "use this workspace to trace this application next time you run the application with RootCause on". You might click No if you don't want to trace that application with RootCause yet, or if you want to keep tracing it with a different workspace with which it's already registered. When in doubt click No: you can always use Workspace->Register Program to do it later.
It means that something has been changed or added that must be recorded when the workspace is saved. You can ignore it.
See Chapter 8 of the User's Guide, which fully describes the Console GUI. Also, look for a file in
$APROBE/probes
(
%APROBE%\probes
on Windows) with the same name and suffix ".apc" and you'll see the details of its implementation. This doesn't apply to X.trace.ual, which is custom for each workspace.
This means "Unknown Source File", probably because no debug information was found. Look in the Messages pane of the Workspace browser window for messages about debug information. You can still trace entry and exit to these functions, and can write custom probes that get data without using debug information.
The dots are there to act as a "path" to help you find the traces and probes you've defined.
A black dot indicates an entry/exit trace of the marked function, method, file, class, or directory. Functions and methods marked with black dots are represented by equivalent entries in the Wildcards dialog, and are implemented by entries in the trace.cfg file in the workspace.
A blue dot indicates a probe or data trace in the marked function, method, file, directory, or class. These actions are not mapped to wildcards, and are implemented by compiled APC for C functions.
You must add the library to the workspace, and then it will show up in the Trace Setup window. To do this, select Add Dynamic Module... from the Workspace menu. If the module changes, you must do Reset Dynamic Module .
"Don't Trace..." will remove the black dots from the subtree it applies to, meaning those methods and functions won't have their entry and exit traced. "Remove Probes..." will remove the blue dots, meaning specific Probe and Data logging actions will be removed.
The easiest way is to copy it into the workspace. You can also use Add UAL, and you'll need to do that if it takes parameters and other complications, but that's a bit more advanced: see Chapter 8 of the User's Guide or contact .
This should happen only on Unix. There, for improved usability (at a customer's request), functions whose names match certain patterns are filtered from the list. This list can be changed, replaced or nullified, though this is not documented.
The filtering is defined by the patterns in the file
$APROBE/arca/trace_filters
. See the commentary at the top of that file for complete information.
Could it be you set APROBE_SEARCH_PATH after you started the GUI? If so, quit RC and restart it so it can pick up the env var.
The easiest way is to create a ".h" file that contains prototypes for the functions that you want. RootCause will automatically compile and use the "debug information" in that file so, for example, you can see the parameters in the setup window of the Console or reference them by name in the custom apc that you write.
To do this:
$APROBE/shadow/t.exe.h and the user-local
one is $APROBE_HOME/shadow/t.exe.h. On Windows,
this is as you would expect: %APROBE%\shadow\t.exe.h
and the user-local one is %APROBE_HOME%\shadow\t.exe.h.
See Question 4.8 about APROBE_HOME.Placing the .h file in $APROBE/shadow would make it available for all invocations of RootCause, whereas the other two locations would be more user specific. Note that RootCause will search the directories in the opposite order of their listing above, so a.out.h in the .rootcause directory will be used instead of a.out.h in the $APROBE directory. (Analogous for Windows.)
You can see an example of this by doing a directory of the $APROBE/shadow/*.h (or %APROBE%\shadow\*.h). RootCause uses this feature to provide parameter information for some of the system shared libraries.
Make sure that you have a supported C compiler available, as this is needed to compile the .h files. (You may not have a supported C compiler if you installed RootCause as Java only and now want to do C probing; contact to add the C capability.)
This is called a "Trigger" and has been a feature of the Aprobe-level trace all along. It was added as a Probes action in the Trace Setup dialog in version 2.1.3a (April 2004). It works like this:
You should see the function or method to which you applied the Trigger action at the top of each traced call tree in your trace, and nothing outside of that (even if you selected it for tracing).
You can check whether trace is enabled with the
ap_RootCauseTraceIsEnabled macro. For example:
if (ap_RootCauseTraceIsEnabled)
{
printf ("Enabled\n");
}
else
{
printf ("Disabled\n");
}
Disabling your probe independently from Trace is covered in the "Disable Probe" example (Windows: %APROBE%\Examples\Advanced\Disable_Probe; Unix: $APROBE/examples/learn/disable_probe).
You can't. This is deliberately designed to remain active even after trace is disabled. We do deliver source for the probes so that users can customize their behavior. In this case it would be a simple matter of putting the "if (RootCauseTraceIsEnabled)" check (see Q6.10) around the code in the "ExceptionHandler" routine within $APROBE/probes/exception.apc, recompiling it, and either using a local copy or overwriting $APROBE/ual_lib/exception.ual. (On Windows, %APROBE%\probes\exception.apc and %APROBE%\ual_lib\exception.dll, respectively.)
After you run through your test, format the APD files with Examine. The tree will reflect the trace path from point A to B. At the end is a summary call tree with call times in it. Or you can look at the performance table node (right click and choose show associated table) to see a table.
There are two possibilities, but the most likely (on Solaris) is that the traced function is a compiler-generated one that is explicitly filtered from the Trace Setup list, but which is covered by the "wildcard" trace used when you do "Trace All Child Nodes" from the Trace Setup module node. See Q6.6 .
The other possibility is that the event was introduced by some other custom probe, such as a J2EE trace. See Q7.2 .
Probably because the events didn't originate in the Trace Setup, but were introduced by a supplementary J2EE trace. Still, you should be prompted to add the containing class, and so be able to define traces on it.
Yes, RootCause has a concept of a source file path. There are a number of ways to set this:
If you click on a method, the first time it will ask if you want to find the source. If you browse and select the source file, the enclosing path is automatically added to a list. If the end of the Java path matches the package path of the class, the "root" of the package path is added also.
You can edit the path directly off the RootCause Setup menu.
We'll pick up an environment variable APROBE_SEARCH_PATH when the RootCause Console starts.
Of course it depends on the vendor, but the best thing to do is to send them what you would want your customers to send you: text with as much pertinent information as possible. If the trace contains enough information for you to determine where the problem is, then the other piece of information they would want is the system configuration, as collected with logenv.ual.
To create the bug report, you could do File->Save As Text from the Trace Display window; then edit the resulting text file to include the program and system configuration and the tracebacks and execution information that identify the problem; then e-mail the result, indicating it was collected with RootCause. (They might have RootCause also, and ask you to re-run to collect additional information).
Yes. The JVM routinely uses signal 11 (perhaps for extending the stack) and signal 4 (illegal instruction -- not sure what that's for). These can show up in the trace and are fine. Later versions of the JVM provide options for reducing its use of signals; you can search java.sun.com for details.
The JVM implements the synchronization on the calling side rather than on the callee side. Once you are inside the method's code, the lock has already been grabbed. This means that the time you see is after the synchronization.
For instance, I have a test that calls a synchronized method from a thread's run method:
try
{
Thread.sleep (1000);
parent.synchronizedMethod (); // Line 15
}
catch (InterruptedException e)
{
e.printStackTrace ();
}
If I trace lines and have things set up so another thread is within synchronizedMethod(), I see something like this:
Line 15 10.45.00 ; Waiting ...
synchronizedMethod entry 10.46.00 ; Got it ...
malloc() listed as being LOAD_SHED in the Trace Display
when it really wasn't?
Because it was attempted to be load-shed, which recorded it as such,
but the actual disabling of the probe was disabled by another UAL's
explicit request, using #pragma nopatchcount.
The confusion comes from the fact that load shedding may mean two things:
Since we don't want (1) to happen for allocation/deallocation
routines when running memstat, these patches could not be disabled.
This was indicated by using #pragma nopatchcount in
combined_memstat.apc.
However, when traced these routines will get load shed just like everything else, and the LOAD_SHED event and appearance in the table indicate that (2) has happened. So this is pretty much "as designed".
If you explicitly mark the function as, "Do Not Shed", it will no longer show up in the table.
You are hitting the limit on the maximum number of items displayed in the trace display. You can either reduce the size of the APD files, reduce the number of APD files selected or increase the limit at the expense of longer processing times and higher memory overhead. I would try the last one first and if this works for you, great. The option is "Maximum number of events in Trace Display" and is described here. Briefly:
The values are recorded per-user, so must be set for each user in the user preferences file: $APROBE_HOME/preferences on Unix, %USERPROFILE%\preferences on Windows.
Yes, but only in RootCause (see below). It is not quite legal XML and so will be rejected by general XML viewers. (If you think this is an important feature, let us know.)
To import saved XML back into RootCause again, you have to set the environment variable RC_ENABLE_LOAD_XML to a nonempty value before starting the RootCause GUI. If you've done this, you will then see the menu item Examine XML File... in the Analyze menu in the RootCause Main menu. Clicking this menu item will open a file selection dialog from which you can select an XML file. This must be a file previously saved from RootCause Trace Display using File->Save As XML. When you click the Examine XML Output button in this dialog, you will then see a Trace Data Dialog in which one of the checkboxes is the name of your XML file. Check it, and click Open, to view the Trace Display.
Under the View menu, click Statistics Filter.... This dialog is used to create a "filtered" copy of the statistics summary tree. The copied tree will be added to the end of the event tree and will identify what filter was used. You specify a statistic to use (Wall time or CPU time, if collected) and a threshold percentage to create the "filtered" copy. A child node in the summary tree will only be copied to the new tree if the child's statistic value is at least the given percentage of the parent's statistic value. Choose "None" to create an exact copy. The threshold must be a numeric percentage between 0 and 100.
No, these are actual times. You can specify overhead values by clicking View->Statistics Overhead. This opens the Set Statistics Overhead dialog. You'll see an options menu from which you can select the statistic to adjust, and type-in fields for the normal (native) call overhead and the Java overhead (which is generally bigger).
Note you must each statistic separately, for example:
None and change it to Wall TimeNone and change it to CPU TimeWhen you've completed setting overhead values, you must regenerate the data:
As described in Q7.11, you can specify tracing overhead to be applied to times shown in the Trace Events details. But what number should you put in there? The answer depends on a number of factors, including your hardware and OS speed, whether you're dumping parameters, and whether it's Java or native code. A good guess is the minimum time you see in the entire tree for that kind of call, or if that seems to big, you can instrument some do-nothing function and see what its time is. This value would be the overhead for every call, and you can use that.
The nodes look like:
ENTER Factor::addWidgets()
time = 2004-05-03 16:32:10.079965024
process = 15193, thread = 0 _start()
symbol = "Factor::addWidgets()" IN "$java$", Factor.java
CPU Time 0.428844 ( 0.428844)
Wall Time 0.552496 ( 0.552496)
EXIT Factor::addWidgets()
time = 2004-05-03 16:32:10.632461354
elapsed time = 00:00:00.552496330
process = 15193, thread = 0 _start()
symbol = "Factor::addWidgets()" IN "$java$", Factor.java
The Details pane for each node gives the (wall) time at which the function or method was entered. In addition, any statistics that were being gathered are attached to the ENTER Node. Shown here are the elapsed CPU Time (gathered because the perf_cpu probe was enabled) and elapsed Wall Time. Both were computed on EXIT from this specific invocation. The EXIT node also shows the elapsed (wall) time, which is the same as the Wall Time statistic.
Consider the following node:
Java_Factor_smallestFactor()
process = 15193, thread = 10 _start()
symbol = extern:"Java_Factor_smallestFactor()" in "libFactorJNI.so", /work/JNI/factor.c
Times called: 29
Child calls (native/Java): 4190 / 0
CPU Time (29): 1.248102 ( 1.298730) [99.753%]
Max : 1.231153 ( 1.274449)
Min : 0.000048 ( 0.000072)
Avg : 0.043038 ( 0.044783)
Wall Time (29): 375.135004 (375.185632) [99.998%]
Max : 375.105686 (375.148982)
Min : 0.000043 ( 0.000067)
Avg : 12.935689 (12.937435)
Recall that each node in the Event Summary tree represents a unique
call stack in the execution. The one shown above is for the
native JNI function Java_Factor_smallestFactor() (see
$APROBE/demo/RootCause/JNI).
The function was called 29 times. Those 29 calls together used 1.248102 seconds of CPU Time after overhead adjustment (See Q7.11.) The slightly larger time shown in parentheses after it (1.298730) is the "raw" time before the overhead adjustment. The percentage in brackets indicates that the total CPU time used for this function comprised 99.753% of the total time used by its caller, the parent node in the summary tree (See Q7.10 about filtering based on this percentage.). Of those 29 calls, the longest (Max) took 1.274449 seconds of CPU, the shortest (Min) took only 0.000072 seconds, and the average took 1.248102 / 29 = 0.043038 seconds of CPU.
Yes. Click on a node to select it, then right-click to pop up the context menu, then click 'Save Node As Text' to save the selected node in a text file. This will save the node and its details exactly as it would appear in the 'File->Save As Text..' output. Note that it works only for one node, so if multiple nodes are selected it applies only to the first of those. See also the next question.
Yes. In either the Events tree on the left, or the details in the lower left: Click on a node (or multiple nodes using shift or control keys in the usual way). Then right-click to pop up the context menu, then click 'Copy'. This will put the selected nodes in the Java clipboard. See Q3.2 for how to paste from the Java clipboard on Unix.
Probably because it was Load Shed. This means that it was called so often its tracing overhead became excessive and tracing was disabled for it during the run. It will appear in the Load Shed table, where you can choose to stop it from being Load Shed during the next run.
You can't. The concept of levels is no longer supported. Instead you can apply a Trace Trigger, or disable and enable the trace using the probes tab for a given function.
These are not currently integrated with RootCause. If you can run them from the command-line using Aprobe you should do that. If you wish to use the "RootCause On" mechanism to run them using the workspace, you must add them to the workspace options using the "Setup->Add UAL" menu item. This adds a new UAL "permanently" to the Workspace UAL tree. For example, to add the "memwatch" probe, you would:
This adds "memwatch" to the UAL tree in the Workspace window. You could then check this to enable memwatch on applications run under RootCause. The output of these probes isn't integrated with RootCause, so the output appears as a "Text" node in the Trace Display event tree. You can use "Save As Text" from that display to view it outside of RootCause.
Prior to RootCause version 1.3.3, you would reference these probes using the Aprobe options and Apformat options dialogs (see Chapter 8 of the user's guide), just as you would on the Aprobe command-line. For example, to enable memwatch, you would add
-u memwatch -p -g
as "Additional Aprobe Options" (under Aprobe options in the Execute menu in the Workspace window) and
-u memwatch
in the Apformat options (under the Analyze menu). For probes like profile that require configuration files, you would have to put the full pathname of the configuration file into the options as well, like
-u profile -p -c /testdisk/probes/prog1.profile.cfg
.
Yes, but this is not explicitly supported. In particular, most operations from the RootCause Console overwrite the scripts in the workspace which apply Aprobe to the application. So after you use the Console to create a workspace, you quit, and edit the aprobe.ksh and apformat.ksh scripts (do_aprobe.cmd and do_apformat.cmd on Windows) directly to apply your probes.
Aprobe supports Java with the
apjava
command. Writing custom probes in Java is described in Chapter 11 of the RootCause for Java User's Guide and the nearly-identical Chapter 5 of the Aprobe User's Guide for
Unix
and
Windows
, and if you really wanted to you could do everything from the command line.
There are three ways of adding a UAL to a trace:
Personally I like option b, choosing not to copy the UAL to the workspace. This makes it easy to enable / disable from the GUI.
The events probe is not integrated with RootCause Trace Display, but you can still use it. Here's a quick way to get started, by simply applying events to all Java methods and all native functions in the main module (if any), and letting load shedding reduce overhead.
cp $APROBE/probes/events.cfg MyWorkspace.aws
echo ';event function "*"' >> MyWorkspace.aws/events.cfg
echo 'event function "*::*"' in $java$ >> MyWorkspace.aws/events.cfg
-c $RC_WORKSPACE_LOC/events.cfg
rootcause format -r MyWorkspace.aws > format.txt
Your results are in format.txt. You can then edit the events.cfg file to do more, as shown in Q15.12 , and you can specify an alternate output file so you get the events output while still formatting within RootCause.
Yes, you can leave RootCause on all the time. It takes effect on reboot about the time when per-user preferences get loaded, or when you get prompted for your login id. Check the System event log (run "eventvwr") to get more exact information.
This depends almost entirely on what you do with it. If you trace almost nothing, it will introduce almost no overhead. If you trace every method call on your machine, it will slow things down too much. The keys to good performance are:
You need to copy or soft-link the RootCause "libapaudit.so" library to a "secure pathname" as described in Chapter 10 of the RootCause User's Guide, "RootCause, SETUID, and Security Concerns".
If you're seeing messages like:
ld.so.1: mail: warning: /opt/aprobe/lib/libapaudit.so: open failed: illegal insecure pathname
ld.so.1: mail: fatal: /opt/aprobe/lib/libapaudit.so: audit initialization failure: disabled.
Then the application you're running (like "mail" above, or "ps") has its setuid bit set and is owned by root. Solaris prevents dynamically loading debug libraries on such applications for security reasons. Here's what to do:
rootcause_libpath -c
This copies the libapaudit.so library to secure paths under /usr/lib.
rootcause_off
rootcause_on
ps
to verify it works.
If you still get warnings, you're probably on an early patch level of Solaris 8. Do:
export LD_AUDIT_64 ; LD_AUDIT_64=/usr/lib/secure/64/libapaudit.so
If that still doesn't work, contact OC Systems. Details about probing secure applications on Solaris is documented in Chapter 10 of the latest Unix RootCause User's Guide.
There's no built-in mechanism. It's harder than you think. Here's some custom APC (for Solaris only) that you could compile into a UAL, add to your workspace, and see the modules:
#include <alloca.h>
#include <link.h>
typedef struct
{
ap_NameT ModuleName;
ap_Uint32 StartAddress;
ap_Uint32 Length;
} DynamicModuleDataT, *DynamicModuleDataPtrT;
static void *ModuleKeyGet (void *S)
{
return (void *) ((DynamicModuleDataPtrT) S)->ModuleName;
}
static ap_BooleanT ModuleKeyCompare (void *LeftKey, void *RightKey)
{
return (strcmp ((ap_NameT) LeftKey, (ap_NameT) RightKey) == 0);
}
static DECLARE_HASH (DynamicModuleTable,
ap_StringHashFunction,
ModuleKeyGet,
ModuleKeyCompare);
#if defined(__SunOS_5_5_1)
extern int dlinfo (void *handle, int request, void *p);
#endif
typedef ap_Uint32 (*FindElfSymbolT) (ap_NameT SymbolName, ap_NameT ModuleName);
static int NextModuleId;
static void DynamicModuleFormat (ap_NameT Filename,
ap_Uint32 *StartAddress,
ap_Uint32 *Length)
{
ap_RootCausePrintEventStart ("program_comment");
printf ("Module loaded: %s\n Address span 0x%08x-0x%08x\n",
Filename,
*StartAddress,
*StartAddress + *Length);
ap_RootCausePrintEventEnd ("program_comment");
}
static void RecordDynamicModule (ap_NameT Filename, void *Handle)
{
ap_ModuleIdT ModuleId;
static FindElfSymbolT FindElfSymbolRoutine = NULL;
ModuleId = ap_ModuleNameToId (Filename);
if (ap_IsNoModuleId (ModuleId))
{
DynamicModuleDataPtrT DynamicModulePtr;
Link_map *Linkmap;
// Get the info for this.
if (dlinfo (Handle, RTLD_DI_LINKMAP, &Linkmap) == -1 ||
Linkmap == NULL)
{
ap_Error (ap_WarningSev,
"Cannot not loader info for %s",
Filename);
return;
}
// Is it in the dynamic table already?
DynamicModulePtr = (DynamicModuleDataPtrT)
ap_HashTableLookup (&DynamicModuleTable, (void *) Linkmap->l_name);
if (DynamicModulePtr == NULL)
{
ap_Uint32 ModuleSize;
ap_ModuleIdT NewModuleId;
ap_NameT ModuleName;
ap_NameT ModuleBaseName;
char *DotSoLocation;
int Dummy = 0;
// Find our internal FindElfSymbol routine.
if (FindElfSymbolRoutine == NULL)
{
FindElfSymbolRoutine = (FindElfSymbolT)
ap_SymbolAddress
(ap_SymbolNameToId (ap_ModuleNameToId ("libaprobe.so"),
"FindElfSymbol()",
ap_ExternSymbol,
ap_FunctionSymbol));
if (FindElfSymbolRoutine == NULL)
{
ap_Error (ap_FatalSev,
"Cannot find FindElfSymbol");
}
}
// Add it to the table.
DynamicModulePtr =
(DynamicModuleDataPtrT) ap_Malloc (sizeof (DynamicModuleDataT));
DynamicModulePtr->ModuleName = ap_StrDup (Linkmap->l_name);
DynamicModulePtr->StartAddress = (ap_Uint32) Linkmap->l_addr;
DynamicModulePtr->Length = FindElfSymbolRoutine ("_end",
Linkmap->l_name);
ap_HashTableInsert (&DynamicModuleTable,
(void *) DynamicModulePtr);
// Record it
log (ap_StringValue (Linkmap->l_name),
DynamicModulePtr->StartAddress,
DynamicModulePtr->Length)
with DynamicModuleFormat to ap_PersistentLogMethod;
// Now log it for the format logic to find
NewModuleId.Value = ap_FetchAndAdd (&NextModuleId, 1);
ModuleBaseName = ap_Basename (Linkmap->l_name);
ModuleName = strcpy (alloca (strlen (ModuleBaseName) + 1),
ModuleBaseName);
DotSoLocation = strstr (ModuleName, ".so");
if (DotSoLocation)
{
*(DotSoLocation + 3) = `\0';
}
ap_LogData (ap_IntegerToLogId (LOG_ID_FOR_FORMAT_RECORD_MODULE),
8,
&NewModuleId,
sizeof (NewModuleId),
ModuleName,
strlen (ModuleName) + 1,
&(DynamicModulePtr->StartAddress),
sizeof (DynamicModulePtr->StartAddress),
&(DynamicModulePtr->Length),
sizeof (DynamicModulePtr->Length),
&Dummy,
sizeof (Dummy),
&Dummy,
sizeof (Dummy),
Linkmap->l_name,
strlen (Linkmap->l_name) + 1,
ap_NoName,
strlen (ap_NoName) + 1);
}
}
}
probe thread
{
probe extern:"dlopen()" in "ld.so"
{
ap_NameT Filename = (ap_NameT) $1;
on_exit
{
if (!ap_IsNoName (Filename) && $return != 0)
{
RecordDynamicModule (Filename, (void *) $return);
}
}
}
probe extern:"dlmopen()" in "ld.so"
{
ap_NameT Filename = (ap_NameT) $2;
on_exit
{
if (!ap_IsNoName (Filename) && $return != 0)
{
RecordDynamicModule (Filename, (void *) $return);
}
}
}
}
probe program
{
on_entry
{
// Record the number of static modules
NextModuleId = ap_NumberOfModules ();
}
}
The following steps should allow you to use RootCause to trace activity in several of the daemons on your Linux system:
RootCause keeps a log file and a registry as defined by the APROBE_LOG and APROBE_REGISTRY environment variables. These are generally set on a per-user basis by the Aprobe setup script, based on the user's $HOME environment variable or on the environment variable APROBE_HOME if that's defined. The default location for these files is a hidden directory under a users home directory called ".rootcause". When RootCause intercepts a program that is starting up it looks in the user's registry to see if this program should be instrumented. If so, there will be an associated workspace file named in the registry. By changing the APROBE_HOME environment variable before running setup, you can change the locations of the log and registry. Note that these files have to be writable by all processes that access them.
Daemons like
sshd
are started on Linux using a shell (bash) script located in
/etc/init.d
. For
sshd
the file is
/etc/init.d/sshd
. If you edit this file you will see a subroutine named "start". Not surprisingly it is this subroutine that we want to add a few statements to setup RootCause to intercept
sshd
.
sshd
:We recommend that you create your workspace on a disk local to the machine that will be running the intercepted program on. Create it in the same way we did today, that is using the "new" pulldown menu on the main RootCause screen.
These files are probably in
$HOME/.linux_rootcause
. They are named: "registry" and "rclog". You can specify a different location using the APROBE_HOME environment variable (see
Q4.8
) but be sure to run "setup" after setting APROBE_HOME and make sure the protections of the resulting files are correct.
/etc/init.d/sshd
script.
You should probably make a copy of the
sshd
file before you modify it so you can restore it when you are finished tracing sshd.
/etc/init.d/sshd
script to setup aprobe:Find the start subroutine in the /etc/init.d/sshd file and insert the following four lines after the "do_dsa_keygen" line:
export APROBE_HOME=directory identified in step 2
. aprobe_root
/aprobe/setup
. $APROBE/bin/rootcause_enable
sshd
daemon.
As root and with your current directory as
/etc/init.d
execute
sshd stop
sshd start
You should see a stopped message from the stop and some output indicating that rootcause has started from the start message. You may get a "FAILED" message from the start. On our system even when we get the failure message the daemon seems to start with no problems. So I think you can ignore this message.
Tracing the libcrypt.so library was interesting, you can really see the ssh protocol flow as it generates keys and such.
The technique outlined above should work for many of the daemons on Linux.
Once you've used Aprobe to investigate the behavior of processes on a running machine, there is nothing particularly complicated about doing the same for system processes while the machine boots, but there are a number of special factors to take into account. These are listed below, and an example given of how we applied these to one of our machines.
The techniques described here were tested on a Solaris 6 box, but should be equally applicable to more current installations.
. /opt/aprobe/setup
rootcause_libpath -c
/sbin/rcN
". The execution of these scripts is described in
/etc/rcN.d/README
, for N = 2 or 3. Since RootCause depends on an environment being defined, we need to 'source' some scripts into this command so the environment is defined when servers and daemons are started. I did this by creating files in /etc/rc3.d. If you look at the README and
/sbin/rc3
script, you should see how this works.
/etc/rc3.d directory.
Defines the APROBE_HOME environment variable where the logs and registry are stored:
APROBE_HOME=/opt/aprobe_home
export APROBE_HOME
Is a soft link to the setup script in the RootCause installation directory:
ln -s /opt/aprobe/setup /etc/rc3.d/K01RootCause.sh
contains the command to enable intercept:
. rootcause_enable
Normally, scripts whose names start with 'K' are used to shut down processes before others are started, but we will take advantage of the fact that these are executed first to ensure that the RootCause setup is performed before anything else.
Yes. However, there are a couple unique things about tracing System Services that you need to keep in mind:
Unlike all other processes, you will _not_ see an APP_START event in the RootCause log when a System Service starts. So, if you want to trace a System Service, you must manually Register it (either with the "rootcause new" command or the RootCause GUI's Workspace->New dialog), and thereafter you will see APP_TRACED events for it in the RootCause log.
Like all Services automatically started at boot time, the RootCause dynamic process intercept Service is started in a pre-defined order by the System Control Manager (SCM).
In order for RootCause to intercept a Service at boot time, the RootCause process intercept Service must start _before_ the Service to be intercepted.
Generally, RootCause starts early enough in the Boot sequence to intercept all Services. However, if it's not early enough for a particular Service, it's easy to modify the Boot sequence so that RootCause starts earlier. This is done by modifying the ServiceGroupOrder Key in Registry.
For C-language applications, RootCause executes a script called "do_aprobe.cmd", located subordinate to the WorkSpace directory, in order to apply the trace (for Java applications, the script name is do_apjava.cmd). The error is reporting that the script could not be executed.
There are a couple things to check: First, this is most probably an access permission problem. Remember that System Services can be defined to run as _any_ user, and that user must have write permission to the RootCause Workspace directory. A common problem is that the Service runs as user LSA (Local Security Authority, or System Account), and LSA doesn't have permission to write to the Workspace directory.
Second, does the Workspace directory exist? Use the "rootcause register -l" command to get a listing of Registered applications and their corresponding Workspace directories and verify that the directory is present and intact.
rootcause register command. The pattern argument
consists of a simple expression that can specify argument positions,
wildcards and simple comparison and logical operations. You can
associated the same executable (or Java class) and different patterns
with different workspaces. At run-time, actual command-line arguments
are substituted for special identifiers in the expression (like %2, $*) and
then the expression is evaluated. If it evaluates to TRUE, the associated
workspace is used to probe the application. If no expression evaluates to
true, then the application is not probed. There's no GUI support; you
have to register your application from the command-line to use this feature.
All the details are described
here. If it's still not clear how to do what
you want, don't hesitate to contact us.
As described in the user's guide, RootCause on AIX does not support the automatic "intercept" of applications at load time: the application must either be run directly from the command line with "rootcause run", or else the binary must be renamed/replaced with a soft-link to a script that simulates the intercept effect.
Starting with version 2.1.3b (May 2004) you can do implement this second alternative with the rootcause link command, which renames/replaces the java binary with a script that uses access-lists and environment variables to manage who's applying rootcause to each Java instance.
The command rootcause link is used to apply Rootcause to
applications (typically services and application servers) which cannot
easily be started from a user's shell environment. rootcause
link uses symbolic links to "intercept" these applications. A
set of subcommands are available to manage these links safely and
conveniently.
Note that step 4 will probably require root authority, depending on where the application to be traced is installed.
echo /usr/java131/bin/java > server.lstThe application named here cannot be a symbolic link.
rootcause link -I server.lst
You may specify more than one application, each on a separate line, in this
file. The rootcause link -I command instructs RootCause to
save this file as the list of applications whose links are to be managed.
rootcause link -I will require write access to the RootCause
installation directory. If you need to change the application list later
you will need to apply step 7 below (remove symbolic links).
rootcause link -lThis will report a line like the following:
- /usr/java131/bin/java
The '-' indicates that the application is eligible to have its link managed,
but that link does not exist and as a result the application will not be run
under RootCause. rootcause link -L will show an explanation of
the characters used to describe the link state. These are:
- Executable is not RootCause linked
* Executable will be run under RootCause
? File is not an executable or is invalid
! A serious error was detected; contact support immediately
rootcause link -K
This will create symbolic links into the RootCause installation directory for
each application designated with the rootcause link -I command.
rootcause link -K requires write access to the directory where the
application to be traced is installed. Typically this will require root
authority.
rootcause link -aNow whenever the application is started, an entry will appear in the rootcause log. Follow the usual procedure to create a workspace and set up trace definitions.
rootcause link -a can be run by any user.
At this point you are ready to begin analyzing and debugging your application with RootCause. The remaining steps describe how to return the application to its original state and should be performed if RootCause is uninstalled.
The symbolic links will remain in place, but the application will not be run under Rootcause.
rootcause link -Z can be run by any user.
rootcause link -D
rootcause link -D requires write access to the directory
where the application to be traced is installed (same as -K).
This will restore your applications to their original state, where
they will run completely independently of any component of the
RootCause toolset.
RootCause will work with any Enterprise Java Application Server that uses a supported JVM.
RootCause can trace an Application Server that is run as a standalone Java JVM (using java executable) or it can trace a JVM that is embedded within a native executable.
RootCause has been tested with:
If the Application Server runs as a standalone Java JVM, you can create a workspace just like any other Java application. Make sure RootCause is enabled in the shell or environment you are running the Application Server JVM. Run the Application Server, and find the Java APP_START event in the "Trace Events" window.
In the New Workspace Dialog , there is an option for "J2EE Server Directory". Enter the directory where deployable Enterprise Java Bean (EJB) and Servlet classes and jars reside. RootCause will automatically add EJB and Servlet classes and jars that are specified in any J2EE compliant XML deployment descriptors.
Once a Java workspace has been created and opened, the J2EE Modules directory can be changed to another location, or the current directory can be searched again for updated or new J2EE applications. This can be done using the Workspace ->Update J2EE Modules menu item.
If the Application Server runs embedded within a native executable, you can create a workspace for the native executable, and then add the libjvm library as a dynamic module. First create a workspace for the executable that runs the Application Server as you would for any other. The open the Trace Setup window.
An Application Server might run an embedded JVM, but already have libjvm library loaded as a dynamic module. If this is the case, the libjvm library will show up in the list of loaded libraries in the "Trace Setup" window.
If libjvm does not appear as a statically-loaded module in Trace Setup, you must find the server version of the libjvm library (
libjvm.so
on Unix,
libjvm.dll
on Windows). Once this module has been found, it can be added using the
Workspace -> Add Dynamic Library
menu item.
Once the libjvm module is shown in the Trace Setup window, you can complete the J2EE configuration from the main workspace window using the Workspace -> Update J2EE Modules menu item.
These instructions assume $IAS_HOME is the install directory of the iPlanet App Server. $IAS_HOME does not have to be set for the application to be run or for RootCause to trace it. It is convenient to have $IAS_HOME set, in addition to $IAS_HOME/bin in your $PATH.
$IAS_HOME/bin/iascontrol is the command line script that controls starting and stopping of the iAS 6.5 server.
Make sure RootCause is enabled in the shell that you start iascontrol. Stop the app server by running `iascontrol kill'. Restart iAS server by running `iascontrol start'.
Once iAS is started, examine the RootCause "Trace Events" window. Find the ".kjs" process in the list. There might be multiple ".kjs" processes showing, selecting any will be fine. The ".kjs" process is the native executable that contains the embedded JVM of the iAS application server. iAS 6.5 defaults to at least two ".kjs" processes, one for the EJB engine, and one for RMI/IIOP connections.
Create a new workspace for the ".kjs" process. Once the workspace is open, you can add J2EE Modules by running the "Workspace -> Update J2EE Modules" menu item. Deployed applications within the iAS server are typically stored in $IAS_HOME/APPS directory. If you want to just add J2EE modules for a particular application, you can select a specific directory within $IAS_HOME/APPS.
There is no need to add libjvm.so as a dynamic module before tracing embedded JVM, as it is already dynamically loaded by the ".kjs" process.
You might want to trace classes in the app server engine itself. If so, add $IAS_HOME/classes/java/kfcjdk11.jar as a dynamic module. Expanding this jar in the "Trace Setup" window will allow tracing of engine classes.
These instructions assume $WL_HOME is the install directory, and $WL_DOMAIN is the WebLogic domain, found in $WL_HOME/config. $WL_HOME is set by the setEnv command in the $WL_HOME/config/$WL_DOMAIN directory. $WL_DOMAIN is not directly set by the startup and config scripts, but provides an easy shorthand for the WebLogic domain used.
WebLogic runs as a standalone JVM process, and is straightforward to trace using RootCase.
Make sure RootCause is enabled in the shell that you start WebLogic. Make sure the app server is stopped. Typically the app server is started by first calling `$WL_HOME/config/$WL_DOMAIN/setEnv`. This configures the environment for running WebLogic. Then start the server by calling `$WL_HOME/config/$WL_DOMAIN/startWebLogic`.
Create a new workspace for this JVM, the main class name is "weblogic.Server". You can add the "J2EE Server Directory" in the "New Workspace" dialog. The typical location of deployed applications for WebLogic is in $WL_HOME/config/$WL_DOMAIN/applications.
These instructions assumes $JBOSS_HOME is the install directory, and $JBOSS_DOMAIN is the domain used, typically found in $JBOSS_HOME/server.
JBoss runs as a standalone JVM process, and is straightforward to trace using RootCause.
Make sure RootCause is enabled in the shell that you start JBoss. Make sure the app server is stopped. Typically the JBoss is started by first calling $JBOSS_HOME/bin/run.
Create a new workspace for this JVM, the main class name is "org.jboss.Main". You can add the "J2EE Server Directory" in the "New Workspace" dialog. The typical location of deployed applications for JBoss is in $JBOSS_HOME/server/$JBOSS_DOMAIN/deploy.
First, get the right JRE and TomCat installation, and configure it (these instructions are for Windows):
Then create your workspace and enable Java Memstat:
Below is a script to create a workspace for jrun. The usage is pretty simple. Setup for RootCause, cd to the directory you want the workspace created in (strong, strong, strong recommendation for a local file system). Set the path to point to the jrun executable and run the script, for instance:
$ . /opt/rootcause211/setup
$ PATH=/work1/tools/jrun4/bin:$PATH create_jrun_ws.ksh
Checking RootCause installation ...
Finding Application Server location ...
Using Application Server found in /work1/tools/jrun4/bin
Creating workspace /percy_work/jrun.aws from:
JRun - /work1/tools/jrun4/bin/jrun
JVM - /opt/j2sdk1.4.0_01
Adding Program: "/work1/tools/jrun4/bin/jrun" ->
"/percy_work/jrun.aws"
Registry updated.
If this doesn't work for whatever reason, the workspace can be created manually. Add the $JAVA_HOME/jre/lib/i386/client/libjvm.so as a dynamic module where JAVA_HOME is the location of the Java Jrun will use. Probably the only initial classpath entry is for jrun4/lib/jrun.jar but I added the whole classpath I saw in the script.
(This assumes that you have run the demo program and have some basic knowledge of RootCause.)
The first thing we need to do is check that you are running one of the JVMs that RootCause supports. Websphere ships with it's own version of the JVM and so long as this is a Sun JVM, RC will work for the tracing components. If it's an IBM JVM, that's only supported on AIX.
To see this, open an X terminal or some other console on the machine that you have Websphere running on. Set up for RootCause by running the setup script in the RootCause installation and turning rootcause on (refer to the demo if necessary for these steps). From the same terminal start Websphere.
Open the RootCause GUI ("rootcause open") and it will open the log of started processes. Near the bottom you should see the Java process starting for Websphere. It should show the path to the JVM so you can check it's version.
All being well you can just select the process start node, right-mouse button click and choose "Open Associated Workspace". Accept the defaults and you will be able to setup your trace against Websphere.
If the JVM supplied is a 1.2.2 *production* version of the JVM then it will be necessary to swap to a *reference* implementation if you want to use the Java memory analysis probes. For more information refer to: the Memstat documentation.
The java memstat probe is built on top of another probe called libapjvmpi. It is an interface to the Java JVMPI library and takes care of a bunch of the low-level work. One of the things it provides is a mechanism to take a heap dump. Working with the interface requires getting a dynamic pointer to the libapjvmpi interface and then using that. For instance:
#include "libapjvmpi.h"
static apjvmpi_InterfacePtrT JvmpiInterface = NULL;
static apjvmpi_InterfaceHandlePtrT JvmpiHandle = NULL;
void InitializeUal_early_heapdump (void)
{
// Load the jvmpi interface UAL
if (ap_IsNoUalId (ap_LoadAndInitializeUal (LIBAPJVMPI_LIBRARY_NAME)))
{
ap_Error (ap_WarningSev,
"Unable to load "LIBAPJVMPI_LIBRARY_NAME"\n");
}
}
probe program
{
on_entry
{
JvmpiInterface = apjvmpi_Initialize;
if (JvmpiInterface == NULL)
{
ap_Error (ap_WarningSev,
"Unable to initialize JVM support for\n"
"Java object tracking.");
return;
}
// Get an interface handle
JvmpiHandle = JvmpiInterface->Initialize (3);
if (JvmpiHandle == NULL)
{
ap_Error (ap_WarningSev,
"Unable to get a necessary interface for "
"Java object\n"
" tracking. It requires interface version 3 but the
"
"apjvmpi library\n"
" is at version %d\n",
JvmpiInterface->GetVersion ());
JvmpiInterface = NULL;
return;
}
}
}
To call the heap dump you would need a probe to determine when and call the heap dump routine:
// Request a heap dump. Keep the last n heap dumps specified - note that
// if there is already a larger count set, that value is retained.
// void (*RequestHeapDump) (apjvmpi_InterfaceHandlePtrT Handle,
// int RetainHeapDumpCount);
{
// Keep 3 dumps
JvmpiInterface->RequestHeapDump (JvmpiHandle, 3);
}
You'll need java_memstat around to format the object dump(s).
Here are some possibilities:
The following possibilities apply only to native (C/C++) functions:
apinfo -sa -x your_application.exe | grep "your_missing_function"
or the Windows equivalent. If you see your missing function in the output, it cannot be instrumented. Contact OC Systems to find out why.
When you add a module as a dynamic dll, this forces it to be preloaded (loaded before program start rather than at the point of the
dlopen()
/
LoadLibrary()
). This means that the
_init()
function is called before _start of your main application, which is before probes have been applied.
On Solaris, there is an
open()
in
libc.so
and one in
libthread.so
. They both call something called
libc_open()
in
libc.so
, so that's the one you should trace. In RootCause 205 this made more accessible using the shadow header file, so will show up as residing in source file
libc.so.h
. There's also a
libc_close()
.
You may be loading a different instance of the library at runtime than you specified to Add Dynamic Module. This may be the case if LD_LIBRARY_PATH (or equivalent) is set. Make sure that the full path to
mylib.so
you've added to your workspace is the same as the one that will be loaded at runtime. On Windows, change ".so" to ".dll" and LD_LIBRARY_PATH to PATH and this still applies.
Make sure the "Add to Custom APC Files" checkbox is checked. If you've already got an APC file, make sure the Append checkbox is checked as well. Also, see Q11.1 .
Because workspaces are marked as System Folders, so we can associate a special icon with them. Of course this doesn't help when you're in dos mode. Use dir /as to see system folders.
Generally you should be able to -- they should show up with special Icon. If they don't then you're running Windows NT or the Icon wasn't registered right. In any case, workspaces are system folders, so you have to set your Folder Options to uncheck the Hide System Folders option.
Older versions of RC would get "stuck" in a weird state where you can neither Uninstall nor Install. The following is the (ugly) procedure you need to follow to get your current version of RC completely uninstalled, and the newer version installed.
I. Procedure for Manually UnInstalling RootCause
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services. Scroll down and you will see a subordinate Key named "rci".
Delete this rci" Key.HKEY_LOCAL_MACHINE\SOFTWARE\. Scroll down until
you see a subordinate Key named "OC Systems".
Delete the OC Systems Key.%APROBE%\licenses\license.dat.C:\Program Files\OC Systems.%SYSTEMROOT%\system32\drivers\rci.sys and
%SYSTEMROOT%\system32\rcjit.exe.II. Procedure for Installing the New RootCause
You need to delete it from the registry. The easiest way to do this is with the GUI:
rootcause register -d -c class_name
to unregister a Java main class.
To unregister a native program, first do rootcause register
-l to see the exact path of the program that is registered,
then do rootcause register -d -x exe_path.
This message will be followed by specific information about the ADI file and module. The module is the executable or DLL on the remote machine, and the ADI file contains the debug information from the host machine where the workspace was developed. The error messages indicate that the version of the module (application) on the remote machine does not match the version against which you developed your original traces. You must create the workspace and traces against the same version you send tot he remote site because we compare checksums. The only difference is that on Windows, the PDB need not be present on the remote machine because the ADI file contains the information that is needed. Unfortunately on Windows the Visual C++ release build defaults to stripping symbols (on Unix the default is to leave them in). Therefore you need to build to get symbols - you don't need debug, just symbols. If you do want debug in order to support the full range of probes, then you should build add /Zi and /link /debug to the Release build options when you build an application that is to be shipped. This is described here.
If after installing the RootCause Console, you notice that when you click the "rootcause on" button in the main GUI window the checkbox won't stay checked, and when you execute the "rootcause on" command from a CMD window you get the following apparent error message:
(E) The program intercept mechanism is not running.
This is an installation problem: contact support@ocsystems.com
RootCause registry found and RootCause is DISABLED.
then the most likely cause of this problem is that you forgot to reboot
your machine after installing RootCause; Please reboot and try again.
If a system reboot does not correct the problem, please follow this procedure to obtain debug information to assist OC Systems support in resolving your problem:
The most likely cause of this is that you're using the "-jar" option on your 'java' command, which is not supported by RootCause prior to version 2.1.2 (October 2003).
So, if your application is run with
java -jar $APROBE/lib/probeit.jar
You could run it instead with:
java -classpath $APROBE/lib/probeit.jar com.ocsystems.probeit.Main
If you don't know what the main class is, it is defined in the manifest of the .jar file. For instance:
mkdir tmp
cd tmp
jar -xf $APROBE/lib/probeit.jar META-INF/MANIFEST.MF
grep "Main-Class" META-INF/MANIFEST.MF
This will give a line "Main-Class: com.ocsystems.probeit.Main".
cd ..
rm -rf tmp
You would do the same thing using your own java command line and jar file in place of the above.
After you have changed the command line, you should then re-run the application and got through the "New Workspace" steps. This time it should work fine.
If this is too much of a hassle, contact support@ocsystems.com about getting a version with -jar support. If you weren't using -jar, or if the problem persists after going through above process, also contact OC Systems support and we can help you debug it.
You will find that when you use "Open Associated Workspace" it imports only the jars in the class path and and so other classes that might be explicitly loaded do not appear in the Trace Setup. This can be easily remedied.
So long as the class loader follows the standard model for class loader inheritance (e.g. classes loaded by that loader have visibility to classes loaded by the application class loader) this is trivial:
If there is no physical representation of the class available, you can use wildcards:
"MyClass::*""MyClass::aMethod"Your home directory (which will be the default disk for the rootcause log) is probably on an NFS disk. When two processes try to lock a file at the same time, one will be halted until the other one is done. However, with NFS it can take a while for the state of the unlock to propagate back, leaving the caller waiting on the lock routine even though the other process has unlocked it. The solution is to set APROBE_HOME to a local disk.
The System Control manager (SCM - the process that handles the Services applet) is real picky about the timings for Service start and stop. With Rootcause enabled, we may be delaying the start of the Service just enough to cause SCM to complain.
Can you check to see if the Service aborted? Or better said, is the service Service running after you see the ErrorMessage box. Use Task Manager to determine this - once you get the timeout, the Services applet doesn't report Service status properly. If the Service aborted, please let me know - we may have to excluded it from Intercept.
There's a Registry _VALUE_ that controls the Service timeout:
HKLM\SYSTEM\CurrentControlSet\Control\ServicesPipeTimeout.
This is a
REG_DWORD value that probably has value 120000
(which is two minutes).
Try increasing this value (e.g. double it) to see if it address the problem.
Add Dynamic Module causes a library to be "preloaded" (using the aprobe -dll option) because it's only on program startup that automatic trace configuration can be done. However, some user libraries cannot be preloaded because they rely on some global state being defined which isn't done until the program starts running.
On Unix platforms, this (currently) means you can't trace or do anything else on this module. You're beat unless you can change the library to allow it to be pre-loaded.
However, on Windows there is partial support for probing modules that are loaded after program startup. In particular, you can use custom probes, but you can't use the predefined probes which use the "probe all" feature. We wrote a subset of the trace probe for a customer to use on his dynamically loaded Windows library: dyntrace.apc. Give it a try and/or contact us for help.
If you just use apcgen module.dll > module.apc you'll get a huge file. This file is translated into ANSI C, then compiled with the native compiler. The capacity of the Visual C++ isn't huge, so that can fail with an error like:
module.apc(102598) : fatal error C1076: compiler limit : internal
heap limit
reached; use /Zm to specify a higher limit
(E) apc could not compile the file module.apc_c.c.
You invoke apc with the "/Zm" option as suggested by adding:
-compiler /Zm300
on the apc command line, which increases the VC++ compiler heap to
300% of the default maximum.
Alternatively, you can attempt to break up the apc file by-hand, or can generate just a subset of the traces by using the -p and -f options on the apcgen command; Use apcgen -h for brief usage or see Appendix A of the Aprobe user's guide.
CoInitializeSecurity() when running under RootCause?
We don't know, exactly. However, we have identified a simple workaround probe:
#include
probe thread
{
probe "CoInitializeSecurity()" in "ole32.dll"
{
on_exit
{
if ($return == RPC_E_TOO_LATE)
{
$return = S_OK;
}
}
}
}
To use this:
coinit_workaround.apc;apc coinit_workaround.apccoinit_workaround.dll file into the
workspace.rootcause register xxx.dply?
A .dply file is just a zip file. You can just use
zip (provided with RootCause) to add files to this archive, like:
zip xxx.dply this.txt, that.class, other.ual
Because it was built on an old version of Linux. You can rebuild it from source using the Makefile in that directory, or else load the compatibility package for Fedora: compat-libstdc++-*.i386.rpm.
If your Workspace is being accessed over NFS, this means you're writing the data to APD files over NFS, and Linux has known bugs with this. You really need to have your workspace/APD files on a locally-mounted disk. (Even if it weren't for this bug, logging over NFS is orders of magnitude slower.)
If you're seeing something like:
First, bear in mind is that the warning can be safely ignored.
Starting RootCause...
eddea02:/home/essc2/josephw/devenv/cstnd/src
==>Jan 25, 2005 8:26:39 PM java.util.prefs.FileSystemPreferences$2 run
INFO: Created user preferences directory.
Jan 25, 2005 8:26:41 PM java.util.prefs.FileSystemPreferences$3 run
WARNING: Could not create system preferences directory. System
preferences are unusable.
Sun's workaround is to run as 'root' with any Java application once.
(There's also a way to eliminate this entirely, but it requires Java 1.4 and for compatibility reasons, the RootCause GUI is built with Java 1.2.2.)
Aprobe is a suite of tools and libraries which support dynamic modification and extension of a program by dynamically patching the program executable and/or shared libraries.
A dictionary defines "Probe" as "Device for exploring an otherwise inaccessible place or object." "Aprobe" stands for "Algorithmic Probe". It is hence a tool for exploring your program with the help of user-written algorithmic probes. These probes are installed into your program with the help of OC Systems' patented "dynamic action linking" technology.
A user runs a program with the "aprobe" tool, indicating that certain "probes" are to be patched into the program and executed as the program itself runs.
A "probe" consists of "actions" composed in C, with some special syntax added to indicate where in the program the actions are to be invoked.
There are a number of predefined probes included in Aprobe; there is a tool to generate simple probes directly from a linked or unlinked object file; or the user may easily compose his own probes in a simple extension of the C language.
See also "What is RootCause?"
The ProbePak was an experiment at introducing users to the power of Aprobe and RootCause by making a subset available for free download. It didn't work out, and ProbePak is no longer supported. See the main page www.ocsystems.com for information on our current products.
Read more about
uses of Aprobe
in the
Product section
of the web site or read the
white papers
in the Resources section. See also
"What are some potential uses of RootCause?"
The best way to get started writing probes is to look at examples, and make some small changes.
If you have RootCause and have been using the GUI, you can use the Custom... button in the Trace Setup window to generate a probe, and look at that. If that looks too daunting, or you want a more tutorial approach, try the graduated examples in the
examples
(or
ada_examples
) and
demo/Aprobe
subdirectories of the Aprobe installation. Check out
%APROBE%\Examples\Simple\Readme.txt
(Windows) or
$APROBE/examples/evaluate/README
(Unix).
The current version of Aprobe on AIX is 4.4.1; on all other platforms it is 4.3.4b, released in June 2005.
The original version of Aprobe is version 2. for AIX, included as part of OC Systems LegacyAda/OATS product, and in earlier versions of OC Systems "PowerAda" product.. While it shares the "probe" concept with the newer version, the user interface and details of Aprobe Version 2 differ substantially from Versions 3 and 4.
Aprobe is currently available on AIX, Linux (x86), Solaris, and Windows 2000/XP.
The detailed requirements are documented in Chapter 2 of the RootCause User's Guide for
Unix
or
Windows
.
Aprobe is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are included in the evaluation version which can be downloaded. The HTML version is available on-line at
www.ocsystems.com/sup_ug_index.html
.
There are a series of graduated examples that come with their own text documentation in the examples and demo subdirectories of the Aprobe installation. You should read %APROBE%\Examples\Simple\Readme.txt (Windows) or $APROBE/examples/evaluate/README (Unix), and try at least some of the examples under that directory, before trying Aprobe on your own application or looking through this FAQ for answers.
apcgen - generates APC for some or all functions in the specified object file(s)
apc - compiles and links the specified APC file(s) into a UAL (DLL).
aprobe - runs the specified program after loading and applying patches in the specified UALs.
apformat - formats any data logged in the specified aprobe data (APD) file(s).
These tools are described further in other questions below. A number of additional tools and scripts and for specific situations are also provided. See Appendix A of the Aprobe User's Guide.
Same as RootCause. See Q1.9 .
Yes. It's called RootCause. See Q1.1 .
In addition, on Windows, we provide a GUI for Aprobe, which is deprecated but still works. Try the Aprobe menu under the Start menu on your workstation to start it.
Also, Some predefined probes (see Q15. below) include a Java GUI to specify configuration parameters for that probe.
Yes. You can run aprobe (without any probes) on any application at all unless:
If you find that using aprobe causes your application to crash, you should try running aprobe without any probes. If it still crashes, it should be reported as a bug to .
A slightly different question is, "Can I use Aprobe to put probes on any program?" To actually apply probes to a native module, there are three basic requirements:
For Aprobe to do what it does it must be able to figure out where the subroutines you are trying to probe have been linked and loaded. We call this location information "symbols". All symbolic debuggers have the same requirement. See Q12.17 .
The symbols may be as originally added to the application (i.e., not stripped, see Q12.16 ), or they may have been saved separately by Aprobe using apmkadi (see Q13.11 ).
Most programs delivered with the operating system, and off-the-shelf software, are stripped, so you can't use Aprobe directly on the application code, but you can generally probe shared libraries (DLLs) that support them.
If the program uses a mechanism that transfers control other than by the normal call and return mechanism, such as
setjmp
/
longjmp
or an unsupported exception mechanism, and there is an active probe at the time of that non-standard transfer of control, the program will likely crash.
Ada and C++ (and Java, but that's a separate issue) support exceptions which are non-standard transfers of control. Each compiler does this in a different way, and must be explicitly supported by the Aprobe runtime. See Q12.15 .
Same as for RootCause. See Q1.10
Same as for RootCause. See Q1.11
Use the "file" command, e.g.:
Solaris:
$ file a.out
a.out: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped
$ file /bin/ls
/bin/ls: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped
AIX:
$ file a.out
a.out: executable (RISC System/6000) or object module not stripped
$ file /bin/ls
bin/ls: executable (RISC System/6000) or object module
Linux:
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), not stripped
$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped
apcgen -L will list the Aprobe function symbols in any compiled object module, for example:
apcgen -L C:\WinNT\system32\kernel32.dll
apcgen -L /usr/lib/libc.so
apcgen -L /work/programs/prog.exe
There are other apcgen options such as -m to show "mangled" names and -v to show file names--use apcgen -h for usage.
The RootCause Trace Setup window shows a tree of all the functions organized by module, directory and file, using the same mechanism used by apcgen.
If you want information about data symbols, or want to confirm that a function may actually be probed, you can use the apinfo command, which runs the "info" predefined probe. This only works on executable programs. For example:
apinfo -d /work/programs/prog.exe
will show all the global and file-static data symbols found when prog.exe is loaded by aprobe. There are lots of other options: use apinfo -h to see them. See Q13.7 if you're on Windows and apinfo prints nothing at all.
On Unix, every program has its symbols unless they're explicitly stripped (see Q12.16 ). So, to get symbols in the program, don't run the "strip" command or link with an option that causes the resulting program to be stripped. Shared libraries always have at least global (external) symbols.
On Windows, if you have control of how a program is built you can greatly expand the routines available for probing by adding the "/debug" switch and the "/pdb:none" switch when you link. These switches cause symbol information to be generated and the information to be put in the executable.
On Windows, the Microsoft Visual C++ compiler can produce symbols in two major forms or it can produce no symbol information at all. It can also place the various types of symbols either in the executable itself or in two kinds of separate files. Complicating this issue further is that shared libraries (aka DLLs) can be produced with the entire range of debug information forms available. One good item is that DLLs almost always expose their public interface by name.
Generally software you get from a vendor (COTS) will have no symbol information included with it. However Windows programs are generally broken into a number of DLLs and the interface between DLLs is visible to Aprobe. Thus while you can't probe routines local to Notepad.exe you can probe how Notepad.exe makes calls to CreateFileW in KERNEL32.DLL.
This is documented in Chapter 10 of the RootCause User's Guide, "Building a Traceable Application", and in Chapter 3 of the Aprobe User's Guide, but it's summarized here:
In addition to compiling with the right option to generate the debug information, you also must retain that information and have it available where it's supposed to be:
export APROBE_POWERADA_LIBRARY=/builds/old/prog1/adalib
The apcgen command will list those functions that have debug information associated with them:
apcgen -Ld a.out
This should be all you need, but on Unix there are some system utilities that look in the object files themselves that may also be used:
readelf -d a.out | grep "N_FUN.*:F" | awk '{ print $NF }'
will list all functions for which there is debug information. If you don't see what you're looking for in an executable, and you are using the C or C++ compiler, the debug information may be in the individual object file. Use the above command on the appropriate object file, or, if you're not sure where it is, do:
readelf -d a.out | grep N_OBJ | awk '{ print $NF }'
to see all the object files referenced by the executable.
objdump -G a.out | grep "N_FUN.*:F" | awk '{ print $NF }'
(Note that while readelf is available on Linux also, it only shows DWARF debug information which isn't yet supported by Aprobe).
dump -t a.out | grep ":F"
will show the functions that have debug information.
A "probe" is a "user action" associated with a specific location in a program. The user action is executed whenever control passes through the location with which it is associated. A "probe" is described in an extension of C called "APC", for example:
probe thread
{
probe "foo"
{
on_entry
{
printf("Entering foo.\n");
}
}
}
The block following the "on_entry" is the "user action". The syntax surrounding it describes exactly where and when the action should be executed: immediately upon entering function "foo()" in each thread.
A UAL is a "User Action Library". It is the output of the "apc" command, and is a shared library consisting of the object code generated from your apc files. Not just any shared library (DLL) may be used as a UAL, and it a UAL may not be renamed after creation, because it has specially-named entry points based on its filename which are called by the Aprobe runtime to perform initialization.
It has an extension of DLL because it is a regular Windows Dynamic Link Library (DLL). In most cases it could also be named .ual , but there are some cases where the .dll is required by Windows.
With respect to Aprobe, "logging" means "writing data to a file for later analysis" Aprobe provides a built-in logging facility that allows saving raw data in a time and space-efficient way, and using "apformat" to display the logged data later. See "Logging Data" for related questions.
An ".apd" file is one that contains the data generated by a program run under aprobe. These are binary files which are read with the "apformat" tool.
There is always a ".apd" file generated giving aprobe invocation information, even if no "log" statements are executed. If log statements were executed there will be a "-1.apd" file, and maybe "-2.apd" files as well.
You can't reference source-level information in your probes. It's just like using a source level debugger in this respect, and for the same reason. A good rule is, if the debugger can print the value of a variable x at line 15, then you can do "on_line(15) log($x)" in your probe.
More specifically, you need to specify "-x exe_or_library " on the apc command, and the exe_or_library must contain debugging information, if you use a construct in your probe that cannot be resolved without specific debug information from the program. Such constructs are:
(a) target expressions: names from the probed program preceded by $, or $* ($1, $2 are ok, as are hardware-register references starting with '$$').
and
(b) references to specific source lines;
Note that there are lots of probes you can write; for example, all but one of the predefined probes provided with Aprobe will work fine in the absence of debug information, and the one that does require it (coverage) does so in order to get source line number information.
on_line() requires application to be have debug
information?Yes, but things aren't that simple. To build a probe that requires debug information (including line information) the debug must be available when the probe is compiled. However, the debug information can then be stripped and the probe ran against the stripped executable.
For the symbol table, the necessary symbols must be present at runtime, either in the application (or application libraries) or in a .adi file which is generated with the Aprobe tool apmkadi . That tool allows you to capture the symbol table in an internal form and then strip the executable.
Also, PowerAda programs always contain source line information -- this is not considered debug information.
Finally for low-level hacking, you can instrument specific offsets using
on_offset.
For probes you are just limited by paging space. For UALs there is a more practical limit - we limit the total number of modules to 255 and that includes UALs.
Yes, if it's in the debug we can see it. We don't look at whether the debug says it's private, protected or public - we just use it.
No. This question is very frequently asked. It sounds great in theory but in practice Aprobe is a tool for tracking problems that have yet to happen, not those that have just happened. There is also quite a bit of work done by Aprobe when an application starts up; often doing this to a running application is as big an issue as re-starting the application.Finally for Java you wouldn't be able to change the classpath to see our classes or intercept classes that have already loaded.
Yes, if you know its address and size, you can define a symbol for it
using ap_RecordDynamicFuntionSymbol() in the Aprobe Runtime
Library and and then apply probes using the define symbol.
defsym.c
#include
defsym.apc
//---------------------------------------------------------------------------
// Define Dynamic Function Symbol Example
//
// This is an example of using ap_RecordDynamicFunctionSymbol()
// to define symbols when no debug information is available.
//
// NOTE: If the offset for symbols is wrong the program will
// likely crash because you will have directed Aprobe to instrument
// the wrong piece of code.
//---------------------------------------------------------------------------
#include "aprobe.h"
// To define your symbols early enough to be instrumented and
// probed, you have to define them from a UAL initalize function.
// The initial part of the name must be InitializeUal_, and the first
// character following that must be lower in the ASCII collating order
// than the first character of the UAL name. '0' is the lowest legal
// character.
void InitializeUal_0_defsym_apc()
{
// In this example I just define an alias for the symbol "main"
// and probe that instead. You have to know the correct offset
// and size of the function (though size is not so critical).
// The offset is the offset in the moudle, not just the text
// section.
ap_SymbolIdT NewSym =
ap_RecordDynamicFunctionSymbol (
ap_ApplicationModuleId(),
"MyAliasForMain",
ap_ExternSymbol,
ap_IntegerToOffset(0x10),
0x1d,
0);
if (ap_IsNoSymbolId(NewSym))
{
printf("Couldn't define symbol...\n");
}
}
probe thread
{
// You'll get a warning about the symbol not being defined
// when you compile this with apc, but it's OK.
probe "MyAliasForMain"
{
on_entry printf("Hello again...\n");
}
}
Aprobe locates the specified UALs (if any), loads them as well as the Aprobe runtime, patches the executable to invoke the probes described in the UAL files, and starts execution of the specified program.
The executable program name is the last argument on the aprobe command line. All options after that are passed as arguments to the executable. For example, if your regular command-line would be:
mygrep "a_string" *.txt
Then with aprobe it would be, on Unix:
aprobe -u mygrep.ual mygrep "a_string" *.txt
and on Windows:
aprobe -u mygrep.dll mygrep.exe \"a_string\" *.txt
Note the backslashes needed to preserve quotes passed to the program.
The most reliable way to do this, used by RootCause, is with the aprobe "-execvp" option. In this case you specify a filename in place of the parameters, and the filename includes all arguments, including "argv[0]" that is to be passed as the executablename. For example, in the above case:
aprobe -execvp -u mygrep.ual mygrep mygrep.args
where mygrep.args might contain the lines:
mygrep.exe
"a_string"
file1.txt
file2.txt
Options and parameters can be passed to each UAL as well. This is done by following the UAL name with the -p option followed by the options in quotes. This is most commonly seen when invoking a predefined probe that is part of Aprobe, for example:
aprobe -u info -p "-sa" mygrep.exe
The options to the info probe are "-sa".
The "-if" ("immediate format") option on the aprobe command does this, e.g.,
aprobe -if -u fooTest foo
Not at this time. Even if you do "aprobe -if -n 0 ... " you get the basic .apd file.
Use RootCause. That's one of it's key features If for some reason you can't do that, you can
See Chapter 4 of the Aprobe User's Guide, "Loading Probes without aprobe".
First make sure that Aprobe was correctly installed. You can do this by running one of the examples in the Aprobe\examples\... directories.
The most common reason you don't see any output is that standard output from your program is going to the null device.
A windows program can be linked for one of several Windows subsystems. If the subsystem is the "Windows GUI subsystem" standard output seems to go elsewhere. You can determine what subsystem your program has been linked for by using QuickView (look for the "subsystem" entry under the "Image Optional Header"). You can still get your output by redirecting it to a file using the -o switch on the Aprobe command. For example:
aprobe -u myual.dll -o!stdout.txt!stderr.txt myexe.exe
In this case you are indicating that:
standard input: <none> standard output: stdout.txt standard error: stderr.txt
This is a result of the buffering that Windows does for each executable/DLL.
For each module (the executable and all the DLLs) NT sets up an individual output buffer for standard output and standard error. If the output is going to a device like a command window no buffering is done and the output from different modules is interleaved on the output device. However if the output is going to a file then the buffer is dumped only when it fills (or the program ends). You can control this buffering by using the C runtime call setvbuf() .
Aprobe turns off buffering in each UAL you produce. Thus if you are using multiple UALs and the Aprobe runtime all output will be interleaved correctly. Unfortunately output from your application may still be buffered. Currently your options are to direct stdout and stderr to a terminal device like a command window or you can recompile your application and make the following call in it:
setvbuf(stdout, NULL, _IONBF, 0);
In the near future we will provide a way in a probe to turn off buffering in your target program.
If your program explicitly loads a file by calling dlopen("dynamic.so"), Aprobe does not support this directly since it does all its patching when the executable and any shared libraries linked in are first loaded into memory. So the only shared libraries you can probe are those listed by the command
ldd exe_name
However, the
LD_PRELOAD
environment variable can sometimes be used to achieve the same goal. Suppose that we have executable a.out which will load at some point libmyfuncs.so using dlopen. The following would cause the shared library to be loaded with a.out and thus accessible to Aprobe:
LD_PRELOAD="/full/path/libmyfuncs.so" aprobe -u myprobes.ual a.out
This assumes that
libmyfuncs.so
is not dependent on any other shared libraries and that it doesn't hurt to be initialized earlier than would have been the case with dlopen.
In general, if you're using Java, you should be using RootCause and not Aprobe directly. However, you can do this using apjava -dll option. If your Java class named "JTest" contains "LoadLibrary("native") then this should work:
apjava -dll /full/path/libnative.so -u native_probes.ual -java JTest
or similarly, for Windows
apjava -dll \myprograms\native.dll -u native_probes.dll -java JTest
If you have a program that can be probed, you can run the tool apmkadi on it to create an Aprobe Debug Information (ADI) file. You can then remove the symbols from the executable (using the strip command on Unix, or removing the PDB file on Windows) and ship it to the target site. When you want to run Aprobe on that, you would then specify not only the UAL file(s) containing the probes, but the ADI file(s) as well, which contain only the symbolic information needed by Aprobe. See Appendix A, "apmkadi" for more information.
Yes. The "-p" flag, which prevents generation of any APD files, was introduced in Aprobe version 4.2.5. This is useful if your probes don't log any data using the default log method.
The possibilities are:
on_exit
code within a block that checks the
ap_ProbeActionReason
implicit parameter, e.g.:
on_exit {
if (ap_ProbeActionReason == ap_ExitAction)
{
log("foo returns ", $return);
}
else
{
log("foo exits abnormally for: ", ap_ProbeActionReason);
}
}
We hoped that by getting rid of shmat() from our code
that we would no longer cause conflicts.
Unfortunately we didn't realize that the OS would choose
memory map addresses that would conflict, so the problem immediately
reappeared. We added a different flag to allow you to specify the memory
area that should be used: -q mmap=address
where address is the address that should be passed to mmap() when
Aprobe requests its shared memory. For example:
aprobe -q mmap=0xd0000000 -u myprobes myapp.exe
If you don't have this flag, you'll need an updated version of Aprobe but you might be able to get around it:
Many users find that they can avoid shared memory conflicts simply
reducing the size of the APD files. The default maximum size is 256M
persistent and 256M user APD file. By using a ring (aprobe -n flag)
you can vastly reduce the user apd size and you can use the -sp flag
to specify a reduced persistent file. For instance, the
following:
aprobe -sp 16000000 -n 5
will create a persistent file of approx 16M and up to 5 APD files of 2M each.
The size of the persistent APD file is controlled independently of
the size of the APD ring files. You can use the -sp
option to lower this significantly.
The default is 256Mbytes because we need to set it to the maximum at the
beginning. However we've found that 16M is generally
sufficient in practice.
If you look and see how big your persistent files grow you can use that at a baseline. The main things that get logged to the persistent file after program start are:
The application might have a poor implementation of malloc built-in. On Solaris and Linux an application can provide it's own implementation of malloc, free, etc. and this will be used. Most local versions of malloc are well behaved. Some, however, require initializing by the application before first use. Since Aprobe gets in earlier than the main() this can cause a malloc request to be made ahead of it being initialized.
If you have control over the code you should fix this by making the malloc self-initializing. If you don't then, unfortunately, you will not be able to run the application under Aprobe.
If (for example) in the Trace Setup GUI window for the pi_demo example, you don't see anything under the pi_demo module, or you don't see all the information described in the demo, then it's likely that RootCause is not finding the DLL it needs to read the debug info.
RootCause uses a DLL that is distributed with Microsoft Visual Studio to read PDB file information. The name of this DLL is slightly different for different versions of Visual Studio. e.g. For VC++ 6.0 it's mspdb60.dll, for .NET (VC++ 7.0)=20 it's mspdb71.dll.
The pi_demo example was built on VC++ 6.0 and by default RootCause will not use the VC++ 7.0 DLL to read VC++ 6.0 PDB files. There's an Environment Variable that can be set to alter this behavior.
Here's what you need to do:
FYI, the following are the locations of the vcvars32.bat and mspdb71.dll files for a typical .NET Visual Studio installation:
c:\Program Files\microsoft Visual Studio .NET 2003\VC7\bin\vcvars32.bat c:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE\mspdb71.dll
apformat reads one or more related APD (.apd) files and formats the data they contain. For example, if the command
aprobe -u a.ual a.exe
produced the files
a.apd a-1.apd
Then the command
apformat a.apd
If you specify the "base" one, without any number at the end (e.g., a.apd), all of the files that were written to during the most recent invocation will be formatted. If you specify an individual data file, such as "a-2.apd", only the data in that specific APD file will be formatted.
Yes. Use the "-z" option to indicate that no UALs are to be loaded implicitly, then use "-u" to explicitly state which one you want to use:
apformat -z -u first myprog.apd
Yes. If you provided your own format routines, you can do it by editing those routines and re-generating the UAL of the same name as the original .
Lets say you have "dumpall.apc", from which you generated "dumpall.ual". Copy "dumpall.apc" to "dumpall.apc.save". Then edit "dumpall.apc" and comment out the bodies of all the format routines except for the one(s) you want to keep. Use `apc' to compile "dumpall.apc" into "dumpall.ual", e.g., apc dumpall.apc -x myprog then do:
apformat -z -u dumpall myprog.apd
The UAL name must be preserved because the basename of each UAL is part of the "key" used to map formats to data in the APD.
Yes, and this is actually preferable:
No. The formats are generated automatically and there's no way to put your own conditions within them. (Of course you can put conditions around the log statement at run time, so that no data is recorded to begin with, but this is a different issue.)
When you want to use UALs different from, or in addition to, the ones that were specified when you ran aprobe. You might want to do this in order to only process part of the data, or use different format routines. Use
apformat
-z
if you want to use
only
those UALs explicitly specified on the apformat command line.
No. There must be a valid APD file generated by aprobe.
This is almost certainly because there's a bug in one of your format routines. See "Debugging Your Probes" near the end of Chapter 3 of the Aprobe User's Guide for
Unix
or
Windows
.
However, if you didn't write any of your own format routines, either because you're using a predefined probe, or because you just used "log(something);", then this is probably OC Systems' fault and you should contact .
No. ap_UalArgv at apformat time is for reading arguments passed to the UAL on the apformat command line, as in:
apformat -u my_probe -p "param1 param2" t.apd
You would have to log the data you need from run-time yourself, and format
it later. This can be done by including the following APC file into your
APC file prior to the "probe format" or other format routine in which you
want to use the arguments. You can then use the variables ap_RuntimeUalArgc
and ap_RuntimeUalArgv just as you would use ap_UalArgc/v at run time.
/* logualargs.apc
* Include this once per UAL to record runtime arguments for format time use.
*/
#ifndef _LOGUALARGS_APC_
#define _LOGUALARGS_APC_
static int ap_RuntimeUalArgc = 0;
static ap_NameT *ap_RuntimeUalArgv = NULL;
static void ap_RuntimeUalArgStart(ap_Uint32 *argc)
{
ap_SizeT size = ((*argc)+1) * sizeof(ap_NameT);
ap_RuntimeUalArgc = *argc;
ap_RuntimeUalArgv = (ap_NameT*)(ap_Malloc(size));
memset(ap_RuntimeUalArgv, 0, size);
}
static void ap_RuntimeUalArgAdd(int *pos, ap_NameT Arg)
{
ap_RuntimeUalArgv[*pos] = ap_StrDup(Arg);
}
probe program
{
on_entry
{
int i;
log (ap_UalArgc)
with ap_RuntimeUalArgStart to ap_PersistentLogMethod;
for (i = 0; i < ap_UalArgc; i++)
{
log(i, ap_StringValue(ap_UalArgv[i]))
with ap_RuntimeUalArgAdd to ap_PersistentLogMethod;
}
}
}
#endif
For example:
#include "logualargs.apc"
probe thread
{
}
probe format
{
on_entry
{
int i;
// Run-time arguments to this UAL
printf("ap_RuntimeUalArgc = %d\n", ap_RuntimeUalArgc);
for (i = 0; i < ap_RuntimeUalArgc; i++)
{
printf("ap_RuntimeUalArgv[%d] = \"%s\"\n", i, ap_RuntimeUalArgv[i]);
}
// Format-time arguments to this UAL
for (i = 0; i < ap_UalArgc; i++)
{
printf("ap_UalArgv[%d] = \"%s\"\n", i, ap_UalArgv[i]);
}
}
}
This is just a UAL containing probes written by OC Systems for a specific purpose. They are generally more complex than ones you would write yourself, and are designed to work on any program that can be probed. Most of these probes include a Java GUI to simplify parameterization of the probe for your specific program, such as specifying the functions to be probed.
All predefined probes are in $APROBE/ual_lib/*.ual; the source code is $APROBE/probes/*.apc. The documentation for these probes is in Appendix D of the User's Guide.
No! The UALs for all of the predefined probes are already built and located in $APROBE/ual_lib. This is in the UAL search path, so the simple name of the UAL is sufficient. For example:
aprobe -u info myprog.exe
The directory $APROBE/ual_lib is always searched for UALs after the working directory. The environment variable APROBE_LIBPATH may also be defined to add additional directories.
Yes. In fact, that's the default. There is no GUI for `info'. The coverage, profile and trace probes provide a GUI to assist in building or modifying configuration file which defines what should be done, but this file is just a text file that can be edited by hand.
The `memwatch' predefined probe provides a "runtime" GUI to monitor memory usage as the program is running, and to take interactive snapshots of the allocation data.
See the documentation for each probe in Appendix D of the Aprobe User's Guide for
Unix
or
Windows
.
You should see the Aprobe User's Guide documentation about this probe. However, you can try these things in this order:
The ability to take a snapshot when an unexpected signal occurs is provided by combining the predefined probe of your choice with the "sigsegv" probe:
// my_coverage.apc
#include "sigsegv.h"
#include "coverage.h"
static void MyHandler(int sig, void *Data)
{
ap_Coverage_DoSnapshot("Snapshot on signal.");
}
probe program
{
on_entry
{
ap_Sigsegv_AddCallback(MyHandler, NULL);
}
}
Then you link this with the existing predefined probes:
$ apc my_coverage.apc coverage.ual sigsegv.ual # creates my_coverage.ual
An API for each predefined probe is defined by the ".h" file corresponding to it in $APROBE/probes. For example, "profile.h" defines "ap_Profile_DoSnapshotForAll()". To call this, you would #include "profile.h" in your APC file (it's in $APROBE/include as well, which is always searched for include files). Then when you compile your apc file, specify the UAL as if it were just another object file to link with:
apc myprofile.apc profile.ual (Windows
: apc myprofile.apc profile.lib)
This will produce
myprofile.ual
(or
myprofile.dll
on Windows).
There are two interfaces to the Java GUI objects used by the predefined probes. The one to start with is defined in
$APROBE/include/quick_gui.h
and implemented in
quick_gui.ual( quick_gui.dll
on Windows
)
. This supports simple graphs, and interactive message, Yes/No, and confirmation dialogs. An example of using this is given in the example
$APROBE/examples/learn/visualize_data/
.
The full GUI interface used by the predefined probes like
profile.ual
is
apGUI.h
, but this is only for fearless experts.
On Windows, simply edit the files in
%APROBE%\probes
, then
cd %APROBE%\ual_lib
nmake
On Unix you'll probably need to copy them locally, which is a bit ugly:
mkdir my_aprobe ; cd my_aprobe
cd my_aprobe
ln -s $APROBE/include $APROBE/lib $APROBE/bin .
mkdir ual_lib
mkdir probes
cd probes
cp $APROBE/probes/memwatch.apc . # if you wanted to edit memwatch
ln -s $APROBE/probes/* . # to get everything else
chmod +w memwatch.apc
# edit memwatch.apc (or whatever) as desired
cd ../ual_lib
make -f $APROBE/ual_lib/Makefile memwatch.ual # or whatever
If you have problems or questions, contact .
The `atcmerge' tool merges formatted results from different runs on the same or different executables. You can use the aprobe "-d" option to create different APD filesets and corresponding ".tc" files for each run, and use the "atcmerge" tool to merge these. See Aprobe\Examples\Advanced\Test_Coverage for an example.
heap.ual (heap.dll on Windows) has been superseded by memwatch.ual. This is a simpler, more robust probe that provides information about allocation patterns, but does not save all the additional data necessary to do error checking. Contact OC Systems if you need a probe with this allocation-checking functionality.
With RootCause 2.0.5 (Aprobe 4.2.5) there's an example under examples/predefined_probes/events (Windows: Examples\Predefined\Events), and documentation in Appendix D of the Aprobe User's Guide. Here's a quick summary we sent to a user:
You must have an app_name .events.cfg file, otherwise events does nothing. Let's take a simple case with the routines one() and two() which both call routine three() which, in turn, calls routine four():
main()
one()
three()
four()
two()
three()
four()
The simplest configuration file is:
EVENT FUNCTION one()
EVENT FUNCTION two()
EVENT FUNCTION three()
EVENT FUNCTION four()
To just look at the calls nested under one() you would add:
FOCUS one()
If you wanted to restrict this at runtime:
FOCUS RUNTIME one()
Let's say that the processing for one() becomes more complex and you want to do end in another routine. This would do the trick:
EVENT START MyEvent one() ON ENTRY
EVENT START MyEvent another() ON_ENTRY
FOCUS MyEvent
FOCUS RUNTIME MyEvent
Assume we have a program
foo
with two functions
outer()
and
inner()
.
outer
loops and calls
inner
which does some work. We setup the
foo.profile.cfg
file to profile both of them.
If we look at the output for routine outer we would expect to see
Calls to Self
being one - it's just called once.
Calls to Child
should be something like 10 or however many times
inner
is called.
Similarly the two tables show individual and cumulative time. The individual time for
outer
would be much lower than the cumulative time since the individual time has all of the recorded times for
inner
subtracted from it.
Finally, note that this only applies to routines profiled. If
outer
also calls routine
another()
which is not profiled,
another
's call counts do not show and its time is recorded as part of
outer
's individual time.
The most likely reason is that your application doesn't use the default system allocation routines. These might be actual replacements for malloc(), etc. in your own application or in another library such as libsafe or libefence.
Sometimes if you explicitly replace malloc() it can break RootCause/Aprobe completely: see Q13.13.
If Aprobe mostly works except for memory probes, then you can override the default routines used by memwatch by registering for your own allocation routines, or by changing the probe itself. This will require writing or editing some apc code, depending on your exact situation. for further assistance.
A specific allocation point (see below) might be reached just once (usually at initialization) and will have an Alloc Count of 1. It may or may not ever free that so the Free Count will be 0 or 1. But many (most!) applications have allocation points that give rise to more than one allocation. For instance:
for (i = 0; i < 10; i++)
{
linkedList.add (new MyObject (i));
}
Obviously each instance of MyObject was created from the same allocation point. Most growth happens this way - in fact we don't count any allocations we only see once as growth.
What is an allocation point? For native code it's the unique traceback up to the current maximum depth, something like:
Line 10 of a()
called from Line 15 of b()
called from Line 32 of c()
For Java each allocation point is a combination of a traceback and the object type allocated there.
The default setting of the memstat probes is to pinpoint leaks in a
longer-running program. However, you can change the options. From the
main RC window select the memstat probe in the UAL list, right click
and choose Edit UAL. From the Runtime tab change the
Sampling Ratio to 1 so you see every allocation.
From the Format tab check the Display Freed
Allocations box. You might also find the Display Zero
Growth Allocations useful. Next run, you'll start seeing those
freed allocations.
Click the OK button and then the Build button. Re-format (either through the Index or Examine button) and the reports should have the information you need.
This mechanism wasn't available in memstat until version 2.1.4b (June 2005), (only in memwatch, which is more focused on individual allocations). For earlier versions, you could edit and build your own custom version of "combined_memstat.apc" that has filtering: see filtermemory.apc.
Version 2.1.4b also introduced EXCLUDE filters in memstat and memwatch, which eliminate the named stack traces and show all others. See $APROBE/probes/[java]_memstat.cfg or $APROBE/probes/memwatch.cfg for usage information.
Yes. The "memcheck" probe, introduced in version 2.1.3 (February 2004) uses a "fence" mechanism to detect corruption of allocated (but not stack/local) memory. It also reports double deallocations.
We have done some work in this area for customers, but we have not productized it, because the platform- and problem-specifics are not easily generalized. If you want some unsupported probes to start from please contact us.
Many changes have occurred as the Trace predefined probe has been adapted to support RootCause users. A number of the options have been deprecated, and others apply only when used directly outside of RootCause.
The following options have been deprecated.
The coverage, memcheck, memwatch, profile, and statprof probes record data in memory and dump it only at normal program termination, or when explicitly requested with a programmatic snapshot. A snapshot can be forced without terminating the program by calling the entry point provided by the probe:
ap_Profile_DoSnapshotForAll() is 1 (TRUE) if
it will be the final snapshot, and 0 (FALSE) if it will be called
again via a snapshot or normal program completion.
There are two ways these can be called. A very convenient way is
to attach with dbx (or gdb) and use the "call" operation. For example
if ps says that the PID of application appdriver
is 12345, then you can do:
$ dbx -a 12345
(dbx) call ap_Statprof_Snapshot( "dbx" );
(dbx) detach
$ apformat appdriver.apd
Even when using detach it's possible that the program
will terminate at this point so you shouldn't use this if it's important
that the program to continue.
An alternative is to link a special version of these apps with a probe which takes a snapshot at a certain point in the program, for example:
// my_profile.apc
#include "profile.h"
probe thread {
probe "abnormal_end_signal_was_handled" {
on_entry ap_Profile_DoSnapshotForAll( "probe snap", FALSE );
}
}
Then you link this with the existing predefined probe:
$ apc my_profile.apc profile.ual # creates my_profile.ual
Note that the name abnormal_end_signal_was_handled is
only a suggestion, not a name in the Aprobe runtime. An application
programmer may offer another name which is called when the application
averts an abnormal end. If not, an application programmer may need to
help by creating and calling this dummy function at the right time for
the snapshot probe, which is when the application averts an abnormal
end. Part of the challenge is finding programmers who know that much
about the application.
A special case of this is to take a snapshot when an unexpected signal occurs: see Q15.6.
Some possibilities are:
The statistical part works like this. Say you have a setting (Sampling rate) of one in thirty. Every 30th allocation we record it in a table. Every free gets looked up in that table. If it is in there it is recorded, if it isn't it is ignored. So the sampling is only on the allocations, not the frees.
In the table, the totals (including leaked memory) and counts are multiplied by the sampling rate. If you have enough samples, this will be entirely valid.
We record what you pass to the O/S, not necessarily what the O/S actually allocates. This could under-estimate the amount of memory in certain cases. (e.g. if the memory manager always allocates in quad-word steps it would allocate 16 bytes when you requested 4).
The statistics that identify certain allocation points as "Growth" are based on least squares linear regression analysis.
That is, is there a way to do something like the following?
FILTER extern:"malloc_y_heap()" in "libc.a(shr.o)"
==> **** any number of levels matching anything ****
==> "ap_demangle.c":"Demangle_Xlc_Symbol_Name()" at line 2103 (ap_demangle.c)
No, the best you can do is enumerate all the possible matches from your test cases. Wildcards of one or more levels may be implemented in the future.
Not officially, but we have written stackcheck.apc for a customer. This version is just for Windows, and checks that the return address is not corrupted on_entry and on_exit to all instrumented functions. instrumentation is hard-coded in the probe for now. A configuration file or separate cconfiguration probe could be added to handle specifying the instrumentation points.
The apc command translates one or more APC files into C, and then uses a native C compiler to compile these into object code, and link them with other files specified on the command-line to form a shared library called a UAL. A UAL has a suffix of .ual on Unix, but .dll on Windows due to limitations in how dynamically-loaded libraries are selected on Windows.
On Windows, only one compiler is supported, so the C compiler is simply your installed version of Microsoft Visual C++. On Unix, the compiler is defined in the file $APROBE/lib/compiler_profiles and by the APROBE_CC_COMMAND environment variable. This is described in the Files Reference (Appendix B) of the Aprobe User's Guide.
Options to the compiler can also be specified on the aprobe command line by including them in quotes after the "-compiler" option, for example,
apc foo.apc -compiler "-v"
You need to specify "-x object module" if you use a construct in your APC that cannot be resolved without specific symbol table or debug information from the program. Such constructs are:
In general, probes that you compose to gather information about specific parts of your program will contain one of the above, and you'll want to include the executable or an object file.
For probes on shared libraries which don't contain any debug information, or for probes that should apply to any program (like the predefine probes included with Aprobe), you generally will not provide an object module.
Just include them on the apc command-line. Linker options are specified in quotes after the "-linker" flag, for example,
apc foo.apc -linker "-lX11"
or, on Windows:
apc foo.apc -linker "/WARN"
There are a number of possibilities. If you specified "-x ... " on the apc command line, then it means it couldn't find the named function in that file's symbol table. Since apc works pretty hard to match incomplete function names, the name is probably wrong in case or spelling, or, if you provided a parameter profile, it's probably not exactly what the C++ compiler encoded as the name for the function.
You could try using apcgen to generate a probe template for all the functions in the source file (or object file, if it's a template instance) containing the function you want, or the tool apinfo or apsymbols to dump out all the function names in the whole program.
For Solaris platforms, we recommend downloading from http://www.sunfreeware.com.
At this site's home page you will see on the right a list of processor/Operating systems combinations. Click the one which is appropriate for your system. (Note that this list includes both SPARC and Intel--be sure to select a SPARC download.)
Below the processor/OS list will be a list of software packages. Select gcc-2.95.3 (RootCause does not yet fully support gcc version 3).
The link to the binary gcc installation will appear in the center pane of your browser. Download the gcc 2.95.3 image from here. Use gunzip to uncompress the file, then use pkgadd to install the package (it will go under /usr/local). You will need root authority to do this:
pkgadd -d gcc-2.95.3-sol7-sparc-local
As with C, use the -g flag; this passes the appropriate debug options to the C compiler (even on Windows: /Zi /Yd /FAcs) and saves the generated C source file.
Yes! If "ls -l ${CC_PATH}/bin/gcc" on the command-line shows that the
compiler exists, then a stanza like:
CC_COMMAND ${CC_PATH}/bin/gcc
will work.
Also note that the environment variable APROBE_COMPILER_PROFILES can be used to override the default of $APROBE/lib/compiler_profiles and point to your own variant of this file. See compiler_profiles file in the user's guide.
If you build your application with the compilation option -m32,
then to build your probe you'll need to pass -m32 to apc's backend
compiler, plus define the i386 macro to the preprocessor.
For example:
apc -Di396 -compiler -m32 -linker -melf_i386 foo.apc
The link stage just invokes ld directly which should automatically build a
32-bit shared library from a 32-bit object file.
If you're going to be doing this regularly you should edit $APROBE/lib/compiler_profiles to update the CFLAGS and PREPROCESS lines so these options are applied automatically.
This section contains questions and answers about writing in probes in APC for native (C, C++, Ada) programs.
You need an object file or executable that contains debug information, i.e., was compiled with debug (see Q12.19 ) or a C header file. For example:
apcgen foo.exe > foo.apc
apc foo.apc -x foo.exe
generates foo.apc, an APC file probing all the user-defined functions in foo.exe that have debug information, then compiles that into a UAL.
apcgen -qparams -p sin -o math_sin.apc /usr/include/math.h
apc math_sin.apc -x /usr/include/math.h
generates and compiles
math_sin.apc
containing a probe on the
sin()
function which logs the parameter and return value. Use
apcgen -h
to see what options are available to control the output.
Note that RootCause provides this functionality in a point-and-click GUI.
One way is to start with a file generated by "apcgen" (see previous Q.). Or you compose one in your favorite text editor. It's pretty much like writing C, but there's some syntax needed to indicate where and when your probe should be executed. Here's a very simple one:
probe thread
{
probe "main"
{
on_entry
{
printf("Entering main.\n");
}
}
}
If you put this in the file "foo.apc", then you would compile it:
apc foo.apc
which produces "foo.ual" (foo.dll on Windows), which you can then probe your program with:
aprobe -u foo foo.exe
There are several differences:
1) There is special syntax to indicate where and when the probe should be executed, such as "probe", "on_entry", "on_exit", "on_line", etc.
2) There is a special keyword called "log" for recording data at run time and defining the format with which it should be displayed afterward.
3) There are special data references, called "target expressions" which start with `$' and refer to values in the probed program.
All of these are expanded or converted to ANSI C by the apc compiler.
In addition, there is an implicit "
#include "aprobe.h
", which makes available the extensive Aprobe API defined in APROBE/include/aprobe.h and documented in Appendix C of the Aprobe User's Guide.
This is an artifact of the clever Aprobe scoping rules. When one probe is nested within another (that is, defined in the declarative part of an enclosing probe), it not only gives visibility to the enclosing probe's data as you would expect, it also means that the inner probe is "active" (its actions may be executed) only if the outer probe is active.
Since every function is executed within some thread of execution, if a function probe weren't inside a thread probe it would never be active.
Anyway, just put in the probe thread{ .. }. It's what works.
The
on_entry
actions of a "probe program" occur once each, before calling
main()
(or
WinMain()
, etc.) and after returning from
main()
, respectively. The corresponding actions of a "probe thread" occur at the creation and destruction of each separate thread.
Data defined in the declarative part of a "probe thread" is global to all probes, but is unique for each thread. There is always at least one, the "main" thread, which is conceptually nested immediately within the probe program.
On AIX, Linux and Windows, the on_entry actions are executed before the first instruction of the function itself. In particular, the function's local stack frame hasn't been created yet.
The on_entry actions are executed immediately after the SPARC save instruction has shifted the register window yet, but before any compiler-generated saves of parameters or other values.
The on_exit actions are executed after the stack frame has been discarded, so local data is not available. The next (target program) instruction executed will be the one following the call to the probed function.
Parameters passed by value are essentially local data. They are stored on the stack and the stack frame has been discarded by the time the
on_exit
part is executed.
If you want to be able to access the input parameters you can save them in the
on_entry
part, for example:
probe thread
{
probe "foo"
{
int parm1;
on_entry
{
parm1 = $1;
}
on_exit
{
if (parm1 == 1)
{
...
}
}
}
}
C++ reference parameters, and composite parameters passed by reference to Ada, are available by-name on_exit because `apc' implicitly generates code in an on_entry section to save the address passed in. GNAT Ada OUT and IN OUT parameters can be displayed because these are implemented as fields of a 'struct' returned by the function.
The on_entry and on_exit parts are conceptually outside the scope of the function, so the local data is not visible. Local data is visible only within an "on_line" action.
Yes. Simply write on_line(first) or on_line(last) . You can use this to do function-relative line numbers as well, such as on_line(first+5) .
In C++, you must specify the exact parameter profile encoded in symbol table by the C++ compiler. The best way to get this is either to look at the output of "apcgen -vL" applied to the object file generated by the compiler, or use `
apinfo -sa myprog
' to list probe names of the functions symbols in your application.
A hardware register is referenced within a user action (e.g., on_entry) by preceding the name commonly used for the register by "$$". The exact register names are documented in Appendix B, "Files Reference", under "APC File".
Note that the value you get for the register is the value it had at the point the target program called the probed routine.
On Windows, if you want the current value of a given register you can use the normal MSVC++ assembly prefixes go get it, for example:
{
int CallerEAX;
int CurrentEAX;
CallerEAX = $$EAX; // move the caller's EAX to CallerEAX
__asm mov CurrentEAX,EAX // move EAX to the variable CurrentEAX
}
If the function is compiled with debug (see Q12.19 ) you can reference a parameter by name ($param) and reference all parameters with "$*.
Whether or not a function is compiled with debug,or there's an object module available, you can reference the first parameter with "$1", the second with "$2", etc., up to $8.
Note, however, that if there is no debug information provided, you must cast the "$1" to its proper type.
Yes, but you must (a) include the definition of each logged item's type in the APC file (if it's not a predefined type), and (b) cast each item to that type. This is how one can log parameters to system routines, for example:
#include <stdio.h> // includes the struct FILE
probe thread
{
probe "fopen"
{
/* fopen returns *FILE, defined in stdio.h */
on_exit
log("fopen() returns ", (FILE *)$return, " = ", *(FILE*)$return);
}
probe "fclose"
{
/* first parameter to fclose is *FILE */
on_entry
log("fclose() called with ", (FILE *)$1, " = ",
*(FILE*)$1 );
}
}
ap_StringValue is a macro which logs everything from the address provided up to the first null character:
on_entry { log("NameParam = ", ap_StringValue($NameParam)); }
Note : this only applies to null-terminated (C, C++) strings. It does not apply to the Ada predefined string type -- see Q17.27 .
You must specify the bounds of the array in the log statement:
on_entry { log("Items = ", $Items[0 ..9]); }
If the array bounds are dynamic (as most are), you can compute them first
on_entry
{
int last;
for (last = 0; $Items[last] != 0; last++);
log ("Items = ", $Items[0 .. last-1]);
}
The Sun Workshop (Forte) compiler's preprocessor is run over the APC file before the APC-specific syntax is processed and converted to C. If you're using Sun WorkShop as apc's C compiler (as defined in $APROBE/lib/compiler_profile), that preprocessor complains about the "0.." syntax that APC uses. If you want to avoid the message, put a blank before ".." whenever you use it.
Use the "ap_StubRoutine" macro in the on_entry part of a function, and be sure to return something sensible if necessary in the on_exit part, e.g.,
probe "foo" {
on_entry ap_StubRoutine;
on_exit $return = 0;
}
Note that you can't assign the return value in the on_entry part, since the return register is reset as part of the stub implementation.
All data in a class is defined as a field of the local variable "this", so to get at the class data item "NCalls" you would do:
log("$this->NCalls");
To specify you want a data item other than that visible by default, add an expression context string, to the target expression:
log("static NItems = ", $(NItems, "-file items.c"));
To get the global one, if any:
log("global NItems = ", $(NItems, "-module foo.exe"));
Yes. See the previous Q. You can reference a static item by name in any file:
log("static NItems = ", $(NItems, "-file items.c"));
even if the probed function this appears in is not in file "items.c".
If your program is compiled with debugging enabled, you can precede its name with a `$'. This is often useful for using a probe to call debugging-support routines, e.g.,
probe thread
{
probe "ReadSymbolTable"
{
on_exit
$DumpSymbolTable($0);
}
}
In the absence of debug information, you can get the symbol address from Aprobe and cast that to the correct type. For example, on Windows:
typedef void MyBeep(int Msec, int Hz);
probe thread
{
probe "main"
{
on_entry
{
(*(MyBeep *)ap_SymbolToAddress
(ap_SymbolNameToId
(ap_ModuleNameToId("KERNEL32"),
"Beep()",
NULL)))(4000, 1000);
}
}
}
Additionally, on Windows, if the function is in a DLL you can use the NT routines GetProcAddress and GetModuleHandle to find the routine. For example:
/* call Beep to sound a 2kHz tone for a second */
GetProcAddress(GetModuleHandle("KERNEL32"),"Beep")(2000, 1000);
In the above example you could use the name of your DLL instead of KERNEL32.
Calling C++ methods is more complex (they require a "this" pointer, and the naming can be tricky): For Windows, see Q17.23; for Unix, see Q17.71.
To call Windows Visual C++ methods in the target program you will have to create a C wrapper function and link that with the application. The wrapper will take the C++ object as an explicit parameter and make the method call. For example, examine the following class definition:
class MyClass
{
public:
MyClass () {a = 10;}
void Show (int b);
void Show (char *b);
private:
int a;
};
If you want to call the first Show method (the one with an int parameter) you need to write the following wrapper function:
extern void WrapperForMyClassShowInt(MyClass *o, int b)
{
o->Show(b);
}
You should pick a unique name for the wrapper, particularly if you are writing a number of wrappers for overloaded methods in the same class. This wrapper name includes the class, MyClass, the method name, Show, and an indication of argument types, Int.
The wrapper function includes an explicit parameter for the C++ object pointer of class MyClass, and all the other parameters for the method call, in this case just an int. The body of the wrapper function just make a C++ method call using the C++ object and the method parameters. If this method returns a value the wrapper function should just return that result.
With this wrapper function compiled and linked with your application, you can call it from APC code. Here is an example of calling the wrapper defined above:
probe "main"
{
{
on_line (31)
{
$WrapperForMyClassShowInt(&$m, 20);
}
}
The target expression
$WrapperForMyClassShowInt
indicates the wrapper function we wrote above, and the target expression
&$m
refers to a C++ object variable in the target program. Here is the function, main, that this probe targets:int main (int argc, char **argv)
{
MyClass m;
return 0; /* line 31 */
}
Yes, but if they do they must all be compiled in the same "apc" command into a single UAL.
Yes. A UAL is just a shared object library (a DLL), so you must do the following:
Export the symbol for the function to be called, using the apc "-e" option, when you build the UAL to be referenced, e.g.,
apc funcdef.apc -e func
On Windows, you can use the MSVC++
__declspec(dllexport)
prefix to identify the routine as something to be exported from a DLL.
2) specify the referenced UAL (on Windows, the corresponding "lib" file) as an input file on the command-line when you compile the probe that contains the external reference flag when you specify the other UAL as a shared module
Windows: apc main.apc funcdef.lib
Unix: apc main.apc funcdef.ual
From $APROBE/examples/learn/probe_exit/exit.apc:
probe thread {
probe "exit" in "libc.so" // "libc.a(shr.o)" on AIX
{
on_entry {
/* return 0 even if an error occurred: */
$1 = 0;
}
}
}
Note:
This probe won't work on Solaris 5.5.1 because
exit()
works differently.
An unconstrained string is represented as a record with two components. The first is a pointer to the string (which is not null-terminated) and the second is a pointer to another record which contains the bounds of the string.
The "apc" tool recognizes this special type and displays it appropriately, if debug information is available. Since it's length is known,
ap_StringValue
is not used. For example:
probe thread {
probe "hello.qualify_name" {
on_entry
{
// log the input parameter then stub the routine itself
log("qualify_name called with: ", $1
);
ap_StubRoutine;
}
}
}
In the absence of debug information (e.g., for
Ada.Text_IO.Put_Line
), or when you want to assign to an unconstrained string, you can use macros defined in gnatstrings.h. For example:
#include "gnatstrings.h"
probe thread {
probe "hello.qualify_name" {
on_exit
{
// return what we want to:
ap_SetGnatUCString
(
$return,
ap_CatenateStrings(
"/home/ocs/",
ap_ExtractGnatUCString
($1),
NULL));
}
}
}
This is an example of an APC file to log a buffer's worth of data and format it as hex.
// Example APC file to demonstrate logging a block of data and
// formatting it as hex.
// Use this macro to provide a buffer and length of data you wish to log
// and be formatted as hex. e.g. LogAsHex (MyBuffer, 100);
#define LogAsHex(B,L) \
log (((ap_Byte *) ((ap_Byte *) B)) [0 .. ((L)-1)], \
(ap_Uint32) (L), \
(ap_Uint32) (B)) with HexFormat
// Buffer is the actual data, Length the length and StartAddress the
// address of the data at runtime.
static void HexFormat (ap_Byte *Buffer,
ap_Uint32 *Length,
ap_Uint32 *StartAddress)
{
ap_Uint32 PrintAddress;
ap_Uint32 EndAddress;
// We start printing at the first 16 byte boundary below StartAddress
// which might be below where we actually need to show characters. So
// we check if we are in range before printing a character
PrintAddress = *StartAddress & 0xfffffff0;
EndAddress = *StartAddress + *Length;
while (PrintAddress < EndAddress)
{
int i;
// Print out the hex bytes
printf ("%08x: ", PrintAddress);
for (i = 0; i < 16; i++)
{
// Check we're in range
if ((PrintAddress + i) < *StartAddress ||
(PrintAddress + i) >= EndAddress)
{
printf (" ");
}
else
{
printf ("%02x", Buffer [PrintAddress - *StartAddress + i]);
}
if (i && i % 4 == 0)
{
printf (" ");
}
}
// Print out the ascii
printf (" ");
for (i = 0; i < 16; i++)
{
// Check it's in range
if ((PrintAddress + i) < *StartAddress ||
(PrintAddress + i) >= EndAddress)
{
printf (" ");
}
else
{
ap_Byte c = Buffer [PrintAddress - *StartAddress + i];
// Is this a printable character?
if (c >= 32 && c <= 127)
{
printf ("%c", c);
}
else
{
printf (".");
}
}
}
printf ("\n");
PrintAddress += 16;
}
}
// This is an example of using the above log mechanism - the first
// parameter must be an address (e.g. an array, a pointer, etc.). The 2nd
// parameter is the number of bytes.
probe thread
{
probe "fred()"
{
on_entry LogAsHex ($1, $2);
}
}
A C file follows to test it with:
void fred (const char *Buffer, int Length)
{
;
}
int main (int argc, char *argv)
{
char Buffer [100];
int i;
for (i = 0; i < 100; i++)
{
Buffer [i] = (char) i;
}
fred ((const char *) Buffer, 100);
return 0;
}
You log the Thread ID using a format routine that prints information about it, since the information, especially the thread entry point, may not be available on_entry to the thread:
void PrintThreadInfo(ap_ThreadIdT *ThreadIdPtr)
{
printf("Thread %d: ", *ThreadIdPtr);
ap_PrintSymbol(
ap_AddressToSymbol(
ap_ThreadEntryPoint(*ThreadIdPtr)));
}
probe thread
{
on_entry
{
log(ap_ThreadId()) with PrintThreadInfo;
}
}
Note that the thread entry point symbol will probably be a system function.
Yes. Here's a probe which stubs (disables) the call the GNAT runtime makes to sigaction() to register a signal handler. This allows the default action to occur when the signal occurs.
#include <signal.h>
probe thread
{
probe "sigaction()" in "libthread.so"
{
ap_BooleanT Stubbed = FALSE;
on_entry
{
if ($1 == SIGSEGV)
{
printf ("Stubbing sigaction(SIGSEGV)\n");
Stubbed = TRUE;
ap_StubRoutine;
}
}
on_exit if (Stubbed) $0 = 0;
}
}
First, you should be running with sigsegv.ual: it will provide
a traceback and exit actions in these cases. If you want to
add additional exit actions, such as a predefined probe snapshot,
see Q15.6, or you can copy and extend
$APROBE/probes/sigsegv.apc to build your own probe.
On Solaris there is:
probe thread {
probe "sigaction.c":"sigacthandler()" in "libc.so"
{
on_entry
log("Signal ", (int)$1, " see at:");
ap_LogTraceback(99);
}
}
Of course, this might not work if the condition causing the signal was due to corrupted memory or registers which Aprobe relies upon.
The most obvious way is to use #pragma nofloat in probes that don't use floating point; this eliminates the need to save/restore floating point registers. See also Aprobe Performance Considerations in Chapter 4 of the Aprobe User's Guide.
probe thread
{
probe "your_routine"
{
#pragma nofloat
// Your probes
}
}
Yes, but there will be no "debug" information found, so you won't be able to use named target expressions (e.g., "$x", "$*") or do on_line probes. Furthermore, no type information is available for parameters, etc., like "$1".
A. Declare or
#include
a C type that maps to the structure you want, then cast your target expression to a dereference of a pointer to this C type. For example:
typedef struct
{
int Field1;
float Field2;
} MyStruct;
probe thread
{
probe "foo"
{
on_entry
{
if (((MyStruct *) $1)->Field1 > 0)
{
log(*((MyStruct *) $1));
}
}
}
}
or perhaps a bit cleaner is:
probe thread
{
probe "foo"
{
on_entry
{
MyStruct *Param1 = (MyStruct *)$1;
if (Param1->Field1 > 0)
{
log(*Param1);
}
}
}
}
"I have part of my program without debug info, but I know the type of a parameter passed in that "no debug" part, and furthermore, I know that the type name is defined in a part that does have debug info. How can I cast an "unknown-type" parameter to the known type name?"
This is similar to the previous question, except instead of defining the type in your APC, refer to the type in your program by its name and file, wrapped in "typeof", within your probe declarative part, as follows:
probe thread
{
probe "foo"
{
typedef ($(MyStruct, "-file debug_part.c")) MyStruct;
on_entry
{
MyStruct *Param1 = (MyStruct *)$1;
if (Param1->Field1 > 0)
{
log(*Param1);
}
}
}
}
No, but it's pretty close to C. The C mode for Emacs, Lemmy, or other editor works pretty well. Contact OC Systems if you think we should put work into this.
In Aprobe version 2, you could do something like:
probe .outer_routine
on entry
if $r3 = 3 then
probe .inner_routine
null; -- inner_routine stuff
end probe;
end if;
end probe;
Probes in Aprobe2 were executable but in Aprobe3 they are declarative. You declare a named probe, and make an explicit calls to enable or disable it. For example:
probe thread
{
probe "outer_routine"
{
// Note that this probe has a name "InnerProbe"
probe "inner_routine"
{
; // Inner routine stuff
} InnerProbe;
// Entry to outer_routine
on_entry
{
if ($param1 == 3)
{
// We can enable or disable the probe
ap_EnableProbe (InnerProbe);
}
else
{
// Disable the inner probe
ap_DisableProbe (InnerProbe);
}
}
}
}
The basic approach is simple. In the little C example "t.c" below, main() calls Test() every 5 seconds, passing to it an integer and a float. Subprogram "Test" prints these values out. In t.apc We put a probe onTest, and replace the parameters with values we retrieve from the environment. The trick is how to retrive values from the environment.
One obvious way is to prompt to stdout and read from stdin. This may work for some applications, but not many. A more general approach is to check if a user created a file "Test.cfg" in the directory where the program is run and if so we read the new values of parameters with the help of a call to fscanf(). This approach works pretty well as long as the overhead of `fopen' call on entry to "Test" is acceptable. In cases when it is not one could move this call some place else and store the new values in global APC variables.
Note that this "read-a-file" approach can be used for a wide range of program iteraction. One could simply use the presence of a file as a "switch" to enable or disable certain probes.
t.c void Test(float parm1, int parm2)
{
printf("Test(%f,%d)n", parm1, parm2);
}
main()
{
while(1)
{
Test(0.0, 0);
sleep(5);
}
}
t.apc
#include <stdio.h>
#define CONFIG_FILE "Test.cfg"
probe thread
{
probe "Test"
{
on_entry
{
FILE *fd = fopen(CONFIG_FILE, "r");
if (fd != NULL)
{
// We have a file with new values
float Parm1;
int Parm2;
fscanf(fd, "Test(%f,%d)", &Parm1, &Parm2);
// Now update the target parameters with new values
$parm1 = Parm1;
$parm2 = Parm2;
fclose(fd);
remove(CONFIG_FILE);
}
}
}
}
The Ada code looks like:
function Plock (N : in Types.Integer_T) return Types.Integer_T;
pragma Import (C, Plock, "plock");
Plock is some system call to lock or unlock into memory process, text or data. I get a warning message from apc stating:
Function "....plock[1] not found in the modules(s) provided to apc
. And also an error message from apc stating:
Could not resolve function name: "......plock[1]"
plock() is a system function - it is not defined within your application. The following will work:
probe thread
{
probe "plock()" in "libc.so"
{
on_entry ap_StubRoutine;
on_exit $0 = 0; // Or whatever return you want
}
}
It turns out that GNAT blocks this signal with a call to thr_sigsetmask. The following probe can be added to your existing probes to unmask this signal. This works by intercepting the thr_sigsetmask function and, if the caller is requesting to add or set the signals, removing the SIGUSR1 from the mask they provide.
#include <signal.h>
#include <thread.h>
probe thread
{
probe "thr_sigsetmask()" in "libthread.so"
{
on_entry
{
sigset_t *NewSigset;
// Do we have new signals or is this just a request for info?
NewSigset = (sigset_t *) $2;
if (NewSigset)
{
// What sort?
if ($1 == SIG_BLOCK || $1 == SIG_SETMASK)
{
// Remove SIGUSR1
sigdelset (NewSigset, SIGUSR1);
}
}
}
}
}
Here's one way, which also illustrates some other useful idioms.
#define MyCallerFunctionId
ap_SymbolToFunction( \
ap_AddressToSymbol( \
ap_LocationAddress( \
ap_CallerLocation( \
ap_CurrentLocation))))
#define NamedFunctionId(SYMBOL,MODULE) \
ap_SymbolToFunction ( \
ap_SymbolNameToId( \
ap_ModuleNameToId (MODULE), \
SYMBOL, \
ap_NoName, \
ap_FunctionSymbol))
probe program
{
int MallocCalls = 0;
int ReallocCalls = 0;
ap_FunctionIdT ReallocFunctionId = NamedFunctionId("realloc()", "libc.so");
probe thread
{
int NestingLevel = 0;
probe "malloc()" in "libc.so"
{
#pragma nofloat
on_entry
{
ap_FunctionIdT CallerFunctionId = MyCallerFunctionId;
if (! ap_FunctionIdsEqual(CallerFunctionId, ReallocFunctionId))
{
MallocCalls++;
}
}
}
probe "realloc()" in "libc.so"
{
#pragma nofloat
on_entry
ReallocCalls++;
}
}
on_exit // from program:
{
log("Heap statistics on program exit");
log("-------------------------------");
log("Number of calls to "malloc()" => ", MallocCalls);
log("Number of calls to "realloc()" => ", ReallocCalls);
}
}
procedure Read_Foo (File : in File_Type;
Item : out String;
Size : out Integer);
For routines like this, although the Item is an out parameter, GNAT implements it as if it were an in parameter (but modifiable) since the bounds of the string must already be set. The following probe shows an example of changing this:
static const char *NewString = "Aprobe string";
probe thread
{
probe "read_package.read_foo"
{
on_entry
{
sprintf ((char *) $item.P_ARRAY, NewString);
ap_StubRoutine;
}
on_exit
{
$return.size = strlen (NewString);
}
}
}
The following gives output similar to:
MyPackage.MyRoutine line: 120
MyPackage.MyRoutine line: 122
when formatted:
probe thread
{
// Replace your name here
probe "MyPackage.MyRoutine"
{
on_line (all)
{
log ("MyPackage.MyRoutine line: ",
ap_StringValue (ap_LineIdToNumber (ap_CurrentLineId)));
}
}
}
Here is an example:
a.cpp
#include <iostream.h>
#define VALUE satu
enum TYPE { sund, mond, tues, wedn, thur, frid, satu };
int main (void)
{
TYPE bar = satu;
cout << "Hello Worldn";
}
a.apc
probe thread
{
probe "main"
{
on_line (11)
{
if ($bar == $satu)
{
log ("Match");
}
else
{
log ("No Match");
}
}
}
}
If the enumeration literals are defined in a class, you can qualify them. So for:
class a
{
enum TYPE { sund, mond, tues, wedn, thur, frid, satu };
private:
TYPE bar;
public:
void seta(){ bar = VALUE; }
};
a test;
You could use
if ($test.bar == ($("a::satu")))
The problem here is that "log" is an Aprobe directive and it is also defined as a function in the mathematical library. So, you need a small workaround to use any function other than 'log' from the mathematical library. Here is an example:
#undef log /* 1. undefine definition in aprobe.h */
#include <math.h> /* 2. process math.h */
#undef log /* 3. remove math.h's log define (AIX) */
#define log aPl /* 4. restore aprobe's definition */
probe thread
{
probe "main"
{
on_exit
{
log("pow(2,3) = ", pow(2,3));
}
}
}
The workaround is to add the preprocessor lines numbered 1 through 4 above.
If you need to use the math.h log function in an APC file, you avoid the workarounds in steps 3 and 4 above, and use 'aPl' instead of Aprobe's log operation everywhere thereafter. That is:
#undef log /* 1. undefine definition in aprobe.h */
#include <math.h> /* 2. process math.h */
probe thread
{
probe "main"
{
on_exit
{
aPl("log(2.0) = ", log(2.0));
}
}
}
In either case, when compiling your APC file on Unix, you must pass the linker flags "-lm" as follows:
apc xxx.apc -linker -lm
because compiling any routines from the libm.a library requires the -lm flags.
You can see the macros for the keywords that Aprobe uses (e.g., #define log aPl) at the top of aprobe.h, preceded by #ifdef APROBE_KEYWORDS, which is only defined when the file is being processed by the APC compiler.Same as you would a pre-loaded DLL:
Write your probe like `probe "func" in "dynamic" ... `
Compile it against the DLL like `apc myprobe.apc -x dynamic.dll'
When you run your program with myprobe.dll, and dynamic.dll isn't found at program startup, the probes are "deferred" until dynamic.dll is loaded, at which time the probes are applied.
Call
getenv()
, as in the following example:
#include <stdlib.h> /* defines getenv() */
ap_NameT LOG_LEVEL = NULL;
static ap_BooleanT IsSevereLogLevel()
{
return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
}
probe program {
on_entry
LOG_LEVEL=getenv("LOG_LEVEL"); /* can set LOG_LEVEL to NULL */
probe thread
{
probe "main()"
{
on_entry
if (IsSevereLogLevel()) printf("Severe\n");
}
}
}
Here's one way, if your "utility" is pure C and doesn't use aprobe stuff.
loglevel.h
extern ap_BooleanT InitializeLogLevel(void);
extern ap_BooleanT IsSevereLogLevel(void);
loglevel.c
#include <stdlib.h> /* defines getenv() */
#include <aprobe.h> /* defines ap_NameT */
static ap_NameT LOG_LEVEL = NULL;
void InitializeLogLevel(void)
{
LOG_LEVEL = getenv("LOG_LEVEL"); /* can set LOG_LEVEL to NULL */
}
ap_BooleanT IsSevereLogLevel(void)
{
return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
}
#include aprobe.h
, just put
$APROBE/include
in your include path:cc -c -I$APROBE/include loglevel.c
t.apc
#include "loglevel.h"
probe program
{
on_entry InitializeLogLevel();
probe thread
{
probe "main()"
{
on_entry
if (IsSevereLogLevel()) printf("Severe\n");
}
}
}
apc -g t.apc loglevel.o
Yes. See also Q17.24 and Q17.25 . This is how our predefined probes are structured. The difference is that you must provide both UALs on the aprobe command-line. One could restructure the above example like so:
loglevel.h
extern ap_BooleanT IsSevereLogLevel(void);
loglevel.apc
#include <stdlib.h> /* defines getenv() */
static ap_NameT LOG_LEVEL = NULL;
// the externally callable function:
ap_BooleanT IsSevereLogLevel(void)
{
return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
}
// initialization of data accessed by the above function:
probe program
{
on_entry
LOG_LEVEL = getenv("LOG_LEVEL"); /* can set LOG_LEVEL to NULL */
}
IsSevereLogLevel
:apc -g loglevel.apc -e IsSevereLogLevel
t.apc
#include "loglevel.h"
probe thread
{
probe "main()"
{
on_entry
if (IsSevereLogLevel()) printf("Severe\n");
}
}
apc -g t.apc loglevel.ual (Windows:
apc -g t.apc loglevel.lib)
aprobe -u t -u loglevel my_program
Presumably because the probe on that function is not triggered. That's because we disable probes whilst in an entry action. This is pretty easy to understand given an example. Suppose you have the following probe:
probe thread
{
probe "printf()" in "libc.so"
{
on_entry printf ("We're in printf\n");
}
}
Obviously if Aprobe didn't do anything specific, you would end up in an infinite loop: Your code would call printf() which would call the entry action for printf which would call printf which would call the entry action ... So what we do is disable the probes while you're in an action. That way the call to printf() from your probe wouldn't trigger the probe on printf itself.
In your example you are calling a routine while probes are disabled so the probe on that routine doesn't get triggered. Of course you can manually turn probes on yourself (although it is then your responsibility that you won't allow an infinite loop). The description of this in aprobe.h was improved in version 3.1.7, to the following:
These two routines
extern void ap_IncrementDisableProbesCount (ap_ThreadContextPtrT);
extern void ap_DecrementDisableProbesCount (ap_ThreadContextPtrT);
can be used to turn off / on probes for the thread. Normally when a probe is hit, Aprobe disables further probes in the thread for the duration of the action. This is to prevent recursive loops (for instance imagine if a probe on "printf()" called "printf()" and we did nothing about it). Sometimes you may want to temporarily enable probes. For instance, suppose on_entry to routine A you make a call to another routine in your application (say B) which calls routine C. You have a probe on C which you want to happen. You could bracket the call as follows:
on_entry
{
// Turn on probes before the call
ap_DecrementDisableProbesCount (ap_ThreadContextPtr);
// Make the call
$B (1, 2, 3);
// Turn probes back off
ap_IncrementDisableProbesCount (ap_ThreadContextPtr);
So, your probe becomes:
probe thread
{
probe "test.adb":"test.x[1]"
{
on entry
...
on exit
...
ap_DecrementDisableProbesCount (ap_ThreadContextPtr);
$("test.y[1]");
ap_IncrementDisableProbesCount (ap_ThreadContextPtr);
}
}
In the absence of debug information all parameters would be assumed to be of type 'int' and only positional ($1, $2, etc.) references will be allowed.
If you know the type of such parameter you could cast it to the right type. The
strdup()
function doesn't have debug information, but you could still compile and use the following apc file:
probe thread
{
probe extern:"strdup()" in "libc.so"
{
on_entry
log("strdup(", ap_StringValue($1), ")");
}
}
Note that
ap_StringValue
is a macro which among other things casts the argument to a string.
For a complete list of subprograms that you can probe in shared libraries do:
aprobe -u info -p -sa <your_executable_here>
It is best not to mix apc code that relies on debug information with the apc code that should compile without it. This way when you compile the apc code that doesn't require debug info you may omit the -x option altogether and you would not have any warnings from the apc compiler.
Yes: replace the parameter to
system()
with a path to your script. In this example, the new path fits in the space occupied by the old. Imagine the possibilities...
my_ls.apc
// change these 2 lines to work on a different command:
static char cmd_to_change[] = "/bin/ls";
static char my_script[] = "/tmp/my_ls ";
probe thread
{
probe "system()" in "libc.so" // or libc.a(shr.o) for AIX
{
ap_NameT new_command = NULL;
on_entry
{
char *command = (char *)$1;
// for debugging, give some info about where we are:
log("system() called with ", ap_StringValue($1));
ap_LogTraceback(99);
// make sure we only replace the right command
{
char *cmdpos = strstr(command, cmd_to_change);
if (cmdpos == command)
{ // replace it
char *argstring = command + strlen(cmd_to_change);
new_command = ap_CatenateStrings(my_script, argstring, NULL);
$1 = (int)new_command;
log("*** changed to: ", ap_StringValue($1));
}
}
}
on_exit
// indicate the return code for the command:
log("system() returns ", $0);
// free our string:
ap_StrFree(new_command);
}
}
my_ls script
echo "MY_LS: --->"
ls -ltF
echo "<---- MY_LS"
We do support suppressing C++ exceptions by calling Aprobe for AIX, but not Solaris or Windows. On AIX the syntax is:
probe "fred"
{
on_exit
if (ap_ProbeActionReason ==
ap_CppExceptionPropagated)
ap_SuppressException;
}
You can catch exceptions in the on_exit section of your probes. To catch exceptions all you have to do is to distinguish between a normal exit from your subprogram and an exception exit from it as both would trigger your probe's on_exit actions. For example, if subprogram "fred()" may leave via exception you could test for this as follows:
probe thread
{
probe "fred()"
{
on_exit
switch(ap_ProbeActionReason)
{
case ap_AdaExceptionPropagated:
case ap_CppExceptionPropagated:
log("Exception exit from fred()\n");
}
}
}
If you need to, you can find other action reasons defined in aprobe.h.
The example above works well when you know where the exception may be raised, when you don't know you can log all exceptions raised in your program. To do so use the following probe:
probe thread
{
ap_LogExceptionsInThread;
}
There are also other macros for this:
ap_PrintExceptionsInThread
,
ap_PrintAndLogExceptionsInThread
. These are all defined in aprobe.h
Here's what we've found:
aprobe -u info.dll -p -sa -dll ws2_32.dll foo.exe > out.txt
Where foo.exe is some executable in your local directory. The -dll <filename> switch causes the dll to be loaded at the start of a program and thus its symbols are available for info.dll to print.
A probe to track stack usage is available here for Windows and AIX. The AIX should be easily extended for other Linux and Solaris.
Yes. Function-relative line numbers are supported using an expression consisting of a constant offset from the special values 'first' and 'last'. For example:
probe "Outer"
{
// Assume that 30 is the relative line number for the next line
// after the call to Inner
on_line (first + 30)
{
$i = 99;
}
}
To be sure you're using the right value, you'll have to know the probe-able lines in your function (see Q17.66). The offset is then the difference between that line and the probe-able line you want (e.g., if the first line is 12, and you want line 22, then probe on_line (first + 10).
Now if the file changes your probe will still work unless you modify Outer (which is obviously less of a concern since that's the one your working with anyway).
What you might want to do is hold the address of the variable and then change that.
probe thread
{
int *i;
probe "Outer"
{
on_line (first)
{
// Store the address of i
i = &$i;
}
}
probe "Inner"
{
on_entry
{
// Change the value of i
*i = 100;
}
}
}
Obviously this is harder for types that aren't straight integers, etc. The typeof expression can be useful here:
probe thread
{
typeof ($("myrecordt", "-file types.ads")) *RecordPtr;
}
Here are some general limitations and workarounds for accessing class data and methods:
$("Screen::nNumScreens")
If you're unsure of the full name of a static data item you can use:
apinfo -d myprog.exe
apcgen -L <dll-or-exe> | grep "Class::"
or
apinfo -sa myprog.exe | grep "Class::"
Here's a simple example:
////////////////////////////////////////////////////////////
// TestStatic.apc
////////////////////////////////////////////////////////////
probe program
{
on_entry
printf (" p. Static1.exe execution has started\n");
on_exit
printf ( " p. Static1.exe execution has completed\n");
}
probe thread
{
probe "Screen::Screen"
{
on_entry
printf (" p. New screen has been constructed!\n");
}
probe "Screen::~Screen"
{
on_entry
printf (" p. A SCREEN HAS BEEN DESTRUCTED!\n");
}
probe "Screen::Update(void)"
{
on_entry
printf (" p. A screen update has started!\n");
printf (" p. Within Update, Current nNumScreens =%d\n",$Screen::nNumScreens);
}
probe "Screen::GetNumScreens(void)"
{
on_entry
printf (" p. GetNumScreens has started!\n");
printf (" p. Current nNumScreens = %d\n",$Screen::nNumScreens);
}
probe "main()"
{
on_entry
printf (" p. Main() has been started!\n");
}
}
You can do this by periodically checking for the existence of a file. If you find the file enable the probe. You can automatically delete it from your probe if you want a single-action check, or delete it yourself when you want to disable the action again. For example:
static ap_BooleanT MemsetProbeEnabled = FALSE;
probe thread
{
probe extern:"memset()"
{
// We are not using floating point registers.
// Use nofloat pragma to avoid saving them and
// speed things up a little.
#pragma nofloat
on_entry
if (MemsetProbeEnabled)
{
// Log parameters, traceback, etc.
}
}
}
#define CONFIG_FILE "/tmp/memset.cfg"
static void PeriodicAction(void *EP)
{
FILE *fd = fopen(CONFIG_FILE, "r");
if (fd != NULL)
{
// Togle the value of MemsetProbeEnabled
MemsetProbeEnabled = !MemsetProbeEnabled;
fclose(fd);
remove(CONFIG_FILE);
}
}
probe program
{
on_entry
ap_DoPeriodically(
PeriodicAction,
15, // interval in seconds
NULL);
}
Aprobe version 2, which was delivered with OC Systems' PowerAda and OATS products as well as being sold separately for C and C++, was fundamentally different in its processing and expression of APC.
The best way isn't to "convert" at all, but to understand what the probes in your old APC file are trying to do, read the current documentation about Aprobe, and then write a probe to do the same thing in Aprobe version 4. This answer will just enumerate a few of the key differences, and rely on you to look in the user's guide for details:
Version 2 Aprobe was available only for the AIX platform, and used low-level AIX register and symbol names. Aprobe versions 3 and newer support multiple platforms.
In Version 2, "aprobe" actually compiled each APC file at run-time. In Version 4, you use the new `apc' program to compile the APC file(s) into a linkable UAL file, and name the UAL files on the aprobe command line.
Version 4 APC is C with a few extra keywords. Version 2 was an invented language based on Ada syntax. So, for example, instead of
case $r3 is
... you'd write
switch($$r3) {
...
In Version 4 there's an underscore to make "on entry" one word: "on_entry", "on_exit", "on_line".
In Version 2 you could write
probe .sym1, .sym2, on entry ...."
In Version 4 each probe can name only one symbol, but there is the new concept of a "probe type" or "typedef probe" which may be defined and then applied to many symbols. So you'd do
typedef probe { on_entry ... } CoolProbeT;
CoolProbeT Sym1Probe("sym1");
CoolProbeT Sym2Probe("sym2");
In Version 2 APC there were only registers ($r3). In Version 4 you can reference parameters by position ($1, $2, etc.); In Version 4 you can reference the return value on_exit as $0, and that's not to mention accessing program variables by their source names...
Because version 2 APC was so low-level, there was another tool "apgen" which read an "apg" file that supported a few operations on source-level variables and generated APC to access them. In Version 4, you can reference a source-level name anywhere, provided that name is available in the debug information of the executable provided to the apc compiler.
In Version 2 a `format' was required for each log statement, and was a special syntax that could be named or unnamed in-line. In version 4 a format routine is just a C routine which can be automatically generated based on the types of the `log' arguments.
Here is a reply given to a customer who asked this question:
> It is my understanding that the new aprobe is more "C like" than "ADA like".
>Beyond that, I could use a little help.
That's true - it is basically ANSI C with some extra keywords. I take it you have gone through the examples in $APROBE/examples/evaluate to get yourself acquainted with the syntax? If not do that first and then come back to your larger problem.
> I wasn't sure if the [Aprobe v2] words format and bytes were aprobe terms.
Yes they are. In v2 you had `format start' and `format finish': These have been made consistent with all other probes on v4 so you would use:
probe format
{
on_entry
{
// Put the equivalent actions to the format start here
}
on_exit
{
// Put the equivalent actions to the format finish here
}
}
The bytes operator was a v2 thing. In v4 you would express the code in
terms of C so you would probably use
char []
:
on_entry
{
char CmdText [200];
}
> I wasn't sure about the $function.
This is where v4 is much better than v2. Since you are writing your probes in C, you can just include the header files and call the functions directly. For instance, you wish to call the `creat' function. All you need to do is:
// Include the header files
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
probe format
{
on_entry
{
int fd;
fd = creat ("Filename", 0644);
}
}
and the same for access, system, printf, sprintf, write, etc. You'll find your probes look much better!
> The probe part I did I am pretty sure is wrong.
`probe' is quite different and you have to account for the different names used by GNAT and the registers used on SPARC [compared to PowerAda and AIX]. Here's an example which will be close:
probe thread {
// I'm guessing on the name here: If you have trouble finding the
// routine, run `aprobe -u info.ual -p -s <exe name> > syms' and all of
// the routines will be placed in the syms file.
probe "Queuing_Services.Read_From_Q[2]"
{
// Store the parameters on entry since the registers aren't
// available on_exit
int SrNum = $2; // Second parameter was $r4 on AIX
int Length = $3; // Third parameter was $r5 on AIX
char *Data = (char *) $4; // Fourth parameter was $r6 on AIX
on_exit
{
// Log the data
log (SrNum, Length, Data [0 .. Length - 1]) with DitFormat;
}
}
}
Your format routine should be defined above this; in v4 they are regular C routines but the important thing is that they take pointers to the data, so:
void DitFormat (int *SrNum, int *Length, char *Data)
{
// Do your processing here
}
A couple of comments on the new file: It is recommended that you use C++ style comments (//) unless you wish to keep code common with some existing C code since they are less error prone.
Make sure your format routine only has pointers for it's parameters.
Hope this helps - like I said, make sure you understand how to write simple probes, logs and logs with formats and then you should be fine to tackle this exercise.
Here is the source for a probe that should do the trick. It will record calls to all of the exec routines, including the file, calling user/group IDs, file user, group and mode information and the environment. It was written for Solaris but should work on other Unixes.
To compile, just save to your local disk and do
apc faq_exec.apc.
To use this probe you will need to have a new or existing workspace for the process you want to watch. Then either,
Although the first option is simpler, using Add Ual will make it easier to turn on or off later.
Now rerun, format and look for the exec calls. If necessary the probe can be expanded to record parameters if this will be necessary to identify it.
Yeah, the "obvious" direct cast doesn't work. The trick is to get that byte into something you can safely cast. The reliable way to do it is as shown below.
probe thread {
probe "qts.write_to_q" {
on_exit
{
/* this doesn't work: log("rc=", $rc, "=", (int)($rc)) */
char rc_val = *(char *)&$rc;
log("rc=", $rc, "=", (int)rc_val);
}
}
}
We did a rough one which just lists the calls made by each function in a Solaris/Sparc executable. You can grab this here.
It works by disassembling the object module and recognizing the call instructions, so it would take some work to port to other platforms. If you're really interested in having it extended or ported, please contact us.
A crash can happen because memory allocated using
malloc() or its variants is being corrupted by code that
writes past the end (or before the beginning) of the memory that's
returned, corrupting malloc's internal pointers or adjacent data.
The predefined "memcheck" probe detects this by putting a "fence" at the end of allocated memory, and checking the fence is intact when the memory is freed: see Q15.18.
You can use the ap_AddNewProcessCallback to add a callback when Aprobe
detects your new process. Pass it a handler that will be called in the
child process. For instance:
static void MyNewProcessHandler (ap_ThreadContextPtrT ThreadContext)
{
log ("Here is my new process");
}
The most reliable way is to use:
apcgen -qlines -p function_name -x module_name
This generates an on_line section for each line in the given function. You can redirect the output to a file and edit the file with your on_line actions.
For an executable module you can use:
apinfo -l exe_name
which lists all the symbols and their lines, if any. This output
is simply the "raw" line information, sorted by code offset, so is not
as useful for writing probes, though the output may be a good reference for
use with test coverage or a debugger.
In general, explaining why a Page Fault occurred is difficult. Note that there's a lot more Paging information available then just a fault count (e.g. current and peak Virtual memory, current and peak real memory, current and peak page file usage, etc.). The following is offered in an attempt to answer this question.
For process-specific page fault, or general memory usage information,
you can use the GetProcessMemoryInfo() function,
which is documented in MSDN. Here's the data structure returned:
typedef struct _PROCESS_MEMORY_COUNTERS {
DWORD cb;
DWORD PageFaultCount;
DWORD PeakWorkingSetSize;
DWORD WorkingSetSize;
DWORD QuotaPeakPagedPoolUsage;
DWORD QuotaPagedPoolUsage;
DWORD QuotaPeakNonPagedPoolUsage;
DWORD QuotaNonPagedPoolUsage;
DWORD PagefileUsage;
DWORD PeakPagefileUsage;
} PROCESS_MEMORY_COUNTERS;
typedef PROCESS_MEMORY_COUNTERS *PPROCESS_MEMORY_COUNTERS;
You can also use the native API ZwQueryInformationProcess()
which has the capability of returning more information than
GetProcessMemoryInfo() (which uses this native API).
For System-Wide page fault, or general memory usage information, you can
use the native API ZwQuerySystemInformation() to get all
sorts of performance data.
For both above cases, you can either create a thread that calls the
function periodically to collect sampling data, or better is to use the
Aprobe ap_DoPeriodically() function, which does this for you.
You could integrate this (or other system-wide statistics) with tracing in a manner similar to the RootCause "perf_cpu" probe. Contact us if you'd like some help with this.
Also note that you can use the Windows "PerfMon" feature to generate real-time graphs of these statistics correlated with program points. See Perfmon in the Aprobe User's guide.
(Note that the "Zw" native APIs mentioned above are not "officially" documented by Microsoft, but they are widely used in both user and Device Driver development are are safe to use. They are documented in "Windows NT/2000 Native API Reference" by Gary Nebbett and you can find them in DDK and web documentation as well).
You can generally pass a mangled name as the name to ap_NameToSymbolId()
and you'll get the correct Symbol ID. However, there is also the following
(defined in aprobe.h, of course):
extern void ap_Demangle(
ap_DemangledNameT *Result,
ap_NameT MangledName,
ap_BooleanT IsSubprogram,
ap_CompilerKindT CompilerKind);
Here is an example of how to use it:
{
ap_DemangledNameT DemangledName;
ap_Demangle(
&DemangledName,
".sec_fdk_Nam_Svc_Def__ELAB",
TRUE,
ap_AIXpa4_CompilerKind);
// Now we can use DemangledName.FullName
SymbolId =
ap_SymbolNameToId(
ap_ApplicationModuleId(),
DemangledName.FullName,
ap_ExternSymbol,
ap_FunctionSymbol);
}
Yes, this was introduced in RootCause 2.1.3/Aprobe 4.3.3 (February 2004).
The way to do it is specify #pragma optional
in column 1 immediately inside the probe (or typedef probe), for example:
probe thread {
probe extern:"PrintDebug()" {
#pragma optional
...
}
}
Conversely there is also a #pragma required which forces
a warning in the case where the module is undefined. By default,
a warning is not generated on probes on missing modules. For example:
probe thread {
probe extern:"open()" in "libpthread.so" {
#pragma required
}
}
would force an warning if libpthread.so was not among the libraries loaded
by the application.
This was possible prior to version 2.1.3 but was harder since it required use of a typedef probe and programmatic checking and instrumentation using the Aprobe API. (See for example AllocationFunctions[] array in memwatch.apc.)
Yes, if:
this pointer for that method's class available (or
else the method is static).
In these cases, you call it just like a C function (see Q17.22 except that you pass this as the first parameter). For
example, suppose you have a class that looks something like this:
class Example {
public:
void doIt(const string& s);
void debugIt(const string& s);
};
And you want to call debugit() on entry to doit().
The following works:
(Note the & when passing the string parameter: APC automatically
dereferences reference parameters, so you need to "restore" the reference.)
probe thread {
probe "Example::doIt" {
on_entry {
$("Example::debugIt")($this, &($s));
}
}
But obviously this is a very simple example. In many real cases you have
template instances with long and subtly different names. In such cases,
you can use apcgen -vL to list the methods in an individual
object file and "grep" for the methods you're looking for and try
to match up the line numbers.
When you have dynamically dispatched calls, you are limited to methods in common base classes, or else you need to use some conditional test to determine which specific method to call.
Often the best choice is to use a separate extern "C" C++
module as an interface between your probe and the call, as described in
Q17.23.
As always, if you have problems or questions, contact .
This is a good example how to combine some simple C++ with a probe to avoid having to reverse-engineer the C++. This is basically the same for all platforms, except for linking. Here's an example:
// example.cpp - setting and getting std::string values
// compile with '$(CCC) -g example.cpp -o example.exe'
// where '$(CCC) is xlC for AIX, CC for SunWorkshop, g++ for GCC
// run with zero or more arguments, e.g.,
// $ example.exe one
// will print
// Example="example.exe"
// Example="one"
#include <iostream>
#include <string>
using namespace std;
static void print_string(string &s)
{
cout << s;
}
class Example
{
public:
void put_string(char *val);
string get_string(void);
void print_string(void);
private:
string value;
};
string Example::get_string(void)
{
return value;
}
void Example::print_string(void)
{
cout << "Example=\"";
::print_string(value);
cout << "\"" << endl;
}
void Example::put_string(char *val)
{
value = val;
}
int main(int argc, char **argv)
{
string Str;
Example X;
for (int i = 0; i < argc; i++)
{
X.put_string(argv[i]);
Str = X.get_string();
// ::print_string(Str);
X.print_string();
}
return 0;
}
Here's some helper C++ to provide operations on the std::string:
// cppstring_help.cpp - C++ functions supporting cppstring.apc
#include
Here's a header file for cppstring_help, that will be used by the probe:
// cppstring_help.h - C++ functions supporting cppstring.apc
extern const char *get_string(void *std_string_ptr);
extern void set_string(void *std_string_ptr, const char *from);
#define GET_CPP_STR(S) get_string((void *)&S)
#define SET_CPP_STR(S, NEW_CS) set_string((void *)&S, NEW_CS)
And here's the apc:
// cppstring.apc - setting and getting std::string values and parameters
// from a probe.
// OTHER REQUIRED FILES: cppstring_help.h, cppstring_help.cpp
// BUILDING THE UAL:
// All:
// $(CCC) -c cppstring_help.cpp # create cppstring_help.o for your platform
// Solaris:
// apc cppstring.apc -x example.exe -u cppstring_help.o
// AIX:
// apc cppstring.apc -x example.exe cppstring_help.o -linker "-lC"
// Linux:
// apc cppstring.apc -x example.exe cppstring_help.o -linker "/usr/lib/libstdc++.so.6"
// Windows:
// apc cppstring.apc -x example.exe cppstring_help.obj
//
// RUNNING
// Run with 'aprobe -u cppstring example.exe' to get output like:
// put_string: Changing example.exe to probe_string1
// get_string: Changing probe_string1 to probe_string2
// print_string: Changing probe_string1 to probe_string3
// this defines the macros GET_CPP_STR and SET_CPP_STR
#include "cppstring_help.h"
// The replacement strings:
static char probe_string1[] = "probe_string1";
static char probe_string2[] = "probe_string2";
static char probe_string3[] = "probe_string3";
probe thread
{
probe extern:"Example::put_string(char*)"
{
on_entry
{ // change entry parameter to be a new string
printf("put_string: Changing %s to %s\n", $1, probe_string1);
$1 = probe_string1;
}
}
probe extern:"Example::print_string(void)"
{
on_entry
{ // change entry parameter to be a new string
printf("print_string: Changing %s to %s\n",
GET_CPP_STR($this->value),
probe_string3);
SET_CPP_STR($this->value, probe_string3);
}
}
probe extern:"Example::get_string(void)"
{
on_exit
{ // change return value parameter to be a new string
printf("get_string: Changing %s to %s\n",
GET_CPP_STR($return),
probe_string2);
SET_CPP_STR($return, probe_string2);
}
}
}
In the comments above, note the different ways linking is done to include the C++ library. The '-u' flag on Solaris means the probe will reference the definition that are in the application as described in q20.13.
See Chapter 5 of the Aprobe User's Guide for
Unix
and
Windows
.
Yes. Here's a simple application, a probe, and the xmj file:
// The application Simple.java
public class Simple
{
int doIt ()
{
return 10;
}
public static void main (String[] args)
{
System.out.println ("doIt returns " + new Simple ().doIt ());
}
}
// The probe SimpleProbe.java
public class SimpleProbe extends com.ocsystems.aprobe.ProbeMethod
{
public Object onExit (Object returnValue)
{
return new Integer (11);
}
}
<!-- The xmj file simple.xmj -->
<probe_deployment>
<probe class="SimpleProbe" parameters="readonly">
<target value="Simple::doIt"/>
</probe>
</probe_deployment>
$ javac Simple.java
$ javac -classpath $APROBE/lib/aprobe.jar SimpleProbe.java
$ apjava -u simple.xmj -java Simple
doIt returns 11
Unfortunately not. Java requires that all exceptions, other than RuntimeException and it's descendants, must be declared by the method or caught. We cannot specify that the base Aprobe Patch class throws a specific exception because that would require that all methods that called it would have to either catch the exception or specify that it throws it. However, you can throw any RuntimeException.
Yes there are a few ways:
com.ocsystems.aprobe.TraceBean.logComment method
to log a comment. You'll get an exception if you have de-selected
trace for the run because you are calling a native method
directly.Yes, starting with RootCause version 2.1.3a (April 2004). To stub a method, simply call the stub() method at the end of the onEntry probe method, for example:
import com.ocsystems.aprobe.*;
public class TestProbe1 extends ProbeMethod
{
public boolean onEntry(Object[] parameters)
{
stub();
return true;
}
}
Sorry but you cannot probe any classes in the bootpath, which includes rt.jar. This is a limitation basically imposed by the JVM because you cannot call methods which are not in the bootpath from within bootpath classes. That is, you could never apply a probe because that class would be in the child's class loader so the parent wouldn't have visibility. In informal discussions with engineers in Sun's JVM group they said it was a bad limitation of the JVM because it made bytecode patching, which was a "preferred" technology, very difficult.
We have kicked around the ideas of having a bridge to native code in the bootpath classes and then the native code calling the probes but the technical issues are difficult.
For some problems, instead of probing these classes it's possible to probe the native methods underneath. For example, probe the file access routines in the libc library (or equivalent on Windows) rather than the java.io methods.
The 'this' object is the first parameter (params[0]). So if you're
probing a method in class SquareID, and you want to call otherMethod()
there, then it'd be something like:
Note that the code has to import the SquareID class, too:
...
SquareID id = (SquareID) params[0];
id.otherMethod();
return true;
...
import SquareID;
See Custom Java Probes in the RootCause Java user guide for more basic information.
No. Most or all of it must be done from the command line. In a GUI
you can click on "Custom" button in the setup options, but this would
only bring up a help dialog with the instructions on how to set the
XMJ and the corresponding Java code. You can cut and paste from this
dialog to create your .xmj file in the workspace. After that, you
would probably only use the workspace and intercept mechanism to
deliver your probes to the application in an automated fashion. You
could apply these probes directly to your application using the
apjava command. RootCause just hides this from the user
of the application.
First, you would have to create a workspace for IEXPLORER. (This is most easily done starting from the APP_START event for IEXPLORER in the RootCause Log). Then you would need to setup for a Java trace in this workspace. Since IEXPLORER is not a Java application (it has JVM library linked into it), you will find that RC Trace Setup tree would not have a $Java$ module node created and available for trace selection.
To make the $Java$ node appear in the Trace Setup you need to add at least one class path entry to the workspace. If all your Java applet classes will be loaded from the web, you would technically not need any classpath entries, but you could still add a dummy one (just type in any directory name on your hard drive in the dialog opened by the Setup->Class Path menu item.
Once you have added at least one class path entry, click on Setup button. You should now see $Java$ module. Click on it and use MB3->Trace All Java Classes to setup a trace for all the Java. Of course, you can be more selective in what you would like to trace.
Yes, starting with RootCause version 2.1.3a (April 2004). There are two parts:
<?xml version="1.0" encoding="UTF-8"?>
<probe_deployment>
<probe class="TestParamsProbe">
<target value="ParamsTester::callIt(java.lang.String,boolean)"
parameters="readwrite" />
</probe>
</probe_deployment>
import com.ocsystems.aprobe.*;
public class TestParamsProbe extends ProbeMethod
{
public boolean onEntry (Object[] params)
{
// params [0], the 'this' parameter, can't and won't be changed.
params [1] = new String ("This is a new string");
params [2] = new Boolean (true);
return true;
}
public Object onExit (Object returnValue)
{
int value = ((Integer) returnValue).intValue ();
return new Integer (value + 1);
}
}
The Variables pane in the RootCause Trace Setup dialog only supports logging Java parameters (all or none). In a custom probe, you can access individual parameters by position, and the return value. From a custom Java probe, you can access public class data just as you would from another class in your Java application. There is no access to method local data or class private data.
Yes. In APC you'd write something like:
probe "a()" {
probe "b()" {
on_entry
do_something();
}
}
For Java it's not quite as clean as with APC because of the split between the probes in Java and the definition in XML. The file Example14.java has two Probe Methods; the MyUmbrellaProbe is the equivalent of the "a()" in the above example. It creates a new MyNestedMethodProbe probe (i.e., "b()") in it's onEntry method. The file Example14.xml is the probe deployment descriptor. We just define both targets in it. Note that you don't specify the hierarchy in the XML: it's defined by the Java probe.
Printing you understand. You call "printf()" or "puts()" and it displays what you passed to it directly to standard output (or some other file if you used fprintf()) as soon as the call is executed.
Logging, as implemented by the "log" directive in APC, is more complicated. It writes the data you specified within the parentheses to a memory-mapped APD file, and associates a "format routine" with that data. The format routine is not called, and the data is not displayed, until later when the "apformat" command is run over the APD file.
Another important difference between printing and logging is that the Aprobe log mechanism is lock-free, whereas printing requires a lock to get exclusive access for the printing thread. This gives a significant advantage to the log operation in multi-threaded applications where performance and deadlock are considerations.
All parameters to a format routine must be *addresses*. So if you do
log((int) x) with myformat;
then you must have
static void myformat(int *i) { ... };
If you had declared "myformat(int i)" then you would get a warning from the C compiler invoked from `apc'.
The short answer is, "Because that's how it works." There are two real reasons. The first has to do with the whole logging/formatting concept. Data is copied to a memory-mapped file when logged. When formatting, we memory-map the APD file. To pass the data to the format routine directly, we'd have to allocate temporary space of the right size and copy it again.
It's much more elegant to pass everything -- scalars, structs, and arrays -- by pointer. That way, when you log an `int' value, you write it to the APD file, and when you format it, you just pass its address in the memory-mapped apd file directly to the format routine. This allows ints, arrays of ints, and structs to all work the same way.
The second reason is related to the first, and has to do with the fact the C doesn't have an array "type", but rather treats any adjacent locations in memory as an array. Here's what our chief designer has to say on this subject:
When designing the APC extensions such as 'log' statements we had to make sure that they would work with any data types, including scalars, structs and arrays. It was array types that gave us the most problems, mostly due to the fact that C has very little support for arrays.
Even though one can declare an array with a given number of elements, such declarations are limited as to where they can appear (e.g. you can not use a pointer to an array declaration inside of a formal parameter list) and operations for array types are essentially the same as operations for pointer types.
Now consider these 2 log statements below:
int foo[10];
log(foo[0]) with MyFormat;
log(foo[0..9]) with MyFormat;
The format for the first log statement could have used 'int' like you suggest, but what about the second log statement? Of course, we could have treated the first log statement differently from the second one, since the first one clearly logs one element, while the other logs a range of elements. If we did so we would use 'int' in the format declaration for the first 'log' statement and 'int *' for the second. But even so, you would still have cases like this:
log(foo[0..0]) with ... // Do you use 'int *' here or 'int'
? log(foo[Var1..Var2] with ... // We don't even know the number of
elements here.
The requirement that all formats use pointers to the data as argument allowed us not to make any distinction between the way we log scalars and arrays. If this seems to be confusing to you, you can always use a simpler interface, where you don't have to provide any formatting routine at all.
log("foo[0] => ", foo[0]);
If this doesn't make sense to you, you are not alone. Some of us didn't like the way this had to be done either, unfortunately no one came up with a better solution than the one we have right now. If you have such suggestions, feel free to share them with us.
This is specified as a parameter to the aprobe command. By default there is a single 256M file (1M on Windows). You can specify the number of files (see the next Q.) and/or the maximum size of each file. You set the maximum size of each file (in bytes) with "-s n_bytes". You set the number of files with "-n num_files", where num_files must be in the range 0-9. If you specify 0, all logged output is discarded. If you specify 2 or more, but don't explicitly set the size with "-s", the maximum size is set to 2 megabytes.
Note that on Unix Aprobe data files grow up to the maximum size. Unfortunately Windows does not allow memory-mapped files to grow. They are opened to their maximum size.
The "APD ring" is how the aprobe logging mechanism deals with large quantities of data. By default there's a single APD file produced by aprobe, with a maximum size of 256 M on Unix platforms and just 1 M on Windows (because memory-mapped files are not dynamically extensible). If you try to log more than that, the last (newest) data is lost.
If you specify more than one file, the files conceptually form a "ring" so that the most recent data is always kept, and the oldest data is lost. The ring is really more like a fixed-length stack where data falls off the bottom when additional data is pushed onto a full stack.
Details are described under "APD File" in Appendix B (Files Reference) of the Aprobe User's Guide.
You can't log data to whatever file you want, but you can register a callback routine that is called whenever the logging mechanism changes to a new file in the ring. This is illustrated by the example in APROBE/examples/learn/apd_ring included with Aprobe.
See the section "Log Statement Overhead", under "Aprobe Performance Considerations", in Chapter 4 of the Aprobe User's Guide.
The appropriate place for such data is the persistent apd file. You
can log to this like this:
log (...) with blahformat to ap_PersistentLogMethod;
Since the persistent file is always formatted first this would mean that you would get your data earlier than you would if you logged to the apd files, in the format on_entry part.
On Windows, Aprobe calls
GetSystemTimeAsRealTime
defined in
winbase.h
On Solaris, Aprobe reads the realtime clock directly using:
clock_gettime( CLOCK_REALTIME, ap_TimeT_ptr);
defined in
/usr/include/time.h
.
On AIX, Aprobe reads the realtime clock directly using
read_real_time
, then converts to
ap_TimeT
using
time_base_to_time
, both defined in
sys/time.h
.
On Linux, Aprobe just calls
gettimeofday()
defined in
sys/time.h.
Almost certainly it's timing. Each time a thread is created, aprobe collects some information. This can delay thread creation somewhat and change the order in which threads are executed. Also, your probes take some time, and delay a thread that executes a probe relative to another that does not.
There's a function ap_CurrentAprobeState() that returns either ap_AprobeRunTime or ap_AprobeFormatTime. So you can do:
if (ap_CurrentAprobeState() == ap_AprobeFormatTime) { ... }
in your probe format. This is the preferable way.
probe program {
on_entry {
DumpInfo();
// Don't run the program. Exit after printing all the info.
// (MAGIC exit code tells runtime this is *not* and error)
exit(APROBE_MAGIC_EXIT_CODE);
}
}
probe format {
on_entry {
if (ap_CurrentAprobeState () == ap_AprobeFormatTime)
{
DumpInfo();
/* Don't do any formatting. Exit after printing all the info. */
exit(0);
}
}
}
On Solaris, a structure returned by value is written to space on the stack allocated by the caller. However, if the caller is discarding the returned value by calling the function as a procedure, no space is allocated. In this case, a probe which may normally attempt to change the return value should not do so, as it will likely corrupt memory. In order to allow users to handle this problem, the following macro is provided:
#define ap_StructValueReturnExpected private
This would be used as a boolean expression in an
on_exit
part as follows:
probe "UpdateCoordinates()"
on_exit
if (ap_StructValueReturnExpected)
$return.x = $return.y = $return.z = 0;
}
The way to do it is to save the address of the parameter on entry and log the dereferenced value of the saved pointer on exit from the subprogram. For example:
Given the C++ file "t.C":
typedef struct
{
int a;
int b;}
MyStructT;
void foo(MyStructT &MyS) {}
main(){
MyStructT S = {10, 1999};
foo(S);}
Then the following apc file "t.apc" would do it:
probe thread
{
probe "foo"
{
typeof($1) *Param1 = &($1);
on_exit
{
log("Param1 => ", *Param1);
}
}
}
Note that the declaration where Param1 is initialized is executed in an implicit on_entry part. See the next Q about using "typeof", and other ways to declare variables.
There are (at least) 3 possibilities, illustrated in the APC file below:
probe thread{
// Method 1: Use the APC "typeof" operator on the type name directly as a
// target expression:
probe "foo" {
typeof($MyStructT) *Param1 = &($MyS);
on_exit {
log("Param1 => ", *Param1);
}
}
// Method #2: Use the APC "typeof" operator on the target
// expression for the parameter name:
probe "foo"{
typeof($MyS) *Param1 = &($MyS);
on_exit{
log("Param1 => ", *Param1);
}
}
// Method # 3: Use the "typeof" operator on the target expression
// for the positional parameter.
probe "foo"
{
typeof($1) *Param1 = &($1);
on_exit
{
log("Param1 => ", *Param1);
}
}
}
This applies whether you're capturing a parameter or global value, or even assigning an APC value to a target expression. The type declaration is the important point here.
Of course target expressions apply only if you have debug information available for the definition of the various names. Otherwise, you must reproduce or include the C type declaration directly in the APC, and reference it there.
This can be tricky. What you need to do is get a list of all the functions as Aprobe will reference them. The info.ual predefined probe is provided precisely for this purpose, and "apcgen -L" also works. In this case, if your executable were named "myprog.exe", and the method you wanted to probe were called Method, try:
aprobe -u info -p -s myprog.exe | grep Method
or
apcgen -Lv myprog.exe | grep Method
This gives each function name which can be probed, and the file and line on which it's declared. This can still be pretty tricky for template instances, but it's the best we have at the moment.
For example:
[Enter : extern:"dyndll2d1_a()?0" in "dyndll2d1.dll" at 10:08:02.799519095
[Enter : extern:"dyndll2d1_a()" in "dyndll2d1.dll" at 10:08:02.799526359
[Enter : extern:"dyndll2d2_c()" in "dyndll2d1.dll" at 10:08:02.799534181
]Leave : extern:"dyndll2d2_c()" in "dyndll2d1.dll" at 10:08:02.800248518
]Leave : extern:"dyndll2d1_a()" in "dyndll2d1.dll" at 10:08:02.800256899
]Leave : extern:"dyndll2d1_a()?0" in "dyndll2d1.dll" at 10:08:02.800262486
In the above example my DLL has a routine named dyndll2d1_a in it. In the trace above some routine named "dyndll2d1_a()?0" is called before my routine "dyndll2d1_a()" is called. There certainly isn't anything like
dyndll2d1_a()?0
in your source code anywhere?
Aprobe is attempting to show you what is really happening in your program. Often when one creates a DLL in Windows the C++ compiler adds a small routine that just calls the real routine. This "thunk" routine is what is pointed at by the exports directory in your DLL. It is this piece of code that Aprobe has named "dyndll2d1_1()?0" and that you are seeing in the trace.
Aprobe attempts to gather as much information about the routines in your program as possible. The sources of information include:
While this variety of sources presents a wealth of information about your program it also can cause a problem when the information is not consistent. In the above example the exports directory lists a symbol "dyndll2d1_a" and has it pointing one place, the PDB symbol table has the same name pointing another place. Aprobe requires that each global symbol point in only one place (you can have similarly named local symbols pointing in different places since they must be distinguished by their associated file name). Thus when Aprobe detects two symbols that point in different places it will change the name of one of the symbols to resolved the conflict. You can use the apinfo command to discover what symbols Aprobe has found and what name it uses for each one.
Another symbol-related issue, not illustrated above, is when two different symbols both point to the same address. In this case both symbols will exist in Aprobe's master symbol table but only one ap_FunctionId will be created. Both symbols will point to this single ap_FunctionId but it will only point back to one of the symbols. This can produce a situation where you probe one routine but the name given in a traceback for the routine will be a different name.
Generally the only way to get these kinds of duplicate symbols is by taking some explicit action in the source code or in a .DEF file to the linker.
Yes, you can. This is described in the User's Guide under "Building a UAL with Unresolved References" in
Chapter 4
of the Aprobe User's Guide.
In dbx, run pathmap with no arguments to list the pathmap. Then build a value for the environment variable APROBE_SEARCH_PATH that consists of each of the "to" directories listed in the pathmap output in the reverse order they're listed.
However, if your pathmap values are partial paths (that is, there are several subdirectories under the "to" directory that contain object), you will need one entry in APROBE_SEARCH_PATH for each such subdirectory.
Note that aprobe and apc will always use the object file at its original location if it exists. This could be a problem if you replace that with new object files of the same name while still referencing the older executable.
A user wrote: "What I want to do is:
probe thread
{
probe "myfunc()"
{
ap_BooleanT IsEnabled = TRUE;
on_entry
if (some_expression)
{
IsEnabled = FALSE;
return;
}
on_exit
if (! IsEnabled) return;
on_entry
do_something();
on_exit
do_something();
}
}
The first on_entry/on_exit pair would be the wrapper part and would prevent any second on_entry/on_exit pair from executing. Can I count on the first pair executing in order?"
Here's the answer: "Yes. On_entry/exit should execute in lexical order. If you have multiple probes on the same routine, their on_entry's should execute in lexical order as well, however on_exit's will execute in the reverse order to ensure proper nesting. Probe program on_entry actions are executed before probe thread's ones and those are executed before any subprogram probes if any.
UALs on the aprobe command-line (or in the RootCause workspace's aprobe script) are initialized in reverse order, i.e., right-to-left. Similarly, two probes on the same function in different UALs are executed right to left, for example:
$ aprobe -u t1 -u t2 t.exe
enter t2:main()
enter t1:main()
exit t1:main()
exit t2:main()
Not in general, but on Solaris, it is possible to compile C++ object files against your C++ target application, and link them into your UAL under certain conditions, but there are restrictions on what data you can access and you must supply extra flags on the apc command. This process is described below:
Yes, on Solaris. A UAL is a shared library created by the apc compiler when you compile your probes. Generally all functions and data called from within the probe must be defined in the probe, or in a library linked with it. Accesses of data and calls to functions whose names are not known when the UAL is linked are unresolved references, and usually render the library unusable since its undefined what should happen when the reference is encountered at run-time. However, there are certain circumstances when unresolved references are useful, thanks to the run-time linking mechanism provided by the Solaris operating system.
The circumstances in which an unresolved reference in a UAL are these:
For example, suppose you want to write a probe that creates a new object of a given class, gives it the right values, and passes it to a method in the class. This is trivial to do in a new C++ function, so you write one up, give in an extern C, so it can be called from your probe (which is in C) and compile your C++ function into a separate object file call CallMyClass.o. CallMyClass.o contains unresolved external references to the C++ runtime, and to constructors of the class which are used to create the object.
Now you write your probe, MyClassProbe.apc, and compile it as follows:
apc MyClassProbe.apc CallMyClass.o -u -x MyProgram
The option -u on the apc command line specifies that unresolved symbols are allowed to remain in the UAL.
Now run aprobe:
aprobe -u MyClassProbe.ual MyProgram
This is what happens to the unresolved symbols when you run aprobe and the UAL is loaded:
When the UAL is loaded, all references to data symbols are resolved. If the undefined symbols are not present in the application or one of its shared libraries, the UAL will not be loaded. This is particularly important to understand at format time since the application symbols are not present at format time. Therefore, no UAL with unresolved data symbols may be used at format time, even if the references will never be executed.
When the UAL is loaded, function references are not resolved until the function is called. This means that symbols may remain unresolved so long as no attempt is made to call those functions. This means that at format time, UALs with references to function symbols which are only found in the application may be loaded so long as no attempt is made to call those functions This is generally the case, since only format routines and not probes are executed at format time.
This can be tricky if the C++ code you want to link in is complicated, or unknown to you. However, if its something straightforward that is under your control to change, it can generally be adapted to these restrictions. As always, contact OC Systems for guidance in these advanced features.
(Unix) Can I force a snapshot of my predefined probe data by sending a signal to my application?
Yes. The following apc code registers for SIGPROF and does a snapshot:
#include <signal.h>
#include "memwatch.h"
static void Handler (int sig, siginfo_t *siginfo, void *ucp)
{
printf ("Taking snapshot on signal %d\n", sig);
ap_Memwatch_DoSnapshot ("snapshot signal");
}
probe program
{
on_entry
{
ap_RegisterSignalHandler (SIGPROF,
ap_CallBeforeUserAction,
Handler);
}
}
If this file was memwatch_sig you would compile it with:
apc memwatch_sig.apc memwatch.ual
and then use
memwatch_sig.ual
instead of memwatch.ual when running. Then send the signal (kill -PROF
pid
) to generate a snapshot.
Aprobe only supports getting one slice at a time -- for the right-most index. For individual elements, therefore, it's trivial:
log ($available_overlays [1] [1]);
or you could use a single slice:
log ($available_overlays [1] [1 .. 10]);
Multi-dimensional arrays should scale up fine. Since the arrays are stored contiguously you could cheat and cast it to a one-dimensional array if you're clever about your labeling.
The apc command does a `chmod 640' on the ual it generates after a successful link. This is necessary because this effects how the shared module is loaded at run-time. Here's an excerpt from AIX 'info' output for 'dlopen()', which is the runtime routine used to load UALs when running aprobe:
If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the
slibclean
command to remove unused modules from the global shared library segment.
It seems obvious that we don't want individual's uals the shared library segment. Multiple edit/apc/aprobe cycles could result in bizarre behavior. The
slibclean
command can only be run by an account with su privileges.
We strongly advise against linking probes with a thread library since it can cause major problems when run against a single threaded application. The recommended approach on AIX, although a little painful, is to look up the symbol dynamically and call it by pointer. Here is an example for pthread_attr_getstacksize:
// Define a type to map to the routine
typedef int (*pthread_attr_getstacksize_subprogram_T)
(pthread_attr_t *, size_t *);
// Declare a variable to hold the address
static pthread_attr_getstacksize_subprogram_T
pthread_attr_getstacksize_subprogram_ptr = NULL;
probe program
{
on_entry
{
pthread_attr_getstacksize_subprogram_ptr =
(pthread_attr_getstacksize_subprogram_T)
ap_FunctionPointer (ap_ModuleNameToId (PthreadModuleId (),
"pthread_attr_getstacksize()",
ap_NoName);
// Call it - note don't do this on program entry until you have the
// fix for that!
if (pthread_attr_getstacksize_subprogram_ptr)
{
pthread_attr_getstacksize_subprogram_ptr (&Attributes, &Size);
}
}
The PthreadModuleId () routine would look something like:
static ap_ModuleIdT PthreadModuleId(void)
{
ap_ModuleIdT Result;
/* First, the 4.3 case */
Result = ap_ModuleNameToId("libpthreads.a(shr_xpg5.o)");
if (ap_IsNoModuleId(Result))
{
/* Didn't find it in shr_xpg5.o, so if we don't find it in shr.o ...
*/
Result = ap_ModuleNameToId("libpthreads.a(shr.o)");
/* ...we'll give back that null result. */
}
return Result;
}
For Solaris things are a little different because, in general, Solaris provides stubs to all of these routines in libc.so. Therefore you can just call the routines directly.
For Linux a similar approach can be used as for AIX. In that case the module is "libpthread.so" always.
Windows always has thread support so nothing special is needed.
Yes. Defining and referencing thread-specific data is built into Aprobe. Here is an example:
int *GetThreadSpecificInt();
probe thread
{
int ThreadSpecificItem = 0;
int *GetThreadSpecificInt()
{
return &ThreadSpecificItem;
}
}
Now you can call GetThreadSpecificInt() function from
anywhere to get hold of the thread specific data item. This should
work equally well on all the platforms and be usually much faster than
using pthread functions.
You can report or take actions when each thread starts and stops as well:
probe thread
{
on_entry
printf("Entering thread\n");
on_exit
printf("Exiting thread\n");
}
The predefined probes in the $APROBE/probes have many sophisticated examples
of this. A simple example is available on Unix platforms in
$APROBE/examples/evaluate/5.threads.
These are interesting differences:
These differences can make Aprobe a little harder to use on a C++ application, or a little less satisfying when a probe logs data to be formatted for easy reading. For example, constructors and destructors may get profiled/traced, but most of the time, they just clutter the report; objects may be shown with member addresses instead of member data; mangled names sometimes show in reports or apc input; exception object content may be needed but lacking; output may show the internal form of an expanded template rather than the source form written by the programmer; a probe's references to inherited data may need compilation by the C++ compiler to be right.
OC Systems is developing a strategy whereby C++ can be linked with the probes to circumvent many of these problems--contact us to learn more.
If your application is bigger than, say 100M, the chances are
that it's running out of memory. You can verify this by running
the "apsymbols" command, for example, apsymbols
c2.eab. If it crashes, then that's the problem. If
apsymbols doesn't crash it, the problem might be elsewhere.
See FAQ
13.13.
There are two known reasons why aprobe may cause the application to run out of memory:
aprobe -h | head to see what version you have.
See the next question for possible workarounds.
This is a side-effect of having huge C++ applications as
described above. On AIX there's a way to give your application
more memory. AIX supports a concept called the
Large Address-Space Model.
This may be applied via an environment variable when running, for
example:
LDR_CNTRL=MAXDATA=0x20000000 c2.eab
or
LDR_CNTRL=MAXDATA=0x20000000 apformat c2.apd
This means it allocates all of 2 memory segments (3 and 4) for your application's memory. If you need even more memory you could try 0x30000000 but this may not work at runtime because some applications hard-code use of segment 5.
This is, again, because of the huge symbol tables in ERAM C++ programs. The workaround is to use Aprobe's ADI (Aprobe Debug Information) mechanism to pre-construct the symbol table for an executable. Here's how it works:
apmkadi command, for example:
cd /u/m2
apmkadi -o m2.adi m2.exe
aprobe -u m2.adi -u trace.ual m2.exe
apformat -u m2.adi m2.apd
The AIX C++ compiler, unlike other compilers, generates a copy of the C++ runtime exception-catching function in every shared library, rather than just the C++ runtime library. Aprobe automatically instruments this function, "__Throw" in the predefined libC.a library, but not in user-provided libraries. For that, you must use a special probe, cppexcmodules.apc, edited to name your library or libraries.
This is likely because the code you are probing was compiled with optimize. Check your Makefile to see if CFLAGS, CXXFLAGS contain -O.
Unless you're an Aprobe or RootCause power user, the way to do this is with the statprof predefined probe (Unix platforms only). If possible, use it in an environment where the application terminates normally or with Ctrl-C (but not "kill -9"). Simply put "-u statprof" on the command-line or in the .apo file, and when you format a table will be generated showing what functions used what percentage of CPU. Details are in the user's guide.
If your application doesn't terminate normally you'll need to force a snapshot, as described below. If the output of statprof says something like:
56.7 0.59 Other functions (not in profiled module)
then you can see the usage throughout all modules by re-running with
-u statprof -p -c, where -c means "course" and will show the usage
of all modules. If the usage was mostly in, say, "libXm.a(shr4.o)"
then you can rerun again to analyze just that one with
-u statprof -p "libXm.a(shr4.o)".
The memcheck ual watches for things like spilling over the limit of a memory area. memcheck requires no configuration files and simply checks standard allocation and deallocation routines. It checks the validity of allocated data on normal program termination, memory signal, or explicit request via call to ap_Memcheck_DoCheckpoint.
The memwatch ual can detect things like unfreed memory accumulating. It doesn't have a configuration file, but requires that the program terminate normally to dump its data. If the program doesn't terminate normally you can use dbx to force a snapshot.
The memstat probe is used primarily with the RootCause GUI because it requires some configuration, but is much more usable with respect to overhead and analysis. For more details on this probe, see RootCause Memory Tracking Probes on the web site.
Debugging a real-time application with dbx (or gdb) is usually tricky, because the debugger must attach to the process in real-time. Aside from the problem of hitting a moving target, hitting the target stops the process. With Aprobe, both problems are easily solved using a custom probe. The model for the probe is below, but an introduction to the concept is needed:
The Aprobe solution is to write a probe which monitors for a reason to debug, and forks a copy of the real-time process when the monitor sees a need. The parent process then continues, while the copy stalls itself in the probe so dbx can attach to the copy. Here is the model, and a talk-through follows the model:
#include <sys/types.h>
#include <unistd.h>
probe thread {
probe "somewhere_where_there_can_be_a_problem" {
on_line (where_there_can_be_a_problem) { // or on_entry or on_exit
if (the elusive problem the user is watching for has occurred) {
// here is the guts of the probe
int normal = $some_reference; // save a normal state, explained below
pid_t child;
child = fork();
if (child) fprintf(stderr,
"Oops -- such-and-such happened -- gdb xxx %d\n", child);
else while (++child) {
if (child > 600) exit(1); // kill if unused in ten minutes
if ($some_reference==normal) sleep(1); // stay in the probe
else {$some_reference = normal; break;} // leave the probe
}
}
}
}
}
The 10-minute stall loop stops counting as soon as dbx attaches. If the
user finishes digging and detaches dbx, loop counting would resume and
the probe would kill the application copy if the user forgot to kill it.
But if the user wants to set breakpoints and resume the application copy
out of the stall to debug it, the method is to use dbx-set to change a
chosen piece of static data and dbx-continue. The probe sees a state
change, restores the saved state, and returns from the probe. This is
the only way the throwaway child process would execute beyond the probe.
Debugging the forked process over a breakpointed path goes beyond interactive data digging at the point of a problem, and may not be needed for every problem. If not, there is no need to chooses a static integer visible to dbx and the probe.
This "living dump" concept is useful for distributed applications, because the parent application process is unaffected by this probe. The whole distributed operation should be unaffected. Yet the user would have an attachable copy of a troubled process that might have stalled itself while the cause of a problem was still visible. Digging for the problem can be leisurely, since it makes no difference if the parent process continues or ends.
Different compilers have different low-level implementations for these and it's best to just call the C++ size method if possible. This worked on our RH8 gcc 2.95.2 system:
probe thread
{
probe extern:"::myroutine(void)"
{
on_entry
{
// The list is in a variable called my_list.
// We need to call list.size ():
log
($("list<basic_string<char,string_char_traits<char>,__default_alloc_template<false,0> >,allocator<basic_string<char,string_char_traits<char>,__default_alloc_template<false,0> > > >::size(void)const") (&$my_list));
}
}
}
I found the routine's fully qualified name using apsymbols (or apcgen) and grepping for "size".
For possible reasons for such crashes, see questions 13.13, 13.16 and 20.19. If you have a core file, keep reading.
The first thing to check is whether any probes you have written are
responsible for illegal memory references. These will cause core dumps just
like any C or C++ program. If you have a machine-level debugger installed
you can usually use it to get the a stack trace. On AIX and Solaris:
dbx /full/path/of/your-application /find/the/core-file
On Linux:
gdb /full/path/of/your-application -c /find/the/core-file
(That is, the first argument is the name of your executable, and the
second is the path to the core file it dropped, which should be in the
program's PWD.) Then enter the command where
which will give the stack trace at the point of the core dump.
(On Solaris, dbx is part of the Sun Workshop toolset and may not be installed on your target system if your applications run on a different host than they are compiled on. Similarly, on AIX you need to have the bos.adt.debug fileset installed.)
If the stack trace includes a function name which looks like:
OnExit_0094_L0013(...
then the core dump probably occurred in one of your own probes. Look at the
integer in the third part of the name: this is the line number of the
'probe' directive in the .apc file (in this case, 13). You may also see
names beginning 'OnEntry' or 'OnOffset'.
If dbx complains that the core file doesn't match the your application, you should run:
On Solaris:
dbx $APROBE/bin/aprobe /find/the/core-file
On AIX:
dbx $APROBE/bin/aprobe.exe /find/the/core-file
On Linux:
gdb $APROBE/bin/aprobe -c /find/the/core-file
Send the output of the where command to
support@ocsystems.com and it should
give us a clue. Remember to state what version of RootCause/Aprobe you are
running (this is reported by 'apconfig' or 'aprobe -h | head')
AIX only: slibclean to correct shared module problems:
Lastly, run 'slibclean' and see if that fixes the problem. 'slibclean' is an
AIX utility which removes unused shared modules from the system's memory. It
require root access, but some sites elect to make this application 'setuid'
so it can be run by ordinary users.
Allowing full core files
In the event dbx complains about a truncated core file, you should verify
that your environment allows full core dumps. This entails two steps:
ulimit -culimit -c unlimited lsattr -E -l sys0 | grep fullcorefullcore true. If not, the sysadmin
needs to enable full core files through smitty System
Environments->Change / Show Characteristics of Operating System->Enable full CORE dump.
OC Systems spends a surprising amount of time helping users get licensing set up on their machines. Here are a few of the most common questions and answers.
This is a decimal format key for use in the prompt that appears during installation. It is a single text string with no blanks or line breaks .
APROBE/licenses/license.dat
.This is human-readable format key, and can't be used at the prompt that appears during installation.
APROBE/licenses/license.dat
.When there is already a license server running on the machine and you want to start another one just for Aprobe, here's how to do it. This should applies to all Unix hosts.
The procedure for running a second license server on the same hostis very simple.
When you are issued a concurrent-use license for Aprobe, it will include a line like the following:
SERVER my.server.name 000347b371fe
You should amend this line by adding a third parameter to the SERVER directive, which will be the port the license server will listen on for client requests. The server and clients all read this same file. The default port for FlexLM is 27000 but any available port number can be specified. One convention is to use the next available port higher than 27000, for example:
SERVER my.server.name 000347b371fe 27001
This parameter is the only one needed to support multiple flex servers on the same host.
These instructions apply to any services which need to be started at boot time on AIX, not just
lmgrd
.
The details of these instructions may or may not be applicable to your own situation, depending on the exact configuration of your systems. You should consult your local policies and support organizations and convince yourself that the suggestions made here are appropriate before putting them into practice.
That said, this is fairly basic stuff.
We will use the '
mkitab
' command to add an entry to the
/etc/inittab
file. This command is used in place of simply editing inittab,because it helps to insure that the integrity of inittab is maintained. If you were to make even a small error while editing inittab, the system may become unbootable. The
mkitab
command helps to alleviate this risk.
With root authority, execute the following command:
mkitab -i rcnfs rclocal:2:once:/etc/rc.local
This adds an entry to
inittab
immediately following the '
rcnfs
' entry, which instructs the init program to run /
etc/rc.local
and not wait for it to complete before proceeding with the rest of system initialization. Thus, you will probably be able in
/etc/rc.local
to take advantage of services which may only be available on NFS filesystems (again, depending on the exact configuration of the system we are installing on, which may not even mention
rcnfs
, in which case you would need to determine the correct point to add your local startup script).
You should then create the
/etc/rc.local
file, set its execute permission, and add to it the appropriate commands to start
lmgrd
and log its output, as well as whatever other site-specific initializations you may need to perform, not limited to OCS products.
Reboot the system and verify correct operation.