Frequently Asked Questions for RootCause and Aprobe (All Platforms)
Updated Feburary 1, 2007
This document describes aspects of the products "RootCause"
and "Aprobe" from OC Systems, Inc. (www.ocsystems.com):
It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.
More complete and detailed descriptions of RootCause and Aprobe are provided by the User's Guides for those products, but this FAQ may provide answers not easily found there, and also includes specific code examples not applicable to a general User's Guide.
RootCause is built on Aprobe, a fully general mechanism for applying patches to programs without changing source or object code. See "What is Aprobe?" for more information.
Users are encouraged to send questions (and answers!) to .
This FAQ is Copyright (c) 2007 by OC Systems, Inc. ALL RIGHTS RESERVED.
This FAQ applies to all platforms, and some answers apply only to specific platform, so read carefully. To avoid excessive repetition, the Unix form of a command or path is used where it may apply to multiple targets. For example, paths to files are given in Unix format using forward slashes, environment variables use Unix format, and Windows users should read .dll where filenames end in .ual (see Q12.23 ).
malloc() listed as being LOAD_SHED in the
Trace Display when it really wasn't?
CoInitializeSecurity() when running under RootCause?
rootcause register xxx.dply?
RootCause is a tool for developing and deploying traces that act as a software "flight recorder", simplifying and speeding root cause analysis, as well as proactively monitoring the health and performance of the application. It can also be used to repair applications in the operational environment without rebuilding or reinstalling the software. RootCause is based on Aprobe (see "What is Aprobe?" ) but steps beyond Aprobe in a number of important ways:
This FAQ addresses questions that apply to these aspects of RootCause. The full power of Aprobe is delivered with RootCause, and is addressed by the Aprobe FAQ.
See also "What is Aprobe?" .
It's a long list. Here are just some of the uses of RootCause:
For a more in-depth discussion of some of these, see the
RootCause white papers
.
RootCause has several facets which apply to different classes of users. Technical support personnel will use it to gather information about a product in the field. Developers will use RootCause to develop traces that the support personnel can use, or which the developers themselves may use to track down problems. Testers might use it to gather data to provide back to developers to supplement test results.
There is RootCause for Java and RootCause for C/C++ . Support for both languages may be enabled to support mixed applications.
RootCause for Java supports tracing J2EE applications such as Sun iPlanet and AS7 , BEA WebLogic, JBOSS, and Tomcat applications. See "RootCause J2EE Support" for more information.
RootCause is currently available on Windows 2000; Windows XP; Sun Solaris (Sparc only), AIX version 5.1 or newer; and Red Hat Linux 7.1 or newer (x86 only). RootCause does not yet support 64-bit applications on any platform, though it _does_ support 32-bit applications running on 64-bit operating systems.
The detailed requirements are documented in Chapter 2 of the RootCause User's Guide for
Unix
or
Windows
.
The best way is to send e-mail to , or phone 703-359-8160, extension 3. You can expect a quick response between 9am and 5pm Eastern US Time.
Yes, in general, but the details differ between Unix and Windows:
Unix: Only if you want to apply probes to native code. You can trace Java and native code, and dump Java parameters, without a C compiler. However, the only thing you can do with native code is trace it; you can't dump parameters or variables or generate probes (e.g., SNAPSHOT or COMMENT) because those are implemented by generating APC source code and then compiling it with Aprobe's apc compiler, which requires a C compiler backend.
Windows:
Everything for Unix above is true for Windows, plus:
(a) the compiler must be Microsoft Visual C++; and
(b) if the program was compiled with
Visual C++ 6 (or Visual Basic 6) it can't even be traced, because
RootCause relies on a DLL that's part of those products which we're
not allowed to distribute.
Starting with version 2.1.1 of RootCause you can trace Visual C++
(VC7) programs
For VC6(VB6) programs RootCause needs MSVC++ to be installed to
provide the (non-redistributable) mechanism to read symbol information
from PDBs. Without MSVC++ installed only symbol information stored
in the executable or in DBG files can be read, plus the exports symbols.
In version 2.1.1 of RootCause an environment variable can be set
to enable the use of the new mechanism to access symbol
information in PDB files for VC6(VB6) programs. Set the
environment variable APROBE_USE_DIA=1 to enable this (experimental)
feature.
RootCause is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are available for pre-sales evaluation.
RootCause for C/C++, RootCause for Java, and the RootCause Agent (run-time) are licensed separately. Licensing is enforced on a per-user basis or per-CPU basis with FlexLM. Contact our sales department for more information at .
If you already have a license but it's not working for you, see "Licensing" or "How do I get technical support?"
Explicit support is provided for C, C++ and Ada. Functions written in Assembler will work to the extent that they adhere to standard calling conventions.
Functions written in other high-level languages (e.g., Basic, Fortran, Pascal, JOVIAL) may also be probed if the probe doesn't reference source-level identifiers ("target expressions"). Contact if you have a favorite.
Almost any program with symbols can be probed. The "full support" described below is based on the debug information needed for source lines and target expressions. Support for additional architectures, operating systems and compilers is always in progress, so please contact if you don't see what you need here.
Aprobe supports the Microsoft Visual C++ development system versions 6 and 7
but does not support .NET (Dynamic Runtime Model) applications.Aprobe supports any IBM C or C++ compiler that runs on AIX 4.2 or newer. There is partial support for gcc and g++ versions 2.95.x, and for gcc versions 3.x compiled with -gstabs+. If your program is Ada, Aprobe supports OC Systems' PowerAda, and (starting with version 4.4.2) GNATPro 5.04.
The C and C++ compilers supported are Sun WorkShop C++ compiler versions 4.2 and higher (Forte) and gcc/g++ compilers before version 3. If your program is Ada, Aprobe requires GNAT version 3.15 or 3.16.
The C and C++ compilers supported on Linux are gcc and g++ versions 2.95.x and 3.x. See also Q1.14 . If your program is Ada, Aprobe supports only PowerAda on Linux and AIX. (GNAT is supported only on AIX and Solaris.)
No, but for non-Java programs it helps. The suggested compromise is to build it with debug, develop your traces, then strip the debug information when shipping the product. This is fully discussed in Chapter 6 of RootCause for C++ User's Guide, "Building a Traceable Application".
RootCause has many unique features which require a unique terminology to describe. See the glossary in Chapter 3 of the user's guide for their definitions. Some basics are:
The part of the RootCause product which actually applies and enables the probes, also known as the Aprobe runtime.
The Graphical User Interface (GUI) used for developing probes , and viewing the data logged by them.
verb
: to efficiently record data into a memory-mapped file for later viewing.
noun
: the RootCause log, a list of all programs run with "rootcause on".
Programmatic actions to be inserted and executed at specific points in the probed application.
gcc/g++ 3.x is fully supported on Linux.
Support for GNAT 5.x, and for gcc/g++ 3.x on other OSes is not currently scheduled.
No. See "Is there any way to attach with Aprobe to a running application?" .
We have provided support for older versions of Java for specific customers: please contact us if you have a specific need.
Some of our probes, most notably java_memstat make use of the JVMPI debugging interface, which has turned out to be unreliable in earlier versions, and which has been eliminated entirely in Java 1.6. See the Memstat documentation for a detailed description.
An "agent installation" is the installation of the "RootCause Agent", a small subset of the product that allows one to run probes developed using the RootCause Console.
Note that this prompt is gone starting with RootCause 2.1.1: the agent is now just a self-installing file %APROBE%\deploy\RootCauseAgent.exe.
RootCause is a superset of Aprobe, and in fact shares the same installation script. You can choose a different name if you like.
Because probes on C/C++ (and Ada and other compiled languages) need to be compiled with a user-supplied C compiler, and the installation script has to know whether to check/prompt for that.
No. RootCause for C/C++, like Aprobe, requires a C compiler to build the probes. This is not provided with RootCause because it's assumed customers have one. If you don't, gcc is fine, and OC Systems can help you download and install it.
Yes. Just enter an empty string, ignore the warnings you may get, and then put the license key into the file
license.dat
in the
licenses
directory under the RootCause installation directory before you start using RootCause. See also
"Licensing".
On RedHat, the Korn shell is provided by the pdksh package. This is on the install media, but not usually installed unless you install everything or specifically request it. The pdksh RPM can be downloaded from the RedHat ftp site. Choose the appropriate link for your version of the RedHat Distribution:
Note that Linux RootCause version 2.2.2 (Aprobe 4.4.2) no longer requires ksh to install: the install script is finally bash-compatible!.
rootcause open
?Because the RootCause Console interface is in Java, and the default selection of fonts does match what's in your X-windows font path. This problem usually only happens when using older (pre-8) versions of Solaris. See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide.
You must be using an older (pre-8) version of Solaris, which requires an older (pre 1.4) version of Java to be used, which doesn't directly support this. Same for default buttons on dialogs. Additionally, on Unix you will find that the 'Copy' operations from various RootCause windows such as Trace Events don't show up in your X-Windows clipboard.
See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide for details, but the quickest fix is to start the X-windows application "xclipboard". When you copy something to the clipboard from Java, it will appear in the xclipboard window. You can then select it there and middle-click to paste elsewhere.
Just right-click farther up on the screen so there's room for the whole menu. The Java popup menu behavior is separate from the selection of the item on which it works. So once you've selected an item with a left -click (MB1), you can right-click anywhere in the window to see the context menu for that selected item.
Yes, you can point your browser (Netscape, Mozilla, Internet Explorer, etc.) to
$APROBE/html/rcguihelp.html
(where $APROBE is the value of the APROBE environment variable, the root of your RootCause installation.) However, the Help operations won't update that automatically -- you'll have to use your browser's Find operation.
However, note that Chapter 8 of the RootCause User's Guide is pretty much identical to the On-line help, and is cross-referenced with the rest of the user's guide (see Q1.8 ).
No. The RootCause Console must be run on the same kind of platform (AIX, Linux, Solaris, Windows) as that on which the data is collected, both for defining the trace and for viewing the data. The format of the deployed workspace and of the collected data is platform-specific.
The problem is that these emulators just don't support Java well. There are some hints in the user guide but it's still not very usable. Our advice: use VNC. It's so much better in every way, and it's free. You may download both the client and server from RealVNC. These sites explain it better than we could here, but if you need assistance feel free to .
Yes. The demo program that we beg everyone to do first shows exactly how to set this up and create a default workspace. There's one for Unix and one for Windows.
However, since you asked so nicely, here's what you do:
rootcause on in a window where you'll start your app.Yes, you should be able to trace anything. If you find one that you cannot trace, please report it as a bug. However, most executables that are part of the system have no symbolic information, so you cannot see functions in the executable itself. You can get functions in shared libraries/DLLs that are loaded, and use the predefined UALs without symbols and debug information.
Some programs like Java 1.4 and Netscape "fork and exec themselves" so these are distinct processes. You generally want the second one, since the first probably set up some things missing from the environment and then tried again.
There could be a number of reasons:
rootcause status
command.rootcause log -s.
Then choose a bigger number, say 20000, and
run rootcause log -s 20000 (see Q4.8).
You can clear out the current log contents with:
rootcause log -Z (see Q4.6).
rootcause register -l
from the command-line) and look at the verbose setting near the top of the output, and see if it's missing or off. To enable it on Windows, run the DOS command
rootcause on verbose
. On Unix:
rootcause register -s verbose
.
libapaudit.so
) from being loaded from its default, non-secure location. See "SetUID Applications" in Chapter 10 of the RootCause User's Guide.In all but the first case, you'll have to run the program again with "rootcause on" for it to show up in the RootCause log.
When you start a program, that may start a shell script. Korn shell, C shell and others can have associated "rc" files (e.g.,
~/.kshrc
,
~/.cshrc
), which run some commands. Then the script itself may run some commands to evaluate the environment. Then the program itself may start some processes (e.g,. by using
CreateProcess()
or
system()
) to do some tasks. You can learn amazing stuff when you use RootCause even without tracing!
Yes, by turning verbose logging off. This is done on Windows with the DOS command
rootcause on quiet
and on Unix with:
rootcause register -s verbose -e off
Also, on Unix, you can set the environment variable APROBE_LD_AUDIT_VERBOSE=FALSE in a shell and it will disable logging of all commands started in that shell and its subshells. This trick is used by the
rootcause_status
script.
There's currently no way to do this from the Console. From the command line:
rootcause log -Z
. Then do
File->Refresh
to see everything disappear.
Yes, it wraps so that it doesn't get huge. The default size is 100000 bytes. You can use the
rootcause log -s
command to query and change the size in bytes (there's no access to this from the Console). For example:
# show the log size:
rootcause log -s
100000
# set the log size to 20000 bytes:
rootcause log -s 20000
Yes, using the APROBE_HOME environment variable (supported starting with version 2.0.5). The value of this environment variable, if set, use used instead of the defaults (%USERPROFILE%\.rootcause on Windows, $HOME/.rootcause, .rootcause_aix, or .rootcause_linux on Unix). On Unix, this directory is where the RootCause Log and RootCause registry reside, so if you want these files accessible system-wide you should set APROBE_HOME to some central, writable location.
Yes. Edit the "preferences" file in your APROBE_HOME directory (see Q4.8)and change
<start_with_log value="true"/><start_with_log value="false"/>
You'll nearly always want to click Yes, which means "use this workspace to trace this application next time you run the application with RootCause on". You might click No if you don't want to trace that application with RootCause yet, or if you want to keep tracing it with a different workspace with which it's already registered. When in doubt click No: you can always use Workspace->Register Program to do it later.
It means that something has been changed or added that must be recorded when the workspace is saved. You can ignore it.
See Chapter 8 of the User's Guide, which fully describes the Console GUI. Also, look for a file in
$APROBE/probes
(
%APROBE%\probes
on Windows) with the same name and suffix ".apc" and you'll see the details of its implementation. This doesn't apply to X.trace.ual, which is custom for each workspace.
This means "Unknown Source File", probably because no debug information was found. Look in the Messages pane of the Workspace browser window for messages about debug information. You can still trace entry and exit to these functions, and can write custom probes that get data without using debug information.
The dots are there to act as a "path" to help you find the traces and probes you've defined.
A black dot indicates an entry/exit trace of the marked function, method, file, class, or directory. Functions and methods marked with black dots are represented by equivalent entries in the Wildcards dialog, and are implemented by entries in the trace.cfg file in the workspace.
A blue dot indicates a probe or data trace in the marked function, method, file, directory, or class. These actions are not mapped to wildcards, and are implemented by compiled APC for C functions.
You must add the library to the workspace, and then it will show up in the Trace Setup window. To do this, select Add Dynamic Module... from the Workspace menu. If the module changes, you must do Reset Dynamic Module .
"Don't Trace..." will remove the black dots from the subtree it applies to, meaning those methods and functions won't have their entry and exit traced. "Remove Probes..." will remove the blue dots, meaning specific Probe and Data logging actions will be removed.
The easiest way is to copy it into the workspace. You can also use Add UAL, and you'll need to do that if it takes parameters and other complications, but that's a bit more advanced: see Chapter 8 of the User's Guide or contact .
This should happen only on Unix. There, for improved usability (at a customer's request), functions whose names match certain patterns are filtered from the list. This list can be changed, replaced or nullified, though this is not documented.
The filtering is defined by the patterns in the file
$APROBE/arca/trace_filters
. See the commentary at the top of that file for complete information.
Could it be you set APROBE_SEARCH_PATH after you started the GUI? If so, quit RC and restart it so it can pick up the env var.
The easiest way is to create a ".h" file that contains prototypes for the functions that you want. RootCause will automatically compile and use the "debug information" in that file so, for example, you can see the parameters in the setup window of the Console or reference them by name in the custom apc that you write.
To do this:
$APROBE/shadow/t.exe.h and the user-local
one is $APROBE_HOME/shadow/t.exe.h. On Windows,
this is as you would expect: %APROBE%\shadow\t.exe.h
and the user-local one is %APROBE_HOME%\shadow\t.exe.h.
See Question 4.8 about APROBE_HOME.Placing the .h file in $APROBE/shadow would make it available for all invocations of RootCause, whereas the other two locations would be more user specific. Note that RootCause will search the directories in the opposite order of their listing above, so a.out.h in the .rootcause directory will be used instead of a.out.h in the $APROBE directory. (Analogous for Windows.)
You can see an example of this by doing a directory of the $APROBE/shadow/*.h (or %APROBE%\shadow\*.h). RootCause uses this feature to provide parameter information for some of the system shared libraries.
Make sure that you have a supported C compiler available, as this is needed to compile the .h files. (You may not have a supported C compiler if you installed RootCause as Java only and now want to do C probing; contact to add the C capability.)
This is called a "Trigger" and has been a feature of the Aprobe-level trace all along. It was added as a Probes action in the Trace Setup dialog in version 2.1.3a (April 2004). It works like this:
You should see the function or method to which you applied the Trigger action at the top of each traced call tree in your trace, and nothing outside of that (even if you selected it for tracing).
You can check whether trace is enabled with the
ap_RootCauseTraceIsEnabled macro. For example:
if (ap_RootCauseTraceIsEnabled)
{
printf ("Enabled\n");
}
else
{
printf ("Disabled\n");
}
Disabling your probe independently from Trace is covered in the "Disable Probe" example (Windows: %APROBE%\Examples\Advanced\Disable_Probe; Unix: $APROBE/examples/learn/disable_probe).
You can't. This is deliberately designed to remain active even after trace is disabled. We do deliver source for the probes so that users can customize their behavior. In this case it would be a simple matter of putting the "if (RootCauseTraceIsEnabled)" check (see Q6.10) around the code in the "ExceptionHandler" routine within $APROBE/probes/exception.apc, recompiling it, and either using a local copy or overwriting $APROBE/ual_lib/exception.ual. (On Windows, %APROBE%\probes\exception.apc and %APROBE%\ual_lib\exception.dll, respectively.)
After you run through your test, format the APD files with Examine. The tree will reflect the trace path from point A to B. At the end is a summary call tree with call times in it. Or you can look at the performance table node (right click and choose show associated table) to see a table.
There are two possibilities, but the most likely (on Solaris) is that the traced function is a compiler-generated one that is explicitly filtered from the Trace Setup list, but which is covered by the "wildcard" trace used when you do "Trace All Child Nodes" from the Trace Setup module node. See Q6.6 .
The other possibility is that the event was introduced by some other custom probe, such as a J2EE trace. See Q7.2 .
Probably because the events didn't originate in the Trace Setup, but were introduced by a supplementary J2EE trace. Still, you should be prompted to add the containing class, and so be able to define traces on it.
Yes, RootCause has a concept of a source file path. There are a number of ways to set this:
If you click on a method, the first time it will ask if you want to find the source. If you browse and select the source file, the enclosing path is automatically added to a list. If the end of the Java path matches the package path of the class, the "root" of the package path is added also.
You can edit the path directly off the RootCause Setup menu.
We'll pick up an environment variable APROBE_SEARCH_PATH when the RootCause Console starts.
Of course it depends on the vendor, but the best thing to do is to send them what you would want your customers to send you: text with as much pertinent information as possible. If the trace contains enough information for you to determine where the problem is, then the other piece of information they would want is the system configuration, as collected with logenv.ual.
To create the bug report, you could do File->Save As Text from the Trace Display window; then edit the resulting text file to include the program and system configuration and the tracebacks and execution information that identify the problem; then e-mail the result, indicating it was collected with RootCause. (They might have RootCause also, and ask you to re-run to collect additional information).
Yes. The JVM routinely uses signal 11 (perhaps for extending the stack) and signal 4 (illegal instruction -- not sure what that's for). These can show up in the trace and are fine. Later versions of the JVM provide options for reducing its use of signals; you can search java.sun.com for details.
The JVM implements the synchronization on the calling side rather than on the callee side. Once you are inside the method's code, the lock has already been grabbed. This means that the time you see is after the synchronization.
For instance, I have a test that calls a synchronized method from a thread's run method:
try
{
Thread.sleep (1000);
parent.synchronizedMethod (); // Line 15
}
catch (InterruptedException e)
{
e.printStackTrace ();
}
If I trace lines and have things set up so another thread is within synchronizedMethod(), I see something like this:
Line 15 10.45.00 ; Waiting ...
synchronizedMethod entry 10.46.00 ; Got it ...
malloc() listed as being LOAD_SHED in the Trace Display
when it really wasn't?
Because it was attempted to be load-shed, which recorded it as such,
but the actual disabling of the probe was disabled by another UAL's
explicit request, using #pragma nopatchcount.
The confusion comes from the fact that load shedding may mean two things:
Since we don't want (1) to happen for allocation/deallocation
routines when running memstat, these patches could not be disabled.
This was indicated by using #pragma nopatchcount in
combined_memstat.apc.
However, when traced these routines will get load shed just like everything else, and the LOAD_SHED event and appearance in the table indicate that (2) has happened. So this is pretty much "as designed".
If you explicitly mark the function as, "Do Not Shed", it will no longer show up in the table.
You are hitting the limit on the maximum number of items displayed in the trace display. You can either reduce the size of the APD files, reduce the number of APD files selected or increase the limit at the expense of longer processing times and higher memory overhead. I would try the last one first and if this works for you, great. The option is "Maximum number of events in Trace Display" and is described here. Briefly:
The values are recorded per-user, so must be set for each user in the user preferences file: $APROBE_HOME/preferences on Unix, %USERPROFILE%\preferences on Windows.
Yes, but only in RootCause (see below). It is not quite legal XML and so will be rejected by general XML viewers. (If you think this is an important feature, let us know.)
To import saved XML back into RootCause again, you have to set the environment variable RC_ENABLE_LOAD_XML to a nonempty value before starting the RootCause GUI. If you've done this, you will then see the menu item Examine XML File... in the Analyze menu in the RootCause Main menu. Clicking this menu item will open a file selection dialog from which you can select an XML file. This must be a file previously saved from RootCause Trace Display using File->Save As XML. When you click the Examine XML Output button in this dialog, you will then see a Trace Data Dialog in which one of the checkboxes is the name of your XML file. Check it, and click Open, to view the Trace Display.
Under the View menu, click Statistics Filter.... This dialog is used to create a "filtered" copy of the statistics summary tree. The copied tree will be added to the end of the event tree and will identify what filter was used. You specify a statistic to use (Wall time or CPU time, if collected) and a threshold percentage to create the "filtered" copy. A child node in the summary tree will only be copied to the new tree if the child's statistic value is at least the given percentage of the parent's statistic value. Choose "None" to create an exact copy. The threshold must be a numeric percentage between 0 and 100.
No, these are actual times. You can specify overhead values by clicking View->Statistics Overhead. This opens the Set Statistics Overhead dialog. You'll see an options menu from which you can select the statistic to adjust, and type-in fields for the normal (native) call overhead and the Java overhead (which is generally bigger).
Note you must each statistic separately, for example:
None and change it to Wall TimeNone and change it to CPU TimeWhen you've completed setting overhead values, you must regenerate the data:
As described in Q7.11, you can specify tracing overhead to be applied to times shown in the Trace Events details. But what number should you put in there? The answer depends on a number of factors, including your hardware and OS speed, whether you're dumping parameters, and whether it's Java or native code. A good guess is the minimum time you see in the entire tree for that kind of call, or if that seems to big, you can instrument some do-nothing function and see what its time is. This value would be the overhead for every call, and you can use that.
The nodes look like:
ENTER Factor::addWidgets()
time = 2004-05-03 16:32:10.079965024
process = 15193, thread = 0 _start()
symbol = "Factor::addWidgets()" IN "$java$", Factor.java
CPU Time 0.428844 ( 0.428844)
Wall Time 0.552496 ( 0.552496)
EXIT Factor::addWidgets()
time = 2004-05-03 16:32:10.632461354
elapsed time = 00:00:00.552496330
process = 15193, thread = 0 _start()
symbol = "Factor::addWidgets()" IN "$java$", Factor.java
The Details pane for each node gives the (wall) time at which the function or method was entered. In addition, any statistics that were being gathered are attached to the ENTER Node. Shown here are the elapsed CPU Time (gathered because the perf_cpu probe was enabled) and elapsed Wall Time. Both were computed on EXIT from this specific invocation. The EXIT node also shows the elapsed (wall) time, which is the same as the Wall Time statistic.
Consider the following node:
Java_Factor_smallestFactor()
process = 15193, thread = 10 _start()
symbol = extern:"Java_Factor_smallestFactor()" in "libFactorJNI.so", /work/JNI/factor.c
Times called: 29
Child calls (native/Java): 4190 / 0
CPU Time (29): 1.248102 ( 1.298730) [99.753%]
Max : 1.231153 ( 1.274449)
Min : 0.000048 ( 0.000072)
Avg : 0.043038 ( 0.044783)
Wall Time (29): 375.135004 (375.185632) [99.998%]
Max : 375.105686 (375.148982)
Min : 0.000043 ( 0.000067)
Avg : 12.935689 (12.937435)
Recall that each node in the Event Summary tree represents a unique
call stack in the execution. The one shown above is for the
native JNI function Java_Factor_smallestFactor() (see
$APROBE/demo/RootCause/JNI).
The function was called 29 times. Those 29 calls together used 1.248102 seconds of CPU Time after overhead adjustment (See Q7.11.) The slightly larger time shown in parentheses after it (1.298730) is the "raw" time before the overhead adjustment. The percentage in brackets indicates that the total CPU time used for this function comprised 99.753% of the total time used by its caller, the parent node in the summary tree (See Q7.10 about filtering based on this percentage.). Of those 29 calls, the longest (Max) took 1.274449 seconds of CPU, the shortest (Min) took only 0.000072 seconds, and the average took 1.248102 / 29 = 0.043038 seconds of CPU.
Yes. Click on a node to select it, then right-click to pop up the context menu, then click 'Save Node As Text' to save the selected node in a text file. This will save the node and its details exactly as it would appear in the 'File->Save As Text..' output. Note that it works only for one node, so if multiple nodes are selected it applies only to the first of those. See also the next question.
Yes. In either the Events tree on the left, or the details in the lower left: Click on a node (or multiple nodes using shift or control keys in the usual way). Then right-click to pop up the context menu, then click 'Copy'. This will put the selected nodes in the Java clipboard. See Q3.2 for how to paste from the Java clipboard on Unix.
Probably because it was Load Shed. This means that it was called so often its tracing overhead became excessive and tracing was disabled for it during the run. It will appear in the Load Shed table, where you can choose to stop it from being Load Shed during the next run.
You can't. The concept of levels is no longer supported. Instead you can apply a Trace Trigger, or disable and enable the trace using the probes tab for a given function.
These are not currently integrated with RootCause. If you can run them from the command-line using Aprobe you should do that. If you wish to use the "RootCause On" mechanism to run them using the workspace, you must add them to the workspace options using the "Setup->Add UAL" menu item. This adds a new UAL "permanently" to the Workspace UAL tree. For example, to add the "memwatch" probe, you would:
This adds "memwatch" to the UAL tree in the Workspace window. You could then check this to enable memwatch on applications run under RootCause. The output of these probes isn't integrated with RootCause, so the output appears as a "Text" node in the Trace Display event tree. You can use "Save As Text" from that display to view it outside of RootCause.
Prior to RootCause version 1.3.3, you would reference these probes using the Aprobe options and Apformat options dialogs (see Chapter 8 of the user's guide), just as you would on the Aprobe command-line. For example, to enable memwatch, you would add
-u memwatch -p -g
as "Additional Aprobe Options" (under Aprobe options in the Execute menu in the Workspace window) and
-u memwatch
in the Apformat options (under the Analyze menu). For probes like profile that require configuration files, you would have to put the full pathname of the configuration file into the options as well, like
-u profile -p -c /testdisk/probes/prog1.profile.cfg
.
Yes, but this is not explicitly supported. In particular, most operations from the RootCause Console overwrite the scripts in the workspace which apply Aprobe to the application. So after you use the Console to create a workspace, you quit, and edit the aprobe.ksh and apformat.ksh scripts (do_aprobe.cmd and do_apformat.cmd on Windows) directly to apply your probes.
Aprobe supports Java with the
apjava
command. Writing custom probes in Java is described in Chapter 11 of the RootCause for Java User's Guide and the nearly-identical Chapter 5 of the Aprobe User's Guide for
Unix
and
Windows
, and if you really wanted to you could do everything from the command line.
There are three ways of adding a UAL to a trace:
Personally I like option b, choosing not to copy the UAL to the workspace. This makes it easy to enable / disable from the GUI.
The events probe is not integrated with RootCause Trace Display, but you can still use it. Here's a quick way to get started, by simply applying events to all Java methods and all native functions in the main module (if any), and letting load shedding reduce overhead.
cp $APROBE/probes/events.cfg MyWorkspace.aws
echo ';event function "*"' >> MyWorkspace.aws/events.cfg
echo 'event function "*::*"' in $java$ >> MyWorkspace.aws/events.cfg
-c $RC_WORKSPACE_LOC/events.cfg
rootcause format -r MyWorkspace.aws > format.txt
Your results are in format.txt. You can then edit the events.cfg file to do more, as shown in Q15.12 , and you can specify an alternate output file so you get the events output while still formatting within RootCause.
Yes, you can leave RootCause on all the time. It takes effect on reboot about the time when per-user preferences get loaded, or when you get prompted for your login id. Check the System event log (run "eventvwr") to get more exact information.
This depends almost entirely on what you do with it. If you trace almost nothing, it will introduce almost no overhead. If you trace every method call on your machine, it will slow things down too much. The keys to good performance are:
You need to copy or soft-link the RootCause "libapaudit.so" library to a "secure pathname" as described in Chapter 10 of the RootCause User's Guide, "RootCause, SETUID, and Security Concerns".
If you're seeing messages like:
ld.so.1: mail: warning: /opt/aprobe/lib/libapaudit.so: open failed: illegal insecure pathname
ld.so.1: mail: fatal: /opt/aprobe/lib/libapaudit.so: audit initialization failure: disabled.
Then the application you're running (like "mail" above, or "ps") has its setuid bit set and is owned by root. Solaris prevents dynamically loading debug libraries on such applications for security reasons. Here's what to do:
rootcause_libpath -c
This copies the libapaudit.so library to secure paths under /usr/lib.
rootcause_off
rootcause_on
ps
to verify it works.
If you still get warnings, you're probably on an early patch level of Solaris 8. Do:
export LD_AUDIT_64 ; LD_AUDIT_64=/usr/lib/secure/64/libapaudit.so
If that still doesn't work, contact OC Systems. Details about probing secure applications on Solaris is documented in Chapter 10 of the latest Unix RootCause User's Guide.
There's no built-in mechanism. It's harder than you think. Here's some custom APC (for Solaris only) that you could compile into a UAL, add to your workspace, and see the modules:
#include <alloca.h>
#include <link.h>
typedef struct
{
ap_NameT ModuleName;
ap_Uint32 StartAddress;
ap_Uint32 Length;
} DynamicModuleDataT, *DynamicModuleDataPtrT;
static void *ModuleKeyGet (void *S)
{
return (void *) ((DynamicModuleDataPtrT) S)->ModuleName;
}
static ap_BooleanT ModuleKeyCompare (void *LeftKey, void *RightKey)
{
return (strcmp ((ap_NameT) LeftKey, (ap_NameT) RightKey) == 0);
}
static DECLARE_HASH (DynamicModuleTable,
ap_StringHashFunction,
ModuleKeyGet,
ModuleKeyCompare);
#if defined(__SunOS_5_5_1)
extern int dlinfo (void *handle, int request, void *p);
#endif
typedef ap_Uint32 (*FindElfSymbolT) (ap_NameT SymbolName, ap_NameT ModuleName);
static int NextModuleId;
static void DynamicModuleFormat (ap_NameT Filename,
ap_Uint32 *StartAddress,
ap_Uint32 *Length)
{
ap_RootCausePrintEventStart ("program_comment");
printf ("Module loaded: %s\n Address span 0x%08x-0x%08x\n",
Filename,
*StartAddress,
*StartAddress + *Length);
ap_RootCausePrintEventEnd ("program_comment");
}
static void RecordDynamicModule (ap_NameT Filename, void *Handle)
{
ap_ModuleIdT ModuleId;
static FindElfSymbolT FindElfSymbolRoutine = NULL;
ModuleId = ap_ModuleNameToId (Filename);
if (ap_IsNoModuleId (ModuleId))
{
DynamicModuleDataPtrT DynamicModulePtr;
Link_map *Linkmap;
// Get the info for this.
if (dlinfo (Handle, RTLD_DI_LINKMAP, &Linkmap) == -1 ||
Linkmap == NULL)
{
ap_Error (ap_WarningSev,
"Cannot not loader info for %s",
Filename);
return;
}
// Is it in the dynamic table already?
DynamicModulePtr = (DynamicModuleDataPtrT)
ap_HashTableLookup (&DynamicModuleTable, (void *) Linkmap->l_name);
if (DynamicModulePtr == NULL)
{
ap_Uint32 ModuleSize;
ap_ModuleIdT NewModuleId;
ap_NameT ModuleName;
ap_NameT ModuleBaseName;
char *DotSoLocation;
int Dummy = 0;
// Find our internal FindElfSymbol routine.
if (FindElfSymbolRoutine == NULL)
{
FindElfSymbolRoutine = (FindElfSymbolT)
ap_SymbolAddress
(ap_SymbolNameToId (ap_ModuleNameToId ("libaprobe.so"),
"FindElfSymbol()",
ap_ExternSymbol,
ap_FunctionSymbol));
if (FindElfSymbolRoutine == NULL)
{
ap_Error (ap_FatalSev,
"Cannot find FindElfSymbol");
}
}
// Add it to the table.
DynamicModulePtr =
(DynamicModuleDataPtrT) ap_Malloc (sizeof (DynamicModuleDataT));
DynamicModulePtr->ModuleName = ap_StrDup (Linkmap->l_name);
DynamicModulePtr->StartAddress = (ap_Uint32) Linkmap->l_addr;
DynamicModulePtr->Length = FindElfSymbolRoutine ("_end",
Linkmap->l_name);
ap_HashTableInsert (&DynamicModuleTable,
(void *) DynamicModulePtr);
// Record it
log (ap_StringValue (Linkmap->l_name),
DynamicModulePtr->StartAddress,
DynamicModulePtr->Length)
with DynamicModuleFormat to ap_PersistentLogMethod;
// Now log it for the format logic to find
NewModuleId.Value = ap_FetchAndAdd (&NextModuleId, 1);
ModuleBaseName = ap_Basename (Linkmap->l_name);
ModuleName = strcpy (alloca (strlen (ModuleBaseName) + 1),
ModuleBaseName);
DotSoLocation = strstr (ModuleName, ".so");
if (DotSoLocation)
{
*(DotSoLocation + 3) = `\0';
}
ap_LogData (ap_IntegerToLogId (LOG_ID_FOR_FORMAT_RECORD_MODULE),
8,
&NewModuleId,
sizeof (NewModuleId),
ModuleName,
strlen (ModuleName) + 1,
&(DynamicModulePtr->StartAddress),
sizeof (DynamicModulePtr->StartAddress),
&(DynamicModulePtr->Length),
sizeof (DynamicModulePtr->Length),
&Dummy,
sizeof (Dummy),
&Dummy,
sizeof (Dummy),
Linkmap->l_name,
strlen (Linkmap->l_name) + 1,
ap_NoName,
strlen (ap_NoName) + 1);
}
}
}
probe thread
{
probe extern:"dlopen()" in "ld.so"
{
ap_NameT Filename = (ap_NameT) $1;
on_exit
{
if (!ap_IsNoName (Filename) && $return != 0)
{
RecordDynamicModule (Filename, (void *) $return);
}
}
}
probe extern:"dlmopen()" in "ld.so"
{
ap_NameT Filename = (ap_NameT) $2;
on_exit
{
if (!ap_IsNoName (Filename) && $return != 0)
{
RecordDynamicModule (Filename, (void *) $return);
}
}
}
}
probe program
{
on_entry
{
// Record the number of static modules
NextModuleId = ap_NumberOfModules ();
}
}
The following steps should allow you to use RootCause to trace activity in several of the daemons on your Linux system:
RootCause keeps a log file and a registry as defined by the APROBE_LOG and APROBE_REGISTRY environment variables. These are generally set on a per-user basis by the Aprobe setup script, based on the user's $HOME environment variable or on the environment variable APROBE_HOME if that's defined. The default location for these files is a hidden directory under a users home directory called ".rootcause". When RootCause intercepts a program that is starting up it looks in the user's registry to see if this program should be instrumented. If so, there will be an associated workspace file named in the registry. By changing the APROBE_HOME environment variable before running setup, you can change the locations of the log and registry. Note that these files have to be writable by all processes that access them.
Daemons like
sshd
are started on Linux using a shell (bash) script located in
/etc/init.d
. For
sshd
the file is
/etc/init.d/sshd
. If you edit this file you will see a subroutine named "start". Not surprisingly it is this subroutine that we want to add a few statements to setup RootCause to intercept
sshd
.
sshd
:We recommend that you create your workspace on a disk local to the machine that will be running the intercepted program on. Create it in the same way we did today, that is using the "new" pulldown menu on the main RootCause screen.
These files are probably in
$HOME/.linux_rootcause
. They are named: "registry" and "rclog". You can specify a different location using the APROBE_HOME environment variable (see
Q4.8
) but be sure to run "setup" after setting APROBE_HOME and make sure the protections of the resulting files are correct.
/etc/init.d/sshd
script.
You should probably make a copy of the
sshd
file before you modify it so you can restore it when you are finished tracing sshd.
/etc/init.d/sshd
script to setup aprobe:Find the start subroutine in the /etc/init.d/sshd file and insert the following four lines after the "do_dsa_keygen" line:
export APROBE_HOME=directory identified in step 2
. aprobe_root
/aprobe/setup
. $APROBE/bin/rootcause_enable
sshd
daemon.
As root and with your current directory as
/etc/init.d
execute
sshd stop
sshd start
You should see a stopped message from the stop and some output indicating that rootcause has started from the start message. You may get a "FAILED" message from the start. On our system even when we get the failure message the daemon seems to start with no problems. So I think you can ignore this message.
Tracing the libcrypt.so library was interesting, you can really see the ssh protocol flow as it generates keys and such.
The technique outlined above should work for many of the daemons on Linux.
Once you've used Aprobe to investigate the behavior of processes on a running machine, there is nothing particularly complicated about doing the same for system processes while the machine boots, but there are a number of special factors to take into account. These are listed below, and an example given of how we applied these to one of our machines.
The techniques described here were tested on a Solaris 6 box, but should be equally applicable to more current installations.
. /opt/aprobe/setup
rootcause_libpath -c
/sbin/rcN
". The execution of these scripts is described in
/etc/rcN.d/README
, for N = 2 or 3. Since RootCause depends on an environment being defined, we need to 'source' some scripts into this command so the environment is defined when servers and daemons are started. I did this by creating files in /etc/rc3.d. If you look at the README and
/sbin/rc3
script, you should see how this works.
/etc/rc3.d directory.
Defines the APROBE_HOME environment variable where the logs and registry are stored:
APROBE_HOME=/opt/aprobe_home
export APROBE_HOME
Is a soft link to the setup script in the RootCause installation directory:
ln -s /opt/aprobe/setup /etc/rc3.d/K01RootCause.sh
contains the command to enable intercept:
. rootcause_enable
Normally, scripts whose names start with 'K' are used to shut down processes before others are started, but we will take advantage of the fact that these are executed first to ensure that the RootCause setup is performed before anything else.
Yes. However, there are a couple unique things about tracing System Services that you need to keep in mind: