RootCause FAQ

From OC Systems Wiki!
Revision as of 21:47, 9 May 2017 by Swn (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

RootCause FAQ

Frequently Asked Questions for RootCause (All Platforms)
Updated May 2017

This document describes aspects of the "RootCause" product from OC Systems, Inc. (www.ocsystems.com):

It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.

More complete and detailed descriptions of RootCause are provided by the User's Guides for those products, but this FAQ may provide answers not easily found there, and also includes specific code examples not applicable to a general User's Guide.

RootCause is built on Aprobe, a fully general mechanism for applying patches to programs without changing source or object code. See "What is Aprobe?" for more information.

Users are encouraged to send questions (and answers!) to support@ocsystems.com.


Note to Windows and Solaris Users:

The last updates to RootCause/Aprobe for the Windows and Solaris platforms were version 2.1.4b/4.3.4b in mid-2006. Support for these platforms was officially dropped in 2011. A recent update of this FAQ has removed all questions and answers that are specific to those platforms. If by some unlucky chance you're still using them, here is the [rc_aprobe_faq-2007.html old version of the FAQ].


Note to 64-bit RootCause/Aprobe Users:

Whereever you read APROBE in the questions and answers below, replace with APROBE64. Different file names and environment variables must be used to allow both 32- and 64-bit versions to co-exist.


This FAQ is Copyright (c) 2017 by OC Systems, Inc. ALL RIGHTS RESERVED.

1.1 What is RootCause?

RootCause is a tool for developing and deploying traces that act as a software "flight recorder", simplifying and speeding root cause analysis, as well as proactively monitoring the health and performance of the application. It can also be used to repair applications in the operational environment without rebuilding or reinstalling the software. RootCause is based on Aprobe (see [#q12.1 "What is Aprobe?"] ) but steps beyond Aprobe in a number of important ways:

  • RootCause provides a GUI "Console" which supports the development of traces and other actions for data collection and modification and viewing of the resulting data.
  • RootCause provides a mechanism for identifying all processes started, and applying Aprobe to designated process as they are run in their "natural" environment.
  • RootCause provides a mechanism for packaging and "deploying" a set of actions to a remote machine, and collecting the resulting data for offline viewing.
  • RootCause does all of the above for Java as well as C/C , and supports mixed applications seamlessly.

This FAQ addresses questions that apply to these aspects of RootCause. The full power of Aprobe is delivered with RootCause, and is addressed by the Aprobe FAQ.

See also [#q12.1 "What is Aprobe?"] .

1.2 What are some potential uses of RootCause?

It's a long list. Here are just some of the uses of RootCause:

  • Performing root cause analysis after an application failure.
  • Identifying the cause of an application's incorrect operation.
  • Resolving performance bottlenecks.
  • Monitoring the ongoing health of an application and alerting engineers to problems before significant deterioration in performance occurs.
  • Repairing an application in the operational or test environment quickly, without having to rebuild, recompile, or reinstall the application.
  • Obtaining information about how beta users are testing an application; finding out what features are used and how they are accessed.
  • Integrating software applications.
  • Identifying the specific application or component which is causing a problem.
  • Tracking down memory usage problems.
  • Replacing or enhancing problem reports with execution details and dumps.
  • Monitoring compliance with an SLA.
  • Obtaining information about an application's execution when it isn't possible to replicate the user's environment.

For a more in-depth discussion of some of these, see the RootCause white papers .

1.3 How do I get started quickly with RootCause?

Do the Demos in chapter 5 of the User's Guide.

1.4 Who can use RootCause?

RootCause has several facets which apply to different classes of users. Technical support personnel will use it to gather information about a product in the field. Developers will use RootCause to develop traces that the support personnel can use, or which the developers themselves may use to track down problems. Testers might use it to gather data to provide back to developers to supplement test results.

1.5 For which platforms is RootCause available?

RootCause is available for 32-bit and (separately) for 64-bit executables on AIX and Linux (x86) platforms. (There is no longer a distinction between the Java and C/C versions.)

The detailed requirements are documented on the System requirements page.

1.6 How do I get technical support?

The best way is to send e-mail to , or phone 703-359-8160, extension 3. You can expect a quick response between 9am and 5pm Eastern US Time.

1.7 Do I really need a C compiler to use RootCause?

Only if you want to apply probes to native code. You can trace Java and native code, and dump Java parameters, without a C compiler. However, the only thing you can do with native code is trace it; you can't dump parameters or variables or generate probes (e.g., SNAPSHOT or COMMENT) because those are implemented by generating APC source code and then compiling it with Aprobe's apc compiler, which requires a C compiler backend.

1.8 What documentation is available for RootCause?

The on-line user's guide is available here.

RootCause is delivered with a User's Guide in HTML and PDF formats.

1.9 How is RootCause licensed?

The RootCause Console is licensed per-concurrent-developer. RootCause Agent (run-time) licenses may also be purchased to allow deploying probes outside the development environment. Licensing is enforced on a per-user basis or per-CPU basis with FlexLM. Contact our sales department for more information at .

If you already have a license but it's not working for you, see [#licensing "Licensing"] or [#q1.6 "How do I get technical support?"]

1.10 In what language(s) can my program be written?

Explicit support is provided for Java, C, C and Ada. Functions written in Assembler will work to the extent that they adhere to standard calling conventions.

Functions written in other high-level compiled languages (e.g., Fortran JOVIAL) may also be probed if the probe doesn't reference source-level identifiers ("target expressions").

1.11 What compiler(s) must have been used to compile my native application program?

Almost any program with symbols can be probed. For "full support" (for referencing source lines and variable names and handling exceptions) you must use one of the compilers listed for each platform on the system requirements page. Here's a summary:

AIX

  • RootCause/Aprobe supports any IBM C or C compiler that runs on AIX 5.2 or newer
  • gcc/g support is no longer supported but there is partial support for gcc and g versions 2.95.x, and for gcc versions 3.x compiled with -gstabs .
  • if your program is Ada, OC Systems' PowerAda, and (starting with version 4.4.2) GNATPro 5.04 are supported.

Linux

  • RootCause/Aprobe supports Linux x86 gcc and g at whatever version is shipped with generally available Red Hat Enterprise Linux and Gnat Ada releases.

1.12 Do I need to build the program with debug to trace it?

No, but for non-Java programs it helps. The suggested compromise is to build it with debug, develop your traces, then strip the debug information when shipping the product. This is fully discussed in Chapter 6 of RootCause for C User's Guide, "Building a Traceable Application".

1.13 What do these terms mean: probes, console, agent, logging, etc.?

RootCause has many unique features which require a unique terminology to describe. See the glossary in Chapter 3 of the user's guide for their definitions. Some basics are:

agent

The part of the RootCause product which actually applies and enables the probes, also known as the Aprobe runtime.

console

The Graphical User Interface (GUI) used for developing probes , and viewing the data logged by them.

log

verb : to efficiently record data into a memory-mapped file for later viewing.
noun : the RootCause log, a list of all programs run with "rootcause on".

probes

Programmatic actions to be inserted and executed at specific points in the probed application.

1.14 Is there any way to attach with RootCause to a running application?

No. See [#q12.29 Q12.29].

1.15 Why should I update to the current version of RootCause?

Full details are in the README file delivered with each version, available from the download page.

1.16 What Java (JVM/JRE) versions are supported for use with RootCause?

  • On AIX IBM JVM version 1.5 and 1.6 have been verified and are supported.
  • On Linux RootCause has been tested with Oracle (Sun) Java 5 and OpenJDK. RootCause does not work with gcj.

We have provided support for older versions of Java for specific customers: please contact us if you have a specific need.

2. Installation

2.1 Why does install_rootcause offer to install in a directory called "aprobe"?

RootCause is a superset of Aprobe, and in fact shares the same installation script. You can choose a different name if you like.

2.2 When the Linux installation prompts for a compiler, does it want the one that builds my application?

No. RootCause for C/C , like Aprobe, requires a C compiler to build the probes. This is not provided with Linux RootCause because it's assumed customers have gcc installed. If you don't, OC Systems can help you download and install it.

2.3 The installation process prompts me for a license key, but I don't have one right now; can I continue?

Yes. Just enter an empty string, ignore the warnings you may get, and then put the license key into the file license.dat in the licenses directory under the RootCause installation directory before you start using RootCause. See also [#licensing "Licensing"].

2.4 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?

No. Leave it blank as in [#q2.3 Q2.3] , and see [#q21.2 Q21.2] .

3. The RootCause Console (GUI)

3.1 Why does the command rootcause open fail with Java errors?

There could be a number of reasons. On AIX, RootCause does not include its own Java Runtime Environment (JRE), so if it's not found in your PATH or expected default locations, or if the Java found there has problems, you'll get errors. While Linux RootCause does include a Java 1.4 JRE, it may again be that it doesn't run right on your system for some reason.

In either case, the workaround is:
   export APROBE_JRE=`which java`
That is set the global environment variable APROBE_JRE to the full path to the java command you want to use. This must be a Java 1.4 or newer JRE, for example, /usr/bin/java or /opt/jdk1.5.0_06/jre/bin/java.

3.2 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?

Just right-click farther up on the screen so there's room for the whole menu. The Java popup menu behavior is separate from the selection of the item on which it works. So once you've selected an item with a left -click (MB1), you can right-click anywhere in the window to see the context menu for that selected item.

3.3 Can I just use my Web Browser instead of the built-in Help Viewer?

Yes, you can point your browser (Firefox, Mozilla, Internet Explorer, etc.) to $APROBE/html/rcguihelp.html (where $APROBE is the value of the APROBE environment variable, the root of your RootCause installation.) However, the Help operations won't update that automatically -- you'll have to use your browser's Find operation.

However, note that Chapter 8 of the RootCause User's Guide is pretty much identical to the On-line help, and is cross-referenced with the rest of the user's guide (see [#q1.8 Q1.8] ).

3.4 Can I run the RootCause GUI on Windows to view data collected my Unix system?

No. The RootCause Console must be run on the same kind of platform (AIX or Linux) as that on which the data is collected, both for defining the trace and for viewing the data. The format of the deployed workspace and of the collected data is platform-specific.

3.5 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?

Yes. The demo program that we beg everyone to do first shows exactly how to set this up and create a default workspace.

However, since you asked so nicely, here's what you do:

  1. Start the RC GUI.
  2. Turn RC on by entering rootcause on in a window where you'll start your app.
  3. Run your Java program as you normally do.
  4. Examine the RC log (File->Open RootCause Log).
  5. Search near the bottom and find you Java program APP_START node. If you see two identical ones, choose the second.
  6. Click on it.
  7. Right-click to get context menu.
  8. Choose Open Associated Workspace.
  9. New Workspace Dialog should appear with information filled in so you just click OK.

4. The RootCause Log

4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?

Yes, you should be able to trace anything. If you find one that you cannot trace, please report it as a bug. However, most executables that are part of the system have no symbolic information, so you cannot see functions in the executable itself. You can get functions in shared libraries/DLLs that are loaded, and use the predefined UALs without symbols and debug information.

4.2 Why do I see two identical copies of a program in the RootCause Log?

Some programs like Java 1.4 and Netscape "fork and exec themselves" so these are distinct processes. You generally want the second one, since the first probably set up some things missing from the environment and then tried again.

4.3 Why don't I see the program I want to trace listed in the RootCause log?

There could be a number of reasons:

  • The program you're looking for is really a driver or script that runs another executable of a different name. Investigate this and look for that "real" program in the log.
  • RootCause was not "on" in the environment when the application was run. Use the rootcause status command to check.
  • RootCause was not on at application startup because the application starts at boot-time. See the explanation for [#q9.4 Q9.4], for example.
  • Many other processes have started since the one you're looking for, and the log file "wrapped around". The RootCause Log is a fixed size. When the maximum size is reached, newer entries overwrite older ones. Each entry is variable length, and if you have long command-lines or CLASSPATH values the log may hold fewer entries. The default size of the log file is 100,000 bytes. You may want to make the log file bigger. To see its current size, run the command rootcause log -s. Then choose a bigger number, say 20000, and run rootcause log -s 20000 (see [#q4.7 Q4.7]). You can clear out the current log contents with: rootcause log -Z (see [#q4.6 Q4.6]).
  • RootCause was "on", but the verbose setting was "off". To find out, use Workspace->List RootCause Registry (or rootcause register -l from the command-line) and look at the verbose setting near the top of the output, and see if it's missing or off. To enable it, enter the command: rootcause register -s verbose.
  • RootCause is being turned "off" again before the application starts up. This can happen when there's a wrapper or startup script that resets the environment by changing the PATH and deleting unknown environment variables. In this case you could see these scripts in the RootCause Log around the time when you think the application would be starting -- you can then edit them to turn RootCause back on again.
  • The program you're trying to trace is run using setuid root, which prevents the program intercept library ( libapaudit.so ) from being loaded from its default, non-secure location. See "SetUID Applications" in Chapter 10 of the RootCause User's Guide.

In all but the first case, you'll have to run the program again with "rootcause on" for it to show up in the RootCause log.

4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?

When you start a program, that may start a shell script. Korn shell, C shell and others can have associated "rc" files (e.g., ~/.kshrc , ~/.cshrc ), which run some commands. Then the script itself may run some commands to evaluate the environment. Then the program itself may start some processes (e.g,. by using CreateProcess() or system() ) to do some tasks. You can learn amazing stuff when you use RootCause even without tracing!

4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?

Yes, by turning verbose logging off. This is done on with the command:

rootcause register -s verbose -e off

Also, you can set the environment variable APROBE_LD_AUDIT_VERBOSE=FALSE in a shell and it will disable logging of all commands started in that shell and its subshells. This trick is used by the rootcause_status script.

4.6 How do I clear the RootCause log?

There's currently no way to do this from the Console. From the command line: rootcause log -Z . Then do File->Refresh to see everything disappear.

4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?

Yes, it wraps so that it doesn't get huge. The default size is 100000 bytes. You can use the rootcause log -s command to query and change the size in bytes (there's no access to this from the Console). For example:

# show the log size:
 rootcause log -s
 100000
# set the log size to 20000 bytes:
 rootcause log -s 20000

4.8 Can I locate my .rootcause directory somewhere other than $HOME?

Yes, using the APROBE_HOME (or APROBE64_HOME, for 64-bit RootCause) environment variable. The value of this environment variable, if set, is used instead of the defaults: ~/.rootcause_aix, ~/.rootcause_aix64, ~/.rootcause_linux or ~/.rootcause_linux64. This directory is where the RootCause Log and RootCause registry reside, so if you want these files accessible system-wide you should set APROBE_HOME/APROBE64_HOME to some central, writable location.

4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?

Yes. Edit the "preferences" file in your APROBE_HOME directory (see [#q4.8 Q4.8])and change

<start_with_log value="true"/>


to

<start_with_log value="false"/>

5. The Workspace Window

5.1 Should I say Yes or No to the "Application is not registered with workspace" dialog?

You'll nearly always want to click Yes, which means "use this workspace to trace this application next time you run the application with RootCause on". You might click No if you don't want to trace that application with RootCause yet, or if you want to keep tracing it with a different workspace with which it's already registered. When in doubt click No: you can always use Workspace->Register Program to do it later.

5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?

It means that something has been changed or added that must be recorded when the workspace is saved. You can ignore it.

5.3 Where do I find out about the Predefined UALs listed here?

See Chapter 8 of the User's Guide, which fully describes the Console GUI. Also, look for a file in $APROBE/probes with the same name and suffix ".apc" and you'll see the details of its implementation. This doesn't apply to X.trace.ual, which is custom for each workspace.

6. The Trace Setup Dialog

6.1 What does <Unknown File> mean in the Trace Setup tree?

This means "Unknown Source File", probably because no debug information was found. Look in the Messages pane of the Workspace browser window for messages about debug information. You can still trace entry and exit to these functions, and can write custom probes that get data without using debug information.

6.2 What do the black and blue dots mean in the Trace Setup tree?

The dots are there to act as a "path" to help you find the traces and probes you've defined.

A black dot indicates an entry/exit trace of the marked function, method, file, class, or directory. Functions and methods marked with black dots are represented by equivalent entries in the Wildcards dialog, and are implemented by entries in the trace.cfg file in the workspace.

A blue dot indicates a probe or data trace in the marked function, method, file, directory, or class. These actions are not mapped to wildcards, and are implemented by compiled APC for C functions.

6.3 How do I trace a dynamically loaded shared library (DLL)?

You must add the library to the workspace, and then it will show up in the Trace Setup window. To do this, select Add Dynamic Module... from the Workspace menu. If the module changes, you must do Reset Dynamic Module .

6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?

"Don't Trace..." will remove the black dots from the subtree it applies to, meaning those methods and functions won't have their entry and exit traced. "Remove Probes..." will remove the blue dots, meaning specific Probe and Data logging actions will be removed.

6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?

The easiest way is to copy it into the workspace. You can also use Add UAL, and you'll need to do that if it takes parameters and other complications, but that's a bit more advanced: see Chapter 8 of the User's Guide or contact .

6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.

For improved usability (at a customer's request), functions whose names match certain patterns are filtered from the list. This list can be changed, replaced or nullified, though this is not documented.

The filtering is defined by the patterns in the file $APROBE/arca/trace_filters . See the commentary at the top of that file for complete information.

6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?

Could it be you set APROBE_SEARCH_PATH after you started the GUI? If so, quit RC and restart it so it can pick up the env var.

6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?

This is addressed in Chapter 10 of the RootCause User's Guide, under Libraries With No Debug Information. Here's a paraphrasing of that given by our support staff:

The easiest way is to create a ".h" file that contains prototypes for the functions that you want. RootCause will automatically compile and use the "debug information" in that file so, for example, you can see the parameters in the setup window of the Console or reference them by name in the custom apc that you write.

To do this:

  1. Put the prototypes (C, not C ) into a ".h" file and give the file the same name as the shared library (or executable) where the functions reside (for example if your executable was named a.out, then the .h file would be named a.out.h)
  2. Place the .h file in the local or global "shadow" directory, with the name of your executable or library plus ".h" on the end. For example, if your program were called t.exe then on Unix the global location is $APROBE/shadow/t.exe.h and the user-local one is $APROBE_HOME/shadow/t.exe.h. See [#q4.8 Question 4.8] about APROBE_HOME (and APROBE64_HOME).

Placing the .h file in $APROBE/shadow would make it available for all invocations of RootCause, whereas the other two locations would be more user specific. Note that RootCause will search the directories in the opposite order of their listing above, so a.out.h in the .rootcause directory will be used instead of a.out.h in the $APROBE directory.

You can see an example of this by doing a directory of the $APROBE/shadow/*.h. RootCause uses this feature to provide parameter information for some of the system shared libraries.

Make sure that you have a supported C compiler available, as this is needed to compile the .h files. (You may not have a supported C compiler if you installed RootCause as Java only and now want to do C probing; contact to add the C capability.)

6.9 How can I turn on trace just when I'm in a chosen method or function?

This is called a "Trigger" and has been a feature of the Aprobe-level trace all along. It was added as a Probes action in the Trace Setup dialog in version 2.1.3a (April 2004). It works like this:

  1. Apply Trace to all the functions and methods you want to trace, as usual.
  2. Select the function or method that is to be the "trigger".
  3. Click the Probes tab in the lower right pane.
  4. Check the On checkbox, then use the Probe Action dropdown menu to select Trigger Trace.
  5. Click Ok to apply and build your trace.

You should see the function or method to which you applied the Trigger action at the top of each traced call tree in your trace, and nothing outside of that (even if you selected it for tracing).

6.10 How can I enable my custom probe only when Trace is also enabled?

You can check whether trace is enabled with the ap_RootCauseTraceIsEnabled macro. For example:


         if (ap_RootCauseTraceIsEnabled)
         {
            printf ("Enabled\n");
         }
         else
         {
            printf ("Disabled\n");
         }

Disabling your probe independently from Trace is covered in the "Disable Probe" example in $APROBE/examples/learn/disable_probe.

6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?

You can't. This is deliberately designed to remain active even after trace is disabled. We do deliver source for the probes so that users can customize their behavior. In this case it would be a simple matter of putting the "if (RootCauseTraceIsEnabled)" check (see [#q6.10 Q6.10]) around the code in the "ExceptionHandler" routine within $APROBE/probes/exception.apc, recompiling it, and either using a local copy or overwriting $APROBE/ual_lib/exception.ual.

6.12 How can I trace and time everything between point A and point B?

  1. Create a workspace for the application (which you have already done).
  2. In the main window:
    • Enable the xxx.trace.ual (the first one).
    • Enable perf_cpu.
  3. Go to the trace setup dialog:
  4. Click on the program node (the very first one).
  5. In the probes tab, create a probe on program entry to disable tracing.
  6. In the left pane, click on the application module node (first 'M' icon).
  7. Right click and choose trace all.
  8. Find and select the point A function in the tree.
  9. In the probes tab, create a probe to enable tracing on entry.
  10. Find and select the point B function in the tree.
  11. In the probes, create a probe to disable tracing on exit.
  12. Click the Options... button to open the Trace Options dialog.
  13. Disable load shedding so you get everything.
  14. Click OK to build the workspace.
  15. Restart your application.

After you run through your test, format the APD files with Examine. The tree will reflect the trace path from point A to B. At the end is a summary call tree with call times in it. Or you can look at the performance table node (right click and choose show associated table) to see a table.

6.13 How can I allow all Java parameters to be traced?

To enable the Log All Parameters menu item, set and/or export the undocumented environment variable RC_ENABLED_LOG_ALL before starting the RootCause GUI.

7. The Trace Display (Event) Dialog

7.1 Why are some functions found in the traced Events not found in the Trace Setup?

There are two possibilities, but the most likely is that the traced function is a compiler-generated one that is explicitly filtered from the Trace Setup list, but which is covered by the "wildcard" trace used when you do "Trace All Child Nodes" from the Trace Setup module node. See [#q6.6 Q6.6] .

The other possibility is that the event was introduced by some other custom probe, such as a J2EE trace. See [#q7.2 Q7.2] .

7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?

Probably because the events didn't originate in the Trace Setup, but were introduced by a supplementary J2EE trace. Still, you should be prompted to add the containing class, and so be able to define traces on it.

7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?

Yes, RootCause has a concept of a source file path. There are a number of ways to set this:

If you click on a method, the first time it will ask if you want to find the source. If you browse and select the source file, the enclosing path is automatically added to a list. If the end of the Java path matches the package path of the class, the "root" of the package path is added also.

You can edit the path directly off the RootCause Setup menu.

We'll pick up an environment variable APROBE_SEARCH_PATH when the RootCause Console starts.

7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?

Of course it depends on the vendor, but the best thing to do is to send them what you would want your customers to send you: text with as much pertinent information as possible. If the trace contains enough information for you to determine where the problem is, then the other piece of information they would want is the system configuration, as collected with logenv.ual.

To create the bug report, you could do File->Save As Text from the Trace Display window; then edit the resulting text file to include the program and system configuration and the tracebacks and execution information that identify the problem; then e-mail the result, indicating it was collected with RootCause. (They might have RootCause also, and ask you to re-run to collect additional information).

7.5 RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?

Yes. The JVM routinely uses signal 11 (perhaps for extending the stack) and signal 4 (illegal instruction -- not sure what that's for). These can show up in the trace and are fine. Later versions of the JVM provide options for reducing its use of signals; you can search java.sun.com for details.

7.6 When I trace a Java synchronized method, does the method time include lock delay time?

The JVM implements the synchronization on the calling side rather than on the callee side. Once you are inside the method's code, the lock has already been grabbed. This means that the time you see is after the synchronization.

For instance, I have a test that calls a synchronized method from a thread's run method:


try
{
   Thread.sleep (1000);
   parent.synchronizedMethod ();            // Line 15
}
catch (InterruptedException e)
{
   e.printStackTrace ();
}

If I trace lines and have things set up so another thread is within synchronizedMethod(), I see something like this:


Line 15                    10.45.00            ; Waiting ...
synchronizedMethod entry   10.46.00            ; Got it ...

7.7 Why was malloc() listed as being LOAD_SHED in the Trace Display when it really wasn't?

Because it was attempted to be load-shed, which recorded it as such, but the actual disabling of the probe was disabled by another UAL's explicit request, using #pragma nopatchcount.

The confusion comes from the fact that load shedding may mean two things:

  1. The patch for the subprogram is disabled (no more probes for this routine will get triggered);
  2. This routine is no longer traced.

Since we don't want (1) to happen for allocation/deallocation routines when running memstat, these patches could not be disabled. This was indicated by using #pragma nopatchcount in combined_memstat.apc.

However, when traced these routines will get load shed just like everything else, and the LOAD_SHED event and appearance in the table indicate that (2) has happened. So this is pretty much "as designed".

If you explicitly mark the function as, "Do Not Shed", it will no longer show up in the table.

7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?

You are hitting the limit on the maximum number of items displayed in the trace display. You can either reduce the size of the APD files, reduce the number of APD files selected or increase the limit at the expense of longer processing times and higher memory overhead. I would try the last one first and if this works for you, great. The option is "Maximum number of events in Trace Display" and is described here. Briefly:

  1. Go to the RootCause Main window
  2. Open the Setup menu (not the button, but the pulldown menu)
  3. Select Options...
  4. Change the value of the option Maximum number of events in Trace Display (third from the bottom) to a higher value. A value of 2000000 (two million) is appropriate for processors with more than 128M of memory.

The values are recorded per-user, so must be set for each user in the user preferences file: $APROBE_HOME/preferences.

7.9 I see that I can do "Save As XML": can I view this XML later?

Yes, but only in RootCause (see below). It is not quite legal XML and so will be rejected by general XML viewers. (If you think this is an important feature, let us know.)

To import saved XML back into RootCause again, you have to set the environment variable RC_ENABLE_LOAD_XML to a nonempty value before starting the RootCause GUI. If you've done this, you will then see the menu item Examine XML File... in the Analyze menu in the RootCause Main menu. Clicking this menu item will open a file selection dialog from which you can select an XML file. This must be a file previously saved from RootCause Trace Display using File->Save As XML. When you click the Examine XML Output button in this dialog, you will then see a Trace Data Dialog in which one of the checkboxes is the name of your XML file. Check it, and click Open, to view the Trace Display.

7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?

Under the View menu, click Statistics Filter.... This dialog is used to create a "filtered" copy of the statistics summary tree. The copied tree will be added to the end of the event tree and will identify what filter was used. You specify a statistic to use (Wall time or CPU time, if collected) and a threshold percentage to create the "filtered" copy. A child node in the summary tree will only be copied to the new tree if the child's statistic value is at least the given percentage of the parent's statistic value. Choose "None" to create an exact copy. The threshold must be a numeric percentage between 0 and 100.

7.11 Do the times shown in Trace Events reflect the aprobe overhead?

No, these are actual times. You can specify overhead values by clicking View->Statistics Overhead. This opens the Set Statistics Overhead dialog. You'll see an options menu from which you can select the statistic to adjust, and type-in fields for the normal (native) call overhead and the Java overhead (which is generally bigger).

Note you must each statistic separately, for example:

  • Click View->Statistics Filter...
  • Click None and change it to Wall Time
  • Type in Overhead and Java Overhead values
  • Click Ok
  • Click View->Statistics Filter...
  • Click None and change it to CPU Time
  • Type in Overhead and Java Overhead values
  • Click Ok

When you've completed setting overhead values, you must regenerate the data:

  • Click File->Refresh' to reformat the data with the new values.

7.12 How do I know what overhead to specify in the Set Statistics Overhead dialog?

As described in [#q7.11 Q7.11], you can specify tracing overhead to be applied to times shown in the Trace Events details. But what number should you put in there? The answer depends on a number of factors, including your hardware and OS speed, whether you're dumping parameters, and whether it's Java or native code. A good guess is the minimum time you see in the entire tree for that kind of call, or if that seems to big, you can instrument some do-nothing function and see what its time is. This value would be the overhead for every call, and you can use that.

7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?

The nodes look like:

ENTER Factor::addWidgets()
  time = 2004-05-03 16:32:10.079965024
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java
  CPU Time 0.428844 ( 0.428844)
  Wall Time 0.552496 ( 0.552496)

 EXIT Factor::addWidgets()
  time = 2004-05-03 16:32:10.632461354
  elapsed time = 00:00:00.552496330
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java

The Details pane for each node gives the (wall) time at which the function or method was entered. In addition, any statistics that were being gathered are attached to the ENTER Node. Shown here are the elapsed CPU Time (gathered because the perf_cpu probe was enabled) and elapsed Wall Time. Both were computed on EXIT from this specific invocation. The EXIT node also shows the elapsed (wall) time, which is the same as the Wall Time statistic.

7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?

Consider the following node:

Java_Factor_smallestFactor()
  process = 15193, thread = 10 _start()
  symbol = extern:"Java_Factor_smallestFactor()" in "libFactorJNI.so", /work/JNI/factor.c
  Times called: 29
  Child calls (native/Java): 4190 / 0
  CPU Time (29):  1.248102 ( 1.298730) [99.753%]
    Max  :  1.231153 ( 1.274449)
    Min  :  0.000048 ( 0.000072)
    Avg  :  0.043038 ( 0.044783)
  Wall Time (29): 375.135004 (375.185632) [99.998%]
    Max  : 375.105686 (375.148982)
    Min  :  0.000043 ( 0.000067)
    Avg  : 12.935689 (12.937435)

Recall that each node in the Event Summary tree represents a unique call stack in the execution. The one shown above is for the native JNI function Java_Factor_smallestFactor() (see $APROBE/demo/RootCause/JNI).

The function was called 29 times. Those 29 calls together used 1.248102 seconds of CPU Time after overhead adjustment (See [#q7.11 Q7.11].) The slightly larger time shown in parentheses after it (1.298730) is the "raw" time before the overhead adjustment. The percentage in brackets indicates that the total CPU time used for this function comprised 99.753% of the total time used by its caller, the parent node in the summary tree (See [#q7.10 Q7.10] about filtering based on this percentage.). Of those 29 calls, the longest (Max) took 1.274449 seconds of CPU, the shortest (Min) took only 0.000072 seconds, and the average took 1.248102 / 29 = 0.043038 seconds of CPU.

7.15 Is there a way to save the text for a specific node in the Trace Events tree?

Yes. Click on a node to select it, then right-click to pop up the context menu, then click 'Save Node As Text' to save the selected node in a text file. This will save the node and its details exactly as it would appear in the 'File->Save As Text..' output. Note that it works only for one node, so if multiple nodes are selected it applies only to the first of those. See also the [#q7.16 next question].

7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere?

Yes. In either the Events tree on the left, or the details in the lower left: Click on a node (or multiple nodes using shift or control keys in the usual way). Then right-click to pop up the context menu, then click 'Copy'. This will put the selected nodes in the Java clipboard.

7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?

Probably because it was Load Shed. This means that it was called so often its tracing overhead became excessive and tracing was disabled for it during the run. It will appear in the Load Shed table, where you can choose to stop it from being Load Shed during the next run.

8. RootCause and Aprobe

8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?

You can't. The concept of levels is no longer supported. Instead you can apply a Trace Trigger, or disable and enable the trace using the probes tab for a given function.

8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?

These are not currently integrated with RootCause. If you can run them from the command-line using Aprobe you should do that. If you wish to use the "RootCause On" mechanism to run them using the workspace, you must add them to the workspace options using the "Setup->Add UAL" menu item. This adds a new UAL "permanently" to the Workspace UAL tree. For example, to add the "memwatch" probe, you would:

  • provide "memwatch" as the path to the UAL and its name;
  • check "Has parameters";
  • provide "-g" as the Aprobe parameter if you want to see the memory usage display;
  • give no apformat parameters.

This adds "memwatch" to the UAL tree in the Workspace window. You could then check this to enable memwatch on applications run under RootCause. The output of these probes isn't integrated with RootCause, so the output appears as a "Text" node in the Trace Display event tree. You can use "Save As Text" from that display to view it outside of RootCause.

Prior to RootCause version 1.3.3, you would reference these probes using the Aprobe options and Apformat options dialogs (see Chapter 8 of the user's guide), just as you would on the Aprobe command-line. For example, to enable memwatch, you would add -u memwatch -p -g as "Additional Aprobe Options" (under Aprobe options in the Execute menu in the Workspace window) and -u memwatch in the Apformat options (under the Analyze menu). For probes like profile that require configuration files, you would have to put the full pathname of the configuration file into the options as well, like -u profile -p -c /testdisk/probes/prog1.profile.cfg .

8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?

Yes, but this is not explicitly supported. In particular, most operations from the RootCause Console overwrite the scripts in the workspace which apply Aprobe to the application. So after you use the Console to create a workspace, you quit, and edit the aprobe.ksh and apformat.ksh scripts directly to apply your probes.

8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?

Aprobe supports Java with the apjava command. Writing custom probes in Java is described in Chapter 11 of the RootCause for Java User's Guide and the nearly-identical Chapter 5 of the Aprobe User's Guide, and if you really wanted to you could do everything from the command line.

8.5 How do I add my own UAL to the RootCause trace?

There are three ways of adding a UAL to a trace:

  1. Update the predefined_uals file in ual_lib to add it for all workspaces. It will show up in the list in the workspace when you do that.
  2. Use the Add Ual option on the setup menu - this will also cause it to show up in the list.
  3. Copy it into the workspace. It will not show up in the list because it's not until runtime that we look in the directory to see what other UALs are present.

Personally I like option b, choosing not to copy the UAL to the workspace. This makes it easy to enable / disable from the GUI.

8.6 How can I use the Events probe with RootCause?

The events probe is not integrated with RootCause Trace Display, but you can still use it. Here's a quick way to get started, by simply applying events to all Java methods and all native functions in the main module (if any), and letting load shedding reduce overhead.

  1. cp $APROBE/probes/events.cfg MyWorkspace.aws
  2. echo ';event function "*"' >> MyWorkspace.aws/events.cfg
  3. echo 'event function "*::*"' in $java$ >> MyWorkspace.aws/events.cfg
  4. Workspace->AddUal: add events.ual and specify the following aprobe parameter:
   -c $RC_WORKSPACE_LOC/events.cfg
  1. Keep the trace.ual enabled with load shedding on, but don't specify any traces (this would load shed low level events)
  2. Run the application
  3. From the command line, use
  rootcause format -r MyWorkspace.aws > format.txt

Your results are in format.txt. You can then edit the events.cfg file to do more, as shown in [#q15.12 Q15.12] , and you can specify an alternate output file so you get the events output while still formatting within RootCause.

9. RootCause at Run Time

9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer? I was thinking that it would be interesting to see all the processes as my computer boots.

Not exactly, but you can turn it on early in the boot process in the same way you would start other services, by putting a script under /etc. Check with your system administrator or contact OC Systems support.

9.2 How much will RootCause slow my application?

This depends almost entirely on what you do with it. If you trace almost nothing, it will introduce almost no overhead. If you trace every method call on your machine, it will slow things down too much. The keys to good performance are:

  • only ask questions you want the answers to; that is, don't blindly trace everything if you're worried about performance; and
  • avoid logging data over the network: put your workspace on a local disk. Experience tells us that collecting too much data is a bigger problem than slowing down the application too much.

9.3 How can I trace Linux daemons with RootCause?

The following steps should allow you to use RootCause to trace activity in several of the daemons on your Linux system:

Background

RootCause keeps a log file and a registry as defined by the APROBE_LOG and APROBE_REGISTRY environment variables. These are generally set on a per-user basis by the Aprobe setup script, based on the user's $HOME environment variable or on the environment variable APROBE_HOME if that's defined. The default location for these files is a hidden directory under a users home directory called ".rootcause". When RootCause intercepts a program that is starting up it looks in the user's registry to see if this program should be instrumented. If so, there will be an associated workspace file named in the registry. By changing the APROBE_HOME environment variable before running setup, you can change the locations of the log and registry. Note that these files have to be writable by all processes that access them.

Daemons like sshd are started on Linux using a shell (bash) script located in /etc/init.d . For sshd the file is /etc/init.d/sshd . If you edit this file you will see a subroutine named "start". Not surprisingly it is this subroutine that we want to add a few statements to setup RootCause to intercept sshd .

Details

  1. Create a RootCause workspace to trace sshd :

We recommend that you create your workspace on a disk local to the machine that will be running the intercepted program on. Create it in the same way we did today, that is using the "new" pulldown menu on the main RootCause screen.

  • Verify the location of your log and registry files:

These files are probably in $HOME/.linux_rootcause . They are named: "registry" and "rclog". You can specify a different location using the APROBE_HOME environment variable (see [#q4.8 Q4.8] ) but be sure to run "setup" after setting APROBE_HOME and make sure the protections of the resulting files are correct.

  • Back up your /etc/init.d/sshd script.

You should probably make a copy of the sshd file before you modify it so you can restore it when you are finished tracing sshd.

  • Modify the /etc/init.d/sshd script to setup aprobe:

Find the start subroutine in the /etc/init.d/sshd file and insert the following four lines after the "do_dsa_keygen" line:

  export APROBE_HOME=directory identified in step 2
. aprobe_root
/aprobe/setup
  . $APROBE/bin/rootcause_enable
    1. Stop and restart the sshd daemon.

As root and with your current directory as /etc/init.d execute

  sshd stop
  sshd start

You should see a stopped message from the stop and some output indicating that rootcause has started from the start message. You may get a "FAILED" message from the start. On our system even when we get the failure message the daemon seems to start with no problems. So I think you can ignore this message. Tracing the libcrypt.so library was interesting, you can really see the ssh protocol flow as it generates keys and such. The technique outlined above should work for many of the daemons on Linux.

9.4 How do I apply RootCause to applications run at boot time?

Once you've used Aprobe to investigate the behavior of processes on a running machine, there is nothing particularly complicated about doing the same for system processes while the machine boots, but there are a number of special factors to take into account. These are listed below, and an example given of how we applied these to one of our machines.

The techniques described here were tested on Solaris (no longer supported) but should apply approximately to Linux. AIX is a bit different, and in any case should be done in coordination with a knowlegeable system administrator.

  1. Any time you make your own modifications to a system's startup procedures, there is a risk that you may make the system unbootable. We'll try to point out the pitfalls, but as with any procedures like this you should be prepared to recover the system from maintenance mode or even to reinstall the OS.
  2. At startup, system resources you may want to rely on may not be available. Make sure your RootCause installation is not on remote disks, and even for local installations, check that the filesystems used for the installation and for logging are available at the expected point during the boot process. If you want to get in at the start of Runlevel 2, the only filesystems typically available at that point are "/" and "/var", which may not have enough free space to support installation and logging.
  3. Startup scripts are run with /sbin/sh, which does not provide all the features you may be accustomed to with ksh, although it is very close for most purposes. Where possible, test scripts by running them under /sbin/sh before adding them to the boot process.
  4. For the test I just performed, I chose to monitor processes started as the system enters Runlevel 3, which starts NFS server processes, among others. At this point, all local filesystems are mounted, so I had no problem finding space for an installation, but many potentially 'interesting' services had already been launched.
  5. The libapaudit.so shared library needs to be installed in a secure location. With root authority, run:
  . /opt/aprobe/setup
  rootcause_libpath -c
  1. The startup procedure for a given Runlevel is determined by a script, " /sbin/rcN ". The execution of these scripts is described in /etc/rcN.d/README , for N = 2 or 3. Since RootCause depends on an environment being defined, we need to 'source' some scripts into this command so the environment is defined when servers and daemons are started. I did this by creating files in /etc/rc3.d. If you look at the README and /sbin/rc3 script, you should see how this works.
  2. You will need to perform three steps to enable RootCause intercept in the rc driver. We will accomplish this by creating three files in the /etc/rc3.d directory.
    • /etc/rc3.d/K00RootCauseLocal.sh

Defines the APROBE_HOME environment variable where the logs and registry are stored:

APROBE_HOME=/opt/aprobe_home
export APROBE_HOME

    • /etc/rc3.d/K01RootCause.sh

Is a soft link to the setup script in the RootCause installation directory:

ln -s  /opt/aprobe/setup /etc/rc3.d/K01RootCause.sh

    • /etc/rc3.d/K02RootCause.sh

contains the command to enable intercept:

. rootcause_enable

Normally, scripts whose names start with 'K' are used to shut down processes before others are started, but we will take advantage of the fact that these are executed first to ensure that the RootCause setup is performed before anything else.

  1. All that is required now is to reboot the machine, then login as root, define APROBE_HOME, source the installation setup script, and start the RootCause GUI. The event viewer should show you what processes were launched.

9.5 Can I apply different workspaces (or none at all) for the same program invoked with different command-line parameters?

Yes, by the addition of a "-p pattern" option to the rootcause register command. The pattern argument consists of a simple expression that can specify argument positions, wildcards and simple comparison and logical operations. You can associated the same executable (or Java class) and different patterns with different workspaces. At run-time, actual command-line arguments are substituted for special identifiers in the expression (like %2, $*) and then the expression is evaluated. If it evaluates to TRUE, the associated workspace is used to probe the application. If no expression evaluates to true, then the application is not probed. There's no GUI support; you have to register your application from the command-line to use this feature. All the details are described [regpattern.txt here]. If it's still not clear how to do what you want, don't hesitate to [#q1.6 contact us].

9.6 How can I "intercept" a Java server on AIX?

As described in the user's guide, RootCause on AIX does not support the automatic "intercept" of applications at load time: the application must either be run directly from the command line with "rootcause run", or else the binary must be renamed/replaced with a soft-link to a script that simulates the intercept effect.

Starting with version 2.1.3b (May 2004) you can do implement this second alternative with the rootcause link command, which renames/replaces the java binary with a script that uses access-lists and environment variables to manage who's applying rootcause to each Java instance.

The command rootcause link is used to apply Rootcause to applications (typically services and application servers) which cannot easily be started from a user's shell environment. rootcause link uses symbolic links to "intercept" these applications. A set of subcommands are available to manage these links safely and conveniently.

Note that step 4 will probably require root authority, depending on where the application to be traced is installed.

  1. Identify the full path to the executable you wish to trace with RootCause. In the case of an application server, this will be a program named "java". You should use the 'ps' command to verify the pathname if possible. Write this path name to a file, for example:
       echo /usr/java131/bin/java > server.lst

The application named here cannot be a symbolic link.

  1. Install the above list as the application list with
       rootcause link -I server.lst

You may specify more than one application, each on a separate line, in this file. The rootcause link -I command instructs RootCause to save this file as the list of applications whose links are to be managed. rootcause link -I will require write access to the RootCause installation directory. If you need to change the application list later you will need to apply step 7 below (remove symbolic links).

  1. Verify the application list is installed as expected with
       rootcause link -l

This will report a line like the following:


     - /usr/java131/bin/java

The '-' indicates that the application is eligible to have its link managed, but that link does not exist and as a result the application will not be run under RootCause. rootcause link -L will show an explanation of the characters used to describe the link state. These are:


   - Executable is not RootCause linked
   * Executable will be run under RootCause
   ? File is not an executable or is invalid
   ! A serious error was detected;  contact support immediately

  1. Create the application link with
       rootcause link -K

This will create symbolic links into the RootCause installation directory for each application designated with the rootcause link -I command. rootcause link -K requires write access to the directory where the application to be traced is installed. Typically this will require root authority.

  1. Turn on rootcause interception with
       rootcause link -a

Now whenever the application is started, an entry will appear in the rootcause log. Follow the usual procedure to create a workspace and set up trace definitions. rootcause link -a can be run by any user. At this point you are ready to begin analyzing and debugging your application with RootCause. The remaining steps describe how to return the application to its original state and should be performed if RootCause is uninstalled.

  1. Turn off rootcause tracing with
       rootcause link -Z

The symbolic links will remain in place, but the application will not be run under Rootcause. rootcause link -Z can be run by any user.

  1. Remove symbolic links with
       rootcause link -D

rootcause link -D requires write access to the directory where the application to be traced is installed (same as -K). This will restore your applications to their original state, where they will run completely independently of any component of the RootCause toolset.

9.7 How can I dump Java objects with a probe on a known program point, rather than at a certain elapsed time as done by java_memstat?

The java memstat probe is built on top of another probe called libapjvmpi. It is an interface to the Java JVMPI library and takes care of a bunch of the low-level work. One of the things it provides is a mechanism to take a heap dump. Working with the interface requires getting a dynamic pointer to the libapjvmpi interface and then using that. For instance:


 #include "libapjvmpi.h"
 
 static apjvmpi_InterfacePtrT JvmpiInterface = NULL;
 static apjvmpi_InterfaceHandlePtrT JvmpiHandle = NULL;
 
 void InitializeUal_early_heapdump (void)
 {
    // Load the jvmpi interface UAL
    if (ap_IsNoUalId (ap_LoadAndInitializeUal (LIBAPJVMPI_LIBRARY_NAME)))
    {
       ap_Error (ap_WarningSev,
                 "Unable to load "LIBAPJVMPI_LIBRARY_NAME"\n");
    }
 }
 
 probe program
 {
    on_entry
    {
       JvmpiInterface = apjvmpi_Initialize;
 
       if (JvmpiInterface == NULL)
       {
          ap_Error (ap_WarningSev,
                    "Unable to initialize JVM support for\n"
                    "Java object tracking.");
          return;
       }
 
       // Get an interface handle
       JvmpiHandle = JvmpiInterface->Initialize (3);
       if (JvmpiHandle == NULL)
       {
          ap_Error (ap_WarningSev,
                    "Unable to get a necessary interface for "
                    "Java object\n"
                    "    tracking. It requires interface version 3 but the
 "
                    "apjvmpi library\n"
                    "    is at version %d\n",
                    JvmpiInterface->GetVersion ());
          JvmpiInterface = NULL;
          return;
       }
    }
 }
 

To call the heap dump you would need a probe to determine when and call the heap dump routine:


// Request a heap dump. Keep the last n heap dumps specified - note that
// if there is already a larger count set, that value is retained.
// void (*RequestHeapDump) (apjvmpi_InterfaceHandlePtrT Handle,
//                          int                         RetainHeapDumpCount);

   {
      // Keep 3 dumps
      JvmpiInterface->RequestHeapDump (JvmpiHandle, 3);
   }

You'll need java_memstat around to format the object dump(s).

10. RootCause J2EE Support

RootCause J2EE support has been discontinued with the introduction of OC Systems "RTI Enterprise" product. See http://rtiperformance.com.

11. RootCause TroubleShooting

11.1 I applied a Trace on function (method) in the RootCause GUI, but I don't see it being called in the output. Why?

Here are some possibilities:

  • The function was called so often that it was load shed, and calls stopped being recorded. Click on the LOAD_SHED node at the end of your Trace Display, choose Show Associated Table, and look for your function there. Using the option-menu in the first column can designate the function as Do Not Shed for subsequent runs.
  • The function was called, but it's not shown in the data file you're viewing. Use Add Data Files to Display in the File menu to add earlier files. If you still don't find it, then the data containing the last call may have been overwritten (i.e., the "trace buffer wrapped around"). You can save all data files containing the trace of a function by adding a SNAPSHOT probe ON_ENTRY to the function in the Trace Setup dialog.
  • There are multiple instances of the method in different classes, and you chose the wrong one. Use Find in Trace Setup and set traces on others that occur.

The following possibilities apply only to native (C/C ) functions:

  • The function that's really being traced is in a different module. For example open() in libc.so instead of your application module. Use Find in trace setup and set traces in all modules where your function appears.
  • You did "Trace All In" which generates a wildcard, but the function was one of those that's not traced as part of a wildcard because it requires an expensive "trap" patch. Return to Trace Setup and force an explicit trace on this function by adding a "probe" on entry.
  • The function was optimized and so "inlined" at the point of call. If there really is no call, the function can't be traced.
  • The function cannot be traced. There are a few functions that because of the way they're coded simply cannot be probed. To test for this, go to the command-line and type:
  apinfo -sa -x your_application.exe | grep "your_missing_function"
  • If you see your missing function in the output, it cannot be instrumented. Contact OC Systems to find out why.

11.2 I add a library as a dynamic module and trace the init function, but the trace doesn't show up. Why?

When you add a module as a dynamic dll, this forces it to be preloaded (loaded before program start rather than at the point of the dlopen() / LoadLibrary() ). This means that the _init() function is called before _start of your main application, which is before probes have been applied.

11.3 I Add Dynamic Module of mylib.so, then specify some traces in mylib.so. But when I run the program, those traces don't appear. Why?

You may be loading a different instance of the library at runtime than you specified to Add Dynamic Module. This may be the case if LD_LIBRARY_PATH (or LIBPATH on AIX) is set. Make sure that the full path to mylib.so you've added to your workspace is the same as the one that will be loaded at runtime.

11.4 I did Custom..., and saved my probes to an APC file, but those probes don't show up in my trace. Why?

Make sure the "Add to Custom APC Files" checkbox is checked. If you've already got an APC file, make sure the Append checkbox is checked as well. Also, see [#q11.1 Q11.1] .

11.5 How do I stop tracing something I've got a workspace for?

You need to delete it from the registry. The easiest way to do this is with the GUI:

  • Open the workspace in the RootCause Console GUI
  • In the RootCause main window, click Unregister Program in the Workspace menu.
From the command-line, do rootcause register -d -c class_name to unregister a Java main class.

To unregister a native program, first do rootcause register -l to see the exact path of the program that is registered, then do rootcause register -d -x exe_path.

11.6 What do I do about the message "(E) ADI checksum (0x84b1c4d) does not match module checksum (0xa1c5e35)." when I register on a .dply file at a remote site?

This message will be followed by specific information about the ADI file and module. The module is the executable or shared library on the remote machine, and the ADI file contains the debug information from the host machine where the workspace was developed.

The error messages indicate that the version of the module (application) on the remote machine does not match the version against which you developed your original traces.

You must create the workspace and traces against the same version you send to the remote site because we compare checksums.

11.7 Why does my Java app fail with "Class Not Found" under RootCause, but work fine without RootCause?

The most likely cause of this is that you're using the "-jar" option on your 'java' command, which is not supported by RootCause prior to version 2.1.2 (October 2003).

So, if your application is run with


java -jar $APROBE/lib/probeit.jar

You could run it instead with:

java -classpath $APROBE/lib/probeit.jar  com.ocsystems.probeit.Main

If you don't know what the main class is, it is defined in the manifest of the .jar file. For instance:


mkdir tmp
cd tmp
jar -xf $APROBE/lib/probeit.jar META-INF/MANIFEST.MF
grep "Main-Class" META-INF/MANIFEST.MF
     This will give a line "Main-Class: com.ocsystems.probeit.Main".
cd ..
rm -rf tmp

You would do the same thing using your own java command line and jar file in place of the above.

After you have changed the command line, you should then re-run the application and got through the "New Workspace" steps. This time it should work fine.

If this is too much of a hassle, contact support@ocsystems.com about getting a version with -jar support. If you weren't using -jar, or if the problem persists after going through above process, also contact OC Systems support and we can help you debug it.

11.8 How can I probe Java classes loaded with a custom class loader and so not in the CLASSPATH?

You will find that when you use "Open Associated Workspace" it imports only the jars in the class path and and so other classes that might be explicitly loaded do not appear in the Trace Setup. This can be easily remedied.

So long as the class loader follows the standard model for class loader inheritance (e.g. classes loaded by that loader have visibility to classes loaded by the application class loader) this is trivial:

  1. From the Main Workspace menu choose the Setup->Class Path menu item o bring up the Class Path dialog.
  2. In the Class Path dialog, add the path(s) to the class directories or jar files you will be loading from. Note that this does not have to be where they will be loaded from at runtime. This just gets them into the Trace Setup.

If there is no physical representation of the class available, you can use wildcards:

  1. Select the Root Java Module in the Trace Setup;
  2. Right click to bring up the context menu;
  3. Choose Edit Wildcards to pen the >Edit Wildcards dialog.
  4. On the left "Trace" side of the dialog, enter strings like:
    "MyClass::*"
    "MyClass::aMethod"

11.9 When I have "rootcause on" I sometimes notice that commands piped together (for instance "env|grep MyVariable") can hang for a while before completing. Why is this?

Your home directory (which will be the default disk for the rootcause log) is probably on an NFS disk. When two processes try to lock a file at the same time, one will be halted until the other one is done. However, with NFS it can take a while for the state of the unlock to propagate back, leaving the caller waiting on the lock routine even though the other process has unlocked it. The solution is to set APROBE_HOME to a local disk.

11.10 When I add my library to the workspace with Add Dynamic Module and run with RootCause, my application never starts. What's wrong and how can I fix it?

Add Dynamic Module causes a library to be "preloaded" (using the aprobe -dll option) because it's only on program startup that automatic trace configuration can be done. However, some user libraries cannot be preloaded because they rely on some global state being defined which isn't done until the program starts running.

This means you can't trace or do anything else on this module. You're beat unless you can change the library to allow it to be pre-loaded.

11.11 Is there a way to add my own files to a deploy file so they will unpack into the directory created by rootcause register xxx.dply?

A .dply file is just a zip file. You can just use zip (provided with RootCause) to add files to this archive, like:
   zip xxx.dply this.txt, that.class, other.ual

11.12 Why doesn't the pi_demo program doesn't run on my new Linux version?

Because it was built on an old version of Linux. You can rebuild it from source using the Makefile in that directory, or else load the compatibility package for Fedora: compat-libstdc -*.i386.rpm.

11.13 Why didn't my trace on Linux didn't log any data?

If your Workspace is being accessed over NFS, this means you're writing the data to APD files over NFS, and Linux has known bugs with this. You really need to have your workspace/APD files on a locally-mounted disk. (Even if it weren't for this bug, logging over NFS is orders of magnitude slower.)