RootCause/Aprobe FAQ

Frequently Asked Questions for RootCause and Aprobe (All Platforms)
Updated Feburary 1, 2007

This document describes aspects of the products "RootCause" and "Aprobe" from OC Systems, Inc. (www.ocsystems.com):

It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.

More complete and detailed descriptions of RootCause and Aprobe are provided by the User's Guides for those products, but this FAQ may provide answers not easily found there, and also includes specific code examples not applicable to a general User's Guide.

RootCause is built on Aprobe, a fully general mechanism for applying patches to programs without changing source or object code. See "What is Aprobe?" for more information.

Users are encouraged to send questions (and answers!) to .

This FAQ is Copyright (c) 2007 by OC Systems, Inc. ALL RIGHTS RESERVED.

Note to Windows Users:

This FAQ applies to all platforms, and some answers apply only to specific platform, so read carefully. To avoid excessive repetition, the Unix form of a command or path is used where it may apply to multiple targets. For example, paths to files are given in Unix format using forward slashes, environment variables use Unix format, and Windows users should read .dll where filenames end in .ual (see Q12.23 ).

1. RootCause FAQ


1.1 What is RootCause?
1.2 What are some potential uses of RootCause?
1.3 How do I get started quickly with RootCause?
1.4 Who can use RootCause?
1.5 For which platforms is RootCause available?
1.6 How do I get technical support?
1.7 Do I really need a C compiler to use RootCause?
1.8 What documentation is available for RootCause?
1.9 How is RootCause licensed?
1.10 In what language(s) can my program be written?
1.11 What compiler(s) must have been used to compile my program?
1.12 Do I need to build the program with debug to trace it?
1.13 What do these terms mean: probes, console, agent, logging, etc.?
1.14 What about gcc/g++ 3.x (and GNAT 5.x) support?
1.15 Is there any way to attach with RootCause to a running application?
1.16 Why should I update to the current version of RootCause?
1.17 What Java (JVM/JRE) versions are supported for use with RootCause?

2. Installation


2.1 On Windows, what does the prompt "Is this an Agent Installation?" mean?
2.2 On Unix, why does install_rootcause offer to install in a directory called "aprobe"?
2.3 On Unix, I get prompted to specify whether I'm using Java or C++: why do you care?
2.4 When the installation prompts for a compiler, does it want the one that builds my application?
2.5 The installation process prompts me for a license key, but I don't have one right now; can I continue?
2.6 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?
2.7 Where do I find ksh (Korn shell) for RedHat Linux?

3. The RootCause Console (GUI)


3.1 On Unix, why do I get a bunch of warnings about fonts when I do rootcause open?
3.2 Why doesn't copy/paste of text fields work in the RootCause Console?
3.3 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?
3.4 Can I just use my Web Browser instead of the built-in Help Viewer?
3.5 Can I run the RootCause GUI on Windows to view data collected my Unix system?
3.6 Unix: The RootCause GUI is just about unusable with my eXceed and Reflection X Windows emulator. What can I do?
3.7 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?

4. The RootCause Log


4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?
4.2 Why do I see two identical copies of a program in the RootCause Log?
4.3 Why don't I see the program I want to trace listed in the RootCause log?
4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?
4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?
4.6 How do I clear the RootCause log?
4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?
4.8 Can I locate my .rootcause directory somewhere other than HOME (or USERPROFILE)?
4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?

5. The Workspace Window


5.1 Should I say Yes or No to "Application is not registered with workspace" dialog?
5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?
5.3 Where do I find out about the Predefined UALs listed here?

6. The Trace Setup Dialog


6.1 What does <Unknown File> mean in the Trace Setup tree?
6.2 What do the black and blue dots mean in the Trace Setup tree?
6.3 How do I trace a dynamically loaded shared library (DLL)?
6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?
6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?
6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.
6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?
6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?
6.9 How can I turn on trace just when I'm in a chosen method or function?
6.10 How can I enable my custom probe only when Trace is also enabled?
6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?
6.12 How can I trace and time everything between point A and point B?
6.13 How can I allow all Java parameters to be traced?

7. The Trace Display (Event) Dialog


7.1 Why are some functions found in the traced Events not found in the Trace Setup?
7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?
7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?
7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?
7.5 Unix: RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?
7.6 When I trace a Java synchronized method, does the method time include lock delay time?
7.7 Why was malloc() listed as being LOAD_SHED in the Trace Display when it really wasn't?
7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?
7.9 I see that I can do "Save As XML": can I view this XML later?
7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?
7.11 Do the times shown in Trace Events reflect the aprobe overhead?
7.12 How do I know what overhead to specify in the Set Statistics Overhead dialog?
7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?
7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?
7.15 Is there a way to save the text for a specific node in the Trace Events tree?
7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere?
7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?

8. RootCause and Aprobe


8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?
8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?
8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?
8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?
8.5 How do I add my own UAL to the RootCause trace?
8.6 How can I use the Events probe with RootCause?

9. RootCause at Run Time


9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer (Windows XP)? I was thinking that it would be interesting to see all the processes as my computer boots.
9.2 How much will RootCause slow my application?
9.3 On Solaris, why do I get "illegal insecure pathname" with RootCause on?
9.4 Solaris: Why do I get warnings from running `ps' when rootcause is enabled, and how can I fix it.
9.5 Solaris: How do I find out the full path of dynamic modules loaded?
9.6 How can I trace Linux daemons with RootCause?
9.7 Solaris: How do I apply RootCause to applications run at boot time?
9.8 Windows: Can RootCause trace System Services which are automatically started when the machine is booted?
9.9 Windows: Why are there no APP_START events in the RootCause Log for System Services?
9.10 Windows: I created a WorkSpace and defined a trace for a System Service that is automatically started each time my machine is booted. However, I don't see an APP_TRACED event in the RootCause Log when I reboot. Why?
9.11. Windows: I've defined a trace for a System Service, but each time the Service starts it is not traced. Instead, I see a TEXT event in the RootCause log reporting the following apparent error: `(W) ... Cannot execute ...\do_aprobe.cmd'
9.12. Can I apply different workspaces (or none at all) for the same program invoked with different command-line parameters?
9.13. How can I "intercept" a Java server on AIX?

10. RootCause J2EE Support


10.1 What J2EE application servers are supported?
10.2 How does one create a workspace for a standalone Java app-server like WebLogic?
10.3 How does one create a workspace for a native executable app-server like iPlanet?
10.4 How do I configure RootCause with Sun iPlanet Application Server 6.5 on Solaris?
10.5 How do I configure RootCause with BEA WebLogic Application Server 6.1 on Solaris or Windows?
10.6 How do I configure RootCause with JBoss 3 on Solaris or Windows?
10.7 How do I set up for tracing TomCat?
10.8 How do I configure an app-server not listed here?
10.9 How do I set up a workspace for "jrun" on Linux?
10.10 How do I set up RootCause for use with IBM Websphere AppServer 4.0.5 Application running on Solaris?
10.11 How can I dump Java objects with a probe on a known program point, rather than at a certain elapsed time as done by java_memstat?

11. RootCause TroubleShooting


11.1 I applied a Trace on function (method) in the RootCause GUI, but I don't see it being called in the output. Why?
11.2 I add a library as a dynamic module and trace the init function, but the trace doesn't show up. Why?
11.3 Solaris: I'm trying to trace file open calls, so I trace "open()" in "libc.so", but I get nothing. Why?
11.4 Solaris: I Add Dynamic Module of mylib.so, then specify some traces in mylib.so. But when I run the program, those traces don't appear. Why?
11.5 I did Custom..., and saved my probes to an APC file, but those probes don't show up in my trace. Why?
11.6 Windows: Why doesn't the DOS dir command show my workspace, pi_demo.aws?
11.7 Windows: Why can't I see RootCause Workspaces in the Windows Explorer?
11.8 Windows: Something happened during Uninstalling the previous version, now I can't install the new one. What do I do?
11.9 How do I stop tracing something I've got a workspace for?
11.10 What do I do about the message "(E) ADI checksum (0x84b1c4d) does not match module checksum (0xa1c5e35)." when I register on a .dply file at a remote site?
11.11 (Windows) Why won't the "RootCause On" button stay checked in the GUI?
11.12 Why does my Java app fail with "Class Not Found" under RootCause, but work fine without RootCause?
11.13 How can I probe Java classes loaded with a custom class loader and so not in the CLASSPATH?
11.14 When I have "rootcause on" I sometimes notice that commands piped together (for instance "env|grep MyVariable") can hang for a while before completing. Why is this?
11.15 (Windows) Trying to apply RootCause to a service, I get MessageBox (after a reboot) saying there was a timeout and the Service failed to respond. Why?
11.16 When I add my library to the workspace with Add Dynamic Module and run with RootCause, my application never starts. What's wrong and how can I fix it?
11.17 (Windows) The APC compiler fails on the giant APC file generated with apcgen. Now what?
11.18 (Windows) Why does our application crash due to a bad return code from CoInitializeSecurity() when running under RootCause?
11.19 Is there a way to add my own files to a deploy file so they will unpack into the directory created by rootcause register xxx.dply?
11.20 Why doesn't the pi_demo program doesn't run Linux Fedora Core 3?
11.21 Why didn't my trace on Linux didn't log any data?
11.22 How can I eliminate "WARNING: Could not create system preferences directory" when I start the RootCause GUI?

12. Aprobe FAQ


12.1 What is Aprobe?
12.2 What is ProbePak?
12.3 What are some potential uses of Aprobe?
12.4 How do I get started quickly with Aprobe?
12.5 Who can use Aprobe?
12.6 What different versions of Aprobe are there?
12.7 For which platforms is Aprobe available?
12.8 How do I get Aprobe?
12.9 What documentation is available for Aprobe?
12.10 What tools make up Aprobe?
12.11 How is Aprobe licensed?
12.12 Is there a point-and-click (GUI) interface to Aprobe?
12.13 Can I run Aprobe on any executable program file?
12.14 In what language(s) can my program be written?
12.15 What compiler(s) must have been used to compile my program?
12.16 (Unix) How do I tell if a program file is "stripped"?
12.17 How do I tell what symbols a program has available?
12.18 What do I do to get symbols in my program?
12.19 What do I do to get "debug information" in my program?
12.20 How do I tell if a program file has "debug information"?
12.21 What is a "probe"?
12.22 What is a "UAL" (.ual file)?
12.23 Why does a Windows UAL file have a file extension of ".dll"?
12.24 What is "logging"?
12.25 What is an ".apd" file?
12.26 What can't I do if my executable or library doesn't have debug information?
12.27 Does use of on_line() requires application to be have debug information?
12.28 What is the maximum number of probes allowed?
12.29 Is there access to C++ private/protected variables?
12.30 Is there any way to attach with Aprobe to a running application?
12.31 Is there a way to probe a function for which no symbol is available?

13. Using the "aprobe" Command


13.1 What does "aprobe" do?
13.2 How do I specify options to my program when using aprobe?
13.3 How do I specify options to my probes?
13.4 How do I print my output at run time instead of sending to the APD file?
13.5 Can I suppress generating an ".apd" file?
13.6 How can I run my probes without invoking aprobe?
13.7 Windows: When I run my program with aprobe I don't get any output even though I know I'm executing routines with probes on them and those probes use printf. What's going on?
13.8 Windows: When I use the -o switch to redirect the output of my program to a file(s), the output seems to be out of order?
13.9 Unix: How do I probe a function in a dynamically-loaded shared library?
13.10 Can I probe a function in native C or C++ code loaded by a Java application?
13.11 Is there a way I can use Aprobe in a target environment where my application has no symbol or debug information with it (is stripped)?
13.12 Can I run aprobe but produce no APD files?
13.13 Why does my program crash when using aprobe, and not without?
13.14 AIX: Aprobe version 3.2 had the -s1 option to prevent conflicts with my application's shared memory. Is there a similar feature in version 4.2?
13.15 Why does Aprobe ask for such a large memory-mapped file on startup, when I've specified only a 4M APD file with "-s"?
13.16 On Solaris and/or Linux, when I run my application under Aprobe it crashes during initialization with a problem in malloc. This doesn't happen without Aprobe. Why?
13.17 Why can't RootCause see program symbols on a system using Visual Studio .NET 2003?

14. Using the "apformat" Command


14.1 What does apformat do?
14.2 Which of the ".apd" files do I specify on the command-line?
14.3 Can I restrict the apformat output to just that generated by one of the several UALs provided at aprobe time?
14.4 Can I restrict the apformat output to just that generated by one or two of my format routines?
14.5 Can I programmatically filter which formats are used?
14.6 Can I do the previous 2 if I'm using automatically generated formats?
14.7 When do I need to specify the UAL file to apformat?
14.8 Can I use "apformat" without an APD file?
14.9 Aprobe works fine, but I get a crash from apformat; why?
14.10 Can can I use ap_UalArgv in "probe format ... on_entry" to get arguments passed at run-time (aprobe time)?

15. Using Predefined Probes


15.1 What is a predefined probe?
15.2 Do I have to use "apc" to build these probes myself?
15.3 The examples show invocation of predefined probes using aprobe -u info myprog.exe. How does aprobe find these UALs when they're not in the current directory?
15.4 Can I use Coverage without using the Java configuration GUI?
15.5 The trace probe really slows down the program--how can I speed it up?
15.6 Unix: How can I get a snapshot of my predefined probe data before my program dumps core?
15.7 Is there a way to invoke predefined probe operations from within my probes?
15.8 How can my probes use the Java GUI facilities that the predefined probes use?
15.9 I'd like to customize a predefined probe -- how do I rebuild it?
15.10 How do I use the coverage probe with multiple test cases?
15.11 Where did the "heap" probe go?
15.12 How do I use this "events" probe everyone's talking about?
15.13 In the `profile' probe, what do "Calls to Self/Child" columns mean?
15.14 Why don't memstat, memwatch, heap probes work on my application?
15.15 Can you please explain the fields "Alloc Count" and "Free Count" in the memstat "Outstanding Allocation" report?
15.16 Can I use memstat to track all allocations and frees?
15.17 Is there a way to only report allocations in a certain module based on the stack traceback entries?
15.18 Is there a predefined probe for detecting memory corruption?
15.19 Is there a predefined probe for tracking down lock contentions?
15.20 What options in the trace.cfg file are obsolete, and why?
15.21 Why does the memstat summary file say it can't do the analysis because I only have one sample?
15.22 Could you explain the memstat summary's "Leaked Memory" and "Total Leakage" values?
15.23 How can I define a memstat (or memwatch) filter matching any number of call levels?
15.24 Is there a probe to check for stack corruption?

16. Using the "apc" Command


16.1 What does apc do?
16.2 How do I indicate what C compiler and options apc should use?
16.3 Do I need to specify an object file or executable to apc?
16.4 How do I specify other object files to link into my UAL?
16.5 apc says my function name's not known--why not?
16.6 Solaris: Where can I download a good gcc installation to use with RootCause?
16.7 How do I generate debug information for my APC files so line and function information show up in tracebacks?

17. Writing Probes in APC


17.1 How do I use "apcgen" to generate a probe automatically?
17.2 How do I write a "probe"?
17.3 What is the difference between APC and straight C?
17.4 Why do I need a "probe thread"?
17.5 What's the difference between "probe thread" and "probe program"?
17.6 When exactly are the "on_entry" and "on_exit" parts of a function probe executed?
17.7 Why can't I dump some parameters in the on_exit part?
17.8 Why is my local variable "unknown" in on_entry and on_exit parts?
17.9 Is there a way to probe "the first line" or "the last line" in my function?
17.10 How do I specify which of several overloaded functions I want to probe?
17.11 How do I reference a hardware register?
17.12 How do I query the parameters to a function?
17.13 Can I use automatic formatting if I don't have an executable with debug information?
17.14 How do I change the return value from a function?
17.15 How do I log the value of a string parameter?
17.16 How do I log the contents of an array?
17.17 Solaris: I get a compile error when I write "a[0..4]", but it seems to work; why?
17.18 How do I "stub out" the probed function so it does nothing?
17.19 How do I query the data in a class from when probing a member function?
17.20 How do I query a global (or static) variable when there's a local one of the same name?
17.21 Can I reference a static variable that wouldn't normally be visible to my probed function?
17.22 Can I call a function in my program from within a probe?
17.23 Windows: Can I call a Visual C++ method from a probe?
17.24 Can my APC files reference names in one another like a C program?
17.25 Can I call a function in another UAL?
17.26 How do I change the return code from my Unix program?
17.27 How do I print or change a GNAT Ada string value in my probe?
17.28 How can I just log some data and format it as hex?
17.29 How do I log information about each thread as it starts?
17.30 GNAT turns SIGSEGV into CONSTRAINT_ERROR; can I use Aprobe to get a core dump?
17.31 How can get I get Aprobe actions to happen when my program dumps core?
17.32 Is there a way to find out where a signal occurs when it doesn't cause a core dump?
17.33 How can I reduce the overhead of my probes?
17.34 Can I use Solaris Aprobe on JOVIAL programs?
17.35 How can a log a composite object without using debug information?
17.36 How can I cast a value to a type name from the program?
17.37 Is there a special editor or editor mode for APC?
17.38 How do I execute a probe only if a certain data condition is met?
17.39 How can I interactively modify the parameters to a routine in my application?
17.40 I'm trying to stub a function called by my program, but APC can't seem to find it.
17.41 Using Solaris GNAT, I want to send a signal to the program to control my probes. But the signal seems to get lost. Why?
17.42 I only want to probe malloc() if it's called by realloc(). How would I do that?
17.43 I have a GNAT Ada procedure that I'm stubbing out, but want to return a string value. The procedure has a declaration similar to the one below. What's the APC?
17.44 Is there a simple probe that just traces the lines in one routine?
17.45 How do I reference enumeration literals in APC?
17.46 Why does including <math.h> in my APC keep it from compiling? (I want to call the "pow()" function in my probe.)
17.47 Windows: How do I probe a function in a dynamically-loaded DLL?
17.48 How do I query an environment variable from with a probe?
17.49 The above looks like a useful utility. How can I structure my probes so it can be shared?
17.50 Can I define functions in one APC file and call them from another APC file?
17.51 I am trying to write an aprobe that will call an Ada routine in a package body, but the routine never seems to get called.Why?
17.52 How can I log a string passed to a library function like strdup() where there's no debug information?
17.53 Can I use Aprobe to change the command run by a call to system() from my application to run my own little script instead?
17.54 Is there a way to catch and suppress exceptions?
17.55 I'd like to probe routines in the Windows sockets DLLS. Any issues I should be aware of?
17.56 Can I track stack usage with Aprobe?
17.57 Is there a way to access local variables that doesn't depend on a hard-coded line number?
17.58 Can I use Aprobe to query a caller's local data that wouldn't be visible by normal visibility rules?
17.59 In APC I can reference some class members as fields of class objects, but others I cannot. Why?
17.60 How can I enable and disable probes externally while my program runs?
17.61 AIX: How do I convert my pre-version-3 APC file to current one?
17.62 (Unix) Is there a probe to see when my application "exec's" another program?
17.63 How can I cast an enumeration value to print its numeric value?
17.64 Is there a probe that will print a static call tree of my executable?
17.65 How can I detect memory overwrites on dynamically allocated (malloc'd) memory?
17.66 (Unix) How do I know when my application has forked?
17.67 How do I know what lines I can probe in a function?
17.68 (Windows) How can I track page faults using Aprobe/RootCause?
17.69 Is there a routine available to find symbol ids by mangled name, or one that will demangle for us?
17.70 Is there a way to suppress (or force) the warning when probing a symbol that is undefined?
17.71 Unix: Can I call a C++ method from a probe?
17.72 How do I print/change a C++ std::string object?

18. Writing Java Probes


18.1 How do I use Aprobe on a Java application?
18.2 Can I change the return value of a Java function?
18.3 Can I throw an arbitrary Java exception from my probe?
18.4 When using a Java custom probe, can I get output to appear in the Trace Display tree?
18.5 Is it possible to "stub" a Java method so it does not execute the code in the original method?
18.6 Is there any way to probe classes from rt.jar, e.g., java.io.*?
18.7 How do I call another method in the same class instance from within my Java method probe?
18.8 Can I add custom Java probes within the RootCause GUI?
18.9 (Windows) How would I trace a Java applet running with Internet Explorer (IEXPLORER process)?
18.10 Can I change the value of parameters passed to a Java method?
18.11 Can I log any Java variables other than method parameters?
18.12 Is there a way to define nested probes in Java similar to that supported in APC?

19. Logging Data


19.1 What's the difference between "logging" and "printing"?
19.2 Why do I get data mismatch warnings logging to my very simple format routine?
19.3 Why do my format routine parameters (usually) have to be pointers to the type logged?
19.4 How can I control the size of the APD file produced?
19.5 What is an "APD ring"?
19.6 How can I control what goes into each APD file?
19.7 How can I reduce the time that is spent logging data in my probes?
19.8 How can I log data so it's guaranteed to be available when I format, even if the APD ring wraps around?

20. Other Aprobe Questions


20.1 Where does aprobe get its "time" from (e.g., for the profile probe)?
20.2 Why do my threads execute in different order under aprobe?
20.3 It looks like if I run "aprobe -if", both the probe program and probe format get executed, which messes up initialization. How can I avoid this?
20.4 Solaris: I have a probe on_exit to a function to change the struct that is returned. It causes a core-dump when the probed function called as a procedure. What's the problem?
20.5 Windows: There is a parameter in a method call which is passed by reference. It is modified by the method and I want to see what it is on exit. Aprobe doesn't allow this, saying that parameters are visible only on entry. Is there a way to see how this value gets modified?
20.6 I want to capture the address of a target expression on entry in a pointer to the right target type. How do I declare this?
20.7 I want to probe a method in a template class. How do I refer to the method in the function probe on that method?
20.8 When I trace all the functions in my (Windows) DLL some functions appear to be entered twice, once with a name that has the string "?0" appended to it and once with the name I think it should have. What is going on?
20.9 Solaris: I'd like my probe to call a little C++ function which creates an object and invokes a method with that object. Can I do this?
20.10 Solaris: I use pathmap to tell dbx where to find my object & source files. How do I tell Aprobe where to find them?
20.11 In what order do separate probes on the same function probes execute?
20.12 Is it possible to reference C++ files from my application from within my UAL.
20.13 Solaris: Can I build a UAL with unresolved references?
20.14 How do I log multi-dimensional Ada arrays?
20.15 AIX: Why isn't my ual world readable?
20.16 AIX: When I use pthreads calls in my probes, the UAL won't link. Do I need to explicitly specify the library or change my compiler_profiles file?
20.17 Is there a way I can manage thread-specific data without using native thread-management routines?
20.18 How does using Aprobe for C++ differ from using Aprobe for C or Ada?
20.19 Why does my C++ application crash when run with Aprobe?
20.20 (AIX) My application + aprobe or its tools runs out of memory. What can I do?
20.21 My application + aprobe or its tools is very slow starting up. What can I do?
20.22 (AIX) Why is the C++ exception raised in my libxml++-1.0.a library not reported by exceptions.ual?
20.23 Why don't my on_line probes work?
20.24 How do I probe a C++ application's CPU usage?
20.25 How do I probe a C++ application's memory usage?
20.26 How can I interactively debug an application in real time?
20.27 How do I get the size of my "std::list<std::string>" object generated by g++?
20.28 What do I do if my program dumps core when run with Aprobe?

21. Licensing


21.1 What do we do with a license key that looks like "ocs-Aprobe-48833..."?
21.2 What do we do with a license key that looks like "FEATURE ..."?
21.3 Unix: How do I start a second license server just for Aprobe?
21.4 AIX: How do I start lmgrd when the machine boots?


1. RootCause FAQ

1.1 What is RootCause?

RootCause is a tool for developing and deploying traces that act as a software "flight recorder", simplifying and speeding root cause analysis, as well as proactively monitoring the health and performance of the application. It can also be used to repair applications in the operational environment without rebuilding or reinstalling the software. RootCause is based on Aprobe (see "What is Aprobe?" ) but steps beyond Aprobe in a number of important ways:

This FAQ addresses questions that apply to these aspects of RootCause. The full power of Aprobe is delivered with RootCause, and is addressed by the Aprobe FAQ.

See also "What is Aprobe?" .

1.2 What are some potential uses of RootCause?

It's a long list. Here are just some of the uses of RootCause:

For a more in-depth discussion of some of these, see the RootCause white papers .

1.3 How do I get started quickly with RootCause?

Do the Demos in chapter 5 of the User's Guide.

1.4 Who can use RootCause?

RootCause has several facets which apply to different classes of users. Technical support personnel will use it to gather information about a product in the field. Developers will use RootCause to develop traces that the support personnel can use, or which the developers themselves may use to track down problems. Testers might use it to gather data to provide back to developers to supplement test results.

1.5 For which platforms is RootCause available?

There is RootCause for Java and RootCause for C/C++ . Support for both languages may be enabled to support mixed applications.

RootCause for Java supports tracing J2EE applications such as Sun iPlanet and AS7 , BEA WebLogic, JBOSS, and Tomcat applications. See "RootCause J2EE Support" for more information.

RootCause is currently available on Windows 2000; Windows XP; Sun Solaris (Sparc only), AIX version 5.1 or newer; and Red Hat Linux 7.1 or newer (x86 only). RootCause does not yet support 64-bit applications on any platform, though it _does_ support 32-bit applications running on 64-bit operating systems.

The detailed requirements are documented in Chapter 2 of the RootCause User's Guide for Unix or Windows .

1.6 How do I get technical support?

The best way is to send e-mail to , or phone 703-359-8160, extension 3. You can expect a quick response between 9am and 5pm Eastern US Time.

1.7 Do I really need a C compiler to use RootCause?

Yes, in general, but the details differ between Unix and Windows:

Unix:   Only if you want to apply probes to native code. You can trace Java and native code, and dump Java parameters, without a C compiler. However, the only thing you can do with native code is trace it; you can't dump parameters or variables or generate probes (e.g., SNAPSHOT or COMMENT) because those are implemented by generating APC source code and then compiling it with Aprobe's apc compiler, which requires a C compiler backend.

Windows:   Everything for Unix above is true for Windows, plus: (a) the compiler must be Microsoft Visual C++; and
(b) if the program was compiled with Visual C++ 6 (or Visual Basic 6) it can't even be traced, because RootCause relies on a DLL that's part of those products which we're not allowed to distribute.

Starting with version 2.1.1 of RootCause you can trace Visual C++ (VC7) programs

For VC6(VB6) programs RootCause needs MSVC++ to be installed to provide the (non-redistributable) mechanism to read symbol information from PDBs. Without MSVC++ installed only symbol information stored in the executable or in DBG files can be read, plus the exports symbols.

In version 2.1.1 of RootCause an environment variable can be set to enable the use of the new mechanism to access symbol information in PDB files for VC6(VB6) programs. Set the environment variable APROBE_USE_DIA=1 to enable this (experimental) feature.

1.8 What documentation is available for RootCause?

RootCause is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are available for pre-sales evaluation.

1.9 How is RootCause licensed?

RootCause for C/C++, RootCause for Java, and the RootCause Agent (run-time) are licensed separately. Licensing is enforced on a per-user basis or per-CPU basis with FlexLM. Contact our sales department for more information at .

If you already have a license but it's not working for you, see "Licensing" or "How do I get technical support?"

1.10 In what language(s) can my program be written?

Explicit support is provided for C, C++ and Ada. Functions written in Assembler will work to the extent that they adhere to standard calling conventions.

Functions written in other high-level languages (e.g., Basic, Fortran, Pascal, JOVIAL) may also be probed if the probe doesn't reference source-level identifiers ("target expressions"). Contact if you have a favorite.

1.11 What compiler(s) must have been used to compile my program?

Almost any program with symbols can be probed. The "full support" described below is based on the debug information needed for source lines and target expressions. Support for additional architectures, operating systems and compilers is always in progress, so please contact if you don't see what you need here.

Windows

Aprobe supports the Microsoft Visual C++ development system versions 6 and 7

but does not support .NET (Dynamic Runtime Model) applications.

AIX

Aprobe supports any IBM C or C++ compiler that runs on AIX 4.2 or newer. There is partial support for gcc and g++ versions 2.95.x, and for gcc versions 3.x compiled with -gstabs+. If your program is Ada, Aprobe supports OC Systems' PowerAda, and (starting with version 4.4.2) GNATPro 5.04.

Solaris

The C and C++ compilers supported are Sun WorkShop C++ compiler versions 4.2 and higher (Forte) and gcc/g++ compilers before version 3. If your program is Ada, Aprobe requires GNAT version 3.15 or 3.16.

Linux

The C and C++ compilers supported on Linux are gcc and g++ versions 2.95.x and 3.x. See also Q1.14 . If your program is Ada, Aprobe supports only PowerAda on Linux and AIX. (GNAT is supported only on AIX and Solaris.)

1.12 Do I need to build the program with debug to trace it?

No, but for non-Java programs it helps. The suggested compromise is to build it with debug, develop your traces, then strip the debug information when shipping the product. This is fully discussed in Chapter 6 of RootCause for C++ User's Guide, "Building a Traceable Application".

1.13 What do these terms mean: probes, console, agent, logging, etc.?

RootCause has many unique features which require a unique terminology to describe. See the glossary in Chapter 3 of the user's guide for their definitions. Some basics are:

agent

The part of the RootCause product which actually applies and enables the probes, also known as the Aprobe runtime.

console

The Graphical User Interface (GUI) used for developing probes , and viewing the data logged by them.

log

verb : to efficiently record data into a memory-mapped file for later viewing.
noun : the RootCause log, a list of all programs run with "rootcause on".

probes

Programmatic actions to be inserted and executed at specific points in the probed application.

1.14 What about gcc/g++ 3.x (and GNAT 5.x) support?

gcc/g++ 3.x is fully supported on Linux.

Support for GNAT 5.x, and for gcc/g++ 3.x on other OSes is not currently scheduled.

1.15 Is there any way to attach with RootCause to a running application?

No. See "Is there any way to attach with Aprobe to a running application?" .

1.16 Why should I update to the current version of RootCause?

Full details are in the README file delivered with each version. Click here for a merged list of the features and fixes for each platform. If you find this has not been updated to correspond to the versions in www.aprobe.com/download/, feel free to .

1.17 What Java (JVM/JRE) versions are supported for use with RootCause?

We have provided support for older versions of Java for specific customers: please contact us if you have a specific need.

Some of our probes, most notably java_memstat make use of the JVMPI debugging interface, which has turned out to be unreliable in earlier versions, and which has been eliminated entirely in Java 1.6. See the Memstat documentation for a detailed description.

2. Installation

2.1 On Windows, what does the prompt "Is this an Agent Installation?" mean?

An "agent installation" is the installation of the "RootCause Agent", a small subset of the product that allows one to run probes developed using the RootCause Console.

Note that this prompt is gone starting with RootCause 2.1.1: the agent is now just a self-installing file %APROBE%\deploy\RootCauseAgent.exe.

2.2 On Unix, why does install_rootcause offer to install in a directory called "aprobe"?

RootCause is a superset of Aprobe, and in fact shares the same installation script. You can choose a different name if you like.

2.3 On Unix, I get prompted to specify whether I'm using Java or C++: why do you care?

Because probes on C/C++ (and Ada and other compiled languages) need to be compiled with a user-supplied C compiler, and the installation script has to know whether to check/prompt for that.

2.4 When the installation prompts for a compiler, does it want the one that builds my application?

No. RootCause for C/C++, like Aprobe, requires a C compiler to build the probes. This is not provided with RootCause because it's assumed customers have one. If you don't, gcc is fine, and OC Systems can help you download and install it.

2.5 The installation process prompts me for a license key, but I don't have one right now; can I continue?

Yes. Just enter an empty string, ignore the warnings you may get, and then put the license key into the file license.dat in the licenses directory under the RootCause installation directory before you start using RootCause. See also "Licensing".

2.6 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?

No. Leave it blank as in Q2.5 , and see Q21.2 .

2.7 Where do I find ksh (Korn shell) for RedHat Linux?

On RedHat, the Korn shell is provided by the pdksh package. This is on the install media, but not usually installed unless you install everything or specifically request it. The pdksh RPM can be downloaded from the RedHat ftp site. Choose the appropriate link for your version of the RedHat Distribution:

RedHat 7.2:
ftp://ftp.redhat.com/pub/redhat/linux/7.2/en/os/i386/RedHat/RPMS/pdksh-5.2.14-13.i386.rpm
RedHat 7.3:
ftp://ftp.redhat.com/pub/redhat/linux/7.3/en/os/i386/RedHat/RPMS/pdksh-5.2.14-16.i386.rpm
RedHat 8.0:
ftp://ftp.redhat.com/pub/redhat/linux/8.0/en/os/i386/RedHat/RPMS/pdksh-5.2.14-19.i386.rpm
RedHat 9:
ftp://ftp.redhat.com/pub/redhat/linux/9/en/os/i386/RedHat/RPMS/pdksh-5.2.14-21.i386.rpm

Note that Linux RootCause version 2.2.2 (Aprobe 4.4.2) no longer requires ksh to install: the install script is finally bash-compatible!.

3. The RootCause Console (GUI)

3.1 On Unix, why do I get a bunch of warnings about fonts when I do rootcause open ?

Because the RootCause Console interface is in Java, and the default selection of fonts does match what's in your X-windows font path. This problem usually only happens when using older (pre-8) versions of Solaris. See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide.

3.2 Why doesn't copy/paste of text fields work in the RootCause Console?

You must be using an older (pre-8) version of Solaris, which requires an older (pre 1.4) version of Java to be used, which doesn't directly support this. Same for default buttons on dialogs. Additionally, on Unix you will find that the 'Copy' operations from various RootCause windows such as Trace Events don't show up in your X-Windows clipboard.

See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide for details, but the quickest fix is to start the X-windows application "xclipboard". When you copy something to the clipboard from Java, it will appear in the xclipboard window. You can then select it there and middle-click to paste elsewhere.

3.3 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?

Just right-click farther up on the screen so there's room for the whole menu. The Java popup menu behavior is separate from the selection of the item on which it works. So once you've selected an item with a left -click (MB1), you can right-click anywhere in the window to see the context menu for that selected item.

3.4 Can I just use my Web Browser instead of the built-in Help Viewer?

Yes, you can point your browser (Netscape, Mozilla, Internet Explorer, etc.) to $APROBE/html/rcguihelp.html (where $APROBE is the value of the APROBE environment variable, the root of your RootCause installation.) However, the Help operations won't update that automatically -- you'll have to use your browser's Find operation.

However, note that Chapter 8 of the RootCause User's Guide is pretty much identical to the On-line help, and is cross-referenced with the rest of the user's guide (see Q1.8 ).

3.5 Can I run the RootCause GUI on Windows to view data collected my Unix system?

No. The RootCause Console must be run on the same kind of platform (AIX, Linux, Solaris, Windows) as that on which the data is collected, both for defining the trace and for viewing the data. The format of the deployed workspace and of the collected data is platform-specific.

3.6 Unix: The RootCause GUI is just about unusable with my eXceed and Reflection X Windows emulator. What can I do?

The problem is that these emulators just don't support Java well. There are some hints in the user guide but it's still not very usable. Our advice: use VNC. It's so much better in every way, and it's free. You may download both the client and server from RealVNC. These sites explain it better than we could here, but if you need assistance feel free to .

3.7 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?

Yes. The demo program that we beg everyone to do first shows exactly how to set this up and create a default workspace. There's one for Unix and one for Windows.

However, since you asked so nicely, here's what you do:

  1. Start the RC GUI.
  2. Turn RC on by clicking the checkbox at top (Windows) or
    Enter rootcause on in a window where you'll start your app.
  3. Run your Java program as you normally do.
  4. Examine the RC log (File->Open RootCause Log).
  5. Search near the bottom and find you Java program APP_START node. If you see two identical ones, choose the second.
  6. Click on it.
  7. Right-click to get context menu.
  8. Choose Open Associated Workspace.
  9. New Workspace Dialog should appear with information filled in so you just click OK.

4. The RootCause Log

4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?

Yes, you should be able to trace anything. If you find one that you cannot trace, please report it as a bug. However, most executables that are part of the system have no symbolic information, so you cannot see functions in the executable itself. You can get functions in shared libraries/DLLs that are loaded, and use the predefined UALs without symbols and debug information.

4.2 Why do I see two identical copies of a program in the RootCause Log?

Some programs like Java 1.4 and Netscape "fork and exec themselves" so these are distinct processes. You generally want the second one, since the first probably set up some things missing from the environment and then tried again.

4.3 Why don't I see the program I want to trace listed in the RootCause log?

There could be a number of reasons:

In all but the first case, you'll have to run the program again with "rootcause on" for it to show up in the RootCause log.

4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?

When you start a program, that may start a shell script. Korn shell, C shell and others can have associated "rc" files (e.g., ~/.kshrc , ~/.cshrc ), which run some commands. Then the script itself may run some commands to evaluate the environment. Then the program itself may start some processes (e.g,. by using CreateProcess() or system() ) to do some tasks. You can learn amazing stuff when you use RootCause even without tracing!

4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?

Yes, by turning verbose logging off. This is done on Windows with the DOS command

rootcause on quiet

and on Unix with:

rootcause register -s verbose -e off

Also, on Unix, you can set the environment variable APROBE_LD_AUDIT_VERBOSE=FALSE in a shell and it will disable logging of all commands started in that shell and its subshells. This trick is used by the rootcause_status script.

4.6 How do I clear the RootCause log?

There's currently no way to do this from the Console. From the command line: rootcause log -Z . Then do File->Refresh to see everything disappear.

4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?

Yes, it wraps so that it doesn't get huge. The default size is 100000 bytes. You can use the rootcause log -s command to query and change the size in bytes (there's no access to this from the Console). For example:

# show the log size:
rootcause log -s
100000
# set the log size to 20000 bytes:
rootcause log -s 20000

4.8 Can I locate my .rootcause directory somewhere other than HOME (or USERPROFILE)?

Yes, using the APROBE_HOME environment variable (supported starting with version 2.0.5). The value of this environment variable, if set, use used instead of the defaults (%USERPROFILE%\.rootcause on Windows, $HOME/.rootcause, .rootcause_aix, or .rootcause_linux on Unix). On Unix, this directory is where the RootCause Log and RootCause registry reside, so if you want these files accessible system-wide you should set APROBE_HOME to some central, writable location.

4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?

Yes. Edit the "preferences" file in your APROBE_HOME directory (see Q4.8)and change

<start_with_log value="true"/>

to
<start_with_log value="false"/>

5. The Workspace Window

5.1 Should I say Yes or No to the "Application is not registered with workspace" dialog?

You'll nearly always want to click Yes, which means "use this workspace to trace this application next time you run the application with RootCause on". You might click No if you don't want to trace that application with RootCause yet, or if you want to keep tracing it with a different workspace with which it's already registered. When in doubt click No: you can always use Workspace->Register Program to do it later.

5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?

It means that something has been changed or added that must be recorded when the workspace is saved. You can ignore it.

5.3 Where do I find out about the Predefined UALs listed here?

See Chapter 8 of the User's Guide, which fully describes the Console GUI. Also, look for a file in $APROBE/probes ( %APROBE%\probes on Windows) with the same name and suffix ".apc" and you'll see the details of its implementation. This doesn't apply to X.trace.ual, which is custom for each workspace.

6. The Trace Setup Dialog

6.1 What does <Unknown File> mean in the Trace Setup tree?

This means "Unknown Source File", probably because no debug information was found. Look in the Messages pane of the Workspace browser window for messages about debug information. You can still trace entry and exit to these functions, and can write custom probes that get data without using debug information.

6.2 What do the black and blue dots mean in the Trace Setup tree?

The dots are there to act as a "path" to help you find the traces and probes you've defined.

A black dot indicates an entry/exit trace of the marked function, method, file, class, or directory. Functions and methods marked with black dots are represented by equivalent entries in the Wildcards dialog, and are implemented by entries in the trace.cfg file in the workspace.

A blue dot indicates a probe or data trace in the marked function, method, file, directory, or class. These actions are not mapped to wildcards, and are implemented by compiled APC for C functions.

6.3 How do I trace a dynamically loaded shared library (DLL)?

You must add the library to the workspace, and then it will show up in the Trace Setup window. To do this, select Add Dynamic Module... from the Workspace menu. If the module changes, you must do Reset Dynamic Module .

6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?

"Don't Trace..." will remove the black dots from the subtree it applies to, meaning those methods and functions won't have their entry and exit traced. "Remove Probes..." will remove the blue dots, meaning specific Probe and Data logging actions will be removed.

6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?

The easiest way is to copy it into the workspace. You can also use Add UAL, and you'll need to do that if it takes parameters and other complications, but that's a bit more advanced: see Chapter 8 of the User's Guide or contact .

6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.

This should happen only on Unix. There, for improved usability (at a customer's request), functions whose names match certain patterns are filtered from the list. This list can be changed, replaced or nullified, though this is not documented.

The filtering is defined by the patterns in the file $APROBE/arca/trace_filters . See the commentary at the top of that file for complete information.

6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?

Could it be you set APROBE_SEARCH_PATH after you started the GUI? If so, quit RC and restart it so it can pick up the env var.

6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?

This is addressed in Chapter 10 of the RootCause User's Guide, under Libraries With No Debug Information. Here's a paraphrasing of that given by our support staff:

The easiest way is to create a ".h" file that contains prototypes for the functions that you want. RootCause will automatically compile and use the "debug information" in that file so, for example, you can see the parameters in the setup window of the Console or reference them by name in the custom apc that you write.

To do this:

  1. Put the prototypes (C, not C++) into a ".h" file and give the file the same name as the shared library (or executable) where the functions reside (for example if your executable was named a.out, then the .h file would be named a.out.h)
  2. Place the .h file in the local or global "shadow" directory, with the name of your executable or library plus ".h" on the end. For example, if your program were called t.exe then on Unix the global location is $APROBE/shadow/t.exe.h and the user-local one is $APROBE_HOME/shadow/t.exe.h. On Windows, this is as you would expect: %APROBE%\shadow\t.exe.h and the user-local one is %APROBE_HOME%\shadow\t.exe.h. See Question 4.8 about APROBE_HOME.

Placing the .h file in $APROBE/shadow would make it available for all invocations of RootCause, whereas the other two locations would be more user specific. Note that RootCause will search the directories in the opposite order of their listing above, so a.out.h in the .rootcause directory will be used instead of a.out.h in the $APROBE directory. (Analogous for Windows.)

You can see an example of this by doing a directory of the $APROBE/shadow/*.h (or %APROBE%\shadow\*.h). RootCause uses this feature to provide parameter information for some of the system shared libraries.

Make sure that you have a supported C compiler available, as this is needed to compile the .h files. (You may not have a supported C compiler if you installed RootCause as Java only and now want to do C probing; contact to add the C capability.)

6.9 How can I turn on trace just when I'm in a chosen method or function?

This is called a "Trigger" and has been a feature of the Aprobe-level trace all along. It was added as a Probes action in the Trace Setup dialog in version 2.1.3a (April 2004). It works like this:

  1. Apply Trace to all the functions and methods you want to trace, as usual.
  2. Select the function or method that is to be the "trigger".
  3. Click the Probes tab in the lower right pane.
  4. Check the On checkbox, then use the Probe Action dropdown menu to select Trigger Trace.
  5. Click Ok to apply and build your trace.

You should see the function or method to which you applied the Trigger action at the top of each traced call tree in your trace, and nothing outside of that (even if you selected it for tracing).

6.10 How can I enable my custom probe only when Trace is also enabled?

You can check whether trace is enabled with the ap_RootCauseTraceIsEnabled macro. For example:

         if (ap_RootCauseTraceIsEnabled)
         {
            printf ("Enabled\n");
         }
         else
         {
            printf ("Disabled\n");
         }

Disabling your probe independently from Trace is covered in the "Disable Probe" example (Windows: %APROBE%\Examples\Advanced\Disable_Probe; Unix: $APROBE/examples/learn/disable_probe).

6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?

You can't. This is deliberately designed to remain active even after trace is disabled. We do deliver source for the probes so that users can customize their behavior. In this case it would be a simple matter of putting the "if (RootCauseTraceIsEnabled)" check (see Q6.10) around the code in the "ExceptionHandler" routine within $APROBE/probes/exception.apc, recompiling it, and either using a local copy or overwriting $APROBE/ual_lib/exception.ual. (On Windows, %APROBE%\probes\exception.apc and %APROBE%\ual_lib\exception.dll, respectively.)

6.12 How can I trace and time everything between point A and point B?

  1. Create a workspace for the application (which you have already done).
  2. In the main window:
    • Enable the xxx.trace.ual (the first one).
    • Enable perf_cpu.
  3. Go to the trace setup dialog:
  4. Click on the program node (the very first one).
  5. In the probes tab, create a probe on program entry to disable tracing.
  6. In the left pane, click on the application module node (first 'M' icon).
  7. Right click and choose trace all.
  8. Find and select the point A function in the tree.
  9. In the probes tab, create a probe to enable tracing on entry.
  10. Find and select the point B function in the tree.
  11. In the probes, create a probe to disable tracing on exit.
  12. Click the Options... button to open the Trace Options dialog.
  13. Disable load shedding so you get everything.
  14. Click OK to build the workspace.
  15. Restart your application.

After you run through your test, format the APD files with Examine. The tree will reflect the trace path from point A to B. At the end is a summary call tree with call times in it. Or you can look at the performance table node (right click and choose show associated table) to see a table.

6.13 How can I allow all Java parameters to be traced?

To enable the Log All Parameters menu item, set and/or export the undocumented environment variable RC_ENABLED_LOG_ALL before starting the RootCause GUI.

7. The Trace Display (Event) Dialog

7.1 Why are some functions found in the traced Events not found in the Trace Setup?

There are two possibilities, but the most likely (on Solaris) is that the traced function is a compiler-generated one that is explicitly filtered from the Trace Setup list, but which is covered by the "wildcard" trace used when you do "Trace All Child Nodes" from the Trace Setup module node. See Q6.6 .

The other possibility is that the event was introduced by some other custom probe, such as a J2EE trace. See Q7.2 .

7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?

Probably because the events didn't originate in the Trace Setup, but were introduced by a supplementary J2EE trace. Still, you should be prompted to add the containing class, and so be able to define traces on it.

7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?

Yes, RootCause has a concept of a source file path. There are a number of ways to set this:

If you click on a method, the first time it will ask if you want to find the source. If you browse and select the source file, the enclosing path is automatically added to a list. If the end of the Java path matches the package path of the class, the "root" of the package path is added also.

You can edit the path directly off the RootCause Setup menu.

We'll pick up an environment variable APROBE_SEARCH_PATH when the RootCause Console starts.

7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?

Of course it depends on the vendor, but the best thing to do is to send them what you would want your customers to send you: text with as much pertinent information as possible. If the trace contains enough information for you to determine where the problem is, then the other piece of information they would want is the system configuration, as collected with logenv.ual.

To create the bug report, you could do File->Save As Text from the Trace Display window; then edit the resulting text file to include the program and system configuration and the tracebacks and execution information that identify the problem; then e-mail the result, indicating it was collected with RootCause. (They might have RootCause also, and ask you to re-run to collect additional information).

7.5 Unix: RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?

Yes. The JVM routinely uses signal 11 (perhaps for extending the stack) and signal 4 (illegal instruction -- not sure what that's for). These can show up in the trace and are fine. Later versions of the JVM provide options for reducing its use of signals; you can search java.sun.com for details.

7.6 When I trace a Java synchronized method, does the method time include lock delay time?

The JVM implements the synchronization on the calling side rather than on the callee side. Once you are inside the method's code, the lock has already been grabbed. This means that the time you see is after the synchronization.

For instance, I have a test that calls a synchronized method from a thread's run method:


try
{
   Thread.sleep (1000);
   parent.synchronizedMethod ();            // Line 15
}
catch (InterruptedException e)
{
   e.printStackTrace ();
}

If I trace lines and have things set up so another thread is within synchronizedMethod(), I see something like this:


Line 15                    10.45.00            ; Waiting ...
synchronizedMethod entry   10.46.00            ; Got it ...

7.7 Why was malloc() listed as being LOAD_SHED in the Trace Display when it really wasn't?

Because it was attempted to be load-shed, which recorded it as such, but the actual disabling of the probe was disabled by another UAL's explicit request, using #pragma nopatchcount.

The confusion comes from the fact that load shedding may mean two things:

  1. The patch for the subprogram is disabled (no more probes for this routine will get triggered);
  2. and
  3. This routine is no longer traced.

Since we don't want (1) to happen for allocation/deallocation routines when running memstat, these patches could not be disabled. This was indicated by using #pragma nopatchcount in combined_memstat.apc.

However, when traced these routines will get load shed just like everything else, and the LOAD_SHED event and appearance in the table indicate that (2) has happened. So this is pretty much "as designed".

If you explicitly mark the function as, "Do Not Shed", it will no longer show up in the table.

7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?

You are hitting the limit on the maximum number of items displayed in the trace display. You can either reduce the size of the APD files, reduce the number of APD files selected or increase the limit at the expense of longer processing times and higher memory overhead. I would try the last one first and if this works for you, great. The option is "Maximum number of events in Trace Display" and is described here. Briefly:

  1. Go to the RootCause Main window
  2. Open the Setup menu (not the button, but the pulldown menu)
  3. Select Options...
  4. Change the value of the option Maximum number of events in Trace Display (third from the bottom) to a higher value. A value of 2000000 (two million) is appropriate for processors with more than 128M of memory.

The values are recorded per-user, so must be set for each user in the user preferences file: $APROBE_HOME/preferences on Unix, %USERPROFILE%\preferences on Windows.

7.9 I see that I can do "Save As XML": can I view this XML later?

Yes, but only in RootCause (see below). It is not quite legal XML and so will be rejected by general XML viewers. (If you think this is an important feature, let us know.)

To import saved XML back into RootCause again, you have to set the environment variable RC_ENABLE_LOAD_XML to a nonempty value before starting the RootCause GUI. If you've done this, you will then see the menu item Examine XML File... in the Analyze menu in the RootCause Main menu. Clicking this menu item will open a file selection dialog from which you can select an XML file. This must be a file previously saved from RootCause Trace Display using File->Save As XML. When you click the Examine XML Output button in this dialog, you will then see a Trace Data Dialog in which one of the checkboxes is the name of your XML file. Check it, and click Open, to view the Trace Display.

7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?

Under the View menu, click Statistics Filter.... This dialog is used to create a "filtered" copy of the statistics summary tree. The copied tree will be added to the end of the event tree and will identify what filter was used. You specify a statistic to use (Wall time or CPU time, if collected) and a threshold percentage to create the "filtered" copy. A child node in the summary tree will only be copied to the new tree if the child's statistic value is at least the given percentage of the parent's statistic value. Choose "None" to create an exact copy. The threshold must be a numeric percentage between 0 and 100.

7.11 Do the times shown in Trace Events reflect the aprobe overhead?

No, these are actual times. You can specify overhead values by clicking View->Statistics Overhead. This opens the Set Statistics Overhead dialog. You'll see an options menu from which you can select the statistic to adjust, and type-in fields for the normal (native) call overhead and the Java overhead (which is generally bigger).

Note you must each statistic separately, for example:

When you've completed setting overhead values, you must regenerate the data:

7.12 How do I know what overhead to specify in the Set Statistics Overhead dialog?

As described in Q7.11, you can specify tracing overhead to be applied to times shown in the Trace Events details. But what number should you put in there? The answer depends on a number of factors, including your hardware and OS speed, whether you're dumping parameters, and whether it's Java or native code. A good guess is the minimum time you see in the entire tree for that kind of call, or if that seems to big, you can instrument some do-nothing function and see what its time is. This value would be the overhead for every call, and you can use that.

7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?

The nodes look like:

ENTER Factor::addWidgets()
  time = 2004-05-03 16:32:10.079965024
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java
  CPU Time 0.428844 ( 0.428844)
  Wall Time 0.552496 ( 0.552496)

 EXIT Factor::addWidgets()
  time = 2004-05-03 16:32:10.632461354
  elapsed time = 00:00:00.552496330
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java

The Details pane for each node gives the (wall) time at which the function or method was entered. In addition, any statistics that were being gathered are attached to the ENTER Node. Shown here are the elapsed CPU Time (gathered because the perf_cpu probe was enabled) and elapsed Wall Time. Both were computed on EXIT from this specific invocation. The EXIT node also shows the elapsed (wall) time, which is the same as the Wall Time statistic.

7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?

Consider the following node:

Java_Factor_smallestFactor()
  process = 15193, thread = 10 _start()
  symbol = extern:"Java_Factor_smallestFactor()" in "libFactorJNI.so", /work/JNI/factor.c
  Times called: 29
  Child calls (native/Java): 4190 / 0
  CPU Time (29):  1.248102 ( 1.298730) [99.753%]
    Max  :  1.231153 ( 1.274449)
    Min  :  0.000048 ( 0.000072)
    Avg  :  0.043038 ( 0.044783)
  Wall Time (29): 375.135004 (375.185632) [99.998%]
    Max  : 375.105686 (375.148982)
    Min  :  0.000043 ( 0.000067)
    Avg  : 12.935689 (12.937435)

Recall that each node in the Event Summary tree represents a unique call stack in the execution. The one shown above is for the native JNI function Java_Factor_smallestFactor() (see $APROBE/demo/RootCause/JNI).

The function was called 29 times. Those 29 calls together used 1.248102 seconds of CPU Time after overhead adjustment (See Q7.11.) The slightly larger time shown in parentheses after it (1.298730) is the "raw" time before the overhead adjustment. The percentage in brackets indicates that the total CPU time used for this function comprised 99.753% of the total time used by its caller, the parent node in the summary tree (See Q7.10 about filtering based on this percentage.). Of those 29 calls, the longest (Max) took 1.274449 seconds of CPU, the shortest (Min) took only 0.000072 seconds, and the average took 1.248102 / 29 = 0.043038 seconds of CPU.

7.15 Is there a way to save the text for a specific node in the Trace Events tree?

Yes. Click on a node to select it, then right-click to pop up the context menu, then click 'Save Node As Text' to save the selected node in a text file. This will save the node and its details exactly as it would appear in the 'File->Save As Text..' output. Note that it works only for one node, so if multiple nodes are selected it applies only to the first of those. See also the next question.

7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere?

Yes. In either the Events tree on the left, or the details in the lower left: Click on a node (or multiple nodes using shift or control keys in the usual way). Then right-click to pop up the context menu, then click 'Copy'. This will put the selected nodes in the Java clipboard. See Q3.2 for how to paste from the Java clipboard on Unix.

7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?

Probably because it was Load Shed. This means that it was called so often its tracing overhead became excessive and tracing was disabled for it during the run. It will appear in the Load Shed table, where you can choose to stop it from being Load Shed during the next run.

8. RootCause and Aprobe

8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?

You can't. The concept of levels is no longer supported. Instead you can apply a Trace Trigger, or disable and enable the trace using the probes tab for a given function.

8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?

These are not currently integrated with RootCause. If you can run them from the command-line using Aprobe you should do that. If you wish to use the "RootCause On" mechanism to run them using the workspace, you must add them to the workspace options using the "Setup->Add UAL" menu item. This adds a new UAL "permanently" to the Workspace UAL tree. For example, to add the "memwatch" probe, you would:

This adds "memwatch" to the UAL tree in the Workspace window. You could then check this to enable memwatch on applications run under RootCause. The output of these probes isn't integrated with RootCause, so the output appears as a "Text" node in the Trace Display event tree. You can use "Save As Text" from that display to view it outside of RootCause.

Prior to RootCause version 1.3.3, you would reference these probes using the Aprobe options and Apformat options dialogs (see Chapter 8 of the user's guide), just as you would on the Aprobe command-line. For example, to enable memwatch, you would add -u memwatch -p -g as "Additional Aprobe Options" (under Aprobe options in the Execute menu in the Workspace window) and -u memwatch in the Apformat options (under the Analyze menu). For probes like profile that require configuration files, you would have to put the full pathname of the configuration file into the options as well, like -u profile -p -c /testdisk/probes/prog1.profile.cfg .

8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?

Yes, but this is not explicitly supported. In particular, most operations from the RootCause Console overwrite the scripts in the workspace which apply Aprobe to the application. So after you use the Console to create a workspace, you quit, and edit the aprobe.ksh and apformat.ksh scripts (do_aprobe.cmd and do_apformat.cmd on Windows) directly to apply your probes.

8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?

Aprobe supports Java with the apjava command. Writing custom probes in Java is described in Chapter 11 of the RootCause for Java User's Guide and the nearly-identical Chapter 5 of the Aprobe User's Guide for Unix and Windows , and if you really wanted to you could do everything from the command line.

8.5 How do I add my own UAL to the RootCause trace?

There are three ways of adding a UAL to a trace:

  1. Update the predefined_uals file in ual_lib to add it for all workspaces. It will show up in the list in the workspace when you do that.
  2. Use the Add Ual option on the setup menu - this will also cause it to show up in the list.
  3. Copy it into the workspace. It will not show up in the list because it's not until runtime that we look in the directory to see what other UALs are present.

Personally I like option b, choosing not to copy the UAL to the workspace. This makes it easy to enable / disable from the GUI.

8.6 How can I use the Events probe with RootCause?

The events probe is not integrated with RootCause Trace Display, but you can still use it. Here's a quick way to get started, by simply applying events to all Java methods and all native functions in the main module (if any), and letting load shedding reduce overhead.

  1. cp $APROBE/probes/events.cfg MyWorkspace.aws
  2. echo ';event function "*"' >> MyWorkspace.aws/events.cfg
  3. echo 'event function "*::*"' in $java$ >> MyWorkspace.aws/events.cfg
  4. Workspace->AddUal: add events.ual and specify the following aprobe parameter:
   -c $RC_WORKSPACE_LOC/events.cfg
  1. Keep the trace.ual enabled with load shedding on, but don't specify any traces (this would load shed low level events)
  2. Run the application
  3. From the command line, use
  rootcause format -r MyWorkspace.aws > format.txt

Your results are in format.txt. You can then edit the events.cfg file to do more, as shown in Q15.12 , and you can specify an alternate output file so you get the events output while still formatting within RootCause.

 

9. RootCause at Run Time

9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer (Windows XP)? I was thinking that it would be interesting to see all the processes as my computer boots.

Yes, you can leave RootCause on all the time. It takes effect on reboot about the time when per-user preferences get loaded, or when you get prompted for your login id. Check the System event log (run "eventvwr") to get more exact information.

9.2 How much will RootCause slow my application?

This depends almost entirely on what you do with it. If you trace almost nothing, it will introduce almost no overhead. If you trace every method call on your machine, it will slow things down too much. The keys to good performance are:

9.3 On Solaris, why do I get "illegal insecure pathname" with RootCause on?

You need to copy or soft-link the RootCause "libapaudit.so" library to a "secure pathname" as described in Chapter 10 of the RootCause User's Guide, "RootCause, SETUID, and Security Concerns".

9.4 Solaris: Why do I get warnings from running `ps' when rootcause is enabled, and how can I fix it?

If you're seeing messages like:

ld.so.1: mail: warning: /opt/aprobe/lib/libapaudit.so: open failed: illegal insecure pathname
ld.so.1: mail: fatal: /opt/aprobe/lib/libapaudit.so: audit initialization failure: disabled.

Then the application you're running (like "mail" above, or "ps") has its setuid bit set and is owned by root. Solaris prevents dynamically loading debug libraries on such applications for security reasons. Here's what to do:

  1. As root, run rootcause setup and then run:
  rootcause_libpath -c
  rootcause_off
  rootcause_on
  ps

If you still get warnings, you're probably on an early patch level of Solaris 8. Do:

export LD_AUDIT_64 ; LD_AUDIT_64=/usr/lib/secure/64/libapaudit.so

If that still doesn't work, contact OC Systems. Details about probing secure applications on Solaris is documented in Chapter 10 of the latest Unix RootCause User's Guide.

9.5 Solaris: How do I find out the full path of dynamic modules loaded?

There's no built-in mechanism. It's harder than you think. Here's some custom APC (for Solaris only) that you could compile into a UAL, add to your workspace, and see the modules:

#include <alloca.h>
#include <link.h>

typedef struct
{
   ap_NameT  ModuleName;
   ap_Uint32 StartAddress;
   ap_Uint32 Length;
} DynamicModuleDataT, *DynamicModuleDataPtrT;

static void *ModuleKeyGet (void *S)
{
   return (void *) ((DynamicModuleDataPtrT) S)->ModuleName;
}

static ap_BooleanT ModuleKeyCompare (void *LeftKey, void *RightKey)
{
   return (strcmp ((ap_NameT) LeftKey, (ap_NameT) RightKey) == 0);
}

static DECLARE_HASH (DynamicModuleTable,
                     ap_StringHashFunction,
                     ModuleKeyGet,
                     ModuleKeyCompare);
                     

#if defined(__SunOS_5_5_1)
extern int dlinfo (void *handle, int request, void *p);
#endif

typedef ap_Uint32 (*FindElfSymbolT) (ap_NameT SymbolName, ap_NameT ModuleName);

static int NextModuleId;

static void DynamicModuleFormat (ap_NameT   Filename,
                                 ap_Uint32 *StartAddress,
                                 ap_Uint32 *Length)
{
   ap_RootCausePrintEventStart ("program_comment");
   printf ("Module loaded: %s\n   Address span 0x%08x-0x%08x\n",
           Filename,
           *StartAddress,
           *StartAddress + *Length);
   ap_RootCausePrintEventEnd ("program_comment");
}

static void RecordDynamicModule (ap_NameT Filename, void *Handle)
{
   ap_ModuleIdT          ModuleId;
   static FindElfSymbolT FindElfSymbolRoutine = NULL;

   ModuleId = ap_ModuleNameToId (Filename);
   if (ap_IsNoModuleId (ModuleId))
   {
      DynamicModuleDataPtrT  DynamicModulePtr;
      Link_map              *Linkmap;
      
      // Get the info for this.
      if (dlinfo (Handle, RTLD_DI_LINKMAP, &Linkmap) == -1 ||
          Linkmap == NULL)
      {
         ap_Error (ap_WarningSev,
                   "Cannot not loader info for %s",
                   Filename);
         return;
      }
         
      // Is it in the dynamic table already?
      DynamicModulePtr = (DynamicModuleDataPtrT)
         ap_HashTableLookup (&DynamicModuleTable, (void *) Linkmap->l_name);

      if (DynamicModulePtr == NULL)
      {
         ap_Uint32     ModuleSize;
         ap_ModuleIdT  NewModuleId;
         ap_NameT      ModuleName;
         ap_NameT      ModuleBaseName;
         char         *DotSoLocation;
         int           Dummy = 0;
         
         // Find our internal FindElfSymbol routine.
         if (FindElfSymbolRoutine == NULL)
         {
            FindElfSymbolRoutine = (FindElfSymbolT)
               ap_SymbolAddress
               (ap_SymbolNameToId (ap_ModuleNameToId ("libaprobe.so"),
                                   "FindElfSymbol()",
                                   ap_ExternSymbol,
                                   ap_FunctionSymbol));
            if (FindElfSymbolRoutine == NULL)
            {
               ap_Error (ap_FatalSev,
                         "Cannot find FindElfSymbol");
            }
         }

         // Add it to the table.
         DynamicModulePtr =
            (DynamicModuleDataPtrT) ap_Malloc (sizeof (DynamicModuleDataT));
         DynamicModulePtr->ModuleName = ap_StrDup (Linkmap->l_name);
         DynamicModulePtr->StartAddress = (ap_Uint32) Linkmap->l_addr;
         DynamicModulePtr->Length = FindElfSymbolRoutine ("_end",
                                                          Linkmap->l_name);
         ap_HashTableInsert (&DynamicModuleTable,
                             (void *) DynamicModulePtr);

         // Record it
         log (ap_StringValue (Linkmap->l_name),
              DynamicModulePtr->StartAddress,
              DynamicModulePtr->Length)
           with DynamicModuleFormat to ap_PersistentLogMethod;

         // Now log it for the format logic to find
         NewModuleId.Value = ap_FetchAndAdd (&NextModuleId, 1);
         ModuleBaseName = ap_Basename (Linkmap->l_name);
         ModuleName = strcpy (alloca (strlen (ModuleBaseName) + 1),
                              ModuleBaseName);
         DotSoLocation = strstr (ModuleName, ".so");
         if (DotSoLocation)
         {
            *(DotSoLocation + 3) = `\0';
         }
         ap_LogData (ap_IntegerToLogId (LOG_ID_FOR_FORMAT_RECORD_MODULE),
                     8,
                     &NewModuleId,
                     sizeof (NewModuleId),
                     ModuleName,
                     strlen (ModuleName) + 1,
                     &(DynamicModulePtr->StartAddress),
                     sizeof (DynamicModulePtr->StartAddress),
                     &(DynamicModulePtr->Length),
                     sizeof (DynamicModulePtr->Length),
                     &Dummy,
                     sizeof (Dummy),
                     &Dummy,
                     sizeof (Dummy),
                     Linkmap->l_name,
                     strlen (Linkmap->l_name) + 1,
                     ap_NoName,
                     strlen (ap_NoName) + 1);
      }
   }
}

probe thread
{
   probe extern:"dlopen()" in "ld.so"
   {
      ap_NameT Filename = (ap_NameT) $1;
      
      on_exit
      {
         if (!ap_IsNoName (Filename) && $return != 0)
         {
            RecordDynamicModule (Filename, (void *) $return);
         }
      }
   }
   probe extern:"dlmopen()" in "ld.so"
   {
      ap_NameT Filename = (ap_NameT) $2;
      
      on_exit
      {
         if (!ap_IsNoName (Filename) && $return != 0)
         {
            RecordDynamicModule (Filename, (void *) $return);
         }
      }
   }
}

probe program
{
   on_entry
   {
      // Record the number of static modules
      NextModuleId = ap_NumberOfModules ();
   }
}

9.6 How can I trace Linux daemons with RootCause?

The following steps should allow you to use RootCause to trace activity in several of the daemons on your Linux system:

Background

RootCause keeps a log file and a registry as defined by the APROBE_LOG and APROBE_REGISTRY environment variables. These are generally set on a per-user basis by the Aprobe setup script, based on the user's $HOME environment variable or on the environment variable APROBE_HOME if that's defined. The default location for these files is a hidden directory under a users home directory called ".rootcause". When RootCause intercepts a program that is starting up it looks in the user's registry to see if this program should be instrumented. If so, there will be an associated workspace file named in the registry. By changing the APROBE_HOME environment variable before running setup, you can change the locations of the log and registry. Note that these files have to be writable by all processes that access them.

Daemons like sshd are started on Linux using a shell (bash) script located in /etc/init.d . For sshd the file is /etc/init.d/sshd . If you edit this file you will see a subroutine named "start". Not surprisingly it is this subroutine that we want to add a few statements to setup RootCause to intercept sshd .

Details

  1. Create a RootCause workspace to trace sshd :

We recommend that you create your workspace on a disk local to the machine that will be running the intercepted program on. Create it in the same way we did today, that is using the "new" pulldown menu on the main RootCause screen.

  • Verify the location of your log and registry files:
  • These files are probably in $HOME/.linux_rootcause . They are named: "registry" and "rclog". You can specify a different location using the APROBE_HOME environment variable (see Q4.8 ) but be sure to run "setup" after setting APROBE_HOME and make sure the protections of the resulting files are correct.

  • Back up your /etc/init.d/sshd script.
  • You should probably make a copy of the sshd file before you modify it so you can restore it when you are finished tracing sshd.

  • Modify the /etc/init.d/sshd script to setup aprobe:
  • Find the start subroutine in the /etc/init.d/sshd file and insert the following four lines after the "do_dsa_keygen" line:

      export APROBE_HOME=directory identified in step 2
    . aprobe_root
    /aprobe/setup
      . $APROBE/bin/rootcause_enable
    1. Stop and restart the sshd daemon.

    As root and with your current directory as /etc/init.d execute

      sshd stop
      sshd start

    You should see a stopped message from the stop and some output indicating that rootcause has started from the start message. You may get a "FAILED" message from the start. On our system even when we get the failure message the daemon seems to start with no problems. So I think you can ignore this message.

    Tracing the libcrypt.so library was interesting, you can really see the ssh protocol flow as it generates keys and such.

    The technique outlined above should work for many of the daemons on Linux.

    9.7 Solaris: How do I apply RootCause to applications run at boot time?

    Once you've used Aprobe to investigate the behavior of processes on a running machine, there is nothing particularly complicated about doing the same for system processes while the machine boots, but there are a number of special factors to take into account. These are listed below, and an example given of how we applied these to one of our machines.

    The techniques described here were tested on a Solaris 6 box, but should be equally applicable to more current installations.

    1. Any time you make your own modifications to a system's startup procedures, there is a risk that you may make the system unbootable. We'll try to point out the pitfalls, but as with any procedures like this you should be prepared to recover the system from maintenance mode or even to reinstall the OS.
    2. At startup, system resources you may want to rely on may not be available. Make sure your RootCause installation is not on remote disks, and even for local installations, check that the filesystems used for the installation and for logging are available at the expected point during the boot process. If you want to get in at the start of Runlevel 2, the only filesystems typically available at that point are "/" and "/var", which may not have enough free space to support installation and logging.
    3. Startup scripts are run with /sbin/sh, which does not provide all the features you may be accustomed to with ksh, although it is very close for most purposes. Where possible, test scripts by running them under /sbin/sh before adding them to the boot process.
    4. For the test I just performed, I chose to monitor processes started as the system enters Runlevel 3, which starts NFS server processes, among others. At this point, all local filesystems are mounted, so I had no problem finding space for an installation, but many potentially 'interesting' services had already been launched.
    5. The libapaudit.so shared library needs to be installed in a secure location. With root authority, run:
    6.   . /opt/aprobe/setup
        rootcause_libpath -c
    7. The startup procedure for a given Runlevel is determined by a script, " /sbin/rcN ". The execution of these scripts is described in /etc/rcN.d/README , for N = 2 or 3. Since RootCause depends on an environment being defined, we need to 'source' some scripts into this command so the environment is defined when servers and daemons are started. I did this by creating files in /etc/rc3.d. If you look at the README and /sbin/rc3 script, you should see how this works.
    8. You will need to perform three steps to enable RootCause intercept in the rc driver. We will accomplish this by creating three files in the /etc/rc3.d directory.
      • /etc/rc3.d/K00RootCauseLocal.sh
      • Defines the APROBE_HOME environment variable where the logs and registry are stored:

        APROBE_HOME=/opt/aprobe_home
        export APROBE_HOME
        
      • /etc/rc3.d/K01RootCause.sh
      • Is a soft link to the setup script in the RootCause installation directory:

        ln -s  /opt/aprobe/setup /etc/rc3.d/K01RootCause.sh
        
      • /etc/rc3.d/K02RootCause.sh
      • contains the command to enable intercept:

        . rootcause_enable

      Normally, scripts whose names start with 'K' are used to shut down processes before others are started, but we will take advantage of the fact that these are executed first to ensure that the RootCause setup is performed before anything else.

    9. All that is required now is to reboot the machine, then login as root, define APROBE_HOME, source the installation setup script, and start the RootCause GUI. The event viewer should show you what processes were launched.

    9.8 Windows: Can RootCause trace System Services which are automatically started when the machine is booted?

    Yes. However, there are a couple unique things about tracing System Services that you need to keep in mind: