RootCause/Aprobe FAQ

Frequently Asked Questions for RootCause and Aprobe (All Platforms)
Updated July 15, 2007

This document describes aspects of the products "RootCause" and "Aprobe" from OC Systems, Inc. (www.ocsystems.com):

It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.

More complete and detailed descriptions of RootCause and Aprobe are provided by the User's Guides for those products, but this FAQ may provide answers not easily found there, and also includes specific code examples not applicable to a general User's Guide.

RootCause is built on Aprobe, a fully general mechanism for applying patches to programs without changing source or object code. See "What is Aprobe?" for more information.

Users are encouraged to send questions (and answers!) to .

This FAQ is Copyright (c) 2007 by OC Systems, Inc. ALL RIGHTS RESERVED.

Note to Windows Users:

This FAQ applies to all platforms, and some answers apply only to specific platform, so read carefully. To avoid excessive repetition, the Unix form of a command or path is used where it may apply to multiple targets. For example, paths to files are given in Unix format using forward slashes, environment variables use Unix format, and Windows users should read .dll where filenames end in .ual (see Q12.23 ).

1. RootCause FAQ


1.1 What is RootCause?
1.2 What are some potential uses of RootCause?
1.3 How do I get started quickly with RootCause?
1.4 Who can use RootCause?
1.5 For which platforms is RootCause available?
1.6 How do I get technical support?
1.7 Do I really need a C compiler to use RootCause?
1.8 What documentation is available for RootCause?
1.9 How is RootCause licensed?
1.10 In what language(s) can my program be written?
1.11 What compiler(s) must have been used to compile my program?
1.12 Do I need to build the program with debug to trace it?
1.13 What do these terms mean: probes, console, agent, logging, etc.?
1.14 What about gcc/g++ 3.x (and GNAT 5.x) support?
1.15 Is there any way to attach with RootCause to a running application?
1.16 Why should I update to the current version of RootCause?
1.17 What Java (JVM/JRE) versions are supported for use with RootCause?

2. Installation


2.1 On Windows, what does the prompt "Is this an Agent Installation?" mean?
2.2 On Unix, why does install_rootcause offer to install in a directory called "aprobe"?
2.3 On Unix, I get prompted to specify whether I'm using Java or C++: why do you care?
2.4 When the installation prompts for a compiler, does it want the one that builds my application?
2.5 The installation process prompts me for a license key, but I don't have one right now; can I continue?
2.6 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?
2.7 Where do I find ksh (Korn shell) for RedHat Linux?

3. The RootCause Console (GUI)


3.1 On Unix, why do I get a bunch of warnings about fonts when I do rootcause open?
3.2 Why doesn't copy/paste of text fields work in the RootCause Console?
3.3 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?
3.4 Can I just use my Web Browser instead of the built-in Help Viewer?
3.5 Can I run the RootCause GUI on Windows to view data collected my Unix system?
3.6 Unix: The RootCause GUI is just about unusable with my eXceed and Reflection X Windows emulator. What can I do?
3.7 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?

4. The RootCause Log


4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?
4.2 Why do I see two identical copies of a program in the RootCause Log?
4.3 Why don't I see the program I want to trace listed in the RootCause log?
4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?
4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?
4.6 How do I clear the RootCause log?
4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?
4.8 Can I locate my .rootcause directory somewhere other than HOME (or USERPROFILE)?
4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?

5. The Workspace Window


5.1 Should I say Yes or No to "Application is not registered with workspace" dialog?
5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?
5.3 Where do I find out about the Predefined UALs listed here?

6. The Trace Setup Dialog


6.1 What does <Unknown File> mean in the Trace Setup tree?
6.2 What do the black and blue dots mean in the Trace Setup tree?
6.3 How do I trace a dynamically loaded shared library (DLL)?
6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?
6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?
6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.
6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?
6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?
6.9 How can I turn on trace just when I'm in a chosen method or function?
6.10 How can I enable my custom probe only when Trace is also enabled?
6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?
6.12 How can I trace and time everything between point A and point B?
6.13 How can I allow all Java parameters to be traced?

7. The Trace Display (Event) Dialog


7.1 Why are some functions found in the traced Events not found in the Trace Setup?
7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?
7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?
7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?
7.5 Unix: RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?
7.6 When I trace a Java synchronized method, does the method time include lock delay time?
7.7 Why was malloc() listed as being LOAD_SHED in the Trace Display when it really wasn't?
7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?
7.9 I see that I can do "Save As XML": can I view this XML later?
7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?
7.11 Do the times shown in Trace Events reflect the aprobe overhead?
7.12 How do I know what overhead to specify in the Set Statistics Overhead dialog?
7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?
7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?
7.15 Is there a way to save the text for a specific node in the Trace Events tree?
7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere?
7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?

8. RootCause and Aprobe


8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?
8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?
8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?
8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?
8.5 How do I add my own UAL to the RootCause trace?
8.6 How can I use the Events probe with RootCause?

9. RootCause at Run Time


9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer (Windows XP)? I was thinking that it would be interesting to see all the processes as my computer boots.
9.2 How much will RootCause slow my application?
9.3 On Solaris, why do I get "illegal insecure pathname" with RootCause on?
9.4 Solaris: Why do I get warnings from running `ps' when rootcause is enabled, and how can I fix it.
9.5 Solaris: How do I find out the full path of dynamic modules loaded?
9.6 How can I trace Linux daemons with RootCause?
9.7 Solaris: How do I apply RootCause to applications run at boot time?
9.8 Windows: Can RootCause trace System Services which are automatically started when the machine is booted?
9.9 Windows: Why are there no APP_START events in the RootCause Log for System Services?
9.10 Windows: I created a WorkSpace and defined a trace for a System Service that is automatically started each time my machine is booted. However, I don't see an APP_TRACED event in the RootCause Log when I reboot. Why?
9.11. Windows: I've defined a trace for a System Service, but each time the Service starts it is not traced. Instead, I see a TEXT event in the RootCause log reporting the following apparent error: `(W) ... Cannot execute ...\do_aprobe.cmd'
9.12. Can I apply different workspaces (or none at all) for the same program invoked with different command-line parameters?
9.13. How can I "intercept" a Java server on AIX?

10. RootCause J2EE Support


10.1 What J2EE application servers are supported?
10.2 How does one create a workspace for a standalone Java app-server like WebLogic?
10.3 How does one create a workspace for a native executable app-server like iPlanet?
10.4 How do I configure RootCause with Sun iPlanet Application Server 6.5 on Solaris?
10.5 How do I configure RootCause with BEA WebLogic Application Server 6.1 on Solaris or Windows?
10.6 How do I configure RootCause with JBoss 3 on Solaris or Windows?
10.7 How do I set up for tracing TomCat?
10.8 How do I configure an app-server not listed here?
10.9 How do I set up a workspace for "jrun" on Linux?
10.10 How do I set up RootCause for use with IBM Websphere AppServer 4.0.5 Application running on Solaris?
10.11 How can I dump Java objects with a probe on a known program point, rather than at a certain elapsed time as done by java_memstat?

11. RootCause TroubleShooting


11.1 I applied a Trace on function (method) in the RootCause GUI, but I don't see it being called in the output. Why?
11.2 I add a library as a dynamic module and trace the init function, but the trace doesn't show up. Why?
11.3 Solaris: I'm trying to trace file open calls, so I trace "open()" in "libc.so", but I get nothing. Why?
11.4 Solaris: I Add Dynamic Module of mylib.so, then specify some traces in mylib.so. But when I run the program, those traces don't appear. Why?
11.5 I did Custom..., and saved my probes to an APC file, but those probes don't show up in my trace. Why?
11.6 Windows: Why doesn't the DOS dir command show my workspace, pi_demo.aws?
11.7 Windows: Why can't I see RootCause Workspaces in the Windows Explorer?
11.8 Windows: Something happened during Uninstalling the previous version, now I can't install the new one. What do I do?
11.9 How do I stop tracing something I've got a workspace for?
11.10 What do I do about the message "(E) ADI checksum (0x84b1c4d) does not match module checksum (0xa1c5e35)." when I register on a .dply file at a remote site?
11.11 (Windows) Why won't the "RootCause On" button stay checked in the GUI?
11.12 Why does my Java app fail with "Class Not Found" under RootCause, but work fine without RootCause?
11.13 How can I probe Java classes loaded with a custom class loader and so not in the CLASSPATH?
11.14 When I have "rootcause on" I sometimes notice that commands piped together (for instance "env|grep MyVariable") can hang for a while before completing. Why is this?
11.15 (Windows) Trying to apply RootCause to a service, I get MessageBox (after a reboot) saying there was a timeout and the Service failed to respond. Why?
11.16 When I add my library to the workspace with Add Dynamic Module and run with RootCause, my application never starts. What's wrong and how can I fix it?
11.17 (Windows) The APC compiler fails on the giant APC file generated with apcgen. Now what?
11.18 (Windows) Why does our application crash due to a bad return code from CoInitializeSecurity() when running under RootCause?
11.19 Is there a way to add my own files to a deploy file so they will unpack into the directory created by rootcause register xxx.dply?
11.20 Why doesn't the pi_demo program doesn't run Linux Fedora Core 3?
11.21 Why didn't my trace on Linux didn't log any data?
11.22 How can I eliminate "WARNING: Could not create system preferences directory" when I start the RootCause GUI?

12. Aprobe FAQ


12.1 What is Aprobe?
12.2 What is ProbePak?
12.3 What are some potential uses of Aprobe?
12.4 How do I get started quickly with Aprobe?
12.5 Who can use Aprobe?
12.6 What different versions of Aprobe are there?
12.7 For which platforms is Aprobe available?
12.8 How do I get Aprobe?
12.9 What documentation is available for Aprobe?
12.10 What tools make up Aprobe?
12.11 How is Aprobe licensed?
12.12 Is there a point-and-click (GUI) interface to Aprobe?
12.13 Can I run Aprobe on any executable program file?
12.14 In what language(s) can my program be written?
12.15 What compiler(s) must have been used to compile my program?
12.16 (Unix) How do I tell if a program file is "stripped"?
12.17 How do I tell what symbols a program has available?
12.18 What do I do to get symbols in my program?
12.19 What do I do to get "debug information" in my program?
12.20 How do I tell if a program file has "debug information"?
12.21 What is a "probe"?
12.22 What is a "UAL" (.ual file)?
12.23 Why does a Windows UAL file have a file extension of ".dll"?
12.24 What is "logging"?
12.25 What is an ".apd" file?
12.26 What can't I do if my executable or library doesn't have debug information?
12.27 Does use of on_line() requires application to be have debug information?
12.28 What is the maximum number of probes allowed?
12.29 Is there access to C++ private/protected variables?
12.30 Is there any way to attach with Aprobe to a running application?
12.31 Is there a way to probe a function for which no symbol is available?

13. Using the "aprobe" Command


13.1 What does "aprobe" do?
13.2 How do I specify options to my program when using aprobe?
13.3 How do I specify options to my probes?
13.4 How do I print my output at run time instead of sending to the APD file?
13.5 Can I suppress generating an ".apd" file?
13.6 How can I run my probes without invoking aprobe?
13.7 Windows: When I run my program with aprobe I don't get any output even though I know I'm executing routines with probes on them and those probes use printf. What's going on?
13.8 Windows: When I use the -o switch to redirect the output of my program to a file(s), the output seems to be out of order?
13.9 Unix: How do I probe a function in a dynamically-loaded shared library?
13.10 Can I probe a function in native C or C++ code loaded by a Java application?
13.11 Is there a way I can use Aprobe in a target environment where my application has no symbol or debug information with it (is stripped)?
13.12 Can I run aprobe but produce no APD files?
13.13 Why does my program crash when using aprobe, and not without?
13.14 AIX: Aprobe version 3.2 had the -s1 option to prevent conflicts with my application's shared memory. Is there a similar feature in version 4.3?
13.15 Why does Aprobe ask for such a large memory-mapped file on startup, when I've specified only a 4M APD file with "-s"?
13.16 On Solaris and/or Linux, when I run my application under Aprobe it crashes during initialization with a problem in malloc. This doesn't happen without Aprobe. Why?
13.17 Why can't RootCause see program symbols on a system using Visual Studio .NET 2003?

14. Using the "apformat" Command


14.1 What does apformat do?
14.2 Which of the ".apd" files do I specify on the command-line?
14.3 Can I restrict the apformat output to just that generated by one of the several UALs provided at aprobe time?
14.4 Can I restrict the apformat output to just that generated by one or two of my format routines?
14.5 Can I programmatically filter which formats are used?
14.6 Can I do the previous 2 if I'm using automatically generated formats?
14.7 When do I need to specify the UAL file to apformat?
14.8 Can I use "apformat" without an APD file?
14.9 Aprobe works fine, but I get a crash from apformat; why?
14.10 Can can I use ap_UalArgv in "probe format ... on_entry" to get arguments passed at run-time (aprobe time)?

15. Using Predefined Probes


15.1 What is a predefined probe?
15.2 Do I have to use "apc" to build these probes myself?
15.3 The examples show invocation of predefined probes using aprobe -u info myprog.exe. How does aprobe find these UALs when they're not in the current directory?
15.4 Can I use Coverage without using the Java configuration GUI?
15.5 The trace probe really slows down the program--how can I speed it up?
15.6 Unix: How can I get a snapshot of my predefined probe data before my program dumps core?
15.7 Is there a way to invoke predefined probe operations from within my probes?
15.8 How can my probes use the Java GUI facilities that the predefined probes use?
15.9 I'd like to customize a predefined probe -- how do I rebuild it?
15.10 How do I use the coverage probe with multiple test cases?
15.11 Where did the "heap" probe go?
15.12 How do I use this "events" probe everyone's talking about?
15.13 In the `profile' probe, what do "Calls to Self/Child" columns mean?
15.14 Why don't memstat, memwatch, heap probes work on my application?
15.15 Can you please explain the fields "Alloc Count" and "Free Count" in the memstat "Outstanding Allocation" report?
15.16 Can I use memstat to track all allocations and frees?
15.17 Is there a way to only report allocations in a certain module based on the stack traceback entries?
15.18 Is there a predefined probe for detecting memory corruption?
15.19 Is there a predefined probe for tracking down lock contentions?
15.20 What options in the trace.cfg file are obsolete, and why?
15.21 Why does the memstat summary file say it can't do the analysis because I only have one sample?
15.22 Could you explain the memstat summary's "Leaked Memory" and "Total Leakage" values?
15.23 How can I define a memstat (or memwatch) filter matching any number of call levels?
15.24 Is there a probe to check for stack corruption?

16. Using the "apc" Command


16.1 What does apc do?
16.2 How do I indicate what C compiler and options apc should use?
16.3 Do I need to specify an object file or executable to apc?
16.4 How do I specify other object files to link into my UAL?
16.5 apc says my function name's not known--why not?
16.6 Solaris: Where can I download a good gcc installation to use with RootCause?
16.7 How do I generate debug information for my APC files so line and function information show up in tracebacks?

17. Writing Probes in APC


17.1 How do I use "apcgen" to generate a probe automatically?
17.2 How do I write a "probe"?
17.3 What is the difference between APC and straight C?
17.4 Why do I need a "probe thread"?
17.5 What's the difference between "probe thread" and "probe program"?
17.6 When exactly are the "on_entry" and "on_exit" parts of a function probe executed?
17.7 Why can't I dump some parameters in the on_exit part?
17.8 Why is my local variable "unknown" in on_entry and on_exit parts?
17.9 Is there a way to probe "the first line" or "the last line" in my function?
17.10 How do I specify which of several overloaded functions I want to probe?
17.11 How do I reference a hardware register?
17.12 How do I query the parameters to a function?
17.13 Can I use automatic formatting if I don't have an executable with debug information?
17.14 How do I change the return value from a function?
17.15 How do I log the value of a string parameter?
17.16 How do I log the contents of an array?
17.17 Solaris: I get a compile error when I write "a[0..4]", but it seems to work; why?
17.18 How do I "stub out" the probed function so it does nothing?
17.19 How do I query the data in a class from when probing a member function?
17.20 How do I query a global (or static) variable when there's a local one of the same name?
17.21 Can I reference a static variable that wouldn't normally be visible to my probed function?
17.22 Can I call a function in my program from within a probe?
17.23 Windows: Can I call a Visual C++ method from a probe?
17.24 Can my APC files reference names in one another like a C program?
17.25 Can I call a function in another UAL?
17.26 How do I change the return code from my Unix program?
17.27 How do I print or change a GNAT Ada string value in my probe?
17.28 How can I just log some data and format it as hex?
17.29 How do I log information about each thread as it starts?
17.30 GNAT turns SIGSEGV into CONSTRAINT_ERROR; can I use Aprobe to get a core dump?
17.31 How can get I get Aprobe actions to happen when my program dumps core?
17.32 Is there a way to find out where a signal occurs when it doesn't cause a core dump?
17.33 How can I reduce the overhead of my probes?
17.34 Can I use Solaris Aprobe on JOVIAL programs?
17.35 How can a log a composite object without using debug information?
17.36 How can I cast a value to a type name from the program?
17.37 Is there a special editor or editor mode for APC?
17.38 How do I execute a probe only if a certain data condition is met?
17.39 How can I interactively modify the parameters to a routine in my application?
17.40 I'm trying to stub a function called by my program, but APC can't seem to find it.
17.41 Using Solaris GNAT, I want to send a signal to the program to control my probes. But the signal seems to get lost. Why?
17.42 I only want to probe malloc() if it's called by realloc(). How would I do that?
17.43 I have a GNAT Ada procedure that I'm stubbing out, but want to return a string value. The procedure has a declaration similar to the one below. What's the APC?
17.44 Is there a simple probe that just traces the lines in one routine?
17.45 How do I reference enumeration literals in APC?
17.46 Why does including <math.h> in my APC keep it from compiling? (I want to call the "pow()" function in my probe.)
17.47 Windows: How do I probe a function in a dynamically-loaded DLL?
17.48 How do I query an environment variable from with a probe?
17.49 The above looks like a useful utility. How can I structure my probes so it can be shared?
17.50 Can I define functions in one APC file and call them from another APC file?
17.51 I am trying to write an aprobe that will call an Ada routine in a package body, but the routine never seems to get called.Why?
17.52 How can I log a string passed to a library function like strdup() where there's no debug information?
17.53 Can I use Aprobe to change the command run by a call to system() from my application to run my own little script instead?
17.54 Is there a way to catch and suppress exceptions?
17.55 I'd like to probe routines in the Windows sockets DLLS. Any issues I should be aware of?
17.56 Can I track stack usage with Aprobe?
17.57 Is there a way to access local variables that doesn't depend on a hard-coded line number?
17.58 Can I use Aprobe to query a caller's local data that wouldn't be visible by normal visibility rules?
17.59 In APC I can reference some class members as fields of class objects, but others I cannot. Why?
17.60 How can I enable and disable probes externally while my program runs?
17.61 AIX: How do I convert my pre-version-3 APC file to current one?
17.62 (Unix) Is there a probe to see when my application "exec's" another program?
17.63 How can I cast an enumeration value to print its numeric value?
17.64 Is there a probe that will print a static call tree of my executable?
17.65 How can I detect memory overwrites on dynamically allocated (malloc'd) memory?
17.66 (Unix) How do I know when my application has forked?
17.67 How do I know what lines I can probe in a function?
17.68 (Windows) How can I track page faults using Aprobe/RootCause?
17.69 Is there a routine available to find symbol ids by mangled name, or one that will demangle for us?
17.70 Is there a way to suppress (or force) the warning when probing a symbol that is undefined?
17.71 Unix: Can I call a C++ method from a probe?
17.72 How do I print/change a C++ std::string object?

18. Writing Java Probes


18.1 How do I use Aprobe on a Java application?
18.2 Can I change the return value of a Java function?
18.3 Can I throw an arbitrary Java exception from my probe?
18.4 When using a Java custom probe, can I get output to appear in the Trace Display tree?
18.5 Is it possible to "stub" a Java method so it does not execute the code in the original method?
18.6 Is there any way to probe classes from rt.jar, e.g., java.io.*?
18.7 How do I call another method in the same class instance from within my Java method probe?
18.8 Can I add custom Java probes within the RootCause GUI?
18.9 (Windows) How would I trace a Java applet running with Internet Explorer (IEXPLORER process)?
18.10 Can I change the value of parameters passed to a Java method?
18.11 Can I log any Java variables other than method parameters?
18.12 Is there a way to define nested probes in Java similar to that supported in APC?

19. Logging Data


19.1 What's the difference between "logging" and "printing"?
19.2 Why do I get data mismatch warnings logging to my very simple format routine?
19.3 Why do my format routine parameters (usually) have to be pointers to the type logged?
19.4 How can I control the size of the APD file produced?
19.5 What is an "APD ring"?
19.6 How can I control what goes into each APD file?
19.7 How can I reduce the time that is spent logging data in my probes?
19.8 How can I log data so it's guaranteed to be available when I format, even if the APD ring wraps around?

20. Other Aprobe Questions


20.1 Where does aprobe get its "time" from (e.g., for the profile probe)?
20.2 Why do my threads execute in different order under aprobe?
20.3 It looks like if I run "aprobe -if", both the probe program and probe format get executed, which messes up initialization. How can I avoid this?
20.4 Solaris: I have a probe on_exit to a function to change the struct that is returned. It causes a core-dump when the probed function called as a procedure. What's the problem?
20.5 Windows: There is a parameter in a method call which is passed by reference. It is modified by the method and I want to see what it is on exit. Aprobe doesn't allow this, saying that parameters are visible only on entry. Is there a way to see how this value gets modified?
20.6 I want to capture the address of a target expression on entry in a pointer to the right target type. How do I declare this?
20.7 I want to probe a method in a template class. How do I refer to the method in the function probe on that method?
20.8 When I trace all the functions in my (Windows) DLL some functions appear to be entered twice, once with a name that has the string "?0" appended to it and once with the name I think it should have. What is going on?
20.9 Solaris: I'd like my probe to call a little C++ function which creates an object and invokes a method with that object. Can I do this?
20.10 Solaris: I use pathmap to tell dbx where to find my object & source files. How do I tell Aprobe where to find them?
20.11 In what order do separate probes on the same function probes execute?
20.12 Is it possible to reference C++ files from my application from within my UAL.
20.13 Solaris: Can I build a UAL with unresolved references?
20.14 How do I log multi-dimensional Ada arrays?
20.15 AIX: Why isn't my ual world readable?
20.16 AIX: When I use pthreads calls in my probes, the UAL won't link. Do I need to explicitly specify the library or change my compiler_profiles file?
20.17 Is there a way I can manage thread-specific data without using native thread-management routines?
20.18 How does using Aprobe for C++ differ from using Aprobe for C or Ada?
20.19 Why does my C++ application crash when run with Aprobe?
20.20 (AIX) My application + aprobe or its tools runs out of memory. What can I do?
20.21 My application + aprobe or its tools is very slow starting up. What can I do?
20.22 (AIX) Why is the C++ exception raised in my libxml++-1.0.a library not reported by exceptions.ual?
20.23 Why don't my on_line probes work?
20.24 How do I probe a C++ application's CPU usage?
20.25 How do I probe a C++ application's memory usage?
20.26 How can I interactively debug an application in real time?
20.27 How do I get the size of my "std::list<std::string>" object generated by g++?
20.28 What do I do if my program dumps core when run with Aprobe?

21. Licensing


21.1 What do we do with a license key that looks like "ocs-Aprobe-48833..."?
21.2 What do we do with a license key that looks like "FEATURE ..."?
21.3 Unix: How do I start a second license server just for Aprobe?
21.4 AIX: How do I start lmgrd when the machine boots?


1. RootCause FAQ

1.1 What is RootCause?

RootCause is a tool for developing and deploying traces that act as a software "flight recorder", simplifying and speeding root cause analysis, as well as proactively monitoring the health and performance of the application. It can also be used to repair applications in the operational environment without rebuilding or reinstalling the software. RootCause is based on Aprobe (see "What is Aprobe?" ) but steps beyond Aprobe in a number of important ways:

This FAQ addresses questions that apply to these aspects of RootCause. The full power of Aprobe is delivered with RootCause, and is addressed by the Aprobe FAQ.

See also "What is Aprobe?" .

1.2 What are some potential uses of RootCause?

It's a long list. Here are just some of the uses of RootCause:

For a more in-depth discussion of some of these, see the RootCause white papers .

1.3 How do I get started quickly with RootCause?

Do the Demos in chapter 5 of the User's Guide.

1.4 Who can use RootCause?

RootCause has several facets which apply to different classes of users. Technical support personnel will use it to gather information about a product in the field. Developers will use RootCause to develop traces that the support personnel can use, or which the developers themselves may use to track down problems. Testers might use it to gather data to provide back to developers to supplement test results.

1.5 For which platforms is RootCause available?

There is RootCause for Java and RootCause for C/C++ . Support for both languages may be enabled to support mixed applications.

RootCause for Java supports tracing J2EE applications AIX, Linux, Windows and Solaris, but what you really want for Linux and Windows Enterprise Java applications is RootCause Transaction Instrumentation (RTI).

RootCause is currently availble for 32- and 64-bit versions of AIX and Linux/Intel. See RootCause System Requirements for details.

Rootcause for 32-bit Windows (2000 and XP) and Sun Solaris (Sparc only) are stil available but are no longer in active development.

1.6 How do I get technical support?

The best way is to send e-mail to , or phone 703-359-8160, extension 3. You can expect a quick response between 9am and 5pm Eastern US Time.

1.7 Do I really need a C compiler to use RootCause?

Yes, in general, but the details differ between Unix and Windows:

Unix:   Only if you want to apply probes to native code. You can trace Java and native code, and dump Java parameters, without a C compiler. However, the only thing you can do with native code is trace it; you can't dump parameters or variables or generate probes (e.g., SNAPSHOT or COMMENT) because those are implemented by generating APC source code and then compiling it with Aprobe's apc compiler, which requires a C compiler backend.

Windows:   Everything for Unix above is true for Windows, plus: (a) the compiler must be Microsoft Visual C++; and
(b) if the program was compiled with Visual C++ 6 (or Visual Basic 6) it can't even be traced, because RootCause relies on a DLL that's part of those products which we're not allowed to distribute.

Starting with version 2.1.1 of RootCause you can trace Visual C++ (VC7) programs

For VC6(VB6) programs RootCause needs MSVC++ to be installed to provide the (non-redistributable) mechanism to read symbol information from PDBs. Without MSVC++ installed only symbol information stored in the executable or in DBG files can be read, plus the exports symbols.

In version 2.1.1 of RootCause an environment variable can be set to enable the use of the new mechanism to access symbol information in PDB files for VC6(VB6) programs. Set the environment variable APROBE_USE_DIA=1 to enable this (experimental) feature.

1.8 What documentation is available for RootCause?

RootCause is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are available for pre-sales evaluation.

1.9 How is RootCause licensed?

RootCause for C/C++, RootCause for Java, and the RootCause Agent (run-time) are licensed separately. Licensing is enforced on a per-user basis or per-CPU basis with FlexLM. Contact our sales department for more information at .

If you already have a license but it's not working for you, see "Licensing" or "How do I get technical support?"

1.10 In what language(s) can my program be written?

Explicit support is provided for C, C++ and Ada. Functions written in Assembler will work to the extent that they adhere to standard calling conventions.

Functions written in other high-level languages (e.g., Basic, Fortran, Pascal, JOVIAL) may also be probed if the probe doesn't reference source-level identifiers ("target expressions"). Contact if you have a favorite.

1.11 What compiler(s) must have been used to compile my program?

Almost any program with symbols can be probed. The "full support" described below is based on the debug information needed for source lines and target expressions. Support for additional architectures, operating systems and compilers is always in progress, so please contact if you don't see what you need here.

AIX

Aprobe supports any IBM C or C++ compiler that runs on AIX 5.2 or newer. If your program is Ada, Aprobe supports OC Systems' PowerAda. 5.04.

Linux

The C and C++ compilers supported on Linux are gcc and g++ versions 2.95.x through 4.4.3 See also Q1.14 . If your program is Ada, Aprobe supports only PowerAda on Linux and AIX. (GNAT 3.x is supported on Solaris.)

Windows

Aprobe supports the Microsoft Visual C++ development system versions 6 and 7

but does not support .NET (Dynamic Runtime Model) applications.

NOTE: Aprobe/RootCause for Windows is no longer in development. The last version released was 2.1.4b/4.3.4b in 2005.

Solaris

The C and C++ compilers supported are Sun WorkShop C++ compiler versions 4.2 and higher (Forte) and gcc/g++ compilers before version 3. If your program is Ada, Aprobe requires GNAT version 3.15 or 3.16.

NOTE: Aprobe/RootCause for Solaris is no longer in development. The last version released was 2.1.4b/4.3.4b in 2005.

1.12 Do I need to build the program with debug to trace it?

No, but for non-Java programs it helps. The suggested compromise is to build it with debug, develop your traces, then strip the debug information when shipping the product. This is fully discussed in Chapter 6 of RootCause for C++ User's Guide, "Building a Traceable Application".

1.13 What do these terms mean: probes, console, agent, logging, etc.?

RootCause has many unique features which require a unique terminology to describe. See the glossary in Chapter 3 of the user's guide for their definitions. Some basics are:

agent

The part of the RootCause product which actually applies and enables the probes, also known as the Aprobe runtime.

console

The Graphical User Interface (GUI) used for developing probes , and viewing the data logged by them.

log

verb : to efficiently record data into a memory-mapped file for later viewing.
noun : the RootCause log, a list of all programs run with "rootcause on".

probes

Programmatic actions to be inserted and executed at specific points in the probed application.

1.14 What about gcc/g++ 3.x (and GNAT 5.x) support?

RootCause version 2.2.3 (Aprobe 4.4.3) for Linux supports gcc/g++ 3.x though 4.4.3. gcc 2.95 and 3.x is supported on AIX, but g++ is not. GNAT 3.x is supported by the last Solaris version, otherwise GNAT is not longer supported.

1.15 Is there any way to attach with RootCause to a running application?

No. See "Is there any way to attach with Aprobe to a running application?" .

1.16 Why should I update to the current version of RootCause?

Full details are in the README file delivered with each version. Click here for a merged list of the features and fixes for each platform. If you find this has not been updated to correspond to the versions in www.aprobe.com/download/, feel free to .

1.17 What Java (JVM/JRE) versions are supported for use with RootCause?

We have provided support for older versions of Java for specific customers: please contact us if you have a specific need.

Some of our probes, most notably java_memstat make use of the JVMPI debugging interface, which has turned out to be unreliable in earlier versions, and which has been eliminated entirely in Java 1.6. See the Memstat documentation for a detailed description.

2. Installation

2.1 On Windows, what does the prompt "Is this an Agent Installation?" mean?

An "agent installation" is the installation of the "RootCause Agent", a small subset of the product that allows one to run probes developed using the RootCause Console.

Note that this prompt is gone starting with RootCause 2.1.1: the agent is now just a self-installing file %APROBE%\deploy\RootCauseAgent.exe.

2.2 On Unix, why does install_rootcause offer to install in a directory called "aprobe"?

RootCause is a superset of Aprobe, and in fact shares the same installation script. You can choose a different name if you like.

2.3 On Unix, I get prompted to specify whether I'm using Java or C++: why do you care?

Because probes on C/C++ (and Ada and other compiled languages) need to be compiled with a user-supplied C compiler, and the installation script has to know whether to check/prompt for that.

2.4 When the installation prompts for a compiler, does it want the one that builds my application?

No. RootCause for C/C++, like Aprobe, requires a C compiler to build the probes. This is not provided with RootCause because it's assumed customers have one. If you don't, gcc is fine, and OC Systems can help you download and install it.

2.5 The installation process prompts me for a license key, but I don't have one right now; can I continue?

Yes. Just enter an empty string, ignore the warnings you may get, and then put the license key into the file license.dat in the licenses directory under the RootCause installation directory before you start using RootCause. See also "Licensing".

2.6 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?

No. Leave it blank as in Q2.5 , and see Q21.2 .

2.7 Where do I find ksh (Korn shell) for RedHat Linux?

On RedHat, the Korn shell is provided by the pdksh package. This is on the install media, but not usually installed unless you install everything or specifically request it. The pdksh RPM can be downloaded from the RedHat ftp site. Choose the appropriate link for your version of the RedHat Distribution:

RedHat 7.2:
ftp://ftp.redhat.com/pub/redhat/linux/7.2/en/os/i386/RedHat/RPMS/pdksh-5.2.14-13.i386.rpm
RedHat 7.3:
ftp://ftp.redhat.com/pub/redhat/linux/7.3/en/os/i386/RedHat/RPMS/pdksh-5.2.14-16.i386.rpm
RedHat 8.0:
ftp://ftp.redhat.com/pub/redhat/linux/8.0/en/os/i386/RedHat/RPMS/pdksh-5.2.14-19.i386.rpm
RedHat 9:
ftp://ftp.redhat.com/pub/redhat/linux/9/en/os/i386/RedHat/RPMS/pdksh-5.2.14-21.i386.rpm

Note that Linux RootCause version 2.2.2 (Aprobe 4.4.2) no longer requires ksh to install: the install script is finally bash-compatible!.

3. The RootCause Console (GUI)

3.1 On Unix, why do I get a bunch of warnings about fonts when I do rootcause open ?

Because the RootCause Console interface is in Java, and the default selection of fonts does match what's in your X-windows font path. This problem usually only happens when using older (pre-8) versions of Solaris. See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide.

3.2 Why doesn't copy/paste of text fields work in the RootCause Console?

You must be using an older (pre-8) version of Solaris, which requires an older (pre 1.4) version of Java to be used, which doesn't directly support this. Same for default buttons on dialogs. Additionally, on Unix you will find that the 'Copy' operations from various RootCause windows such as Trace Events don't show up in your X-Windows clipboard.

See the section entitled Platform-Specific GUI Issues in Chapter 8 of the RootCause User's Guide for details, but the quickest fix is to start the X-windows application "xclipboard". When you copy something to the clipboard from Java, it will appear in the xclipboard window. You can then select it there and middle-click to paste elsewhere.

3.3 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?

Just right-click farther up on the screen so there's room for the whole menu. The Java popup menu behavior is separate from the selection of the item on which it works. So once you've selected an item with a left -click (MB1), you can right-click anywhere in the window to see the context menu for that selected item.

3.4 Can I just use my Web Browser instead of the built-in Help Viewer?

Yes, you can point your browser (Netscape, Mozilla, Internet Explorer, etc.) to $APROBE/html/rcguihelp.html (where $APROBE is the value of the APROBE environment variable, the root of your RootCause installation.) However, the Help operations won't update that automatically -- you'll have to use your browser's Find operation.

However, note that Chapter 8 of the RootCause User's Guide is pretty much identical to the On-line help, and is cross-referenced with the rest of the user's guide (see Q1.8 ).

3.5 Can I run the RootCause GUI on Windows to view data collected my Unix system?

No. The RootCause Console must be run on the same kind of platform (AIX, Linux, Solaris, Windows) as that on which the data is collected, both for defining the trace and for viewing the data. The format of the deployed workspace and of the collected data is platform-specific.

3.6 Unix: The RootCause GUI is just about unusable with my eXceed and Reflection X Windows emulator. What can I do?

The problem is that these emulators just don't support Java well. There are some hints in the user guide but it's still not very usable. Our advice: use VNC. It's so much better in every way, and it's free. You may download both the client and server from RealVNC. These sites explain it better than we could here, but if you need assistance feel free to .

3.7 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?

Yes. The demo program that we beg everyone to do first shows exactly how to set this up and create a default workspace. There's one for Unix and one for Windows.

However, since you asked so nicely, here's what you do:

  1. Start the RC GUI.
  2. Turn RC on by clicking the checkbox at top (Windows) or
    Enter rootcause on in a window where you'll start your app.
  3. Run your Java program as you normally do.
  4. Examine the RC log (File->Open RootCause Log).
  5. Search near the bottom and find you Java program APP_START node. If you see two identical ones, choose the second.
  6. Click on it.
  7. Right-click to get context menu.
  8. Choose Open Associated Workspace.
  9. New Workspace Dialog should appear with information filled in so you just click OK.

4. The RootCause Log

4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?

Yes, you should be able to trace anything. If you find one that you cannot trace, please report it as a bug. However, most executables that are part of the system have no symbolic information, so you cannot see functions in the executable itself. You can get functions in shared libraries/DLLs that are loaded, and use the predefined UALs without symbols and debug information.

4.2 Why do I see two identical copies of a program in the RootCause Log?

Some programs like Java 1.4 and Netscape "fork and exec themselves" so these are distinct processes. You generally want the second one, since the first probably set up some things missing from the environment and then tried again.

4.3 Why don't I see the program I want to trace listed in the RootCause log?

There could be a number of reasons:

In all but the first case, you'll have to run the program again with "rootcause on" for it to show up in the RootCause log.

4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?

When you start a program, that may start a shell script. Korn shell, C shell and others can have associated "rc" files (e.g., ~/.kshrc , ~/.cshrc ), which run some commands. Then the script itself may run some commands to evaluate the environment. Then the program itself may start some processes (e.g,. by using CreateProcess() or system() ) to do some tasks. You can learn amazing stuff when you use RootCause even without tracing!

4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?

Yes, by turning verbose logging off. This is done on Windows with the DOS command

rootcause on quiet

and on Unix with:

rootcause register -s verbose -e off

Also, on Unix, you can set the environment variable APROBE_LD_AUDIT_VERBOSE=FALSE in a shell and it will disable logging of all commands started in that shell and its subshells. This trick is used by the rootcause_status script.

4.6 How do I clear the RootCause log?

There's currently no way to do this from the Console. From the command line: rootcause log -Z . Then do File->Refresh to see everything disappear.

4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?

Yes, it wraps so that it doesn't get huge. The default size is 100000 bytes. You can use the rootcause log -s command to query and change the size in bytes (there's no access to this from the Console). For example:

# show the log size:
rootcause log -s
100000
# set the log size to 20000 bytes:
rootcause log -s 20000

4.8 Can I locate my .rootcause directory somewhere other than HOME (or USERPROFILE)?

Yes, using the APROBE_HOME environment variable (supported starting with version 2.0.5). The value of this environment variable, if set, use used instead of the defaults (%USERPROFILE%\.rootcause on Windows, $HOME/.rootcause, .rootcause_aix, or .rootcause_linux on Unix). On Unix, this directory is where the RootCause Log and RootCause registry reside, so if you want these files accessible system-wide you should set APROBE_HOME to some central, writable location.

4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?

Yes. Edit the "preferences" file in your APROBE_HOME directory (see Q4.8)and change

<start_with_log value="true"/>

to
<start_with_log value="false"/>

5. The Workspace Window

5.1 Should I say Yes or No to the "Application is not registered with workspace" dialog?

You'll nearly always want to click Yes, which means "use this workspace to trace this application next time you run the application with RootCause on". You might click No if you don't want to trace that application with RootCause yet, or if you want to keep tracing it with a different workspace with which it's already registered. When in doubt click No: you can always use Workspace->Register Program to do it later.

5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?

It means that something has been changed or added that must be recorded when the workspace is saved. You can ignore it.

5.3 Where do I find out about the Predefined UALs listed here?

See Chapter 8 of the User's Guide, which fully describes the Console GUI. Also, look for a file in $APROBE/probes ( %APROBE%\probes on Windows) with the same name and suffix ".apc" and you'll see the details of its implementation. This doesn't apply to X.trace.ual, which is custom for each workspace.

6. The Trace Setup Dialog

6.1 What does <Unknown File> mean in the Trace Setup tree?

This means "Unknown Source File", probably because no debug information was found. Look in the Messages pane of the Workspace browser window for messages about debug information. You can still trace entry and exit to these functions, and can write custom probes that get data without using debug information.

6.2 What do the black and blue dots mean in the Trace Setup tree?

The dots are there to act as a "path" to help you find the traces and probes you've defined.

A black dot indicates an entry/exit trace of the marked function, method, file, class, or directory. Functions and methods marked with black dots are represented by equivalent entries in the Wildcards dialog, and are implemented by entries in the trace.cfg file in the workspace.

A blue dot indicates a probe or data trace in the marked function, method, file, directory, or class. These actions are not mapped to wildcards, and are implemented by compiled APC for C functions.

6.3 How do I trace a dynamically loaded shared library (DLL)?

You must add the library to the workspace, and then it will show up in the Trace Setup window. To do this, select Add Dynamic Module... from the Workspace menu. If the module changes, you must do Reset Dynamic Module .

6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?

"Don't Trace..." will remove the black dots from the subtree it applies to, meaning those methods and functions won't have their entry and exit traced. "Remove Probes..." will remove the blue dots, meaning specific Probe and Data logging actions will be removed.

6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?

The easiest way is to copy it into the workspace. You can also use Add UAL, and you'll need to do that if it takes parameters and other complications, but that's a bit more advanced: see Chapter 8 of the User's Guide or contact .

6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.

This should happen only on Unix. There, for improved usability (at a customer's request), functions whose names match certain patterns are filtered from the list. This list can be changed, replaced or nullified, though this is not documented.

The filtering is defined by the patterns in the file $APROBE/arca/trace_filters . See the commentary at the top of that file for complete information.

6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?

Could it be you set APROBE_SEARCH_PATH after you started the GUI? If so, quit RC and restart it so it can pick up the env var.

6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?

This is addressed in Chapter 10 of the RootCause User's Guide, under Libraries With No Debug Information. Here's a paraphrasing of that given by our support staff:

The easiest way is to create a ".h" file that contains prototypes for the functions that you want. RootCause will automatically compile and use the "debug information" in that file so, for example, you can see the parameters in the setup window of the Console or reference them by name in the custom apc that you write.

To do this:

  1. Put the prototypes (C, not C++) into a ".h" file and give the file the same name as the shared library (or executable) where the functions reside (for example if your executable was named a.out, then the .h file would be named a.out.h)
  2. Place the .h file in the local or global "shadow" directory, with the name of your executable or library plus ".h" on the end. For example, if your program were called t.exe then on Unix the global location is $APROBE/shadow/t.exe.h and the user-local one is $APROBE_HOME/shadow/t.exe.h. On Windows, this is as you would expect: %APROBE%\shadow\t.exe.h and the user-local one is %APROBE_HOME%\shadow\t.exe.h. See Question 4.8 about APROBE_HOME.

Placing the .h file in $APROBE/shadow would make it available for all invocations of RootCause, whereas the other two locations would be more user specific. Note that RootCause will search the directories in the opposite order of their listing above, so a.out.h in the .rootcause directory will be used instead of a.out.h in the $APROBE directory. (Analogous for Windows.)

You can see an example of this by doing a directory of the $APROBE/shadow/*.h (or %APROBE%\shadow\*.h). RootCause uses this feature to provide parameter information for some of the system shared libraries.

Make sure that you have a supported C compiler available, as this is needed to compile the .h files. (You may not have a supported C compiler if you installed RootCause as Java only and now want to do C probing; contact to add the C capability.)

6.9 How can I turn on trace just when I'm in a chosen method or function?

This is called a "Trigger" and has been a feature of the Aprobe-level trace all along. It was added as a Probes action in the Trace Setup dialog in version 2.1.3a (April 2004). It works like this:

  1. Apply Trace to all the functions and methods you want to trace, as usual.
  2. Select the function or method that is to be the "trigger".
  3. Click the Probes tab in the lower right pane.
  4. Check the On checkbox, then use the Probe Action dropdown menu to select Trigger Trace.
  5. Click Ok to apply and build your trace.

You should see the function or method to which you applied the Trigger action at the top of each traced call tree in your trace, and nothing outside of that (even if you selected it for tracing).

6.10 How can I enable my custom probe only when Trace is also enabled?

You can check whether trace is enabled with the ap_RootCauseTraceIsEnabled macro. For example:

         if (ap_RootCauseTraceIsEnabled)
         {
            printf ("Enabled\n");
         }
         else
         {
            printf ("Disabled\n");
         }

Disabling your probe independently from Trace is covered in the "Disable Probe" example (Windows: %APROBE%\Examples\Advanced\Disable_Probe; Unix: $APROBE/examples/learn/disable_probe).

6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?

You can't. This is deliberately designed to remain active even after trace is disabled. We do deliver source for the probes so that users can customize their behavior. In this case it would be a simple matter of putting the "if (RootCauseTraceIsEnabled)" check (see Q6.10) around the code in the "ExceptionHandler" routine within $APROBE/probes/exception.apc, recompiling it, and either using a local copy or overwriting $APROBE/ual_lib/exception.ual. (On Windows, %APROBE%\probes\exception.apc and %APROBE%\ual_lib\exception.dll, respectively.)

6.12 How can I trace and time everything between point A and point B?

  1. Create a workspace for the application (which you have already done).
  2. In the main window:
    • Enable the xxx.trace.ual (the first one).
    • Enable perf_cpu.
  3. Go to the trace setup dialog:
  4. Click on the program node (the very first one).
  5. In the probes tab, create a probe on program entry to disable tracing.
  6. In the left pane, click on the application module node (first 'M' icon).
  7. Right click and choose trace all.
  8. Find and select the point A function in the tree.
  9. In the probes tab, create a probe to enable tracing on entry.
  10. Find and select the point B function in the tree.
  11. In the probes, create a probe to disable tracing on exit.
  12. Click the Options... button to open the Trace Options dialog.
  13. Disable load shedding so you get everything.
  14. Click OK to build the workspace.
  15. Restart your application.

After you run through your test, format the APD files with Examine. The tree will reflect the trace path from point A to B. At the end is a summary call tree with call times in it. Or you can look at the performance table node (right click and choose show associated table) to see a table.

6.13 How can I allow all Java parameters to be traced?

To enable the Log All Parameters menu item, set and/or export the undocumented environment variable RC_ENABLED_LOG_ALL before starting the RootCause GUI.

7. The Trace Display (Event) Dialog

7.1 Why are some functions found in the traced Events not found in the Trace Setup?

There are two possibilities, but the most likely (on Solaris) is that the traced function is a compiler-generated one that is explicitly filtered from the Trace Setup list, but which is covered by the "wildcard" trace used when you do "Trace All Child Nodes" from the Trace Setup module node. See Q6.6 .

The other possibility is that the event was introduced by some other custom probe, such as a J2EE trace. See Q7.2 .

7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?

Probably because the events didn't originate in the Trace Setup, but were introduced by a supplementary J2EE trace. Still, you should be prompted to add the containing class, and so be able to define traces on it.

7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?

Yes, RootCause has a concept of a source file path. There are a number of ways to set this:

If you click on a method, the first time it will ask if you want to find the source. If you browse and select the source file, the enclosing path is automatically added to a list. If the end of the Java path matches the package path of the class, the "root" of the package path is added also.

You can edit the path directly off the RootCause Setup menu.

We'll pick up an environment variable APROBE_SEARCH_PATH when the RootCause Console starts.

7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?

Of course it depends on the vendor, but the best thing to do is to send them what you would want your customers to send you: text with as much pertinent information as possible. If the trace contains enough information for you to determine where the problem is, then the other piece of information they would want is the system configuration, as collected with logenv.ual.

To create the bug report, you could do File->Save As Text from the Trace Display window; then edit the resulting text file to include the program and system configuration and the tracebacks and execution information that identify the problem; then e-mail the result, indicating it was collected with RootCause. (They might have RootCause also, and ask you to re-run to collect additional information).

7.5 Unix: RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?

Yes. The JVM routinely uses signal 11 (perhaps for extending the stack) and signal 4 (illegal instruction -- not sure what that's for). These can show up in the trace and are fine. Later versions of the JVM provide options for reducing its use of signals; you can search java.sun.com for details.

7.6 When I trace a Java synchronized method, does the method time include lock delay time?

The JVM implements the synchronization on the calling side rather than on the callee side. Once you are inside the method's code, the lock has already been grabbed. This means that the time you see is after the synchronization.

For instance, I have a test that calls a synchronized method from a thread's run method:


try
{
   Thread.sleep (1000);
   parent.synchronizedMethod ();            // Line 15
}
catch (InterruptedException e)
{
   e.printStackTrace ();
}

If I trace lines and have things set up so another thread is within synchronizedMethod(), I see something like this:


Line 15                    10.45.00            ; Waiting ...
synchronizedMethod entry   10.46.00            ; Got it ...

7.7 Why was malloc() listed as being LOAD_SHED in the Trace Display when it really wasn't?

Because it was attempted to be load-shed, which recorded it as such, but the actual disabling of the probe was disabled by another UAL's explicit request, using #pragma nopatchcount.

The confusion comes from the fact that load shedding may mean two things:

  1. The patch for the subprogram is disabled (no more probes for this routine will get triggered);
  2. and
  3. This routine is no longer traced.

Since we don't want (1) to happen for allocation/deallocation routines when running memstat, these patches could not be disabled. This was indicated by using #pragma nopatchcount in combined_memstat.apc.

However, when traced these routines will get load shed just like everything else, and the LOAD_SHED event and appearance in the table indicate that (2) has happened. So this is pretty much "as designed".

If you explicitly mark the function as, "Do Not Shed", it will no longer show up in the table.

7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?

You are hitting the limit on the maximum number of items displayed in the trace display. You can either reduce the size of the APD files, reduce the number of APD files selected or increase the limit at the expense of longer processing times and higher memory overhead. I would try the last one first and if this works for you, great. The option is "Maximum number of events in Trace Display" and is described here. Briefly:

  1. Go to the RootCause Main window
  2. Open the Setup menu (not the button, but the pulldown menu)
  3. Select Options...
  4. Change the value of the option Maximum number of events in Trace Display (third from the bottom) to a higher value. A value of 2000000 (two million) is appropriate for processors with more than 128M of memory.

The values are recorded per-user, so must be set for each user in the user preferences file: $APROBE_HOME/preferences on Unix, %USERPROFILE%\preferences on Windows.

7.9 I see that I can do "Save As XML": can I view this XML later?

Yes, but only in RootCause (see below). It is not quite legal XML and so will be rejected by general XML viewers. (If you think this is an important feature, let us know.)

To import saved XML back into RootCause again, you have to set the environment variable RC_ENABLE_LOAD_XML to a nonempty value before starting the RootCause GUI. If you've done this, you will then see the menu item Examine XML File... in the Analyze menu in the RootCause Main menu. Clicking this menu item will open a file selection dialog from which you can select an XML file. This must be a file previously saved from RootCause Trace Display using File->Save As XML. When you click the Examine XML Output button in this dialog, you will then see a Trace Data Dialog in which one of the checkboxes is the name of your XML file. Check it, and click Open, to view the Trace Display.

7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?

Under the View menu, click Statistics Filter.... This dialog is used to create a "filtered" copy of the statistics summary tree. The copied tree will be added to the end of the event tree and will identify what filter was used. You specify a statistic to use (Wall time or CPU time, if collected) and a threshold percentage to create the "filtered" copy. A child node in the summary tree will only be copied to the new tree if the child's statistic value is at least the given percentage of the parent's statistic value. Choose "None" to create an exact copy. The threshold must be a numeric percentage between 0 and 100.

7.11 Do the times shown in Trace Events reflect the aprobe overhead?

No, these are actual times. You can specify overhead values by clicking View->Statistics Overhead. This opens the Set Statistics Overhead dialog. You'll see an options menu from which you can select the statistic to adjust, and type-in fields for the normal (native) call overhead and the Java overhead (which is generally bigger).

Note you must each statistic separately, for example:

When you've completed setting overhead values, you must regenerate the data:

7.12 How do I know what overhead to specify in the Set Statistics Overhead dialog?

As described in Q7.11, you can specify tracing overhead to be applied to times shown in the Trace Events details. But what number should you put in there? The answer depends on a number of factors, including your hardware and OS speed, whether you're dumping parameters, and whether it's Java or native code. A good guess is the minimum time you see in the entire tree for that kind of call, or if that seems to big, you can instrument some do-nothing function and see what its time is. This value would be the overhead for every call, and you can use that.

7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?

The nodes look like:

ENTER Factor::addWidgets()
  time = 2004-05-03 16:32:10.079965024
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java
  CPU Time 0.428844 ( 0.428844)
  Wall Time 0.552496 ( 0.552496)

 EXIT Factor::addWidgets()
  time = 2004-05-03 16:32:10.632461354
  elapsed time = 00:00:00.552496330
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java

The Details pane for each node gives the (wall) time at which the function or method was entered. In addition, any statistics that were being gathered are attached to the ENTER Node. Shown here are the elapsed CPU Time (gathered because the perf_cpu probe was enabled) and elapsed Wall Time. Both were computed on EXIT from this specific invocation. The EXIT node also shows the elapsed (wall) time, which is the same as the Wall Time statistic.

7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?

Consider the following node:

Java_Factor_smallestFactor()
  process = 15193, thread = 10 _start()
  symbol = extern:"Java_Factor_smallestFactor()" in "libFactorJNI.so", /work/JNI/factor.c
  Times called: 29
  Child calls (native/Java): 4190 / 0
  CPU Time (29):  1.248102 ( 1.298730) [99.753%]
    Max  :  1.231153 ( 1.274449)
    Min  :  0.000048 ( 0.000072)
    Avg  :  0.043038 ( 0.044783)
  Wall Time (29): 375.135004 (375.185632) [99.998%]
    Max  : 375.105686 (375.148982)
    Min  :  0.000043 ( 0.000067)
    Avg  : 12.935689 (12.937435)

Recall that each node in the Event Summary tree represents a unique call stack in the execution. The one shown above is for the native JNI function Java_Factor_smallestFactor() (see $APROBE/demo/RootCause/JNI).

The function was called 29 times. Those 29 calls together used 1.248102 seconds of CPU Time after overhead adjustment (See Q7.11.) The slightly larger time shown in parentheses after it (1.298730) is the "raw" time before the overhead adjustment. The percentage in brackets indicates that the total CPU time used for this function comprised 99.753% of the total time used by its caller, the parent node in the summary tree (See Q7.10 about filtering based on this percentage.). Of those 29 calls, the longest (Max) took 1.274449 seconds of CPU, the shortest (Min) took only 0.000072 seconds, and the average took 1.248102 / 29 = 0.043038 seconds of CPU.

7.15 Is there a way to save the text for a specific node in the Trace Events tree?

Yes. Click on a node to select it, then right-click to pop up the context menu, then click 'Save Node As Text' to save the selected node in a text file. This will save the node and its details exactly as it would appear in the 'File->Save As Text..' output. Note that it works only for one node, so if multiple nodes are selected it applies only to the first of those. See also the next question.

7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere?

Yes. In either the Events tree on the left, or the details in the lower left: Click on a node (or multiple nodes using shift or control keys in the usual way). Then right-click to pop up the context menu, then click 'Copy'. This will put the selected nodes in the Java clipboard. See Q3.2 for how to paste from the Java clipboard on Unix.

7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?

Probably because it was Load Shed. This means that it was called so often its tracing overhead became excessive and tracing was disabled for it during the run. It will appear in the Load Shed table, where you can choose to stop it from being Load Shed during the next run.

8. RootCause and Aprobe

8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?

You can't. The concept of levels is no longer supported. Instead you can apply a Trace Trigger, or disable and enable the trace using the probes tab for a given function.

8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?

These are not currently integrated with RootCause. If you can run them from the command-line using Aprobe you should do that. If you wish to use the "RootCause On" mechanism to run them using the workspace, you must add them to the workspace options using the "Setup->Add UAL" menu item. This adds a new UAL "permanently" to the Workspace UAL tree. For example, to add the "memwatch" probe, you would:

This adds "memwatch" to the UAL tree in the Workspace window. You could then check this to enable memwatch on applications run under RootCause. The output of these probes isn't integrated with RootCause, so the output appears as a "Text" node in the Trace Display event tree. You can use "Save As Text" from that display to view it outside of RootCause.

Prior to RootCause version 1.3.3, you would reference these probes using the Aprobe options and Apformat options dialogs (see Chapter 8 of the user's guide), just as you would on the Aprobe command-line. For example, to enable memwatch, you would add -u memwatch -p -g as "Additional Aprobe Options" (under Aprobe options in the Execute menu in the Workspace window) and -u memwatch in the Apformat options (under the Analyze menu). For probes like profile that require configuration files, you would have to put the full pathname of the configuration file into the options as well, like -u profile -p -c /testdisk/probes/prog1.profile.cfg .

8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?

Yes, but this is not explicitly supported. In particular, most operations from the RootCause Console overwrite the scripts in the workspace which apply Aprobe to the application. So after you use the Console to create a workspace, you quit, and edit the aprobe.ksh and apformat.ksh scripts (do_aprobe.cmd and do_apformat.cmd on Windows) directly to apply your probes.

8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?

Aprobe supports Java with the apjava command. Writing custom probes in Java is described in Chapter 11 of the RootCause for Java User's Guide and the nearly-identical Chapter 5 of the Aprobe User's Guide for Unix and Windows , and if you really wanted to you could do everything from the command line.

8.5 How do I add my own UAL to the RootCause trace?

There are three ways of adding a UAL to a trace:

  1. Update the predefined_uals file in ual_lib to add it for all workspaces. It will show up in the list in the workspace when you do that.
  2. Use the Add Ual option on the setup menu - this will also cause it to show up in the list.
  3. Copy it into the workspace. It will not show up in the list because it's not until runtime that we look in the directory to see what other UALs are present.

Personally I like option b, choosing not to copy the UAL to the workspace. This makes it easy to enable / disable from the GUI.

8.6 How can I use the Events probe with RootCause?

The events probe is not integrated with RootCause Trace Display, but you can still use it. Here's a quick way to get started, by simply applying events to all Java methods and all native functions in the main module (if any), and letting load shedding reduce overhead.

  1. cp $APROBE/probes/events.cfg MyWorkspace.aws
  2. echo ';event function "*"' >> MyWorkspace.aws/events.cfg
  3. echo 'event function "*::*"' in $java$ >> MyWorkspace.aws/events.cfg
  4. Workspace->AddUal: add events.ual and specify the following aprobe parameter:
   -c $RC_WORKSPACE_LOC/events.cfg
  1. Keep the trace.ual enabled with load shedding on, but don't specify any traces (this would load shed low level events)
  2. Run the application
  3. From the command line, use
  rootcause format -r MyWorkspace.aws > format.txt

Your results are in format.txt. You can then edit the events.cfg file to do more, as shown in Q15.12 , and you can specify an alternate output file so you get the events output while still formatting within RootCause.

 

9. RootCause at Run Time

9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer (Windows XP)? I was thinking that it would be interesting to see all the processes as my computer boots.

Yes, you can leave RootCause on all the time. It takes effect on reboot about the time when per-user preferences get loaded, or when you get prompted for your login id. Check the System event log (run "eventvwr") to get more exact information.

9.2 How much will RootCause slow my application?

This depends almost entirely on what you do with it. If you trace almost nothing, it will introduce almost no overhead. If you trace every method call on your machine, it will slow things down too much. The keys to good performance are:

9.3 On Solaris, why do I get "illegal insecure pathname" with RootCause on?

You need to copy or soft-link the RootCause "libapaudit.so" library to a "secure pathname" as described in Chapter 10 of the RootCause User's Guide, "RootCause, SETUID, and Security Concerns".

9.4 Solaris: Why do I get warnings from running `ps' when rootcause is enabled, and how can I fix it?

If you're seeing messages like:

ld.so.1: mail: warning: /opt/aprobe/lib/libapaudit.so: open failed: illegal insecure pathname
ld.so.1: mail: fatal: /opt/aprobe/lib/libapaudit.so: audit initialization failure: disabled.

Then the application you're running (like "mail" above, or "ps") has its setuid bit set and is owned by root. Solaris prevents dynamically loading debug libraries on such applications for security reasons. Here's what to do:

  1. As root, run rootcause setup and then run:
  rootcause_libpath -c
  rootcause_off
  rootcause_on
  ps

If you still get warnings, you're probably on an early patch level of Solaris 8. Do:

export LD_AUDIT_64 ; LD_AUDIT_64=/usr/lib/secure/64/libapaudit.so

If that still doesn't work, contact OC Systems. Details about probing secure applications on Solaris is documented in Chapter 10 of the latest Unix RootCause User's Guide.

9.5 Solaris: How do I find out the full path of dynamic modules loaded?

There's no built-in mechanism. It's harder than you think. Here's some custom APC (for Solaris only) that you could compile into a UAL, add to your workspace, and see the modules:

#include <alloca.h>
#include <link.h>

typedef struct
{
   ap_NameT  ModuleName;
   ap_Uint32 StartAddress;
   ap_Uint32 Length;
} DynamicModuleDataT, *DynamicModuleDataPtrT;

static void *ModuleKeyGet (void *S)
{
   return (void *) ((DynamicModuleDataPtrT) S)->ModuleName;
}

static ap_BooleanT ModuleKeyCompare (void *LeftKey, void *RightKey)
{
   return (strcmp ((ap_NameT) LeftKey, (ap_NameT) RightKey) == 0);
}

static DECLARE_HASH (DynamicModuleTable,
                     ap_StringHashFunction,
                     ModuleKeyGet,
                     ModuleKeyCompare);
                     

#if defined(__SunOS_5_5_1)
extern int dlinfo (void *handle, int request, void *p);
#endif

typedef ap_Uint32 (*FindElfSymbolT) (ap_NameT SymbolName, ap_NameT ModuleName);

static int NextModuleId;

static void DynamicModuleFormat (ap_NameT   Filename,
                                 ap_Uint32 *StartAddress,
                                 ap_Uint32 *Length)
{
   ap_RootCausePrintEventStart ("program_comment");
   printf ("Module loaded: %s\n   Address span 0x%08x-0x%08x\n",
           Filename,
           *StartAddress,
           *StartAddress + *Length);
   ap_RootCausePrintEventEnd ("program_comment");
}

static void RecordDynamicModule (ap_NameT Filename, void *Handle)
{
   ap_ModuleIdT          ModuleId;
   static FindElfSymbolT FindElfSymbolRoutine = NULL;

   ModuleId = ap_ModuleNameToId (Filename);
   if (ap_IsNoModuleId (ModuleId))
   {
      DynamicModuleDataPtrT  DynamicModulePtr;
      Link_map              *Linkmap;
      
      // Get the info for this.
      if (dlinfo (Handle, RTLD_DI_LINKMAP, &Linkmap) == -1 ||
          Linkmap == NULL)
      {
         ap_Error (ap_WarningSev,
                   "Cannot not loader info for %s",
                   Filename);
         return;
      }
         
      // Is it in the dynamic table already?
      DynamicModulePtr = (DynamicModuleDataPtrT)
         ap_HashTableLookup (&DynamicModuleTable, (void *) Linkmap->l_name);

      if (DynamicModulePtr == NULL)
      {
         ap_Uint32     ModuleSize;
         ap_ModuleIdT  NewModuleId;
         ap_NameT      ModuleName;
         ap_NameT      ModuleBaseName;
         char         *DotSoLocation;
         int           Dummy = 0;
         
         // Find our internal FindElfSymbol routine.
         if (FindElfSymbolRoutine == NULL)
         {
            FindElfSymbolRoutine = (FindElfSymbolT)
               ap_SymbolAddress
               (ap_SymbolNameToId (ap_ModuleNameToId ("libaprobe.so"),
                                   "FindElfSymbol()",
                                   ap_ExternSymbol,
                                   ap_FunctionSymbol));
            if (FindElfSymbolRoutine == NULL)
            {
               ap_Error (ap_FatalSev,
                         "Cannot find FindElfSymbol");
            }
         }

         // Add it to the table.
         DynamicModulePtr =
            (DynamicModuleDataPtrT) ap_Malloc (sizeof (DynamicModuleDataT));
         DynamicModulePtr->ModuleName = ap_StrDup (Linkmap->l_name);
         DynamicModulePtr->StartAddress = (ap_Uint32) Linkmap->l_addr;
         DynamicModulePtr->Length = FindElfSymbolRoutine ("_end",
                                                          Linkmap->l_name);
         ap_HashTableInsert (&DynamicModuleTable,
                             (void *) DynamicModulePtr);

         // Record it
         log (ap_StringValue (Linkmap->l_name),
              DynamicModulePtr->StartAddress,
              DynamicModulePtr->Length)
           with DynamicModuleFormat to ap_PersistentLogMethod;

         // Now log it for the format logic to find
         NewModuleId.Value = ap_FetchAndAdd (&NextModuleId, 1);
         ModuleBaseName = ap_Basename (Linkmap->l_name);
         ModuleName = strcpy (alloca (strlen (ModuleBaseName) + 1),
                              ModuleBaseName);
         DotSoLocation = strstr (ModuleName, ".so");
         if (DotSoLocation)
         {
            *(DotSoLocation + 3) = `\0';
         }
         ap_LogData (ap_IntegerToLogId (LOG_ID_FOR_FORMAT_RECORD_MODULE),
                     8,
                     &NewModuleId,
                     sizeof (NewModuleId),
                     ModuleName,
                     strlen (ModuleName) + 1,
                     &(DynamicModulePtr->StartAddress),
                     sizeof (DynamicModulePtr->StartAddress),
                     &(DynamicModulePtr->Length),
                     sizeof (DynamicModulePtr->Length),
                     &Dummy,
                     sizeof (Dummy),
                     &Dummy,
                     sizeof (Dummy),
                     Linkmap->l_name,
                     strlen (Linkmap->l_name) + 1,
                     ap_NoName,
                     strlen (ap_NoName) + 1);
      }
   }
}

probe thread
{
   probe extern:"dlopen()" in "ld.so"
   {
      ap_NameT Filename = (ap_NameT) $1;
      
      on_exit
      {
         if (!ap_IsNoName (Filename) && $return != 0)
         {
            RecordDynamicModule (Filename, (void *) $return);
         }
      }
   }
   probe extern:"dlmopen()" in "ld.so"
   {
      ap_NameT Filename = (ap_NameT) $2;
      
      on_exit
      {
         if (!ap_IsNoName (Filename) && $return != 0)
         {
            RecordDynamicModule (Filename, (void *) $return);
         }
      }
   }
}

probe program
{
   on_entry
   {
      // Record the number of static modules
      NextModuleId = ap_NumberOfModules ();
   }
}

9.6 How can I trace Linux daemons with RootCause?

The following steps should allow you to use RootCause to trace activity in several of the daemons on your Linux system:

Background

RootCause keeps a log file and a registry as defined by the APROBE_LOG and APROBE_REGISTRY environment variables. These are generally set on a per-user basis by the Aprobe setup script, based on the user's $HOME environment variable or on the environment variable APROBE_HOME if that's defined. The default location for these files is a hidden directory under a users home directory called ".rootcause". When RootCause intercepts a program that is starting up it looks in the user's registry to see if this program should be instrumented. If so, there will be an associated workspace file named in the registry. By changing the APROBE_HOME environment variable before running setup, you can change the locations of the log and registry. Note that these files have to be writable by all processes that access them.

Daemons like sshd are started on Linux using a shell (bash) script located in /etc/init.d . For sshd the file is /etc/init.d/sshd . If you edit this file you will see a subroutine named "start". Not surprisingly it is this subroutine that we want to add a few statements to setup RootCause to intercept sshd .

Details

  1. Create a RootCause workspace to trace sshd :

We recommend that you create your workspace on a disk local to the machine that will be running the intercepted program on. Create it in the same way we did today, that is using the "new" pulldown menu on the main RootCause screen.

  • Verify the location of your log and registry files:
  • These files are probably in $HOME/.linux_rootcause . They are named: "registry" and "rclog". You can specify a different location using the APROBE_HOME environment variable (see Q4.8 ) but be sure to run "setup" after setting APROBE_HOME and make sure the protections of the resulting files are correct.

  • Back up your /etc/init.d/sshd script.
  • You should probably make a copy of the sshd file before you modify it so you can restore it when you are finished tracing sshd.

  • Modify the /etc/init.d/sshd script to setup aprobe:
  • Find the start subroutine in the /etc/init.d/sshd file and insert the following four lines after the "do_dsa_keygen" line:

      export APROBE_HOME=directory identified in step 2
    . aprobe_root
    /aprobe/setup
      . $APROBE/bin/rootcause_enable
    1. Stop and restart the sshd daemon.

    As root and with your current directory as /etc/init.d execute

      sshd stop
      sshd start

    You should see a stopped message from the stop and some output indicating that rootcause has started from the start message. You may get a "FAILED" message from the start. On our system even when we get the failure message the daemon seems to start with no problems. So I think you can ignore this message.

    Tracing the libcrypt.so library was interesting, you can really see the ssh protocol flow as it generates keys and such.

    The technique outlined above should work for many of the daemons on Linux.

    9.7 Solaris: How do I apply RootCause to applications run at boot time?

    Once you've used Aprobe to investigate the behavior of processes on a running machine, there is nothing particularly complicated about doing the same for system processes while the machine boots, but there are a number of special factors to take into account. These are listed below, and an example given of how we applied these to one of our machines.

    The techniques described here were tested on a Solaris 6 box, but should be equally applicable to more current installations.

    1. Any time you make your own modifications to a system's startup procedures, there is a risk that you may make the system unbootable. We'll try to point out the pitfalls, but as with any procedures like this you should be prepared to recover the system from maintenance mode or even to reinstall the OS.
    2. At startup, system resources you may want to rely on may not be available. Make sure your RootCause installation is not on remote disks, and even for local installations, check that the filesystems used for the installation and for logging are available at the expected point during the boot process. If you want to get in at the start of Runlevel 2, the only filesystems typically available at that point are "/" and "/var", which may not have enough free space to support installation and logging.
    3. Startup scripts are run with /sbin/sh, which does not provide all the features you may be accustomed to with ksh, although it is very close for most purposes. Where possible, test scripts by running them under /sbin/sh before adding them to the boot process.
    4. For the test I just performed, I chose to monitor processes started as the system enters Runlevel 3, which starts NFS server processes, among others. At this point, all local filesystems are mounted, so I had no problem finding space for an installation, but many potentially 'interesting' services had already been launched.
    5. The libapaudit.so shared library needs to be installed in a secure location. With root authority, run:
    6.   . /opt/aprobe/setup
        rootcause_libpath -c
    7. The startup procedure for a given Runlevel is determined by a script, " /sbin/rcN ". The execution of these scripts is described in /etc/rcN.d/README , for N = 2 or 3. Since RootCause depends on an environment being defined, we need to 'source' some scripts into this command so the environment is defined when servers and daemons are started. I did this by creating files in /etc/rc3.d. If you look at the README and /sbin/rc3 script, you should see how this works.
    8. You will need to perform three steps to enable RootCause intercept in the rc driver. We will accomplish this by creating three files in the /etc/rc3.d directory.
      • /etc/rc3.d/K00RootCauseLocal.sh
      • Defines the APROBE_HOME environment variable where the logs and registry are stored:

        APROBE_HOME=/opt/aprobe_home
        export APROBE_HOME
        
      • /etc/rc3.d/K01RootCause.sh
      • Is a soft link to the setup script in the RootCause installation directory:

        ln -s  /opt/aprobe/setup /etc/rc3.d/K01RootCause.sh
        
      • /etc/rc3.d/K02RootCause.sh
      • contains the command to enable intercept:

        . rootcause_enable

      Normally, scripts whose names start with 'K' are used to shut down processes before others are started, but we will take advantage of the fact that these are executed first to ensure that the RootCause setup is performed before anything else.

    9. All that is required now is to reboot the machine, then login as root, define APROBE_HOME, source the installation setup script, and start the RootCause GUI. The event viewer should show you what processes were launched.

    9.8 Windows: Can RootCause trace System Services which are automatically started when the machine is booted?

    Yes. However, there are a couple unique things about tracing System Services that you need to keep in mind:

    9.9 Windows: Why are there no APP_START events in the RootCause Log for System Services?

    Unlike all other processes, you will _not_ see an APP_START event in the RootCause log when a System Service starts. So, if you want to trace a System Service, you must manually Register it (either with the "rootcause new" command or the RootCause GUI's Workspace->New dialog), and thereafter you will see APP_TRACED events for it in the RootCause log.

    9.10 Windows: I created a WorkSpace and defined a trace for a System Service that is automatically started each time my machine is booted. However, I don't see an APP_TRACED event in the RootCause Log when I reboot. Why?

    Like all Services automatically started at boot time, the RootCause dynamic process intercept Service is started in a pre-defined order by the System Control Manager (SCM).

    In order for RootCause to intercept a Service at boot time, the RootCause process intercept Service must start _before_ the Service to be intercepted.

    Generally, RootCause starts early enough in the Boot sequence to intercept all Services. However, if it's not early enough for a particular Service, it's easy to modify the Boot sequence so that RootCause starts earlier. This is done by modifying the ServiceGroupOrder Key in Registry.

    9.11. Windows: I've defined a trace for a System Service, but each time the Service starts it is not traced. Instead, I see a TEXT event in the RootCause log reporting the following apparent error: `(W) ... Cannot execute ...\do_aprobe.cmd'

    For C-language applications, RootCause executes a script called "do_aprobe.cmd", located subordinate to the WorkSpace directory, in order to apply the trace (for Java applications, the script name is do_apjava.cmd). The error is reporting that the script could not be executed.

    There are a couple things to check: First, this is most probably an access permission problem. Remember that System Services can be defined to run as _any_ user, and that user must have write permission to the RootCause Workspace directory. A common problem is that the Service runs as user LSA (Local Security Authority, or System Account), and LSA doesn't have permission to write to the Workspace directory.

    Second, does the Workspace directory exist? Use the "rootcause register -l" command to get a listing of Registered applications and their corresponding Workspace directories and verify that the directory is present and intact.

    Can I apply different workspaces (or none at all) for the same program invoked with different command-line parameters?

    Yes! This feature was quietly introduced in RootCause 2.1.3 (February 2004) by the addition of a "-p pattern" option to the rootcause register command. The pattern argument consists of a simple expression that can specify argument positions, wildcards and simple comparison and logical operations. You can associated the same executable (or Java class) and different patterns with different workspaces. At run-time, actual command-line arguments are substituted for special identifiers in the expression (like %2, $*) and then the expression is evaluated. If it evaluates to TRUE, the associated workspace is used to probe the application. If no expression evaluates to true, then the application is not probed. There's no GUI support; you have to register your application from the command-line to use this feature. All the details are described here. If it's still not clear how to do what you want, don't hesitate to contact us.

    How can I "intercept" a Java server on AIX?

    As described in the user's guide, RootCause on AIX does not support the automatic "intercept" of applications at load time: the application must either be run directly from the command line with "rootcause run", or else the binary must be renamed/replaced with a soft-link to a script that simulates the intercept effect.

    Starting with version 2.1.3b (May 2004) you can do implement this second alternative with the rootcause link command, which renames/replaces the java binary with a script that uses access-lists and environment variables to manage who's applying rootcause to each Java instance.

    The command rootcause link is used to apply Rootcause to applications (typically services and application servers) which cannot easily be started from a user's shell environment. rootcause link uses symbolic links to "intercept" these applications. A set of subcommands are available to manage these links safely and conveniently.

    Note that step 4 will probably require root authority, depending on where the application to be traced is installed.

    1. Identify the full path to the executable you wish to trace with RootCause. In the case of an application server, this will be a program named "java". You should use the 'ps' command to verify the pathname if possible. Write this path name to a file, for example:
         echo /usr/java131/bin/java > server.lst

      The application named here cannot be a symbolic link.

    2. Install the above list as the application list with
         rootcause link -I server.lst

      You may specify more than one application, each on a separate line, in this file. The rootcause link -I command instructs RootCause to save this file as the list of applications whose links are to be managed.

      rootcause link -I will require write access to the RootCause installation directory. If you need to change the application list later you will need to apply step 7 below (remove symbolic links).

    3. Verify the application list is installed as expected with
         rootcause link -l

      This will report a line like the following:

      
           - /usr/java131/bin/java
      

      The '-' indicates that the application is eligible to have its link managed, but that link does not exist and as a result the application will not be run under RootCause. rootcause link -L will show an explanation of the characters used to describe the link state. These are:

      
         - Executable is not RootCause linked
         * Executable will be run under RootCause
         ? File is not an executable or is invalid
         ! A serious error was detected;  contact support immediately
      
    4. Create the application link with
         rootcause link -K

      This will create symbolic links into the RootCause installation directory for each application designated with the rootcause link -I command.

      rootcause link -K requires write access to the directory where the application to be traced is installed. Typically this will require root authority.

    5. Turn on rootcause interception with
         rootcause link -a

      Now whenever the application is started, an entry will appear in the rootcause log. Follow the usual procedure to create a workspace and set up trace definitions.

      rootcause link -a can be run by any user.

      At this point you are ready to begin analyzing and debugging your application with RootCause. The remaining steps describe how to return the application to its original state and should be performed if RootCause is uninstalled.

    6. Turn off rootcause tracing with
         rootcause link -Z

      The symbolic links will remain in place, but the application will not be run under Rootcause.

      rootcause link -Z can be run by any user.

    7. Remove symbolic links with
         rootcause link -D

      rootcause link -D requires write access to the directory where the application to be traced is installed (same as -K). This will restore your applications to their original state, where they will run completely independently of any component of the RootCause toolset.

    10. RootCause J2EE Support

    10.1 What J2EE application servers are supported?

    RootCause will work with any Enterprise Java Application Server that uses a supported JVM.

    RootCause can trace an Application Server that is run as a standalone Java JVM (using java executable) or it can trace a JVM that is embedded within a native executable.

    RootCause has been tested with:

    • Sun iPlanet 6.5 and AppServer 7 on Solaris;
    • BEA WebLogic 5.1, 6.0, 6.1 and 7.0 on Solaris and Windows;
    • JBOSS 3 and 4 on Solaris and Windows;
    • Tomcat on Linux, Solaris and Windows; and
    • IBM Websphere 4, 5, and 6, and CE on AIX.

    10.2 How does one create a workspace for a standalone Java app-server like WebLogic?

    If the Application Server runs as a standalone Java JVM, you can create a workspace just like any other Java application. Make sure RootCause is enabled in the shell or environment you are running the Application Server JVM. Run the Application Server, and find the Java APP_START event in the "Trace Events" window.

    In the New Workspace Dialog , there is an option for "J2EE Server Directory". Enter the directory where deployable Enterprise Java Bean (EJB) and Servlet classes and jars reside. RootCause will automatically add EJB and Servlet classes and jars that are specified in any J2EE compliant XML deployment descriptors.

    Once a Java workspace has been created and opened, the J2EE Modules directory can be changed to another location, or the current directory can be searched again for updated or new J2EE applications. This can be done using the Workspace ->Update J2EE Modules menu item.

    10.3 How does one create a workspace for a native executable app-server like iPlanet?

    If the Application Server runs embedded within a native executable, you can create a workspace for the native executable, and then add the libjvm library as a dynamic module. First create a workspace for the executable that runs the Application Server as you would for any other. The open the Trace Setup window.

    An Application Server might run an embedded JVM, but already have libjvm library loaded as a dynamic module. If this is the case, the libjvm library will show up in the list of loaded libraries in the "Trace Setup" window.

    If libjvm does not appear as a statically-loaded module in Trace Setup, you must find the server version of the libjvm library ( libjvm.so on Unix, libjvm.dll on Windows). Once this module has been found, it can be added using the Workspace -> Add Dynamic Library menu item.

    Once the libjvm module is shown in the Trace Setup window, you can complete the J2EE configuration from the main workspace window using the Workspace -> Update J2EE Modules menu item.

    10.4 How do I configure RootCause with Sun iPlanet Application Server 6.5 on Solaris?

    These instructions assume $IAS_HOME is the install directory of the iPlanet App Server. $IAS_HOME does not have to be set for the application to be run or for RootCause to trace it. It is convenient to have $IAS_HOME set, in addition to $IAS_HOME/bin in your $PATH.

    $IAS_HOME/bin/iascontrol is the command line script that controls starting and stopping of the iAS 6.5 server.

    Make sure RootCause is enabled in the shell that you start iascontrol. Stop the app server by running `iascontrol kill'. Restart iAS server by running `iascontrol start'.

    Once iAS is started, examine the RootCause "Trace Events" window. Find the ".kjs" process in the list. There might be multiple ".kjs" processes showing, selecting any will be fine. The ".kjs" process is the native executable that contains the embedded JVM of the iAS application server. iAS 6.5 defaults to at least two ".kjs" processes, one for the EJB engine, and one for RMI/IIOP connections.

    Create a new workspace for the ".kjs" process. Once the workspace is open, you can add J2EE Modules by running the "Workspace -> Update J2EE Modules" menu item. Deployed applications within the iAS server are typically stored in $IAS_HOME/APPS directory. If you want to just add J2EE modules for a particular application, you can select a specific directory within $IAS_HOME/APPS.

    There is no need to add libjvm.so as a dynamic module before tracing embedded JVM, as it is already dynamically loaded by the ".kjs" process.

    You might want to trace classes in the app server engine itself. If so, add $IAS_HOME/classes/java/kfcjdk11.jar as a dynamic module. Expanding this jar in the "Trace Setup" window will allow tracing of engine classes.

    10.5 How do I configure RootCause with BEA WebLogic Application Server 6.1 on Solaris or Windows?

    These instructions assume $WL_HOME is the install directory, and $WL_DOMAIN is the WebLogic domain, found in $WL_HOME/config. $WL_HOME is set by the setEnv command in the $WL_HOME/config/$WL_DOMAIN directory. $WL_DOMAIN is not directly set by the startup and config scripts, but provides an easy shorthand for the WebLogic domain used.

    WebLogic runs as a standalone JVM process, and is straightforward to trace using RootCase.

    Make sure RootCause is enabled in the shell that you start WebLogic. Make sure the app server is stopped. Typically the app server is started by first calling `$WL_HOME/config/$WL_DOMAIN/setEnv`. This configures the environment for running WebLogic. Then start the server by calling `$WL_HOME/config/$WL_DOMAIN/startWebLogic`.

    Create a new workspace for this JVM, the main class name is "weblogic.Server". You can add the "J2EE Server Directory" in the "New Workspace" dialog. The typical location of deployed applications for WebLogic is in $WL_HOME/config/$WL_DOMAIN/applications.

    10.6 How do I configure RootCause with JBoss 3 on Solaris or Windows?

    These instructions assumes $JBOSS_HOME is the install directory, and $JBOSS_DOMAIN is the domain used, typically found in $JBOSS_HOME/server.

    JBoss runs as a standalone JVM process, and is straightforward to trace using RootCause.

    Make sure RootCause is enabled in the shell that you start JBoss. Make sure the app server is stopped. Typically the JBoss is started by first calling $JBOSS_HOME/bin/run.

    Create a new workspace for this JVM, the main class name is "org.jboss.Main". You can add the "J2EE Server Directory" in the "New Workspace" dialog. The typical location of deployed applications for JBoss is in $JBOSS_HOME/server/$JBOSS_DOMAIN/deploy.

    10.7 How do I set up for tracing TomCat?

    First, get the right JRE and TomCat installation, and configure it (these instructions are for Windows):

    1. Get a current JDK and install it: If you don't already have a JDK 1.3.1 or higher, get one from Sun and install it. (Don't get the co-bundle unless you want the NetBeans IDE, but do get the SDK rather than the JRE).
    2. Download TomCat from (page down to 5.0.16.tar.gz or whichever format), and install it somewhere. We'll call the installation location CATALINA_HOME.
    3. Edit %CATALINA_HOME%\bin\catalina.bat to add something like:
      set JAVA_HOME=c:\program files\j2sdk_nb\j2sdk1.4.2
      set CATALINA_HOME=C:\J2ee\jakarta-tomcat-5.0.16
    4. Open a command prompt and execute "%CATALINA_HOME%\bin\catalina run" and it will start. Visit http://localhost:8080 to see what it did. Ctrl-C to stop it.

    Then create your workspace and enable Java Memstat:

    1. Run TomCat once with RC to record it in the RootCause log.
    2. Look in the RootCause log for the org.apache.catalina.startup.Bootstrap class to get run. (To see the RC log, either open RC with no parameters or use the Workspace/Open RootCause Log (cntrl-L) menu option.
    3. Select the org.apache... node and use mouse-button 3 to bring up the context menu. Select the Open Associated Workspace option.
    4. Accept all the defaults in the New Workspace dialog.
    5. Select the java_memstat predefined probe, click build and re-start TomCat.

    10.8 How do I configure an app-server not listed here?

    Call or

    10.9 How do I set up a workspace for "jrun" on Linux?

    Below is a script to create a workspace for jrun. The usage is pretty simple. Setup for RootCause, cd to the directory you want the workspace created in (strong, strong, strong recommendation for a local file system). Set the path to point to the jrun executable and run the script, for instance:

    
    $ . /opt/rootcause211/setup
    $  PATH=/work1/tools/jrun4/bin:$PATH create_jrun_ws.ksh
    Checking RootCause installation ...
    Finding Application Server location ...
    Using Application Server found in /work1/tools/jrun4/bin
    Creating workspace /percy_work/jrun.aws from:
       JRun - /work1/tools/jrun4/bin/jrun
       JVM  - /opt/j2sdk1.4.0_01
    Adding Program: "/work1/tools/jrun4/bin/jrun" -> 
    "/percy_work/jrun.aws"
    Registry updated.

    If this doesn't work for whatever reason, the workspace can be created manually. Add the $JAVA_HOME/jre/lib/i386/client/libjvm.so as a dynamic module where JAVA_HOME is the location of the Java Jrun will use. Probably the only initial classpath entry is for jrun4/lib/jrun.jar but I added the whole classpath I saw in the script.

    10.10 How do I set up RootCause for use with IBM Websphere AppServer 4.0.5 Application running on Solaris?

    (This assumes that you have run the demo program and have some basic knowledge of RootCause.)

    The first thing we need to do is check that you are running one of the JVMs that RootCause supports. Websphere ships with it's own version of the JVM and so long as this is a Sun JVM, RC will work for the tracing components. If it's an IBM JVM, that's only supported on AIX.

    To see this, open an X terminal or some other console on the machine that you have Websphere running on. Set up for RootCause by running the setup script in the RootCause installation and turning rootcause on (refer to the demo if necessary for these steps). From the same terminal start Websphere.

    Open the RootCause GUI ("rootcause open") and it will open the log of started processes. Near the bottom you should see the Java process starting for Websphere. It should show the path to the JVM so you can check it's version.

    All being well you can just select the process start node, right-mouse button click and choose "Open Associated Workspace". Accept the defaults and you will be able to setup your trace against Websphere.

    If the JVM supplied is a 1.2.2 *production* version of the JVM then it will be necessary to swap to a *reference* implementation if you want to use the Java memory analysis probes. For more information refer to: the Memstat documentation.

    10.11 How can I dump Java objects with a probe on a known program point, rather than at a certain elapsed time as done by java_memstat?

    The java memstat probe is built on top of another probe called libapjvmpi. It is an interface to the Java JVMPI library and takes care of a bunch of the low-level work. One of the things it provides is a mechanism to take a heap dump. Working with the interface requires getting a dynamic pointer to the libapjvmpi interface and then using that. For instance:

    
    #include "libapjvmpi.h"
    
    static apjvmpi_InterfacePtrT JvmpiInterface = NULL;
    static apjvmpi_InterfaceHandlePtrT JvmpiHandle = NULL;
    
    void InitializeUal_early_heapdump (void)
    {
       // Load the jvmpi interface UAL
       if (ap_IsNoUalId (ap_LoadAndInitializeUal (LIBAPJVMPI_LIBRARY_NAME)))
       {
          ap_Error (ap_WarningSev,
                    "Unable to load "LIBAPJVMPI_LIBRARY_NAME"\n");
       }
    }
    
    probe program
    {
       on_entry
       {
          JvmpiInterface = apjvmpi_Initialize;
    
          if (JvmpiInterface == NULL)
          {
             ap_Error (ap_WarningSev,
                       "Unable to initialize JVM support for\n"
                       "Java object tracking.");
             return;
          }
    
          // Get an interface handle
          JvmpiHandle = JvmpiInterface->Initialize (3);
          if (JvmpiHandle == NULL)
          {
             ap_Error (ap_WarningSev,
                       "Unable to get a necessary interface for "
                       "Java object\n"
                       "    tracking. It requires interface version 3 but the 
    "
                       "apjvmpi library\n"
                       "    is at version %d\n",
                       JvmpiInterface->GetVersion ());
             JvmpiInterface = NULL;
             return;
          }
       }
    }
    

    To call the heap dump you would need a probe to determine when and call the heap dump routine:

    
    // Request a heap dump. Keep the last n heap dumps specified - note that
    // if there is already a larger count set, that value is retained.
    // void (*RequestHeapDump) (apjvmpi_InterfaceHandlePtrT Handle,
    //                          int                         RetainHeapDumpCount);
    
       {
          // Keep 3 dumps
          JvmpiInterface->RequestHeapDump (JvmpiHandle, 3);
       }
    

    You'll need java_memstat around to format the object dump(s).

    11. RootCause TroubleShooting

    11.1 I applied a Trace on function (method) in the RootCause GUI, but I don't see it being called in the output. Why?

    Here are some possibilities:

    • The function was called so often that it was load shed, and calls stopped being recorded. Click on the LOAD_SHED node at the end of your Trace Display, choose Show Associated Table, and look for your function there. Using the option-menu in the first column can designate the function as Do Not Shed for subsequent runs.
    • The function was called, but it's not shown in the data file you're viewing. Use Add Data Files to Display in the File menu to add earlier files. If you still don't find it, then the data containing the last call may have been overwritten (i.e., the "trace buffer wrapped around"). You can save all data files containing the trace of a function by adding a SNAPSHOT probe ON_ENTRY to the function in the Trace Setup dialog.
    • There are multiple instances of the method in different classes, and you chose the wrong one. Use Find in Trace Setup and set traces on others that occur.

    The following possibilities apply only to native (C/C++) functions:

    • The function that's really being traced is in a different module. For example open() in libthread.so instead of open() in libc.so (see Q11.3 ). Use Find in trace setup and set traces in all modules where your function appears.
    • You did "Trace All In" which generates a wildcard, but the function was one of those that's not traced as part of a wildcard because it requires an expensive "trap" patch. Return to Trace Setup and force an explicit trace on this function by adding a "probe" on entry.
    • The function was optimized and so "inlined" at the point of call. If there really is no call, the function can't be traced.
    • The function cannot be traced. There are a few functions that because of the way they're coded simply cannot be probed. To test for this, go to the command-line and type:
      apinfo -sa -x your_application.exe | grep "your_missing_function"
    • or the Windows equivalent. If you see your missing function in the output, it cannot be instrumented. Contact OC Systems to find out why.

    11.2 I add a library as a dynamic module and trace the init function, but the trace doesn't show up. Why?

    When you add a module as a dynamic dll, this forces it to be preloaded (loaded before program start rather than at the point of the dlopen() / LoadLibrary() ). This means that the _init() function is called before _start of your main application, which is before probes have been applied.

    11.3 Solaris: I'm trying to trace file open calls, so I trace "open()" in "libc.so" , but I get nothing. Why?

    On Solaris, there is an open() in libc.so and one in libthread.so . They both call something called libc_open() in libc.so , so that's the one you should trace. In RootCause 205 this made more accessible using the shadow header file, so will show up as residing in source file libc.so.h . There's also a libc_close() .

    11.4 Solaris: I Add Dynamic Module of mylib.so, then specify some traces in mylib.so. But when I run the program, those traces don't appear. Why?

    You may be loading a different instance of the library at runtime than you specified to Add Dynamic Module. This may be the case if LD_LIBRARY_PATH (or equivalent) is set. Make sure that the full path to mylib.so you've added to your workspace is the same as the one that will be loaded at runtime. On Windows, change ".so" to ".dll" and LD_LIBRARY_PATH to PATH and this still applies.

    11.5 I did Custom..., and saved my probes to an APC file, but those probes don't show up in my trace. Why?

    Make sure the "Add to Custom APC Files" checkbox is checked. If you've already got an APC file, make sure the Append checkbox is checked as well. Also, see Q11.1 .

    11.6 Windows: Why doesn't the DOS dir command show my workspace, pi_demo.aws?

    Because workspaces are marked as System Folders, so we can associate a special icon with them. Of course this doesn't help when you're in dos mode. Use dir /as to see system folders.

    11.7 Windows: Why can't I see RootCause Workspaces in the Windows Explorer?

    Generally you should be able to -- they should show up with special Icon. If they don't then you're running Windows NT or the Icon wasn't registered right. In any case, workspaces are system folders, so you have to set your Folder Options to uncheck the Hide System Folders option.

     

    11.8 Windows: Something happened during Uninstalling the previous version, now I can't install the new one. What do I do?

    Older versions of RC would get "stuck" in a weird state where you can neither Uninstall nor Install. The following is the (ugly) procedure you need to follow to get your current version of RC completely uninstalled, and the newer version installed.

    I. Procedure for Manually UnInstalling RootCause

    1. Turn Intercept off by entering "rootcause off" in a CMD window.
    2. Launch the regedit utility by entering "regedit" in a CMD window.
    3. Navigate to: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services. Scroll down and you will see a subordinate Key named "rci". Delete this rci" Key.
    4. Again in the regedit utility, navigate to: HKEY_LOCAL_MACHINE\SOFTWARE\. Scroll down until you see a subordinate Key named "OC Systems". Delete the OC Systems Key.
    5. Reboot your machine.
    6. Reboot your machine again. Yes, our device driver guy says we need 2 reboots here.
    7. You might want to save a copy of your license key file, it's at %APROBE%\licenses\license.dat.
    8. Delete the RootCause installation directory (%APROBE%): e.g. C:\Program Files\OC Systems.
    9. Delete the following files: %SYSTEMROOT%\system32\drivers\rci.sys and %SYSTEMROOT%\system32\rcjit.exe.

    II. Procedure for Installing the New RootCause

    1. Install it. You can skip supplying a license key during installation (just ignore the warning about not supplying a license key).
    2. When Installation completes, do _not_ reboot yet.
    3. Copy your license key (the one you saved above) into the new installation.
    4. Now, re-boot your machine.

    11.9 How do I stop tracing something I've got a workspace for?

    You need to delete it from the registry. The easiest way to do this is with the GUI:

    • Open the workspace in the RootCause Console GUI
    • In the RootCause main window, click Unregister Program in the Workspace menu.
    From the command-line, do rootcause register -d -c class_name to unregister a Java main class.

    To unregister a native program, first do rootcause register -l to see the exact path of the program that is registered, then do rootcause register -d -x exe_path.

    11.10 What do I do about the message "(E) ADI checksum (0x84b1c4d) does not match module checksum (0xa1c5e35)." when I register on a .dply file at a remote site?

    This message will be followed by specific information about the ADI file and module. The module is the executable or DLL on the remote machine, and the ADI file contains the debug information from the host machine where the workspace was developed. The error messages indicate that the version of the module (application) on the remote machine does not match the version against which you developed your original traces. You must create the workspace and traces against the same version you send tot he remote site because we compare checksums. The only difference is that on Windows, the PDB need not be present on the remote machine because the ADI file contains the information that is needed. Unfortunately on Windows the Visual C++ release build defaults to stripping symbols (on Unix the default is to leave them in). Therefore you need to build to get symbols - you don't need debug, just symbols. If you do want debug in order to support the full range of probes, then you should build add /Zi and /link /debug to the Release build options when you build an application that is to be shipped. This is described here.

    11.11 (Windows) Why won't the "RootCause On" button stay checked in the GUI?

    If after installing the RootCause Console, you notice that when you click the "rootcause on" button in the main GUI window the checkbox won't stay checked, and when you execute the "rootcause on" command from a CMD window you get the following apparent error message:

    (E) The program intercept mechanism is not running.
          This is an installation problem: contact support@ocsystems.com
          RootCause registry found and RootCause is DISABLED.
    
    then the most likely cause of this problem is that you forgot to reboot your machine after installing RootCause; Please reboot and try again.

    If a system reboot does not correct the problem, please follow this procedure to obtain debug information to assist OC Systems support in resolving your problem:

    1. Open a CMD window and enter "winver". This utility reports what Windows version you are running.
    2. Open a CMD window and enter "eventvwr" - this will launch the Event Viewer application (you can also use Control Panel -> Administrator Tools -> Event Viewer).
    3. Within Event Viewer, do a "save as" operation on the System Event Log, being sure to save the file as a "Event Log (.evt)" type file.
    4. Please forward the output from the "winver" command and the .evt file to support@ocsystems.com

    11.12 Why does my Java app fail with "Class Not Found" under RootCause, but work fine without RootCause?

    The most likely cause of this is that you're using the "-jar" option on your 'java' command, which is not supported by RootCause prior to version 2.1.2 (October 2003).

    So, if your application is run with

    
    java -jar $APROBE/lib/probeit.jar
    
    You could run it instead with:
    
    java -classpath $APROBE/lib/probeit.jar  com.ocsystems.probeit.Main
    

    If you don't know what the main class is, it is defined in the manifest of the .jar file. For instance:

    
    mkdir tmp
    cd tmp
    jar -xf $APROBE/lib/probeit.jar META-INF/MANIFEST.MF 
    grep "Main-Class" META-INF/MANIFEST.MF
         This will give a line "Main-Class: com.ocsystems.probeit.Main".
    cd ..
    rm -rf tmp
    

    You would do the same thing using your own java command line and jar file in place of the above.

    After you have changed the command line, you should then re-run the application and got through the "New Workspace" steps. This time it should work fine.

    If this is too much of a hassle, contact support@ocsystems.com about getting a version with -jar support. If you weren't using -jar, or if the problem persists after going through above process, also contact OC Systems support and we can help you debug it.

    11.13 How can I probe Java classes loaded with a custom class loader and so not in the CLASSPATH?

    You will find that when you use "Open Associated Workspace" it imports only the jars in the class path and and so other classes that might be explicitly loaded do not appear in the Trace Setup. This can be easily remedied.

    So long as the class loader follows the standard model for class loader inheritance (e.g. classes loaded by that loader have visibility to classes loaded by the application class loader) this is trivial:

    1. From the Main Workspace menu choose the Setup->Class Path menu item o bring up the Class Path dialog.
    2. In the Class Path dialog, add the path(s) to the class directories or jar files you will be loading from. Note that this does not have to be where they will be loaded from at runtime. This just gets them into the Trace Setup.

    If there is no physical representation of the class available, you can use wildcards:

    1. Select the Root Java Module in the Trace Setup;
    2. Right click to bring up the context menu;
    3. Choose Edit Wildcards to pen the >Edit Wildcards dialog.
    4. On the left "Trace" side of the dialog, enter strings like:
      "MyClass::*"
      "MyClass::aMethod"

    11.14 When I have "rootcause on" I sometimes notice that commands piped together (for instance "env|grep MyVariable") can hang for a while before completing. Why is this?

    Your home directory (which will be the default disk for the rootcause log) is probably on an NFS disk. When two processes try to lock a file at the same time, one will be halted until the other one is done. However, with NFS it can take a while for the state of the unlock to propagate back, leaving the caller waiting on the lock routine even though the other process has unlocked it. The solution is to set APROBE_HOME to a local disk.

    11.15 (Windows) Trying to apply RootCause to a service, I get MessageBox (after a reboot) saying there was a timeout and the Service failed to respond. Why?

    The System Control manager (SCM - the process that handles the Services applet) is real picky about the timings for Service start and stop. With Rootcause enabled, we may be delaying the start of the Service just enough to cause SCM to complain.

    Can you check to see if the Service aborted? Or better said, is the service Service running after you see the ErrorMessage box. Use Task Manager to determine this - once you get the timeout, the Services applet doesn't report Service status properly. If the Service aborted, please let me know - we may have to excluded it from Intercept.

    There's a Registry _VALUE_ that controls the Service timeout:
    HKLM\SYSTEM\CurrentControlSet\Control\ServicesPipeTimeout.
    This is a REG_DWORD value that probably has value 120000 (which is two minutes).

    Try increasing this value (e.g. double it) to see if it address the problem.

    11.16 When I add my library to the workspace with Add Dynamic Module and run with RootCause, my application never starts. What's wrong and how can I fix it?

    Add Dynamic Module causes a library to be "preloaded" (using the aprobe -dll option) because it's only on program startup that automatic trace configuration can be done. However, some user libraries cannot be preloaded because they rely on some global state being defined which isn't done until the program starts running.

    On Unix platforms, this (currently) means you can't trace or do anything else on this module. You're beat unless you can change the library to allow it to be pre-loaded.

    However, on Windows there is partial support for probing modules that are loaded after program startup. In particular, you can use custom probes, but you can't use the predefined probes which use the "probe all" feature. We wrote a subset of the trace probe for a customer to use on his dynamically loaded Windows library: dyntrace.apc. Give it a try and/or contact us for help.

    11.17 (Windows) The APC compiler fails on the giant APC file generated with apcgen. Now what?

    If you just use apcgen module.dll > module.apc you'll get a huge file. This file is translated into ANSI C, then compiled with the native compiler. The capacity of the Visual C++ isn't huge, so that can fail with an error like:

    
    module.apc(102598) : fatal error C1076: compiler limit : internal
     heap limit
     reached; use /Zm to specify a higher limit
     (E) apc could not compile the file module.apc_c.c.
    

    You invoke apc with the "/Zm" option as suggested by adding:
       -compiler /Zm300
    on the apc command line, which increases the VC++ compiler heap to 300% of the default maximum.

    Alternatively, you can attempt to break up the apc file by-hand, or can generate just a subset of the traces by using the -p and -f options on the apcgen command; Use apcgen -h for brief usage or see Appendix A of the Aprobe user's guide.

    11.18 (Windows) Why does our application crash due to a bad return code from CoInitializeSecurity() when running under RootCause?

    We don't know, exactly. However, we have identified a simple workaround probe:

    
    #include 
    probe thread
    {
       probe "CoInitializeSecurity()" in "ole32.dll"
       {
          on_exit
          {
             if ($return == RPC_E_TOO_LATE)
             {
                 $return = S_OK;
             }
          }
       }
    }
    

    To use this:

    1. Put the above in an apc file, say coinit_workaround.apc;
    2. compile it with    apc coinit_workaround.apc
    3. copy the resulting coinit_workaround.dll file into the workspace.
    4. rerun the application under rootcause

    11.19 Is there a way to add my own files to a deploy file so they will unpack into the directory created by rootcause register xxx.dply?

    A .dply file is just a zip file. You can just use zip (provided with RootCause) to add files to this archive, like:
       zip xxx.dply this.txt, that.class, other.ual

    11.20 Why doesn't the pi_demo program doesn't run Linux Fedora Core 3?

    Because it was built on an old version of Linux. You can rebuild it from source using the Makefile in that directory, or else load the compatibility package for Fedora: compat-libstdc++-*.i386.rpm.

    11.21 Why didn't my trace on Linux didn't log any data?

    If your Workspace is being accessed over NFS, this means you're writing the data to APD files over NFS, and Linux has known bugs with this. You really need to have your workspace/APD files on a locally-mounted disk. (Even if it weren't for this bug, logging over NFS is orders of magnitude slower.)

    11.22 How can I eliminate "WARNING: Could not create system preferences directory" when I start the RootCause GUI?

    If you're seeing something like:

    Starting RootCause...
    eddea02:/home/essc2/josephw/devenv/cstnd/src
    ==>Jan 25, 2005 8:26:39 PM java.util.prefs.FileSystemPreferences$2 run
    INFO: Created user preferences directory.
    Jan 25, 2005 8:26:41 PM java.util.prefs.FileSystemPreferences$3 run
    WARNING: Could not create system preferences directory. System
    preferences are unusable.
    
    First, bear in mind is that the warning can be safely ignored.

    Sun's workaround is to run as 'root' with any Java application once.

    (There's also a way to eliminate this entirely, but it requires Java 1.4 and for compatibility reasons, the RootCause GUI is built with Java 1.2.2.)

    12. Aprobe FAQ

    12.1 What is Aprobe?

    Aprobe is a suite of tools and libraries which support dynamic modification and extension of a program by dynamically patching the program executable and/or shared libraries.

    A dictionary defines "Probe" as "Device for exploring an otherwise inaccessible place or object." "Aprobe" stands for "Algorithmic Probe". It is hence a tool for exploring your program with the help of user-written algorithmic probes. These probes are installed into your program with the help of OC Systems' patented "dynamic action linking" technology.

    A user runs a program with the "aprobe" tool, indicating that certain "probes" are to be patched into the program and executed as the program itself runs.

    A "probe" consists of "actions" composed in C, with some special syntax added to indicate where in the program the actions are to be invoked.

    There are a number of predefined probes included in Aprobe; there is a tool to generate simple probes directly from a linked or unlinked object file; or the user may easily compose his own probes in a simple extension of the C language.

    See also "What is RootCause?"

    12.2 What is ProbePak?

    The ProbePak was an experiment at introducing users to the power of Aprobe and RootCause by making a subset available for free download. It didn't work out, and ProbePak is no longer supported. See the main page www.ocsystems.com for information on our current products.

    12.3 What are some potential uses of Aprobe?

    Read more about uses of Aprobe in the Product section of the web site or read the white papers in the Resources section. See also "What are some potential uses of RootCause?"

    12.4 How do I get started quickly with Aprobe?

    The best way to get started writing probes is to look at examples, and make some small changes.

    If you have RootCause and have been using the GUI, you can use the Custom... button in the Trace Setup window to generate a probe, and look at that. If that looks too daunting, or you want a more tutorial approach, try the graduated examples in the examples (or ada_examples ) and demo/Aprobe subdirectories of the Aprobe installation. Check out %APROBE%\Examples\Simple\Readme.txt (Windows) or $APROBE/examples/evaluate/README (Unix).

    12.5 Who can use Aprobe?

    Technical people who are developing, testing, and maintaining software.

    12.6 What different versions of Aprobe are there?

    The current version of Aprobe on AIX is 4.4.1; on all other platforms it is 4.3.4b, released in June 2005.

    The original version of Aprobe is version 2. for AIX, included as part of OC Systems LegacyAda/OATS product, and in earlier versions of OC Systems "PowerAda" product.. While it shares the "probe" concept with the newer version, the user interface and details of Aprobe Version 2 differ substantially from Versions 3 and 4.

    12.7 For which platforms is Aprobe available?

    Aprobe is currently available on AIX, Linux (x86), Solaris, and Windows 2000/XP.

    The detailed requirements are documented in Chapter 2 of the RootCause User's Guide for Unix or Windows .

    12.8 How do I get Aprobe?

    E-mail , and we will arrange for you to receive the software.

    12.9 What documentation is available for Aprobe?

    Aprobe is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are included in the evaluation version which can be downloaded. The HTML version is available on-line at www.ocsystems.com/sup_ug_index.html .

    There are a series of graduated examples that come with their own text documentation in the examples and demo subdirectories of the Aprobe installation. You should read %APROBE%\Examples\Simple\Readme.txt (Windows) or $APROBE/examples/evaluate/README (Unix), and try at least some of the examples under that directory, before trying Aprobe on your own application or looking through this FAQ for answers.

    12.10 What tools make up Aprobe?

    apcgen - generates APC for some or all functions in the specified object file(s)

    apc - compiles and links the specified APC file(s) into a UAL (DLL).

    aprobe - runs the specified program after loading and applying patches in the specified UALs.

    apformat - formats any data logged in the specified aprobe data (APD) file(s).

    These tools are described further in other questions below. A number of additional tools and scripts and for specific situations are also provided. See Appendix A of the Aprobe User's Guide.

    12.11 How is Aprobe licensed?

    Same as RootCause. See Q1.9 .

    12.12 Is there a point-and-click (GUI) interface to Aprobe?

    Yes. It's called RootCause. See Q1.1 .

    In addition, on Windows, we provide a GUI for Aprobe, which is deprecated but still works. Try the Aprobe menu under the Start menu on your workstation to start it.

    Also, Some predefined probes (see Q15. below) include a Java GUI to specify configuration parameters for that probe.

    12.13 Can I run Aprobe on any executable program file?

    Yes. You can run aprobe (without any probes) on any application at all unless:

    • It is a secure application which a debugger doesn't have authority to attach to. In this case you should get a clear explanatory message.
    • The application does something very strange like replacing some low-level system routines with its own versions that do something different.
    • There's a bug in Aprobe.

    If you find that using aprobe causes your application to crash, you should try running aprobe without any probes. If it still crashes, it should be reported as a bug to .

    A slightly different question is, "Can I use Aprobe to put probes on any program?" To actually apply probes to a native module, there are three basic requirements:

    • Symbols

    For Aprobe to do what it does it must be able to figure out where the subroutines you are trying to probe have been linked and loaded. We call this location information "symbols". All symbolic debuggers have the same requirement. See Q12.17 .

    The symbols may be as originally added to the application (i.e., not stripped, see Q12.16 ), or they may have been saved separately by Aprobe using apmkadi (see Q13.11 ).

    Most programs delivered with the operating system, and off-the-shelf software, are stripped, so you can't use Aprobe directly on the application code, but you can generally probe shared libraries (DLLs) that support them.

  • Standard Call/Return behavior
  • If the program uses a mechanism that transfers control other than by the normal call and return mechanism, such as setjmp / longjmp or an unsupported exception mechanism, and there is an active probe at the time of that non-standard transfer of control, the program will likely crash.

  • Supported exception mechanism.
  • Ada and C++ (and Java, but that's a separate issue) support exceptions which are non-standard transfers of control. Each compiler does this in a different way, and must be explicitly supported by the Aprobe runtime. See Q12.15 .

    12.14 In what language(s) can my program be written?

    Same as for RootCause. See Q1.10

    12.15 What compiler(s) must have been used to compile my program?

    Same as for RootCause. See Q1.11

    12.16 (Unix) How do I tell if a program file is "stripped"?

    Use the "file" command, e.g.:

    Solaris:

    $ file a.out 
    a.out: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, not stripped
    $ file /bin/ls 
    /bin/ls: ELF 32-bit MSB executable SPARC Version 1, dynamically linked, stripped 
    

    AIX:

    $ file a.out
    a.out:      executable (RISC System/6000) or object module not stripped
    $ file /bin/ls 
    bin/ls:     executable (RISC System/6000) or object module
    

    Linux:

    $ file a.out
    a.out: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), not stripped
    $ file /bin/ls
    /bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped

    12.17 How do I tell what symbols a program has available?

    apcgen -L will list the Aprobe function symbols in any compiled object module, for example:

    apcgen -L C:\WinNT\system32\kernel32.dll
    apcgen -L /usr/lib/libc.so
    apcgen -L /work/programs/prog.exe
    

    There are other apcgen options such as -m to show "mangled" names and -v to show file names--use apcgen -h for usage.

    The RootCause Trace Setup window shows a tree of all the functions organized by module, directory and file, using the same mechanism used by apcgen.

    If you want information about data symbols, or want to confirm that a function may actually be probed, you can use the apinfo command, which runs the "info" predefined probe. This only works on executable programs. For example:

    apinfo -d /work/programs/prog.exe

    will show all the global and file-static data symbols found when prog.exe is loaded by aprobe. There are lots of other options: use apinfo -h to see them. See Q13.7 if you're on Windows and apinfo prints nothing at all.

    12.18 What do I do to get symbols in my program?

    On Unix, every program has its symbols unless they're explicitly stripped (see Q12.16 ). So, to get symbols in the program, don't run the "strip" command or link with an option that causes the resulting program to be stripped. Shared libraries always have at least global (external) symbols.

    On Windows, if you have control of how a program is built you can greatly expand the routines available for probing by adding the "/debug" switch and the "/pdb:none" switch when you link. These switches cause symbol information to be generated and the information to be put in the executable.

    On Windows, the Microsoft Visual C++ compiler can produce symbols in two major forms or it can produce no symbol information at all. It can also place the various types of symbols either in the executable itself or in two kinds of separate files. Complicating this issue further is that shared libraries (aka DLLs) can be produced with the entire range of debug information forms available. One good item is that DLLs almost always expose their public interface by name.

    Generally software you get from a vendor (COTS) will have no symbol information included with it. However Windows programs are generally broken into a number of DLLs and the interface between DLLs is visible to Aprobe. Thus while you can't probe routines local to Notepad.exe you can probe how Notepad.exe makes calls to CreateFileW in KERNEL32.DLL.

    12.19 What do I do to get "debug information" in my program?

    This is documented in Chapter 10 of the RootCause User's Guide, "Building a Traceable Application", and in Chapter 3 of the Aprobe User's Guide, but it's summarized here:

    • For Unix compilers (except PowerAda) compile with -g.
    • With PowerAda you get debug information by default, but you need the PowerAda program library available just as you would for adbg.
    • For Windows, compile with /Zi and link with /debug.

    In addition to compiling with the right option to generate the debug information, you also must retain that information and have it available where it's supposed to be:

    • On Windows, this information is put in a separate PDB file, which should be in the same directory as the module itself.
    • For gcc-based compilers, including GNAT, and IBM's C and C++ compilers, debug information is collected at link-time into the executable, and is retained unless you explicitly use the "strip" command.
    • The Sun WorkShop (Forte) compilers do not copy the debug to the executable, but leave it in the object files and simply build an index in the executable which gives the path to all the object files. You can force the debug information into the executable by compiling (not just linking) with the -xs flag.
    • PowerAda line information is recorded in the executable and can't be stripped, so you don't need any debug information at run-time. However, for `apc' and the RootCause GUI, you need the Ada program library, which must be consistent with the executable and available at the same location recorded in the executable. If the library is moved, you can specify its location with the environment variable APROBE_POWERADA_LIBRARY, for example
      export APROBE_POWERADA_LIBRARY=/builds/old/prog1/adalib

    12.20 How do I tell if a program file has "debug information"?

    The apcgen command will list those functions that have debug information associated with them:

    apcgen -Ld a.out

    This should be all you need, but on Unix there are some system utilities that look in the object files themselves that may also be used:

    • On Solaris, we provide the readelf utility delivered with Aprobe to determine if there is debug information for a given file or function. The command
      readelf -d a.out | grep "N_FUN.*:F" | awk '{ print $NF }'
    
    • will list all functions for which there is debug information. If you don't see what you're looking for in an executable, and you are using the C or C++ compiler, the debug information may be in the individual object file. Use the above command on the appropriate object file, or, if you're not sure where it is, do:

      readelf -d a.out | grep N_OBJ | awk '{ print $NF }' 
    
    • to see all the object files referenced by the executable.

    • Linux uses ELF format also, but the similar utility is objdump :
      objdump -G a.out | grep "N_FUN.*:F" | awk '{ print $NF }'
    • (Note that while readelf is available on Linux also, it only shows DWARF debug information which isn't yet supported by Aprobe).

    • On AIX, you can use the dump utility with the -t option to dump symbol information, including debug "stab" strings. For example:
      dump -t a.out | grep ":F"
    • will show the functions that have debug information.

    12.21 What is a "probe"?

    A "probe" is a "user action" associated with a specific location in a program. The user action is executed whenever control passes through the location with which it is associated. A "probe" is described in an extension of C called "APC", for example:

    probe thread 
    {    
      probe "foo"     
      {
        on_entry
        {    
          printf("Entering foo.\n");
        }     
      } 
    }

    The block following the "on_entry" is the "user action". The syntax surrounding it describes exactly where and when the action should be executed: immediately upon entering function "foo()" in each thread.

    12.22 What is a "UAL" (.ual file)?

    A UAL is a "User Action Library". It is the output of the "apc" command, and is a shared library consisting of the object code generated from your apc files. Not just any shared library (DLL) may be used as a UAL, and it a UAL may not be renamed after creation, because it has specially-named entry points based on its filename which are called by the Aprobe runtime to perform initialization.

    12.23 Why does a Windows UAL file have a file extension of ".dll"?

    It has an extension of DLL because it is a regular Windows Dynamic Link Library (DLL). In most cases it could also be named .ual , but there are some cases where the .dll is required by Windows.

    12.24 What is "logging"?

    With respect to Aprobe, "logging" means "writing data to a file for later analysis" Aprobe provides a built-in logging facility that allows saving raw data in a time and space-efficient way, and using "apformat" to display the logged data later. See "Logging Data" for related questions.

    12.25 What is an ".apd" file?

    An ".apd" file is one that contains the data generated by a program run under aprobe. These are binary files which are read with the "apformat" tool.

    There is always a ".apd" file generated giving aprobe invocation information, even if no "log" statements are executed. If log statements were executed there will be a "-1.apd" file, and maybe "-2.apd" files as well.

    12.26 What can't I do if my executable or library doesn't have debug information?

    You can't reference source-level information in your probes. It's just like using a source level debugger in this respect, and for the same reason. A good rule is, if the debugger can print the value of a variable x at line 15, then you can do "on_line(15) log($x)" in your probe.

    More specifically, you need to specify "-x exe_or_library " on the apc command, and the exe_or_library must contain debugging information, if you use a construct in your probe that cannot be resolved without specific debug information from the program. Such constructs are:

    (a) target expressions: names from the probed program preceded by $, or $* ($1, $2 are ok, as are hardware-register references starting with '$$').

    and

    (b) references to specific source lines;

    Note that there are lots of probes you can write; for example, all but one of the predefined probes provided with Aprobe will work fine in the absence of debug information, and the one that does require it (coverage) does so in order to get source line number information.

    12.27 Does use of on_line() requires application to be have debug information?

    Yes, but things aren't that simple. To build a probe that requires debug information (including line information) the debug must be available when the probe is compiled. However, the debug information can then be stripped and the probe ran against the stripped executable.

    For the symbol table, the necessary symbols must be present at runtime, either in the application (or application libraries) or in a .adi file which is generated with the Aprobe tool apmkadi . That tool allows you to capture the symbol table in an internal form and then strip the executable.

    Also, PowerAda programs always contain source line information -- this is not considered debug information.

    Finally for low-level hacking, you can instrument specific offsets using on_offset.

    12.28 What is the maximum number of probes allowed?

    For probes you are just limited by paging space. For UALs there is a more practical limit - we limit the total number of modules to 255 and that includes UALs.

    12.29 Is there access to C++ private/protected variables?

    Yes, if it's in the debug we can see it. We don't look at whether the debug says it's private, protected or public - we just use it.

    12.30 Is there any way to attach with Aprobe to a running application?

    No. This question is very frequently asked. It sounds great in theory but in practice Aprobe is a tool for tracking problems that have yet to happen, not those that have just happened. There is also quite a bit of work done by Aprobe when an application starts up; often doing this to a running application is as big an issue as re-starting the application.Finally for Java you wouldn't be able to change the classpath to see our classes or intercept classes that have already loaded.

    12.31 Is there a way to probe a function for which no symbol is available?

    Yes, if you know its address and size, you can define a symbol for it using ap_RecordDynamicFuntionSymbol() in the Aprobe Runtime Library and and then apply probes using the define symbol.

    Here is are example C and Apc files illustrating how to use it.

    defsym.c

    #include 
    #include 
    
    static char *image(char *s) 
    {
       char *s1 = strdup(s);
       return s1;
    }
    
    int main (void)
    {
      printf (image("Hello\n"));
      return 0;
    }
    

    defsym.apc

    //---------------------------------------------------------------------------
    // Define Dynamic Function Symbol Example
    //
    // This is an example of using ap_RecordDynamicFunctionSymbol()
    // to define symbols when no debug information is available.
    //
    // NOTE:  If the offset for symbols is wrong the program will
    // likely crash because you will have directed Aprobe to instrument
    // the wrong piece of code.
    //---------------------------------------------------------------------------
    
    #include "aprobe.h"
    
    // To define your symbols early enough to be instrumented and
    // probed, you have to define them from a UAL initalize function.
    // The initial part of the name must be InitializeUal_, and the first
    // character following that must be lower in the ASCII collating order
    // than the first character of the UAL name. '0' is the lowest legal 
    // character.
    
    void InitializeUal_0_defsym_apc()
    {
       // In this example I just define an alias for the symbol "main"
       // and probe that instead.  You have to know the correct offset
       // and size of the function (though size is not so critical).
       // The offset is the offset in the moudle, not just the text
       // section.
       ap_SymbolIdT NewSym = 
          ap_RecordDynamicFunctionSymbol (
             ap_ApplicationModuleId(),
             "MyAliasForMain",
             ap_ExternSymbol,
             ap_IntegerToOffset(0x10),
             0x1d,
             0);
       if (ap_IsNoSymbolId(NewSym))
       {
          printf("Couldn't define symbol...\n");
       }
    }
    
    probe thread
    {
       // You'll get a warning about the symbol not being defined
       // when you compile this with apc, but it's OK.
       probe "MyAliasForMain"
       {
          on_entry  printf("Hello again...\n");
       }
    }
    

    13. Using the "aprobe" Command

    13.1 What does "aprobe" do?

    Aprobe locates the specified UALs (if any), loads them as well as the Aprobe runtime, patches the executable to invoke the probes described in the UAL files, and starts execution of the specified program.

    13.2 How do I specify options to my program when using aprobe?

    The executable program name is the last argument on the aprobe command line. All options after that are passed as arguments to the executable. For example, if your regular command-line would be:

        mygrep "a_string" *.txt

    Then with aprobe it would be, on Unix:

      aprobe -u mygrep.ual mygrep "a_string" *.txt
    and on Windows:
      aprobe -u mygrep.dll mygrep.exe \"a_string\" *.txt

    Note the backslashes needed to preserve quotes passed to the program.

    The most reliable way to do this, used by RootCause, is with the aprobe "-execvp" option. In this case you specify a filename in place of the parameters, and the filename includes all arguments, including "argv[0]" that is to be passed as the executablename. For example, in the above case:

    aprobe -execvp -u mygrep.ual mygrep mygrep.args

    where mygrep.args might contain the lines:

    mygrep.exe
    "a_string"
    file1.txt
    file2.txt

    13.3 How do I specify options to my probes?

    Options and parameters can be passed to each UAL as well. This is done by following the UAL name with the -p option followed by the options in quotes. This is most commonly seen when invoking a predefined probe that is part of Aprobe, for example:

       aprobe -u info -p "-sa" mygrep.exe

    The options to the info probe are "-sa".

    13.4 How do I print my output at run time instead of sending to the APD file?

    The "-if" ("immediate format") option on the aprobe command does this, e.g.,

       aprobe -if -u fooTest foo

    13.5 Can I suppress generating an ".apd" file?

    Not at this time. Even if you do "aprobe -if -n 0 ... " you get the basic .apd file.

    13.6 How can I run my probes without invoking aprobe?

    Use RootCause. That's one of it's key features If for some reason you can't do that, you can

    • Substitute the Aprobe command line for the command that starts your program
    • Rename your application and replace it with a hard-coded script that calls aprobe on the renamed executable. On Unix there is a script delivered that facilitates this, called "run_with_aprobe_edit", which documents its use.
    • On Unix, use the "run_with_aprobe_apo" script. The script text includes documentation on its use.
    • On Solaris there's an additional option of linking aprobe into your application. This mechanism is coming soon for AIX and Linux, but will never be available for Windows.

    See Chapter 4 of the Aprobe User's Guide, "Loading Probes without aprobe".

    13.7 Windows: When I run my program with aprobe I don't get any output even though I know I'm executing routines with probes on them and those probes use printf. What's going on?

    First make sure that Aprobe was correctly installed. You can do this by running one of the examples in the Aprobe\examples\... directories.

    The most common reason you don't see any output is that standard output from your program is going to the null device.

    A windows program can be linked for one of several Windows subsystems. If the subsystem is the "Windows GUI subsystem" standard output seems to go elsewhere. You can determine what subsystem your program has been linked for by using QuickView (look for the "subsystem" entry under the "Image Optional Header"). You can still get your output by redirecting it to a file using the -o switch on the Aprobe command. For example:

       aprobe -u myual.dll -o!stdout.txt!stderr.txt myexe.exe

    In this case you are indicating that:

    standard input: <none> standard output: stdout.txt standard error: stderr.txt

    13.8 Windows: When I use the -o switch to redirect the output of my program to a file(s), the output seems to be out of order?

    This is a result of the buffering that Windows does for each executable/DLL.

    For each module (the executable and all the DLLs) NT sets up an individual output buffer for standard output and standard error. If the output is going to a device like a command window no buffering is done and the output from different modules is interleaved on the output device. However if the output is going to a file then the buffer is dumped only when it fills (or the program ends). You can control this buffering by using the C runtime call setvbuf() .

    Aprobe turns off buffering in each UAL you produce. Thus if you are using multiple UALs and the Aprobe runtime all output will be interleaved correctly. Unfortunately output from your application may still be buffered. Currently your options are to direct stdout and stderr to a terminal device like a command window or you can recompile your application and make the following call in it:

      setvbuf(stdout, NULL, _IONBF, 0);

    In the near future we will provide a way in a probe to turn off buffering in your target program.

    13.9 Unix: How do I probe a function in a dynamically-loaded shared library?

    If your program explicitly loads a file by calling dlopen("dynamic.so"), Aprobe does not support this directly since it does all its patching when the executable and any shared libraries linked in are first loaded into memory. So the only shared libraries you can probe are those listed by the command

    ldd exe_name

    However, the LD_PRELOAD environment variable can sometimes be used to achieve the same goal. Suppose that we have executable a.out which will load at some point libmyfuncs.so using dlopen. The following would cause the shared library to be loaded with a.out and thus accessible to Aprobe:

    LD_PRELOAD="/full/path/libmyfuncs.so" aprobe -u myprobes.ual a.out

    This assumes that libmyfuncs.so is not dependent on any other shared libraries and that it doesn't hurt to be initialized earlier than would have been the case with dlopen.

    13.10 Can I probe a function in native C or C++ code loaded by a Java application?

    In general, if you're using Java, you should be using RootCause and not Aprobe directly. However, you can do this using apjava -dll option. If your Java class named "JTest" contains "LoadLibrary("native") then this should work:

    apjava -dll /full/path/libnative.so -u native_probes.ual -java JTest

    or similarly, for Windows

    apjava -dll \myprograms\native.dll -u native_probes.dll -java JTest

    13.11 Is there a way I can use Aprobe in a target environment where my application has no symbol or debug information with it (is stripped)?

    If you have a program that can be probed, you can run the tool apmkadi on it to create an Aprobe Debug Information (ADI) file. You can then remove the symbols from the executable (using the strip command on Unix, or removing the PDB file on Windows) and ship it to the target site. When you want to run Aprobe on that, you would then specify not only the UAL file(s) containing the probes, but the ADI file(s) as well, which contain only the symbolic information needed by Aprobe. See Appendix A, "apmkadi" for more information.

    13.12 Can I run aprobe but produce no APD files?

    Yes. The "-p" flag, which prevents generation of any APD files, was introduced in Aprobe version 4.2.5. This is useful if your probes don't log any data using the default log method.

    13.13 Why does my program crash when using aprobe, and not without?

    The possibilities are:

    1. There's a bug in your probe, for example, one of your action routines is dereferencing a null pointer. See "Debugging Your Probes" near the end of Chapter 3 of the Aprobe User's Guide.
    2. (Unix only) Your application provides its own "malloc()" function which requires initialization before its first use. Since aprobe gets control before your application does, and uses the application's malloc(), this could cause a crash on startup. See also Q15.14.
    3. (Windows only) The probe is opening a file to access symbol information when the application is in a state where this is not allowed. Since such states are not documented, aprobe has a "-R" flag that forces all symbol information to be loaded before the program starts. Using "aprobe -R" makes for slower startup on big applications but avoids this apparent Windows bug.
    4. Your probe is accessing or logging data on_exit to a function or thread, but the on_exit action is being called in an exception or thread exit condition and so may not have valid data available. In order to check for this, put on_exit code within a block that checks the ap_ProbeActionReason implicit parameter, e.g.:
        on_exit {
          if (ap_ProbeActionReason == ap_ExitAction)
          {
            log("foo returns ", $return);
          }
          else
          {
            log("foo exits abnormally for: ", ap_ProbeActionReason);
          }
        }
    5. Your program is very time-critical, or is such that timing may change the order in which order-dependent operations are executed. Aprobe introduces some overhead, and your probes likely introduce a lot more overhead, which can change your program's behavior. You can use aprobe itself to find out what's happening, and to force synchronization between threads -- contact .
    6. There's an Aprobe capacity problem. This may happen with the predefined probes if you select all functions, or (equivalently) specify "*" IN "*" in the configuration file. (See Q. 4.9). You can either reduce the number of functions you're probing, or increase the default probe stack size with "aprobe -q stacksize=20000000" (or some other big number).
    7. You're probing a function that doesn't follow standard generated code conventions. This can happen when you try to probe everything in a shared system library such as ntdll.dll on Windows, libc.so on Solaris or Linux, or libc.a(shr.o) on AIX.. See if it reproduces when you only probe known entry points in the system library, or limit your probes to your application module only.
    8. There's a bug in Aprobe. Contact Contact our sales department for more information ( ).

    13.14 AIX: Aprobe version 3.2 had the -s1 option to prevent conflicts with my application's shared memory. Is there a similar feature in version 4.2?

    We hoped that by getting rid of shmat() from our code that we would no longer cause conflicts. Unfortunately we didn't realize that the OS would choose memory map addresses that would conflict, so the problem immediately reappeared. We added a different flag to allow you to specify the memory area that should be used: -q mmap=address where address is the address that should be passed to mmap() when Aprobe requests its shared memory. For example:
    aprobe -q mmap=0xd0000000 -u myprobes myapp.exe

    If you don't have this flag, you'll need an updated version of Aprobe but you might be able to get around it:

    Many users find that they can avoid shared memory conflicts simply reducing the size of the APD files. The default maximum size is 256M persistent and 256M user APD file. By using a ring (aprobe -n flag) you can vastly reduce the user apd size and you can use the -sp flag to specify a reduced persistent file. For instance, the following:
    aprobe -sp 16000000 -n 5

    will create a persistent file of approx 16M and up to 5 APD files of 2M each.

    13.15 Why does Aprobe ask for such a large memory-mapped file on startup, when I've specified only a 4M APD file with "-s"?

    The size of the persistent APD file is controlled independently of the size of the APD ring files. You can use the -sp option to lower this significantly. The default is 256Mbytes because we need to set it to the maximum at the beginning. However we've found that 16M is generally sufficient in practice.

    If you look and see how big your persistent files grow you can use that at a baseline. The main things that get logged to the persistent file after program start are:

    • New java classes / methods
    • b) New threads
    • Tracebacks recorded with the traceback to ID mechanism
    • LOAD_SHED functions

    13.16 On Solaris and/or Linux, when I run my application under Aprobe it crashes during initialization with a problem in malloc. This doesn't happen without Aprobe. Why?

    The application might have a poor implementation of malloc built-in. On Solaris and Linux an application can provide it's own implementation of malloc, free, etc. and this will be used. Most local versions of malloc are well behaved. Some, however, require initializing by the application before first use. Since Aprobe gets in earlier than the main() this can cause a malloc request to be made ahead of it being initialized.

    If you have control over the code you should fix this by making the malloc self-initializing. If you don't then, unfortunately, you will not be able to run the application under Aprobe.

    13.17 Why can't RootCause see program symbols on a system using Visual Studio .NET 2003?

    If (for example) in the Trace Setup GUI window for the pi_demo example, you don't see anything under the pi_demo module, or you don't see all the information described in the demo, then it's likely that RootCause is not finding the DLL it needs to read the debug info.

    RootCause uses a DLL that is distributed with Microsoft Visual Studio to read PDB file information. The name of this DLL is slightly different for different versions of Visual Studio. e.g. For VC++ 6.0 it's mspdb60.dll, for .NET (VC++ 7.0)=20 it's mspdb71.dll.

    The pi_demo example was built on VC++ 6.0 and by default RootCause will not use the VC++ 7.0 DLL to read VC++ 6.0 PDB files. There's an Environment Variable that can be set to alter this behavior.

    Here's what you need to do:

    1. Set Environment Variable "APROBE_USE_DIA" to the value "1" either user/system-wide using the System Properties -> Environment Variables applet, or in the CMD window where you launch the RootCause GUI (via the "rootcause open" comand) and the pi_demo application (e.g. enter "set APROBE_USE_DIA=1").
    2. Make sure that the path to the location of file mspdb71.dll is in your %PATH% Environment Variable. The easiest way to do this is to execute the vcvars32.bat setup file (under the Visual Studio installation directory) in the CMD window where you launch the RootCause GUI (via "rootcause open") and the pi_demo application.
    FYI, the following are the locations of the vcvars32.bat and mspdb71.dll
    files for a typical .NET Visual Studio installation:
    c:\Program Files\microsoft Visual Studio .NET 2003\VC7\bin\vcvars32.bat
    c:\Program Files\Microsoft Visual Studio .NET 2003\Common7\IDE\mspdb71.dll
    

    14. Using the "apformat" Command

    14.1 What does apformat do?

    apformat reads one or more related APD (.apd) files and formats the data they contain. For example, if the command

        aprobe -u a.ual a.exe

    produced the files

        a.apd a-1.apd

    Then the command

        apformat a.apd
    1. Reads a.apd to find out the executable (a.out) and UAL(s) (a.ual or a.dll) that were used by aprobe to generate the file, and what other APD files were generated (a-1.apd).
    2. Reads the data records contained in a-1.apd, and for each one, invokes the associated format routine contained in the UAL file, passing the data in the record as parameters to the format routine.

    14.2 Which of the ".apd" files do I specify on the command-line?

    If you specify the "base" one, without any number at the end (e.g., a.apd), all of the files that were written to during the most recent invocation will be formatted. If you specify an individual data file, such as "a-2.apd", only the data in that specific APD file will be formatted.

    14.3 Can I restrict the apformat output to just that generated by one of the several UALs provided at aprobe time?

    Yes. Use the "-z" option to indicate that no UALs are to be loaded implicitly, then use "-u" to explicitly state which one you want to use:

      apformat -z -u first myprog.apd

    14.4 Can I restrict the apformat output to just that generated by one or two of my format routines?

    Yes. If you provided your own format routines, you can do it by editing those routines and re-generating the UAL of the same name as the original .

    Lets say you have "dumpall.apc", from which you generated "dumpall.ual". Copy "dumpall.apc" to "dumpall.apc.save". Then edit "dumpall.apc" and comment out the bodies of all the format routines except for the one(s) you want to keep. Use `apc' to compile "dumpall.apc" into "dumpall.ual", e.g., apc dumpall.apc -x myprog then do:

      apformat -z -u dumpall myprog.apd

    The UAL name must be preserved because the basename of each UAL is part of the "key" used to map formats to data in the APD.

    14.5 Can I programmatically filter which formats are used?

    Yes, and this is actually preferable:

    1. Define global flags corresponding to the different kinds of filtering you want, initializing all to (say) "true".
    2. Code your format routines such that each has "if (FormatFlag1 || FormatFlag2) { ... }" guarding the execution of the print actions in the format routine.
    3. In the "on_entry" part of a "probe format", read the command-line arguments to the UAL (ap_UalArgc, ap_UalArgv), or an environment variable, or file, or whatever, to determine the desired settings of the flags.

    14.6 Can I do the previous 2 if I'm using automatically generated formats?

    No. The formats are generated automatically and there's no way to put your own conditions within them. (Of course you can put conditions around the log statement at run time, so that no data is recorded to begin with, but this is a different issue.)

    14.7 When do I need to specify the UAL file to apformat?

    When you want to use UALs different from, or in addition to, the ones that were specified when you ran aprobe. You might want to do this in order to only process part of the data, or use different format routines. Use apformat -z if you want to use only those UALs explicitly specified on the apformat command line.

    14.8 Can I use "apformat" without an APD file?

    No. There must be a valid APD file generated by aprobe.

    14.9 Aprobe works fine, but I get a crash from apformat; why?

    This is almost certainly because there's a bug in one of your format routines. See "Debugging Your Probes" near the end of Chapter 3 of the Aprobe User's Guide for Unix or Windows .

    However, if you didn't write any of your own format routines, either because you're using a predefined probe, or because you just used "log(something);", then this is probably OC Systems' fault and you should contact .

    14.10 Can can I use ap_UalArgv in "probe format ... on_entry" to get arguments passed at run-time (aprobe time)?

    No. ap_UalArgv at apformat time is for reading arguments passed to the UAL on the apformat command line, as in:

       apformat -u my_probe -p "param1 param2" t.apd

    You would have to log the data you need from run-time yourself, and format it later. This can be done by including the following APC file into your APC file prior to the "probe format" or other format routine in which you want to use the arguments. You can then use the variables ap_RuntimeUalArgc and ap_RuntimeUalArgv just as you would use ap_UalArgc/v at run time.

    /* logualargs.apc
     * Include this once per UAL to record runtime arguments for format time use. 
     */
    
    #ifndef _LOGUALARGS_APC_
    #define _LOGUALARGS_APC_
    
    static int ap_RuntimeUalArgc = 0;
    static ap_NameT *ap_RuntimeUalArgv = NULL;
    static void ap_RuntimeUalArgStart(ap_Uint32 *argc)
    {
       ap_SizeT size = ((*argc)+1) * sizeof(ap_NameT);
       ap_RuntimeUalArgc = *argc; 
       ap_RuntimeUalArgv = (ap_NameT*)(ap_Malloc(size));
       memset(ap_RuntimeUalArgv, 0, size);
    }
    
    static void ap_RuntimeUalArgAdd(int *pos, ap_NameT Arg)
    {
       ap_RuntimeUalArgv[*pos] = ap_StrDup(Arg);
    }
    
    probe program
    {
       on_entry
       {
           int i;
           log (ap_UalArgc) 
              with ap_RuntimeUalArgStart to ap_PersistentLogMethod;
           for (i = 0; i < ap_UalArgc; i++)
           {
              log(i, ap_StringValue(ap_UalArgv[i])) 
               with ap_RuntimeUalArgAdd to ap_PersistentLogMethod;
           }
       }
    }
    #endif
    

    For example:

    #include "logualargs.apc"
    
    probe thread 
    {
    }
    
    probe format
    {
       on_entry
       {
           int i;
           // Run-time arguments to this UAL
           printf("ap_RuntimeUalArgc = %d\n", ap_RuntimeUalArgc);
           for (i = 0; i < ap_RuntimeUalArgc; i++)
           {
              printf("ap_RuntimeUalArgv[%d] = \"%s\"\n", i, ap_RuntimeUalArgv[i]);
           }
           // Format-time arguments to this UAL 
          for (i = 0; i < ap_UalArgc; i++)
           {
              printf("ap_UalArgv[%d] = \"%s\"\n", i, ap_UalArgv[i]);
           }
       }
    }
    

    15. Using Predefined Probes

    15.1 What is a predefined probe?

    This is just a UAL containing probes written by OC Systems for a specific purpose. They are generally more complex than ones you would write yourself, and are designed to work on any program that can be probed. Most of these probes include a Java GUI to simplify parameterization of the probe for your specific program, such as specifying the functions to be probed.

    All predefined probes are in $APROBE/ual_lib/*.ual; the source code is $APROBE/probes/*.apc. The documentation for these probes is in Appendix D of the User's Guide.

    15.2 Do I have to use "apc" to build these probes myself?

    No! The UALs for all of the predefined probes are already built and located in $APROBE/ual_lib. This is in the UAL search path, so the simple name of the UAL is sufficient. For example:

        aprobe -u info myprog.exe

    15.3 The examples show invocation of predefined probes using aprobe -u info myprog.exe. How does aprobe find these UALs when they're not in the current directory?

    The directory $APROBE/ual_lib is always searched for UALs after the working directory. The environment variable APROBE_LIBPATH may also be defined to add additional directories.

    15.4 Can I use Coverage without using the Java configuration GUI?

    Yes. In fact, that's the default. There is no GUI for `info'. The coverage, profile and trace probes provide a GUI to assist in building or modifying configuration file which defines what should be done, but this file is just a text file that can be edited by hand.

    The `memwatch' predefined probe provides a "runtime" GUI to monitor memory usage as the program is running, and to take interactive snapshots of the allocation data.

    See the documentation for each probe in Appendix D of the Aprobe User's Guide for Unix or Windows .

    15.5 The trace probe really slows down the program--how can I speed it up?

    You should see the Aprobe User's Guide documentation about this probe. However, you can try these things in this order:

    1. Use Load Shedding by specifying "LoadShedThreshold 10" in your configuration file.
    2. Don't use wildcards like "Trace *", but rather use apcgen -L to list specific functions you want to trace and just name those.
    3. Use the TRIGGER configuration parameter to specify a specific call-tree you want to trace.
    4. Use the circular-buffer mechanism, by specifying SaveTraceDataTo CIRCULAR_BUFFER in the configuration file, rather than logging data in real time. Note that your program must complete in a well-behaved way in order to get a snapshot of the data logged to the circular buffer.

    15.6 Unix: How can I get a snapshot of my predefined probe data before my program dumps core?

    The ability to take a snapshot when an unexpected signal occurs is provided by combining the predefined probe of your choice with the "sigsegv" probe:

    
       // my_coverage.apc
       #include "sigsegv.h"
       #include "coverage.h"
       static void MyHandler(int sig, void *Data)
       {
          ap_Coverage_DoSnapshot("Snapshot on signal.");
       }
       probe program
       {
          on_entry
          {
             ap_Sigsegv_AddCallback(MyHandler, NULL);
          }
       }   
    
    Then you link this with the existing predefined probes:
    
      $ apc my_coverage.apc coverage.ual sigsegv.ual # creates my_coverage.ual
    

    15.7 Is there a way to invoke predefined probe operations from within my probes?

    An API for each predefined probe is defined by the ".h" file corresponding to it in $APROBE/probes. For example, "profile.h" defines "ap_Profile_DoSnapshotForAll()". To call this, you would #include "profile.h" in your APC file (it's in $APROBE/include as well, which is always searched for include files). Then when you compile your apc file, specify the UAL as if it were just another object file to link with:

        apc myprofile.apc profile.ual (Windows
    : apc myprofile.apc profile.lib)

    This will produce myprofile.ual (or myprofile.dll on Windows).

    15.8 How can my probes use the Java GUI facilities that the predefined probes use?

    There are two interfaces to the Java GUI objects used by the predefined probes. The one to start with is defined in $APROBE/include/quick_gui.h and implemented in quick_gui.ual( quick_gui.dll on Windows ) . This supports simple graphs, and interactive message, Yes/No, and confirmation dialogs. An example of using this is given in the example $APROBE/examples/learn/visualize_data/ .

    The full GUI interface used by the predefined probes like profile.ual is apGUI.h , but this is only for fearless experts.

    15.9 I'd like to customize a predefined probe -- how do I rebuild it?

    On Windows, simply edit the files in %APROBE%\probes , then

    cd %APROBE%\ual_lib
    nmake

    On Unix you'll probably need to copy them locally, which is a bit ugly:

    mkdir my_aprobe ; cd my_aprobe
    cd my_aprobe
    ln -s $APROBE/include $APROBE/lib $APROBE/bin .
    mkdir ual_lib
    mkdir probes
    cd probes
    cp $APROBE/probes/memwatch.apc . # if you wanted to edit memwatch
    ln -s $APROBE/probes/* .         # to get everything else
    chmod +w memwatch.apc
    # edit memwatch.apc (or whatever) as desired
    cd ../ual_lib
    make -f $APROBE/ual_lib/Makefile memwatch.ual # or whatever
    

    If you have problems or questions, contact .

    15.10 How do I use the coverage probe with multiple test cases?

    The `atcmerge' tool merges formatted results from different runs on the same or different executables. You can use the aprobe "-d" option to create different APD filesets and corresponding ".tc" files for each run, and use the "atcmerge" tool to merge these. See Aprobe\Examples\Advanced\Test_Coverage for an example.

    15.11 Where did the "heap" probe go?

    heap.ual (heap.dll on Windows) has been superseded by memwatch.ual. This is a simpler, more robust probe that provides information about allocation patterns, but does not save all the additional data necessary to do error checking. Contact OC Systems if you need a probe with this allocation-checking functionality.

    15.12 How do I use this "events" probe everyone's talking about?

    With RootCause 2.0.5 (Aprobe 4.2.5) there's an example under examples/predefined_probes/events (Windows: Examples\Predefined\Events), and documentation in Appendix D of the Aprobe User's Guide. Here's a quick summary we sent to a user:

    You must have an app_name .events.cfg file, otherwise events does nothing. Let's take a simple case with the routines one() and two() which both call routine three() which, in turn, calls routine four():

       main()
          one()
             three()
                four()
          two()
             three()
                four()
    

    The simplest configuration file is:

    EVENT FUNCTION one()
    EVENT FUNCTION two()
    EVENT FUNCTION three()
    EVENT FUNCTION four()
    

    To just look at the calls nested under one() you would add:

    FOCUS one()
    

    If you wanted to restrict this at runtime:

    FOCUS RUNTIME one()

    Let's say that the processing for one() becomes more complex and you want to do end in another routine. This would do the trick:

    EVENT START MyEvent one() ON ENTRY
    EVENT START MyEvent another() ON_ENTRY
    
    FOCUS MyEvent
    FOCUS RUNTIME MyEvent
    

    15.13 In the `profile' probe, what do "Calls to Self/Child" columns mean?

    Assume we have a program foo with two functions outer() and inner() . outer loops and calls inner which does some work. We setup the foo.profile.cfg file to profile both of them.

    If we look at the output for routine outer we would expect to see Calls to Self being one - it's just called once. Calls to Child should be something like 10 or however many times inner is called.

    Similarly the two tables show individual and cumulative time. The individual time for outer would be much lower than the cumulative time since the individual time has all of the recorded times for inner subtracted from it.

    Finally, note that this only applies to routines profiled. If outer also calls routine another() which is not profiled, another 's call counts do not show and its time is recorded as part of outer 's individual time.

    15.14 Why don't memstat, memwatch, heap probes work on my application?

    The most likely reason is that your application doesn't use the default system allocation routines. These might be actual replacements for malloc(), etc. in your own application or in another library such as libsafe or libefence.

    Sometimes if you explicitly replace malloc() it can break RootCause/Aprobe completely: see Q13.13.

    If Aprobe mostly works except for memory probes, then you can override the default routines used by memwatch by registering for your own allocation routines, or by changing the probe itself. This will require writing or editing some apc code, depending on your exact situation. for further assistance.

    15.15 Can you please explain the fields "Alloc Count" and "Free Count" in the memstat "Outstanding Allocation" report?

    A specific allocation point (see below) might be reached just once (usually at initialization) and will have an Alloc Count of 1. It may or may not ever free that so the Free Count will be 0 or 1. But many (most!) applications have allocation points that give rise to more than one allocation. For instance:

    
    for (i = 0; i < 10; i++)
    {
       linkedList.add (new MyObject (i));
    }
    

    Obviously each instance of MyObject was created from the same allocation point. Most growth happens this way - in fact we don't count any allocations we only see once as growth.

    What is an allocation point? For native code it's the unique traceback up to the current maximum depth, something like:

    
      Line 10 of a()
      called from Line 15 of b()
      called from Line 32 of c()
    

    For Java each allocation point is a combination of a traceback and the object type allocated there.

    15.16 Can I use memstat to track all allocations and frees?

    The default setting of the memstat probes is to pinpoint leaks in a longer-running program. However, you can change the options. From the main RC window select the memstat probe in the UAL list, right click and choose Edit UAL. From the Runtime tab change the Sampling Ratio to 1 so you see every allocation.

    From the Format tab check the Display Freed Allocations box. You might also find the Display Zero Growth Allocations useful. Next run, you'll start seeing those freed allocations.

    Click the OK button and then the Build button. Re-format (either through the Index or Examine button) and the reports should have the information you need.

    15.17 Is there a way to only report allocations in a certain module based on the stack traceback entries?

    This mechanism wasn't available in memstat until version 2.1.4b (June 2005), (only in memwatch, which is more focused on individual allocations). For earlier versions, you could edit and build your own custom version of "combined_memstat.apc" that has filtering: see filtermemory.apc.

    Version 2.1.4b also introduced EXCLUDE filters in memstat and memwatch, which eliminate the named stack traces and show all others. See $APROBE/probes/[java]_memstat.cfg or $APROBE/probes/memwatch.cfg for usage information.

    15.18 Is there a predefined probe for detecting memory corruption?

    Yes. The "memcheck" probe, introduced in version 2.1.3 (February 2004) uses a "fence" mechanism to detect corruption of allocated (but not stack/local) memory. It also reports double deallocations.

    15.19 Is there a predefined probe for tracking down lock contentions?

    We have done some work in this area for customers, but we have not productized it, because the platform- and problem-specifics are not easily generalized. If you want some unsupported probes to start from please contact us.

    15.20 What options in the trace.cfg file are obsolete, and why?

    Many changes have occurred as the Trace predefined probe has been adapted to support RootCause users. A number of the options have been deprecated, and others apply only when used directly outside of RootCause.

    The following options have been deprecated.

    MaxDepthOfTracedCalls, DefaultLevels
    These were synonyms. It is no longer possible to specify a maximum depth at which tracing is disabled.
    LogTimes
    times are always logged.
    LogLines
    lines are logged if and only if specified on each TRACE line with LINES.
    TracingEnabledInitially
    Tracing is enabled initially if and only if no TRIGGER lines appear. if there are one or more TRIGGER lines, then tracing is only enabled when executing the functions specified by the TRIGGER(s).
    CallCountOptions. ExactCallCounts
    call counts are now done at format-time by the RC trace display rather than by the trace probe, so these options don't apply.
    IndexSymbols
    Symbols cannot be indexed.
    MaxIndentLevelsBeforeWrap, IndentColumns, AlwaysShowNumericNestingLevel
    these used to control formatting but now custom formatting is done by providing alternative formatting routines. The ones provided for the RootCause Trace Display are in $APROBE/probes/rc_formats.ual.

    15.21 How do I force a snapshot from a predefined probe?

    The coverage, memcheck, memwatch, profile, and statprof probes record data in memory and dump it only at normal program termination, or when explicitly requested with a programmatic snapshot. A snapshot can be forced without terminating the program by calling the entry point provided by the probe:

    • coverage - ap_Coverage_DoSnapshot( "comment" );
    • memcheck - ap_Memcheck_DoCheckpoint( "comment" );
    • memwatch - ap_Memwatch_DoSnapshot( "comment" );
    • profile - ap_Profile_DoSnapshotForAll( "comment", 1 );
    • statprof - ap_Statprof_Snapshot( "comment" );
    The second parameter to ap_Profile_DoSnapshotForAll() is 1 (TRUE) if it will be the final snapshot, and 0 (FALSE) if it will be called again via a snapshot or normal program completion.

    There are two ways these can be called. A very convenient way is to attach with dbx (or gdb) and use the "call" operation. For example if ps says that the PID of application appdriver is 12345, then you can do:

    
       $  dbx -a 12345
       (dbx) call ap_Statprof_Snapshot( "dbx" );
       (dbx) detach
       $  apformat appdriver.apd
    
    Even when using detach it's possible that the program will terminate at this point so you shouldn't use this if it's important that the program to continue.

    An alternative is to link a special version of these apps with a probe which takes a snapshot at a certain point in the program, for example:

    
      // my_profile.apc
      #include "profile.h"
      probe thread {
        probe "abnormal_end_signal_was_handled" {
          on_entry ap_Profile_DoSnapshotForAll( "probe snap",  FALSE );
        }
      }
    
    Then you link this with the existing predefined probe:
    
      $ apc my_profile.apc profile.ual # creates my_profile.ual
    
    Note that the name abnormal_end_signal_was_handled is only a suggestion, not a name in the Aprobe runtime. An application programmer may offer another name which is called when the application averts an abnormal end. If not, an application programmer may need to help by creating and calling this dummy function at the right time for the snapshot probe, which is when the application averts an abnormal end. Part of the challenge is finding programmers who know that much about the application.

    A special case of this is to take a snapshot when an unexpected signal occurs: see Q15.6.

    15.21 Why does the memstat summary file say it can't do the analysis because I only have one sample?

    Some possibilities are:

    • You didn't run for long enough. A couple of minutes isn't long enough if you want to run the statistical sampling.
    • You didn't format all of the available data. You can do this by selecting all of the apd files for the ring instead of the default (which is just the last one).
    • You don't have a real problem. This is more common than you think: People often see instability that they think are memory leakage issues that aren't.
    For more information see the Memory Probes page.

    15.22 Could you explain the memstat summary's "Leaked Memory" and "Total Leakage" values?

    The statistical part works like this. Say you have a setting (Sampling rate) of one in thirty. Every 30th allocation we record it in a table. Every free gets looked up in that table. If it is in there it is recorded, if it isn't it is ignored. So the sampling is only on the allocations, not the frees.

    In the table, the totals (including leaked memory) and counts are multiplied by the sampling rate. If you have enough samples, this will be entirely valid.

    We record what you pass to the O/S, not necessarily what the O/S actually allocates. This could under-estimate the amount of memory in certain cases. (e.g. if the memory manager always allocates in quad-word steps it would allocate 16 bytes when you requested 4).

    The statistics that identify certain allocation points as "Growth" are based on least squares linear regression analysis.

    15.23 How can I define a memstat (or memwatch) filter matching any number of call levels?

    That is, is there a way to do something like the following?

    
       FILTER      extern:"malloc_y_heap()" in "libc.a(shr.o)"
            ==> **** any number of levels matching anything **** 
            ==> "ap_demangle.c":"Demangle_Xlc_Symbol_Name()" at line 2103 (ap_demangle.c)
    

    No, the best you can do is enumerate all the possible matches from your test cases. Wildcards of one or more levels may be implemented in the future.

    15.24 Is there a predefined probe to check for stack corruption?

    Not officially, but we have written stackcheck.apc for a customer. This version is just for Windows, and checks that the return address is not corrupted on_entry and on_exit to all instrumented functions. instrumentation is hard-coded in the probe for now. A configuration file or separate cconfiguration probe could be added to handle specifying the instrumentation points.

    16. Using the "apc" Command

    16.1 What does apc do?

    The apc command translates one or more APC files into C, and then uses a native C compiler to compile these into object code, and link them with other files specified on the command-line to form a shared library called a UAL. A UAL has a suffix of .ual on Unix, but .dll on Windows due to limitations in how dynamically-loaded libraries are selected on Windows.

    16.2 How do I indicate what C compiler and options apc should use?

    On Windows, only one compiler is supported, so the C compiler is simply your installed version of Microsoft Visual C++. On Unix, the compiler is defined in the file $APROBE/lib/compiler_profiles and by the APROBE_CC_COMMAND environment variable. This is described in the Files Reference (Appendix B) of the Aprobe User's Guide.

    Options to the compiler can also be specified on the aprobe command line by including them in quotes after the "-compiler" option, for example,

        apc foo.apc -compiler "-v"

    16.3 Do I need to specify an object file or executable to apc?

    You need to specify "-x object module" if you use a construct in your APC that cannot be resolved without specific symbol table or debug information from the program. Such constructs are:

    1. target expressions: names from the probed program preceded by `$', or "$*" ($1, $2 are ok, as are hardware-register references starting with '$$'), and
    2. references to specific source lines.

    In general, probes that you compose to gather information about specific parts of your program will contain one of the above, and you'll want to include the executable or an object file.

    For probes on shared libraries which don't contain any debug information, or for probes that should apply to any program (like the predefine probes included with Aprobe), you generally will not provide an object module.

    16.4 How do I specify other object files to link into my UAL?

    Just include them on the apc command-line. Linker options are specified in quotes after the "-linker" flag, for example,

        apc foo.apc -linker "-lX11"
    or, on Windows:
        apc foo.apc -linker "/WARN"

    16.5 apc says my function name's not known--why not?

    There are a number of possibilities. If you specified "-x ... " on the apc command line, then it means it couldn't find the named function in that file's symbol table. Since apc works pretty hard to match incomplete function names, the name is probably wrong in case or spelling, or, if you provided a parameter profile, it's probably not exactly what the C++ compiler encoded as the name for the function.

    You could try using apcgen to generate a probe template for all the functions in the source file (or object file, if it's a template instance) containing the function you want, or the tool apinfo or apsymbols to dump out all the function names in the whole program.

    16.6 Solaris: Where can I download a good gcc installation to use with RootCause?

    For Solaris platforms, we recommend downloading from http://www.sunfreeware.com.

    At this site's home page you will see on the right a list of processor/Operating systems combinations. Click the one which is appropriate for your system. (Note that this list includes both SPARC and Intel--be sure to select a SPARC download.)

    Below the processor/OS list will be a list of software packages. Select gcc-2.95.3 (RootCause does not yet fully support gcc version 3).

    The link to the binary gcc installation will appear in the center pane of your browser. Download the gcc 2.95.3 image from here. Use gunzip to uncompress the file, then use pkgadd to install the package (it will go under /usr/local). You will need root authority to do this:

    pkgadd -d gcc-2.95.3-sol7-sparc-local

    16.7 How do I generate debug information for my APC files so line and function information show up in tracebacks?

    As with C, use the -g flag; this passes the appropriate debug options to the C compiler (even on Windows: /Zi /Yd /FAcs) and saves the generated C source file.

    16.8 (Unix)Can I specify an environment variable for the compiler path in the compiler_profiles file?

    Yes! If "ls -l ${CC_PATH}/bin/gcc" on the command-line shows that the compiler exists, then a stanza like:
    CC_COMMAND ${CC_PATH}/bin/gcc
    will work.

    Also note that the environment variable APROBE_COMPILER_PROFILES can be used to override the default of $APROBE/lib/compiler_profiles and point to your own variant of this file. See compiler_profiles file in the user's guide.

    16.9 How do I compile a probe for a 32-bit app when running 64-bit Linux?

    If you build your application with the compilation option -m32, then to build your probe you'll need to pass -m32 to apc's backend compiler, plus define the i386 macro to the preprocessor. For example:
       apc -Di396 -compiler -m32 -linker -melf_i386 foo.apc
    The link stage just invokes ld directly which should automatically build a 32-bit shared library from a 32-bit object file.

    If you're going to be doing this regularly you should edit $APROBE/lib/compiler_profiles to update the CFLAGS and PREPROCESS lines so these options are applied automatically.

    17. Writing Probes in APC

    This section contains questions and answers about writing in probes in APC for native (C, C++, Ada) programs.

    17.1 How do I use "apcgen" to generate a probe automatically?

    You need an object file or executable that contains debug information, i.e., was compiled with debug (see Q12.19 ) or a C header file. For example:

       apcgen foo.exe > foo.apc
      apc foo.apc -x foo.exe

    generates foo.apc, an APC file probing all the user-defined functions in foo.exe that have debug information, then compiles that into a UAL.

      apcgen -qparams -p sin -o math_sin.apc /usr/include/math.h
      apc math_sin.apc -x /usr/include/math.h

    generates and compiles math_sin.apc containing a probe on the sin() function which logs the parameter and return value. Use apcgen -h to see what options are available to control the output.

    Note that RootCause provides this functionality in a point-and-click GUI.

    17.2 How do I write a "probe"?

    One way is to start with a file generated by "apcgen" (see previous Q.). Or you compose one in your favorite text editor. It's pretty much like writing C, but there's some syntax needed to indicate where and when your probe should be executed. Here's a very simple one:

      probe thread     
      {
        probe "main"
        {           
          on_entry            
          {               
            printf("Entering main.\n");
          }
        }     
      }

    If you put this in the file "foo.apc", then you would compile it:

       apc foo.apc 

    which produces "foo.ual" (foo.dll on Windows), which you can then probe your program with:

       aprobe -u foo foo.exe
     

    17.3 What is the difference between APC and straight C?

    There are several differences:

    1) There is special syntax to indicate where and when the probe should be executed, such as "probe", "on_entry", "on_exit", "on_line", etc.

    2) There is a special keyword called "log" for recording data at run time and defining the format with which it should be displayed afterward.

    3) There are special data references, called "target expressions" which start with `$' and refer to values in the probed program.

    All of these are expanded or converted to ANSI C by the apc compiler.

    In addition, there is an implicit " #include "aprobe.h ", which makes available the extensive Aprobe API defined in APROBE/include/aprobe.h and documented in Appendix C of the Aprobe User's Guide.

    17.4 Why do I need a "probe thread"?

    This is an artifact of the clever Aprobe scoping rules. When one probe is nested within another (that is, defined in the declarative part of an enclosing probe), it not only gives visibility to the enclosing probe's data as you would expect, it also means that the inner probe is "active" (its actions may be executed) only if the outer probe is active.

    Since every function is executed within some thread of execution, if a function probe weren't inside a thread probe it would never be active.

    Anyway, just put in the probe thread{ .. }. It's what works.

    17.5 What's the difference between "probe thread" and "probe program"?

    The on_entry actions of a "probe program" occur once each, before calling main() (or WinMain() , etc.) and after returning from main() , respectively. The corresponding actions of a "probe thread" occur at the creation and destruction of each separate thread.

    Data defined in the declarative part of a "probe thread" is global to all probes, but is unique for each thread. There is always at least one, the "main" thread, which is conceptually nested immediately within the probe program.

    17.6 When exactly are the "on_entry" and "on_exit" parts of a function probe executed?

    On AIX, Linux and Windows, the on_entry actions are executed before the first instruction of the function itself. In particular, the function's local stack frame hasn't been created yet.

    The on_entry actions are executed immediately after the SPARC save instruction has shifted the register window yet, but before any compiler-generated saves of parameters or other values.

    The on_exit actions are executed after the stack frame has been discarded, so local data is not available. The next (target program) instruction executed will be the one following the call to the probed function.

    17.7 Why can't I dump some parameters in the on_exit part?

    Parameters passed by value are essentially local data. They are stored on the stack and the stack frame has been discarded by the time the on_exit part is executed.

    If you want to be able to access the input parameters you can save them in the on_entry part, for example:

    probe thread 
    {   
      probe "foo"   
      {      
        int parm1;
         on_entry      
        {
          parm1 = $1;      
        }      
    
        on_exit      
        {
          if (parm1 == 1)
          {
             ...
          }      
        }   
      } 
    }

    C++ reference parameters, and composite parameters passed by reference to Ada, are available by-name on_exit because `apc' implicitly generates code in an on_entry section to save the address passed in. GNAT Ada OUT and IN OUT parameters can be displayed because these are implemented as fields of a 'struct' returned by the function.

    17.8 Why is my local variable "unknown" in on_entry and on_exit parts?

    The on_entry and on_exit parts are conceptually outside the scope of the function, so the local data is not visible. Local data is visible only within an "on_line" action.

    17.9 Is there a way to probe "the first line" or "the last line" in my function?

    Yes. Simply write on_line(first) or on_line(last) . You can use this to do function-relative line numbers as well, such as on_line(first+5) .

    17.10 How do I specify which of several overloaded functions I want to probe?

    In C++, you must specify the exact parameter profile encoded in symbol table by the C++ compiler. The best way to get this is either to look at the output of "apcgen -vL" applied to the object file generated by the compiler, or use ` apinfo -sa myprog ' to list probe names of the functions symbols in your application.

    17.11 How do I reference a hardware register?

    A hardware register is referenced within a user action (e.g., on_entry) by preceding the name commonly used for the register by "$$". The exact register names are documented in Appendix B, "Files Reference", under "APC File".

    Note that the value you get for the register is the value it had at the point the target program called the probed routine.

    On Windows, if you want the current value of a given register you can use the normal MSVC++ assembly prefixes go get it, for example:

      {     
        int CallerEAX;     
        int CurrentEAX;
        CallerEAX = $$EAX;  // move the caller's EAX to CallerEAX
        __asm mov CurrentEAX,EAX   // move EAX to the variable CurrentEAX
      }

    17.12 How do I query the parameters to a function?

    If the function is compiled with debug (see Q12.19 ) you can reference a parameter by name ($param) and reference all parameters with "$*.

    Whether or not a function is compiled with debug,or there's an object module available, you can reference the first parameter with "$1", the second with "$2", etc., up to $8.

    Note, however, that if there is no debug information provided, you must cast the "$1" to its proper type.

    17.13 Can I use automatic formatting if I don't have an executable with debug information?

    Yes, but you must (a) include the definition of each logged item's type in the APC file (if it's not a predefined type), and (b) cast each item to that type. This is how one can log parameters to system routines, for example:

    #include <stdio.h> // includes the struct FILE 
    probe thread 
    {    
      probe "fopen"    
      {       
        /* fopen returns *FILE, defined in stdio.h */       
        on_exit
          log("fopen() returns ", (FILE *)$return, " = ", *(FILE*)$return);    
      }    
      probe "fclose"    
      {       
        /* first parameter to fclose is *FILE */       
        on_entry
          log("fclose() called with ", (FILE *)$1, " = ",
            *(FILE*)$1 );    
      } 
    }

    17.14 How do I change the return value from a function?

         on_exit { $return = desired_value; }

    17.15 How do I log the value of a string parameter?

    ap_StringValue is a macro which logs everything from the address provided up to the first null character:

         on_entry { log("NameParam = ", ap_StringValue($NameParam)); } 

    Note : this only applies to null-terminated (C, C++) strings. It does not apply to the Ada predefined string type -- see Q17.27 .

    17.16 How do I log the contents of an array?

    You must specify the bounds of the array in the log statement:

         on_entry { log("Items = ", $Items[0 ..9]); }

    If the array bounds are dynamic (as most are), you can compute them first

          on_entry      
        {
          int last;
          for (last = 0; $Items[last] != 0; last++);
          log ("Items = ", $Items[0 .. last-1]);      
        }

    17.17 Solaris: I get a compile error when I write "a[0..4]", but it seems to work; why?

    The Sun Workshop (Forte) compiler's preprocessor is run over the APC file before the APC-specific syntax is processed and converted to C. If you're using Sun WorkShop as apc's C compiler (as defined in $APROBE/lib/compiler_profile), that preprocessor complains about the "0.." syntax that APC uses. If you want to avoid the message, put a blank before ".." whenever you use it.

    17.18 How do I "stub out" the probed function so it does nothing?

    Use the "ap_StubRoutine" macro in the on_entry part of a function, and be sure to return something sensible if necessary in the on_exit part, e.g.,

       probe "foo" {       
        on_entry ap_StubRoutine;       
        on_exit  $return = 0;    
      }

    Note that you can't assign the return value in the on_entry part, since the return register is reset as part of the stub implementation.

    17.19 How do I query the data in a class from when probing a member function?

    All data in a class is defined as a field of the local variable "this", so to get at the class data item "NCalls" you would do:

      log("$this->NCalls");

    17.20 How do I query a global (or static) variable when there's a local one of the same name?

    To specify you want a data item other than that visible by default, add an expression context string, to the target expression:

      log("static NItems = ", $(NItems, "-file items.c"));

    To get the global one, if any:

      log("global NItems = ", $(NItems, "-module foo.exe"));

    17.21 Can I reference a static variable that wouldn't normally be visible to my probed function?

    Yes. See the previous Q. You can reference a static item by name in any file:

            log("static NItems = ", $(NItems, "-file items.c"));

    even if the probed function this appears in is not in file "items.c".

    17.22 Can I call a function in my program from within a probe?

    If your program is compiled with debugging enabled, you can precede its name with a `$'. This is often useful for using a probe to call debugging-support routines, e.g.,

    probe thread    
    {       
      probe "ReadSymbolTable"       
      {          
        on_exit          
          $DumpSymbolTable($0);          
      }    
    }

    In the absence of debug information, you can get the symbol address from Aprobe and cast that to the correct type. For example, on Windows:

    typedef void MyBeep(int Msec, int Hz);
    probe thread 
    {    
      probe "main"    
      {       
        on_entry       
        {          
          (*(MyBeep *)ap_SymbolToAddress               
            (ap_SymbolNameToId                  
              (ap_ModuleNameToId("KERNEL32"),                   
              "Beep()",                   
              NULL)))(4000, 1000);       
        }    
      } 
    }

    Additionally, on Windows, if the function is in a DLL you can use the NT routines GetProcAddress and GetModuleHandle to find the routine. For example:

      /* call Beep to sound a 2kHz tone for a second */    
      GetProcAddress(GetModuleHandle("KERNEL32"),"Beep")(2000, 1000); 

    In the above example you could use the name of your DLL instead of KERNEL32.

    Calling C++ methods is more complex (they require a "this" pointer, and the naming can be tricky): For Windows, see Q17.23; for Unix, see Q17.71.

     

    17.23 Windows: Can I call a Visual C++ method from a probe?

    To call Windows Visual C++ methods in the target program you will have to create a C wrapper function and link that with the application. The wrapper will take the C++ object as an explicit parameter and make the method call. For example, examine the following class definition:

    class MyClass
    {
    public:
       MyClass () {a = 10;}
    
       void Show (int b);
       void Show (char *b);
    
    private:
       int a;
    };
    

    If you want to call the first Show method (the one with an int parameter) you need to write the following wrapper function:

    extern void WrapperForMyClassShowInt(MyClass *o, int b)
    {
       o->Show(b);
    }
    

    You should pick a unique name for the wrapper, particularly if you are writing a number of wrappers for overloaded methods in the same class. This wrapper name includes the class, MyClass, the method name, Show, and an indication of argument types, Int.

    The wrapper function includes an explicit parameter for the C++ object pointer of class MyClass, and all the other parameters for the method call, in this case just an int. The body of the wrapper function just make a C++ method call using the C++ object and the method parameters. If this method returns a value the wrapper function should just return that result.

    With this wrapper function compiled and linked with your application, you can call it from APC code. Here is an example of calling the wrapper defined above:

       probe "main"
       {
       {
          on_line (31)
          {
             $WrapperForMyClassShowInt(&$m, 20);
          }
       }
    

    The target expression $WrapperForMyClassShowInt indicates the wrapper function we wrote above, and the target expression &$m refers to a C++ object variable in the target program. Here is the function, main, that this probe targets:int main (int argc, char **argv)

    {
    MyClass m;

    return 0; /* line 31 */
    }

    17.24 Can my APC files reference names in one another like a C program?

    Yes, but if they do they must all be compiled in the same "apc" command into a single UAL.

    17.25 Can I call a function in another UAL?

    Yes. A UAL is just a shared object library (a DLL), so you must do the following:

    Export the symbol for the function to be called, using the apc "-e" option, when you build the UAL to be referenced, e.g.,

       apc funcdef.apc -e func

    On Windows, you can use the MSVC++ __declspec(dllexport) prefix to identify the routine as something to be exported from a DLL.

    2) specify the referenced UAL (on Windows, the corresponding "lib" file) as an input file on the command-line when you compile the probe that contains the external reference flag when you specify the other UAL as a shared module

       Windows: apc main.apc funcdef.lib
       Unix: apc main.apc funcdef.ual

    17.26 How do I change the return code from my Unix program?

    From $APROBE/examples/learn/probe_exit/exit.apc:

    probe thread {    
      probe "exit" in "libc.so" // "libc.a(shr.o)" on AIX
      {
        on_entry {          
          /* return 0 even if an error occurred: */          
          $1 = 0;       
        }    
      } 
    }

    Note: This probe won't work on Solaris 5.5.1 because exit() works differently.

    17.27 How do I print or change a GNAT Ada string value in my probe?

    An unconstrained string is represented as a record with two components. The first is a pointer to the string (which is not null-terminated) and the second is a pointer to another record which contains the bounds of the string.

    The "apc" tool recognizes this special type and displays it appropriately, if debug information is available. Since it's length is known, ap_StringValue is not used. For example:

    probe thread {
      probe "hello.qualify_name" {
         on_entry
         {
            // log the input parameter then stub the routine itself 
            log("qualify_name called with: ", $1
    );
            ap_StubRoutine;
         }
       }
    }

    In the absence of debug information (e.g., for Ada.Text_IO.Put_Line ), or when you want to assign to an unconstrained string, you can use macros defined in gnatstrings.h. For example:

    #include "gnatstrings.h"
    
    probe thread {
      probe "hello.qualify_name" {
         on_exit
         {
            // return what we want to:
             ap_SetGnatUCString
    (
                $return,
                ap_CatenateStrings(
                   "/home/ocs/",
                   ap_ExtractGnatUCString
    ($1),
                   NULL));
         }
      }
    }
    

    17.28 How can I just log some data and format it as hex?

    This is an example of an APC file to log a buffer's worth of data and format it as hex.

    // Example APC file to demonstrate logging a block of data and
    // formatting it as hex.
    
    // Use this macro to provide a buffer and length of data you wish to log
    // and be formatted as hex. e.g. LogAsHex (MyBuffer, 100);
    #define LogAsHex(B,L)                              \
    log (((ap_Byte *) ((ap_Byte *) B)) [0 .. ((L)-1)], \
         (ap_Uint32) (L),                              \
         (ap_Uint32) (B)) with HexFormat
    
    // Buffer is the actual data, Length the length and StartAddress the
    // address of the data at runtime.
    static void HexFormat (ap_Byte    *Buffer,
                           ap_Uint32  *Length,
                           ap_Uint32  *StartAddress)
    {
       ap_Uint32 PrintAddress;
       ap_Uint32 EndAddress;
    
       // We start printing at the first 16 byte boundary below StartAddress
       // which might be below where we actually need to show characters. So
       // we check if we are in range before printing a character
       PrintAddress = *StartAddress & 0xfffffff0;
       EndAddress = *StartAddress + *Length;
       
       while (PrintAddress < EndAddress)
       {
          int i;
    
          // Print out the hex bytes
          printf ("%08x: ", PrintAddress);
          for (i = 0; i < 16; i++)
          {
             // Check we're in range
             if ((PrintAddress + i) < *StartAddress ||
                 (PrintAddress + i) >= EndAddress)
             {
                printf ("  ");
             }
             else
             {
                printf ("%02x", Buffer [PrintAddress - *StartAddress + i]);
             }
    
             if (i && i % 4 == 0)
             {
                printf ("  ");
             }
          }
          
          // Print out the ascii
          printf ("   ");
          for (i = 0; i < 16; i++)
          {
             // Check it's in range
             if ((PrintAddress + i) < *StartAddress ||
                 (PrintAddress + i) >= EndAddress)
             {
                printf (" ");
             }
             else
             {
                ap_Byte c = Buffer [PrintAddress - *StartAddress + i];
    
                // Is this a printable character?
                if (c >= 32 && c <= 127)
                {
                   printf ("%c", c);
                }
                else
                {
                   printf (".");
                }
             }
          }
    
          printf ("\n");
          PrintAddress += 16;
       }
    }
    
    // This is an example of using the above log mechanism - the first
    // parameter must be an address (e.g. an array, a pointer, etc.). The 2nd
    // parameter is the number of bytes.
    probe thread
    {
       probe "fred()"
       {
          on_entry LogAsHex ($1, $2);
       }
    }
    

    A C file follows to test it with:

    void fred (const char *Buffer, int Length)
    {
       ;
    }
    
    int main (int argc, char *argv)
    {
       char Buffer [100];
       int  i;
    
       for (i = 0; i < 100; i++)
       {
          Buffer [i] = (char) i;
       }
       fred ((const char *) Buffer, 100);
       return 0;
    }
    

    17.29 How do I log information about each thread as it starts?

    You log the Thread ID using a format routine that prints information about it, since the information, especially the thread entry point, may not be available on_entry to the thread:

    void PrintThreadInfo(ap_ThreadIdT *ThreadIdPtr)
    {
      printf("Thread %d: ", *ThreadIdPtr);
      ap_PrintSymbol(
         ap_AddressToSymbol(
            ap_ThreadEntryPoint(*ThreadIdPtr)));
    }                       
    
    probe thread 
    {
      on_entry
      {
         log(ap_ThreadId()) with PrintThreadInfo;
      }
    }
    

    Note that the thread entry point symbol will probably be a system function.

    17.30 GNAT turns SIGSEGV into CONSTRAINT_ERROR; can I use Aprobe to get a core dump?

    Yes. Here's a probe which stubs (disables) the call the GNAT runtime makes to sigaction() to register a signal handler. This allows the default action to occur when the signal occurs.

    #include <signal.h>
    
    probe thread
    {
       probe "sigaction()" in "libthread.so"
       {
          ap_BooleanT Stubbed = FALSE;
          
          on_entry
          {
             if ($1 == SIGSEGV)
             {
                printf ("Stubbing sigaction(SIGSEGV)\n");
                Stubbed = TRUE;
                ap_StubRoutine;
             }
          }
          on_exit if (Stubbed) $0 = 0;
       }
    }
    

    17.31 How can get I get Aprobe actions to happen when my program dumps core?

    First, you should be running with sigsegv.ual: it will provide a traceback and exit actions in these cases. If you want to add additional exit actions, such as a predefined probe snapshot, see Q15.6, or you can copy and extend $APROBE/probes/sigsegv.apc to build your own probe.

    17.32 Is there a way to find out where a signal occurs when it doesn't cause a core dump?

    On Solaris there is:

    probe thread {
      probe "sigaction.c":"sigacthandler()" in "libc.so"
      {
        on_entry
          log("Signal ", (int)$1, " see at:");
          ap_LogTraceback(99);
      }
    }
    

    Of course, this might not work if the condition causing the signal was due to corrupted memory or registers which Aprobe relies upon.

    17.33 How can I reduce the overhead of my probes?

    The most obvious way is to use #pragma nofloat in probes that don't use floating point; this eliminates the need to save/restore floating point registers. See also Aprobe Performance Considerations in Chapter 4 of the Aprobe User's Guide.

    probe thread
    {
      probe "your_routine"
      {
        #pragma nofloat
        // Your probes
      }
    }
    

    17.34 Can I use Solaris Aprobe on JOVIAL programs?

    Yes, but there will be no "debug" information found, so you won't be able to use named target expressions (e.g., "$x", "$*") or do on_line probes. Furthermore, no type information is available for parameters, etc., like "$1".

    17.35 How can a log a composite object without using debug information?

    A. Declare or #include a C type that maps to the structure you want, then cast your target expression to a dereference of a pointer to this C type. For example:

    typedef struct 
    {
       int Field1;
       float Field2;
    } MyStruct;
    
    probe thread
    {
      probe "foo"
      {
        on_entry
        {
          if (((MyStruct *) $1)->Field1 > 0)
          {
            log(*((MyStruct *) $1));
          }
        }
      }
    }

    or perhaps a bit cleaner is:

    probe thread
    {
      probe "foo"
      {
        on_entry
        {
          MyStruct *Param1 = (MyStruct *)$1;
          if (Param1->Field1 > 0)
          {
            log(*Param1);
          }
        }
      }
    }
    

    17.36 How can I cast a value to a type name from the program?

    "I have part of my program without debug info, but I know the type of a parameter passed in that "no debug" part, and furthermore, I know that the type name is defined in a part that does have debug info. How can I cast an "unknown-type" parameter to the known type name?"

    This is similar to the previous question, except instead of defining the type in your APC, refer to the type in your program by its name and file, wrapped in "typeof", within your probe declarative part, as follows:

    probe thread
    {
      probe "foo"
      {
        typedef ($(MyStruct, "-file debug_part.c")) MyStruct;
        on_entry
        {
          MyStruct *Param1 = (MyStruct *)$1;
          if (Param1->Field1 > 0)
          {
            log(*Param1);
          }
        }
      }
    }
    

    17.37 Is there a special editor or editor mode for APC?

    No, but it's pretty close to C. The C mode for Emacs, Lemmy, or other editor works pretty well. Contact OC Systems if you think we should put work into this.

    17.38 How do I execute a probe only if a certain data condition is met?

    In Aprobe version 2, you could do something like:

        probe .outer_routine
        on entry
           if $r3 = 3 then
              probe .inner_routine
                null; -- inner_routine stuff
              end probe;
           end if;
        end probe;
    

    Probes in Aprobe2 were executable but in Aprobe3 they are declarative. You declare a named probe, and make an explicit calls to enable or disable it. For example:

    probe thread
    {
      probe "outer_routine"
      {
        // Note that this probe has a name "InnerProbe"
        probe "inner_routine"
        {
          ; // Inner routine stuff
        } InnerProbe;
    
        // Entry to outer_routine
        on_entry
        {
          if ($param1 == 3)
          {
            // We can enable or disable the probe
            ap_EnableProbe (InnerProbe);
          }
          else
          {
            // Disable the inner probe
           ap_DisableProbe (InnerProbe);
          }
        }
      }
    }
    

    17.39 How can I interactively modify the parameters to a routine in my application?

    The basic approach is simple. In the little C example "t.c" below, main() calls Test() every 5 seconds, passing to it an integer and a float. Subprogram "Test" prints these values out. In t.apc We put a probe onTest, and replace the parameters with values we retrieve from the environment. The trick is how to retrive values from the environment.

    One obvious way is to prompt to stdout and read from stdin. This may work for some applications, but not many. A more general approach is to check if a user created a file "Test.cfg" in the directory where the program is run and if so we read the new values of parameters with the help of a call to fscanf(). This approach works pretty well as long as the overhead of `fopen' call on entry to "Test" is acceptable. In cases when it is not one could move this call some place else and store the new values in global APC variables.

    Note that this "read-a-file" approach can be used for a wide range of program iteraction. One could simply use the presence of a file as a "switch" to enable or disable certain probes.

    t.c 
    void Test(float parm1, int parm2) 
    {
      printf("Test(%f,%d)n", parm1, parm2);
    }
    
    
    main()
    {
      while(1)
      {
        Test(0.0, 0);
        sleep(5);
      }
    }
    
    t.apc
    
    #include <stdio.h>
    
    #define CONFIG_FILE "Test.cfg"
    
    probe thread
    {
      probe "Test"
      {
        on_entry
        {
          FILE *fd = fopen(CONFIG_FILE, "r");
    
          if (fd != NULL)
          {
             // We have a file with new values
            float Parm1;
            int   Parm2;
            fscanf(fd, "Test(%f,%d)", &Parm1, &Parm2);
    
            // Now update the target parameters with new values
            $parm1 = Parm1;
            $parm2 = Parm2;
            fclose(fd);
            remove(CONFIG_FILE);
          }
        }
      }
    }
    

    17.40 I'm trying to stub a function called by my program, but APC can't seem to find it.

    The Ada code looks like:

        function Plock (N : in Types.Integer_T) return Types.Integer_T;
        pragma Import (C, Plock, "plock");

    Plock is some system call to lock or unlock into memory process, text or data. I get a warning message from apc stating: Function "....plock[1] not found in the modules(s) provided to apc . And also an error message from apc stating: Could not resolve function name: "......plock[1]"

    plock() is a system function - it is not defined within your application. The following will work:

    probe thread
    {
      probe "plock()" in "libc.so"
      {
        on_entry ap_StubRoutine;
        on_exit $0 = 0;   // Or whatever return you want
      }
    }
    

    17.41 Using Solaris GNAT, I want to send a signal to the program to control my probes. But the signal seems to get lost. Why?

    It turns out that GNAT blocks this signal with a call to thr_sigsetmask. The following probe can be added to your existing probes to unmask this signal. This works by intercepting the thr_sigsetmask function and, if the caller is requesting to add or set the signals, removing the SIGUSR1 from the mask they provide.

    #include <signal.h>
    #include <thread.h>
    probe thread
    {
      probe "thr_sigsetmask()" in "libthread.so"
      {
        on_entry
        {
          sigset_t *NewSigset;
    
          // Do we have new signals or is this just a request for info?
          NewSigset = (sigset_t *) $2;
          if (NewSigset)
          {
            // What sort?
            if ($1 == SIG_BLOCK || $1 == SIG_SETMASK)
            {
              // Remove SIGUSR1
              sigdelset (NewSigset, SIGUSR1);
            }
          }
        }
      }
    }
    

    17.42 I only want to probe malloc() if it's called by realloc(). How would I do that?

    Here's one way, which also illustrates some other useful idioms.

    #define MyCallerFunctionId               
       ap_SymbolToFunction(                 \
          ap_AddressToSymbol(               \
             ap_LocationAddress(            \
                ap_CallerLocation(          \
                   ap_CurrentLocation))))  
    
    #define NamedFunctionId(SYMBOL,MODULE)  \
        ap_SymbolToFunction (               \
          ap_SymbolNameToId(                \
            ap_ModuleNameToId (MODULE),     \
            SYMBOL,                         \
            ap_NoName,                      \
            ap_FunctionSymbol))
    
    probe program
    {
      int MallocCalls = 0;
      int ReallocCalls = 0;
      
      ap_FunctionIdT ReallocFunctionId = NamedFunctionId("realloc()", "libc.so");
      
      probe thread
      {
        int NestingLevel = 0;
      
        probe "malloc()" in "libc.so"
        {
          #pragma nofloat
          on_entry
          {
            ap_FunctionIdT CallerFunctionId = MyCallerFunctionId;
      
            if (! ap_FunctionIdsEqual(CallerFunctionId, ReallocFunctionId))
            {     
               MallocCalls++;
            }
          }
        }
    
        probe "realloc()" in "libc.so"
        {
          #pragma nofloat
          on_entry
            ReallocCalls++;
        }
      }
    
      on_exit // from program:
      {
        log("Heap statistics on program exit");
        log("-------------------------------");
        log("Number of calls to "malloc()"  => ", MallocCalls);
        log("Number of calls to "realloc()" => ", ReallocCalls);
      }
    }
    

    17.43 I have a GNAT Ada procedure that I'm stubbing out, but want to return a string value. The procedure has a declaration similar to the one below. What's the APC?

       procedure Read_Foo (File : in  File_Type;
                           Item : out String;
                           Size : out Integer);

    For routines like this, although the Item is an out parameter, GNAT implements it as if it were an in parameter (but modifiable) since the bounds of the string must already be set. The following probe shows an example of changing this:

    static const char *NewString = "Aprobe string";
    
    probe thread
    {
      probe "read_package.read_foo"
      {
        on_entry
        {
          sprintf ((char *) $item.P_ARRAY, NewString);
          ap_StubRoutine;
        }
        on_exit
        {
          $return.size = strlen (NewString);
        }
      }
    }
    

    17.44 Is there a simple probe that just traces the lines in one routine?

    The following gives output similar to:

       MyPackage.MyRoutine line: 120
       MyPackage.MyRoutine line: 122

    when formatted:

    probe thread
    {
      // Replace your name here
      probe "MyPackage.MyRoutine"
      {
        on_line (all)
        {
          log ("MyPackage.MyRoutine line: ",
          ap_StringValue (ap_LineIdToNumber (ap_CurrentLineId)));
        }
      }
    }
    

    17.45 How do I reference enumeration literals in APC?

    Here is an example:

    a.cpp
    
    #include <iostream.h>
    #define VALUE satu
    
    enum TYPE { sund, mond, tues, wedn, thur, frid, satu };
    
    int main (void)
    {
      TYPE bar = satu;
      cout << "Hello Worldn";
    }
    
    a.apc
    
    probe thread
    {
      probe "main"
      {
        on_line (11)
        {
          if ($bar == $satu)
          {
            log ("Match");
          }
          else
          {
            log ("No Match");
          }
        }
      }
    }
    

    If the enumeration literals are defined in a class, you can qualify them. So for:

    class a
    {
      enum TYPE { sund, mond, tues, wedn, thur, frid, satu };
    
      private:
        TYPE bar;
    
      public:
        void seta(){ bar = VALUE; }
    };
    
    a test;
    

    You could use

       if ($test.bar == ($("a::satu")))
    

    17.46 Why does including <math.h> in my APC keep it from compiling? (I want to call the "pow()" function in my probe.)

    The problem here is that "log" is an Aprobe directive and it is also defined as a function in the mathematical library. So, you need a small workaround to use any function other than 'log' from the mathematical library. Here is an example:

    #undef log               /* 1. undefine definition in aprobe.h */
    #include <math.h>        /* 2. process math.h */
    #undef log               /* 3. remove math.h's log define (AIX) */
    #define log aPl          /* 4. restore aprobe's definition */
    
    probe thread
    {
      probe "main"
      {
        on_exit
        {
          log("pow(2,3) = ", pow(2,3));
        }
      }
    }
    

    The workaround is to add the preprocessor lines numbered 1 through 4 above.

    If you need to use the math.h log function in an APC file, you avoid the workarounds in steps 3 and 4 above, and use 'aPl' instead of Aprobe's log operation everywhere thereafter. That is:

    #undef log               /* 1. undefine definition in aprobe.h */
    #include <math.h>        /* 2. process math.h */
    
    probe thread
    {
      probe "main"
      {
        on_exit
        {
          aPl("log(2.0) = ", log(2.0));
        }
      }
    }
    

    In either case, when compiling your APC file on Unix, you must pass the linker flags "-lm" as follows:

    apc xxx.apc -linker -lm

    because compiling any routines from the libm.a library requires the -lm flags.

    You can see the macros for the keywords that Aprobe uses (e.g., #define log aPl) at the top of aprobe.h, preceded by #ifdef APROBE_KEYWORDS, which is only defined when the file is being processed by the APC compiler.

    17.47 Windows: How do I probe a function in a dynamically-loaded DLL?

    Same as you would a pre-loaded DLL:

    Write your probe like `probe "func" in "dynamic" ... `

    Compile it against the DLL like `apc myprobe.apc -x dynamic.dll'

    When you run your program with myprobe.dll, and dynamic.dll isn't found at program startup, the probes are "deferred" until dynamic.dll is loaded, at which time the probes are applied.

    17.48 How do I query an environment variable from with a probe?

    Call getenv() , as in the following example:

    #include <stdlib.h> /* defines getenv() */
    ap_NameT LOG_LEVEL = NULL;
    
    static ap_BooleanT IsSevereLogLevel() 
    {
      return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
    }
    
    probe program {
      on_entry
        LOG_LEVEL=getenv("LOG_LEVEL");  /* can set LOG_LEVEL to NULL */
    
      probe thread 
      {
        probe "main()" 
        {
          on_entry
            if (IsSevereLogLevel()) printf("Severe\n");
        }
      }
    }
    
    

    17.49 The above looks like a useful utility. How can I structure my probes so it can be shared?

    Here's one way, if your "utility" is pure C and doesn't use aprobe stuff.

    1. Write "loglevel.h", "loglevel.c" in the obvious way, e.g.
    loglevel.h
    
    extern ap_BooleanT InitializeLogLevel(void);
    extern ap_BooleanT IsSevereLogLevel(void);
     
    loglevel.c
    
    #include <stdlib.h>                /* defines getenv() */
    #include <aprobe.h>                /* defines ap_NameT */
    
    static ap_NameT LOG_LEVEL = NULL;
    
    void InitializeLogLevel(void)
    {
        LOG_LEVEL = getenv("LOG_LEVEL");  /* can set LOG_LEVEL to NULL */
    }
    
    ap_BooleanT IsSevereLogLevel(void)
    {
      return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
    }
    
    1. Compile loglevel.c into loglevel.o. If you #include aprobe.h , just put $APROBE/include in your include path:
    cc -c -I$APROBE/include loglevel.c
    
    1. Write the probe:
    t.apc
    
    #include "loglevel.h"
    probe program
    {
      on_entry InitializeLogLevel();
    
      probe thread 
      {
        probe "main()" 
        {
          on_entry
            if (IsSevereLogLevel()) printf("Severe\n");
        }
      }
    }
    
    1. Compile the probe, referencing loglevel.o:
    apc -g t.apc loglevel.o
    

    17.50 Can I define functions in one APC file and call them from another APC file?

    Yes. See also Q17.24 and Q17.25 . This is how our predefined probes are structured. The difference is that you must provide both UALs on the aprobe command-line. One could restructure the above example like so:

    1. Define the header file:
     
    loglevel.h
    
    extern ap_BooleanT IsSevereLogLevel(void);
    
    1. Write the probe:
     
    loglevel.apc
    
    #include <stdlib.h>                   /* defines getenv() */
    
    static ap_NameT LOG_LEVEL = NULL; 
    
    // the externally callable function:
    ap_BooleanT IsSevereLogLevel(void) 
    {
      return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
    }
    
    // initialization of data accessed by the above function:
    probe program
    {
      on_entry
        LOG_LEVEL = getenv("LOG_LEVEL");  /* can set LOG_LEVEL to NULL */
    }
    
    1. Compile the probe into loglevel.ual (loglevel.dll on Windows), exporting IsSevereLogLevel :
    apc -g loglevel.apc -e IsSevereLogLevel
    
    1. Write the "client" probe:
    t.apc
    
    #include "loglevel.h"
    probe thread 
    {
      probe "main()" 
      {
        on_entry
          if (IsSevereLogLevel()) printf("Severe\n");
      }
    }
    
    1. Compile the client probe, referencing loglevel.ual (loglevel.lib on Windows).
    apc -g t.apc loglevel.ual (Windows: 
    apc -g t.apc loglevel.lib)
    
    1. When you run an application, you need both t and loglevel:
    aprobe -u t -u loglevel my_program
    

    17.51 I am trying to write an aprobe that will call an Ada routine in a package body, but the routine never seems to get called.Why?

    Presumably because the probe on that function is not triggered. That's because we disable probes whilst in an entry action. This is pretty easy to understand given an example. Suppose you have the following probe:

    probe thread
    {
       probe "printf()" in "libc.so"
       {
          on_entry printf ("We're in printf\n");
       }
    }
    

    Obviously if Aprobe didn't do anything specific, you would end up in an infinite loop: Your code would call printf() which would call the entry action for printf which would call printf which would call the entry action ... So what we do is disable the probes while you're in an action. That way the call to printf() from your probe wouldn't trigger the probe on printf itself.

    In your example you are calling a routine while probes are disabled so the probe on that routine doesn't get triggered. Of course you can manually turn probes on yourself (although it is then your responsibility that you won't allow an infinite loop). The description of this in aprobe.h was improved in version 3.1.7, to the following:

    These two routines

         extern void ap_IncrementDisableProbesCount (ap_ThreadContextPtrT);
         extern void ap_DecrementDisableProbesCount (ap_ThreadContextPtrT);
    

    can be used to turn off / on probes for the thread. Normally when a probe is hit, Aprobe disables further probes in the thread for the duration of the action. This is to prevent recursive loops (for instance imagine if a probe on "printf()" called "printf()" and we did nothing about it). Sometimes you may want to temporarily enable probes. For instance, suppose on_entry to routine A you make a call to another routine in your application (say B) which calls routine C. You have a probe on C which you want to happen. You could bracket the call as follows:

    
    on_entry
          {                                                       
             // Turn on probes before the call                    
             ap_DecrementDisableProbesCount (ap_ThreadContextPtr);
             // Make the call                                     
             $B (1, 2, 3);                                        
             // Turn probes back off                              
             ap_IncrementDisableProbesCount (ap_ThreadContextPtr);
    

    So, your probe becomes:

    probe thread
    {
        probe "test.adb":"test.x[1]"
        {
          on entry
            ...
          on exit
            ...
            ap_DecrementDisableProbesCount (ap_ThreadContextPtr);
            $("test.y[1]");
            ap_IncrementDisableProbesCount (ap_ThreadContextPtr);  
        } 
    }
    

    17.52 How can I log a string passed to a library function like strdup() where there's no debug information?

    In the absence of debug information all parameters would be assumed to be of type 'int' and only positional ($1, $2, etc.) references will be allowed.

    If you know the type of such parameter you could cast it to the right type. The strdup() function doesn't have debug information, but you could still compile and use the following apc file:

    probe thread
    {
       probe extern:"strdup()" in "libc.so"
       {
       on_entry
          log("strdup(", ap_StringValue($1), ")");
       }
    }
    

    Note that ap_StringValue is a macro which among other things casts the argument to a string.

    For a complete list of subprograms that you can probe in shared libraries do:

    aprobe -u info -p -sa <your_executable_here>

    It is best not to mix apc code that relies on debug information with the apc code that should compile without it. This way when you compile the apc code that doesn't require debug info you may omit the -x option altogether and you would not have any warnings from the apc compiler.

    17.53 Can I use Aprobe to change the command run by a call to system() from my application to run my own little script instead?

    Yes: replace the parameter to system() with a path to your script. In this example, the new path fits in the space occupied by the old. Imagine the possibilities...

    my_ls.apc
    
    // change these 2 lines to work on a different command:
    static char cmd_to_change[] = "/bin/ls";
    static char my_script[]     = "/tmp/my_ls ";
    
    probe thread
    {
      probe "system()" in "libc.so" // or libc.a(shr.o) for AIX
      {
        ap_NameT new_command = NULL;
    
        on_entry
        {
          char *command = (char *)$1;
    
          // for debugging, give some info about where we are:
          log("system() called with ", ap_StringValue($1));
          ap_LogTraceback(99);
          
          // make sure we only replace the right command
          {
            char *cmdpos = strstr(command, cmd_to_change);
             if (cmdpos == command)
            { // replace it
              char *argstring = command + strlen(cmd_to_change);
              new_command = ap_CatenateStrings(my_script, argstring, NULL);
              $1 = (int)new_command;
              log("*** changed to: ", ap_StringValue($1));
            }
          }
        }
        
        on_exit
          // indicate the return code for the command:
          log("system() returns ", $0);
          // free our string:
          ap_StrFree(new_command);
      }
    }
    my_ls script
    
    echo "MY_LS: --->"
    ls -ltF
    echo "<---- MY_LS"
    

    17.54 Is there a way to catch and suppress exceptions?

    We do support suppressing C++ exceptions by calling Aprobe for AIX, but not Solaris or Windows. On AIX the syntax is:

    probe "fred"
    {
      on_exit
          if (ap_ProbeActionReason ==
            ap_CppExceptionPropagated)
              ap_SuppressException;
    }
    

    You can catch exceptions in the on_exit section of your probes. To catch exceptions all you have to do is to distinguish between a normal exit from your subprogram and an exception exit from it as both would trigger your probe's on_exit actions. For example, if subprogram "fred()" may leave via exception you could test for this as follows:

      probe thread
      {
        probe "fred()"
        {
          on_exit
            switch(ap_ProbeActionReason)
            {
              case ap_AdaExceptionPropagated:
              case ap_CppExceptionPropagated:
                log("Exception exit from fred()\n");
            }
        }
      }
    

    If you need to, you can find other action reasons defined in aprobe.h.

    The example above works well when you know where the exception may be raised, when you don't know you can log all exceptions raised in your program. To do so use the following probe:

    probe thread
    {
       ap_LogExceptionsInThread;
    }
    

    There are also other macros for this: ap_PrintExceptionsInThread , ap_PrintAndLogExceptionsInThread . These are all defined in aprobe.h

    17.55 I'd like to probe routines in the Windows sockets DLLS. Any issues I should be aware of?

    Here's what we've found:

    1. Probe WS2_32.dll as that's where all the other DLLs forward socket requests. There are a couple of MS specific routines in mswsock.dll too.
    2. Generally WS2_32.dll is loaded dynamically so it will be hard to see what sorts of functions it has available. You can use the following command line to get the symbol names from ws2_32.dll:
       aprobe -u info.dll -p -sa -dll ws2_32.dll foo.exe > out.txt
    • Where foo.exe is some executable in your local directory. The -dll <filename> switch causes the dll to be loaded at the start of a program and thus its symbols are available for info.dll to print.

    • "Bad things will happen if you probe the routine "WEP" in ws2_32.dll."

    17.56 Can I track stack usage with Aprobe?

    A probe to track stack usage is available here for Windows and AIX. The AIX should be easily extended for other Linux and Solaris.

    17.57 Is there a way to access local variables that doesn't depend on a hard-coded line number?

    Yes. Function-relative line numbers are supported using an expression consisting of a constant offset from the special values 'first' and 'last'. For example:

       probe "Outer"
       {
          // Assume that 30 is the relative line number for the next line
          // after the call to Inner
          on_line (first + 30)
          {
             $i = 99;
          }
       }
    

    To be sure you're using the right value, you'll have to know the probe-able lines in your function (see Q17.66). The offset is then the difference between that line and the probe-able line you want (e.g., if the first line is 12, and you want line 22, then probe on_line (first + 10).

    Now if the file changes your probe will still work unless you modify Outer (which is obviously less of a concern since that's the one your working with anyway).

    17.58 Can I use Aprobe to query a caller's local data that wouldn't be visible by normal visibility rules?

    What you might want to do is hold the address of the variable and then change that.

       probe thread
       {
          int *i;
    
          probe "Outer"
          {
             on_line (first)
             {
                // Store the address of i
                i = &$i;
             }
          }
    
          probe "Inner"
          {
             on_entry
             {
                // Change the value of i
                *i = 100;
             }
          }
       }
    

    Obviously this is harder for types that aren't straight integers, etc. The typeof expression can be useful here:

       probe thread
       {
          typeof ($("myrecordt", "-file types.ads")) *RecordPtr;
       }

    17.59 In APC I can reference some class members as fields of class objects, but others I cannot. Why?

    Here are some general limitations and workarounds for accessing class data and methods:

    1. Class static data is not part of the object; it is a global and is referenced using a qualified name, like
              $("Screen::nNumScreens")
    • If you're unsure of the full name of a static data item you can use:

              apinfo -d myprog.exe
    1. A class object is always called $this within a method. However, static class methods do not have a $this argument.
    2. To see what's really in the class object, use "log(*$this);" on_entry to a method.
    3. If you're unsure of the full method name in class "Class", you can use
              apcgen -L <dll-or-exe> | grep "Class::"
    • or

              apinfo -sa myprog.exe | grep "Class::"

    Here's a simple example:

    ////////////////////////////////////////////////////////////
    // TestStatic.apc
    ////////////////////////////////////////////////////////////
    probe program
    {
      on_entry
        printf ("  p. Static1.exe execution has started\n");
      on_exit
        printf ( "  p. Static1.exe execution has completed\n");
    
    }
    probe thread
    {
      probe "Screen::Screen"
      {
        on_entry
          printf ("  p. New screen has been constructed!\n");
      }
      probe "Screen::~Screen"
      {
        on_entry
          printf ("  p. A SCREEN HAS BEEN DESTRUCTED!\n");
      }
      probe "Screen::Update(void)"
      {
        on_entry
          printf ("  p. A screen update has started!\n");
          printf ("  p. Within Update, Current nNumScreens =%d\n",$Screen::nNumScreens);
      }
      probe "Screen::GetNumScreens(void)"
      {
        on_entry
          printf ("  p. GetNumScreens has started!\n");
          printf ("  p. Current nNumScreens = %d\n",$Screen::nNumScreens);
      }
      probe "main()"
      {
        on_entry
          printf ("  p. Main() has been started!\n");
      }
    }
    

    17.60 How can I enable and disable probes externally while my program runs?

    You can do this by periodically checking for the existence of a file. If you find the file enable the probe. You can automatically delete it from your probe if you want a single-action check, or delete it yourself when you want to disable the action again. For example:

    static ap_BooleanT MemsetProbeEnabled = FALSE;
    
    probe thread
    {
       probe extern:"memset()"
       {
       // We are not using floating point registers.
       // Use nofloat pragma to avoid saving them and
       // speed things up a little.
       #pragma nofloat
       on_entry
          if (MemsetProbeEnabled)
          {
             // Log parameters, traceback, etc.
          }
       }
    }
    
    #define CONFIG_FILE "/tmp/memset.cfg"
    
    static void PeriodicAction(void *EP)
    {
       FILE *fd = fopen(CONFIG_FILE, "r");
    
       if (fd != NULL)
       {
          // Togle the value of MemsetProbeEnabled
          MemsetProbeEnabled = !MemsetProbeEnabled;
    
          fclose(fd);
          remove(CONFIG_FILE);
       }
    }
    
    probe program
    {
    on_entry
       ap_DoPeriodically(
          PeriodicAction,
          15, // interval in seconds
          NULL);
    }
    

    17.61 AIX: How do I convert my pre-version-3 APC file to current one?

    Aprobe version 2, which was delivered with OC Systems' PowerAda and OATS products as well as being sold separately for C and C++, was fundamentally different in its processing and expression of APC.

    The best way isn't to "convert" at all, but to understand what the probes in your old APC file are trying to do, read the current documentation about Aprobe, and then write a probe to do the same thing in Aprobe version 4. This answer will just enumerate a few of the key differences, and rely on you to look in the user's guide for details:

    Version 2 Aprobe was available only for the AIX platform, and used low-level AIX register and symbol names. Aprobe versions 3 and newer support multiple platforms.

    In Version 2, "aprobe" actually compiled each APC file at run-time. In Version 4, you use the new `apc' program to compile the APC file(s) into a linkable UAL file, and name the UAL files on the aprobe command line.

    Version 4 APC is C with a few extra keywords. Version 2 was an invented language based on Ada syntax. So, for example, instead of case $r3 is ... you'd write switch($$r3) { ...

    In Version 4 there's an underscore to make "on entry" one word: "on_entry", "on_exit", "on_line".

    In Version 2 you could write

    probe .sym1, .sym2, on entry ...."

    In Version 4 each probe can name only one symbol, but there is the new concept of a "probe type" or "typedef probe" which may be defined and then applied to many symbols. So you'd do

    typedef probe { on_entry ... } CoolProbeT; 
    CoolProbeT  Sym1Probe("sym1");
    CoolProbeT Sym2Probe("sym2");

    In Version 2 APC there were only registers ($r3). In Version 4 you can reference parameters by position ($1, $2, etc.); In Version 4 you can reference the return value on_exit as $0, and that's not to mention accessing program variables by their source names...

    Because version 2 APC was so low-level, there was another tool "apgen" which read an "apg" file that supported a few operations on source-level variables and generated APC to access them. In Version 4, you can reference a source-level name anywhere, provided that name is available in the debug information of the executable provided to the apc compiler.

    In Version 2 a `format' was required for each log statement, and was a special syntax that could be named or unnamed in-line. In version 4 a format routine is just a C routine which can be automatically generated based on the types of the `log' arguments.

    Here is a reply given to a customer who asked this question:

    > It is my understanding that the new aprobe is more "C like" than "ADA like".
    >Beyond that, I could use a little help.

    That's true - it is basically ANSI C with some extra keywords. I take it you have gone through the examples in $APROBE/examples/evaluate to get yourself acquainted with the syntax? If not do that first and then come back to your larger problem.

    > I wasn't sure if the [Aprobe v2] words format and bytes were aprobe terms.

    Yes they are. In v2 you had `format start' and `format finish': These have been made consistent with all other probes on v4 so you would use:

    probe format
    {
      on_entry
      {
        // Put the equivalent actions to the format start here
      }
    
      on_exit
      {
        // Put the equivalent actions to the format finish here
      }
    }
    

    The bytes operator was a v2 thing. In v4 you would express the code in
    terms of C so you would probably use char [] :

    
      on_entry
       {
        char CmdText [200];
      }
    

    > I wasn't sure about the $function.

    This is where v4 is much better than v2. Since you are writing your probes in C, you can just include the header files and call the functions directly. For instance, you wish to call the `creat' function. All you need to do is:

    // Include the header files
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    
    probe format
    {
      on_entry
       {
        int fd;
        fd = creat ("Filename", 0644);
      }
    }
    

    and the same for access, system, printf, sprintf, write, etc. You'll find your probes look much better!

    > The probe part I did I am pretty sure is wrong.

    `probe' is quite different and you have to account for the different names used by GNAT and the registers used on SPARC [compared to PowerAda and AIX]. Here's an example which will be close:

    probe thread {
       // I'm guessing on the name here: If you have trouble finding the
       // routine, run `aprobe -u info.ual -p -s <exe name> > syms' and all of
       // the routines will be placed in the syms file.
       probe "Queuing_Services.Read_From_Q[2]"
       {
          // Store the parameters on entry since the registers aren't
          // available on_exit
          int   SrNum = $2;            // Second parameter was $r4 on AIX
          int   Length = $3;           // Third parameter was $r5 on AIX
          char *Data = (char *) $4;    // Fourth parameter was $r6 on AIX
    
          on_exit
          {
             // Log the data
             log (SrNum, Length, Data [0 .. Length - 1]) with DitFormat;
          }
       }
    }
    

    Your format routine should be defined above this; in v4 they are regular C routines but the important thing is that they take pointers to the data, so:

    void DitFormat (int *SrNum, int *Length, char *Data)
    {
       // Do your processing here
    }
          

    A couple of comments on the new file: It is recommended that you use C++ style comments (//) unless you wish to keep code common with some existing C code since they are less error prone.

    Make sure your format routine only has pointers for it's parameters.

    Hope this helps - like I said, make sure you understand how to write simple probes, logs and logs with formats and then you should be fine to tackle this exercise.

    17.62 (Unix) Is there a probe to see when my application "exec's" another program?

    Here is the source for a probe that should do the trick. It will record calls to all of the exec routines, including the file, calling user/group IDs, file user, group and mode information and the environment. It was written for Solaris but should work on other Unixes.

    To compile, just save to your local disk and do apc faq_exec.apc.

    To use this probe you will need to have a new or existing workspace for the process you want to watch. Then either,

    • Copy the exec.ual file into the workspace directory, or
    • Use the Add Ual option from the Setup menu in the main RC window. In the "Ual file" field type or browse to the exec.ual file. Uncheck the copy UAL checkbox and (optionally) give it a description like "Record exec calls" and click OK. The UAL will be listed in the list on the left hand side of the window; check the checkbox for it and click Build.

    Although the first option is simpler, using Add Ual will make it easier to turn on or off later.

    Now rerun, format and look for the exec calls. If necessary the probe can be expanded to record parameters if this will be necessary to identify it.

    17.63 How can I cast an enumeration value to print its numeric value?

    Yeah, the "obvious" direct cast doesn't work. The trick is to get that byte into something you can safely cast. The reliable way to do it is as shown below.

    
    probe thread {
      probe "qts.write_to_q" {
        on_exit
        {
           /* this doesn't work: log("rc=", $rc, "=", (int)($rc)) */
           char rc_val = *(char *)&$rc;
           log("rc=", $rc, "=",  (int)rc_val);
        }
      }
    }
    

    17.64 Is there a probe that will print a static call tree of my executable?

    We did a rough one which just lists the calls made by each function in a Solaris/Sparc executable. You can grab this here.

    It works by disassembling the object module and recognizing the call instructions, so it would take some work to port to other platforms. If you're really interested in having it extended or ported, please contact us.

    17.65 How can I detect memory overwrites on dynamically allocated (malloc'd) memory?

    A crash can happen because memory allocated using malloc() or its variants is being corrupted by code that writes past the end (or before the beginning) of the memory that's returned, corrupting malloc's internal pointers or adjacent data.

    The predefined "memcheck" probe detects this by putting a "fence" at the end of allocated memory, and checking the fence is intact when the memory is freed: see Q15.18.

    17.66 (Unix) How do I know when my application has forked?

    You can use the ap_AddNewProcessCallback to add a callback when Aprobe detects your new process. Pass it a handler that will be called in the child process. For instance:

    static void MyNewProcessHandler (ap_ThreadContextPtrT ThreadContext)
    {
       log ("Here is my new process");
    }
    

    17.67 How do I know what lines I can probe in a function?

    The most reliable way is to use:
       apcgen -qlines -p function_name -x module_name

    This generates an on_line section for each line in the given function. You can redirect the output to a file and edit the file with your on_line actions.

    For an executable module you can use:
       apinfo -l exe_name
    which lists all the symbols and their lines, if any. This output is simply the "raw" line information, sorted by code offset, so is not as useful for writing probes, though the output may be a good reference for use with test coverage or a debugger.

    17.68 (Windows) How can I track page faults using Aprobe/RootCause?

    In general, explaining why a Page Fault occurred is difficult. Note that there's a lot more Paging information available then just a fault count (e.g. current and peak Virtual memory, current and peak real memory, current and peak page file usage, etc.). The following is offered in an attempt to answer this question.

    For process-specific page fault, or general memory usage information, you can use the GetProcessMemoryInfo() function, which is documented in MSDN. Here's the data structure returned:

    typedef struct _PROCESS_MEMORY_COUNTERS {
        DWORD cb;
        DWORD PageFaultCount;
        DWORD PeakWorkingSetSize;
        DWORD WorkingSetSize;
        DWORD QuotaPeakPagedPoolUsage;
        DWORD QuotaPagedPoolUsage;
        DWORD QuotaPeakNonPagedPoolUsage;
        DWORD QuotaNonPagedPoolUsage;
        DWORD PagefileUsage;
        DWORD PeakPagefileUsage;
    } PROCESS_MEMORY_COUNTERS;
    typedef PROCESS_MEMORY_COUNTERS *PPROCESS_MEMORY_COUNTERS;
    

    You can also use the native API ZwQueryInformationProcess() which has the capability of returning more information than GetProcessMemoryInfo() (which uses this native API).

    For System-Wide page fault, or general memory usage information, you can use the native API ZwQuerySystemInformation() to get all sorts of performance data.

    For both above cases, you can either create a thread that calls the function periodically to collect sampling data, or better is to use the Aprobe ap_DoPeriodically() function, which does this for you.

    You could integrate this (or other system-wide statistics) with tracing in a manner similar to the RootCause "perf_cpu" probe. Contact us if you'd like some help with this.

    Also note that you can use the Windows "PerfMon" feature to generate real-time graphs of these statistics correlated with program points. See Perfmon in the Aprobe User's guide.

    (Note that the "Zw" native APIs mentioned above are not "officially" documented by Microsoft, but they are widely used in both user and Device Driver development are are safe to use. They are documented in "Windows NT/2000 Native API Reference" by Gary Nebbett and you can find them in DDK and web documentation as well).

    17.69 Is there a routine available to find symbol ids by mangled name, or one that will demangle for us?

    You can generally pass a mangled name as the name to ap_NameToSymbolId() and you'll get the correct Symbol ID. However, there is also the following (defined in aprobe.h, of course):

    extern void ap_Demangle(
       ap_DemangledNameT *Result,
       ap_NameT          MangledName,
       ap_BooleanT       IsSubprogram,
       ap_CompilerKindT  CompilerKind);
    

    Here is an example of how to use it:

    {
       ap_DemangledNameT DemangledName;
    
       ap_Demangle(
          &DemangledName,
          ".sec_fdk_Nam_Svc_Def__ELAB",
          TRUE,
          ap_AIXpa4_CompilerKind);
    
    
       // Now we can use DemangledName.FullName
       SymbolId =
          ap_SymbolNameToId(
             ap_ApplicationModuleId(),
             DemangledName.FullName,
             ap_ExternSymbol,
             ap_FunctionSymbol);
    }
    

    17.70 Is there a way to suppress (or force) the warning when probing a symbol that is undefined?

    Yes, this was introduced in RootCause 2.1.3/Aprobe 4.3.3 (February 2004). The way to do it is specify #pragma optional in column 1 immediately inside the probe (or typedef probe), for example:

    
    probe thread {
       probe extern:"PrintDebug()" {
    #pragma optional
        ...
       }
    }
    

    Conversely there is also a #pragma required which forces a warning in the case where the module is undefined. By default, a warning is not generated on probes on missing modules. For example:

    
    probe thread {
       probe extern:"open()" in "libpthread.so" {
    #pragma required
       }
    }
    
    would force an warning if libpthread.so was not among the libraries loaded by the application.

    This was possible prior to version 2.1.3 but was harder since it required use of a typedef probe and programmatic checking and instrumentation using the Aprobe API. (See for example AllocationFunctions[] array in memwatch.apc.)

    17.71 Unix: Can I call a C++ method from a probe?

    Yes, if:

    • You know the full method name, and
    • You have a this pointer for that method's class available (or else the method is static).

    In these cases, you call it just like a C function (see Q17.22 except that you pass this as the first parameter). For example, suppose you have a class that looks something like this:

    class Example {
    public:
       void doIt(const string& s);
       void debugIt(const string& s);
    };
    

    And you want to call debugit() on entry to doit(). The following works:

    probe thread {
      probe "Example::doIt" {
        on_entry {
          $("Example::debugIt")($this, &($s));
        }
      }
    
    (Note the & when passing the string parameter: APC automatically dereferences reference parameters, so you need to "restore" the reference.)

    But obviously this is a very simple example. In many real cases you have template instances with long and subtly different names. In such cases, you can use apcgen -vL to list the methods in an individual object file and "grep" for the methods you're looking for and try to match up the line numbers.

    When you have dynamically dispatched calls, you are limited to methods in common base classes, or else you need to use some conditional test to determine which specific method to call.

    Often the best choice is to use a separate extern "C" C++ module as an interface between your probe and the call, as described in Q17.23 and Q17.23.

    As always, if you have problems or questions, contact .

    17.72 How do I print/change a C++ std::string object?

    Aprobe 4.4.3 (RootCause 2.2.3) for AIX and Linux introduces a predefined probe cppstring.ual that implements the probe illustrated below. It also provides automatic support in the 'apc' command for linking with C++ on those platforms. The probe's usage is documented in $APROBE/examples/predefined_probes/cppstring/README which also encourages users to implement similar wrapper probes for other C++ library methods as needed. Contact support@ocsystems.com for older versions.

    This example explains how to combine some simple C++ with a probe to avoid having to reverse-engineer the C++. This is basically the same for all platforms, except for linking.


    First: here's some helper C++ to provide operations on the std::string:

    // cppstring_help.cpp - C++ functions supporting cppstring.apc
    
    #include 
    extern "C"  {
    
    const char *get_string(void *std_string_ptr)
    {
       std::string *s_ptr = (std::string *)std_string_ptr;
       return s_ptr->c_str();
    }
    
    void set_string(void *std_string_ptr, const char *from)
    {
       std::string *s_ptr = (std::string *)std_string_ptr;
       *s_ptr = from;
    }
    
    }
    

    Here's a header file for cppstring_help, that will be used by the probe:

    // cppstring_help.h - C++ functions supporting cppstring.apc
    
    extern const char *get_string(void *std_string_ptr);
    extern void set_string(void *std_string_ptr, const char *from);
    
    #define GET_CPP_STR(S)  get_string((void *)&S)
    #define SET_CPP_STR(S, NEW_CS)  set_string((void *)&S, NEW_CS)
    

    This is an example program, followed by a probe that uses the above helper.

    // example.cpp - setting and getting std::string values
    // compile with '$(CCC) -g example.cpp -o example.exe'
    //   where '$(CCC) is xlC for AIX, CC for SunWorkshop, g++ for GCC
    // run with zero or more arguments, e.g., 
    // $ example.exe one
    // will print
    // Example="example.exe"
    // Example="one"
    
    #include <iostream>
    #include <string>
    
    using namespace std;
    
    static void print_string(string &s)
    {
       cout << s;
    }
    
    class Example
    {
    public:
      void put_string(char *val);
      string get_string(void);
      void print_string(void);
    private:
      string value;
    };
    
    string Example::get_string(void)
    {
       return value; 
    }
    
    void Example::print_string(void)
    {
       cout << "Example=\"";
       ::print_string(value);
       cout << "\"" << endl; 
    }
    
    void Example::put_string(char *val) 
    { 
       value = val;
    }
    
    int main(int argc, char **argv)
    {
       string Str;
       Example X;
       for (int i = 0; i < argc; i++)
       {
          X.put_string(argv[i]);
          Str = X.get_string();
          // ::print_string(Str);
          X.print_string();
       }
       return 0;
    }
    

    And here's the apc:

    // cppstring.apc - setting and getting std::string values and parameters 
    //                  from a probe.  
    // OTHER REQUIRED FILES: cppstring_help.h, cppstring_help.cpp
    // BUILDING THE UAL:
    // All:
    //   $(CCC) -c cppstring_help.cpp # create cppstring_help.o for your platform
    // Solaris:
    //   apc cppstring.apc -x example.exe -u cppstring_help.o 
    // AIX:
    //   apc cppstring.apc -x example.exe cppstring_help.o -linker "-lC"
    // Linux:
    //   apc cppstring.apc -x example.exe cppstring_help.o -linker "/usr/lib/libstdc++.so.6"
    // Windows:
    //   apc cppstring.apc -x example.exe cppstring_help.obj
    // 
    // RUNNING
    // Run with 'aprobe -u cppstring example.exe' to get output like:
    // put_string: Changing example.exe to probe_string1
    // get_string: Changing probe_string1 to probe_string2
    // print_string: Changing probe_string1 to probe_string3
    
    // this defines the macros GET_CPP_STR and SET_CPP_STR
    #include "cppstring_help.h"
    
    // The replacement strings:
    static char probe_string1[] = "probe_string1";
    static char probe_string2[] = "probe_string2";
    static char probe_string3[] = "probe_string3";
    
    probe thread
    {
    
       probe extern:"Example::put_string(char*)"
       {
          on_entry
          {  // change entry parameter to be a new string 
    	 printf("put_string: Changing %s to %s\n", $1, probe_string1);
    	 $1 = probe_string1;
          }
       }
    
       probe extern:"Example::print_string(void)"
       {
          on_entry
          {  // change entry parameter to be a new string 
    	 printf("print_string: Changing %s to %s\n", 
    	     GET_CPP_STR($this->value),
    	     probe_string3);
             SET_CPP_STR($this->value, probe_string3);
          }
       }
    
       probe extern:"Example::get_string(void)"
       {
          on_exit
          {  // change return value parameter to be a new string 
    	 printf("get_string: Changing %s to %s\n",
    	     GET_CPP_STR($return),
    	     probe_string2);
    	 SET_CPP_STR($return, probe_string2);
          }
       }
    }
    

    In the comments above, note the different ways linking is done to include the C++ library. The '-u' flag on Solaris means the probe will reference the definition that are in the application as described in q20.13.

    18. Writing Java Probes

    18.1 How do I use Aprobe on a Java application?

    See Chapter 5 of the Aprobe User's Guide for Unix and Windows .

    18.2 Can I change the return value of a Java function?

    Yes. Here's a simple application, a probe, and the xmj file:

    // The application Simple.java
    public class Simple
    {
       int doIt ()
       {
          return 10;
       }
    
       public static void main (String[] args)
       {
          System.out.println ("doIt returns " + new Simple ().doIt ());
       }
    }
    
    // The probe SimpleProbe.java
    public class SimpleProbe extends com.ocsystems.aprobe.ProbeMethod
    {
       public Object onExit (Object returnValue)
       {
          return new Integer (11);
       }
    }
    
    <!-- The xmj file simple.xmj -->
    <probe_deployment>
       <probe class="SimpleProbe" parameters="readonly">
          <target value="Simple::doIt"/>
       </probe>
    </probe_deployment>
    
    
    $ javac Simple.java
    $ javac -classpath $APROBE/lib/aprobe.jar SimpleProbe.java
    $ apjava -u simple.xmj -java Simple
    doIt returns 11
    

    18.3 Can I throw an arbitrary Java exception from my probe?

    Unfortunately not. Java requires that all exceptions, other than RuntimeException and it's descendants, must be declared by the method or caught. We cannot specify that the base Aprobe Patch class throws a specific exception because that would require that all methods that called it would have to either catch the exception or specify that it throws it. However, you can throw any RuntimeException.

    18.4 When using a Java custom probe, can I get output to appear in the Trace Display tree?

    Yes there are a few ways:

    1. Use the methods in com.ocsystems.aprobe.Logger to log objects (including strings).
    2. Use the com.ocsystems.aprobe.TraceBean.logComment method to log a comment. You'll get an exception if you have de-selected trace for the run because you are calling a native method directly.
    3. Write some custom apc to go along with the custom java; have the custom apc define specific format routines for the logged data and export some native methods; have the probe bean call the exported native methods. Needless to say this option is about as advanced as you can get and we don't really document it. No user has got to the stage of doing it yet. If you are there,.

    18.5 Is it possible to "stub" a Java method so it does not execute the code in the original method?

    Yes, starting with RootCause version 2.1.3a (April 2004). To stub a method, simply call the stub() method at the end of the onEntry probe method, for example:

    import com.ocsystems.aprobe.*;
    public class TestProbe1 extends ProbeMethod
    {
       public boolean onEntry(Object[] parameters)
       {
          stub();
          return true;
       }
    }
    

    18.6 Is there any way to probe classes from rt.jar, e.g., java.io.*?

    Sorry but you cannot probe any classes in the bootpath, which includes rt.jar. This is a limitation basically imposed by the JVM because you cannot call methods which are not in the bootpath from within bootpath classes. That is, you could never apply a probe because that class would be in the child's class loader so the parent wouldn't have visibility. In informal discussions with engineers in Sun's JVM group they said it was a bad limitation of the JVM because it made bytecode patching, which was a "preferred" technology, very difficult.

    We have kicked around the ideas of having a bridge to native code in the bootpath classes and then the native code calling the probes but the technical issues are difficult.

    For some problems, instead of probing these classes it's possible to probe the native methods underneath. For example, probe the file access routines in the libc library (or equivalent on Windows) rather than the java.io methods.

    18.7 How do I call another method in the same class instance from within my Java method probe?

    The 'this' object is the first parameter (params[0]). So if you're probing a method in class SquareID, and you want to call otherMethod() there, then it'd be something like:

    ...
      SquareID id = (SquareID) params[0];
    
      id.otherMethod();
      return true;
    ...
    
    Note that the code has to import the SquareID class, too:
    import SquareID;

    See Custom Java Probes in the RootCause Java user guide for more basic information.

    18.8 Can I add custom Java probes within the RootCause GUI?

    No. Most or all of it must be done from the command line. In a GUI you can click on "Custom" button in the setup options, but this would only bring up a help dialog with the instructions on how to set the XMJ and the corresponding Java code. You can cut and paste from this dialog to create your .xmj file in the workspace. After that, you would probably only use the workspace and intercept mechanism to deliver your probes to the application in an automated fashion. You could apply these probes directly to your application using the apjava command. RootCause just hides this from the user of the application.

    18.9 (Windows) How would I trace a Java applet running with Internet Explorer (IEXPLORER process)?

    NOTE: this applies only if your Java plugin is the Sun JRE. It cannot be the Microsoft JRE. You can download the current Sun JRE from here.

    First, you would have to create a workspace for IEXPLORER. (This is most easily done starting from the APP_START event for IEXPLORER in the RootCause Log). Then you would need to setup for a Java trace in this workspace. Since IEXPLORER is not a Java application (it has JVM library linked into it), you will find that RC Trace Setup tree would not have a $Java$ module node created and available for trace selection.

    To make the $Java$ node appear in the Trace Setup you need to add at least one class path entry to the workspace. If all your Java applet classes will be loaded from the web, you would technically not need any classpath entries, but you could still add a dummy one (just type in any directory name on your hard drive in the dialog opened by the Setup->Class Path menu item.

    Once you have added at least one class path entry, click on Setup button. You should now see $Java$ module. Click on it and use MB3->Trace All Java Classes to setup a trace for all the Java. Of course, you can be more selective in what you would like to trace.

    18.10 Can I change the value of parameters passed to a Java method?

    Yes, starting with RootCause version 2.1.3a (April 2004). There are two parts:

    1. In the deployment descriptor XML file, indicate that the parameters are read/write (not the default of read-only):
      <?xml version="1.0" encoding="UTF-8"?>
      <probe_deployment>
         <probe class="TestParamsProbe">
              <target value="ParamsTester::callIt(java.lang.String,boolean)" 
                      parameters="readwrite" />
         </probe>
      </probe_deployment>
      
    2. In the probe itself, simply assign new Objects to the params vector:
      import com.ocsystems.aprobe.*;
      public class TestParamsProbe extends ProbeMethod
      {
         public boolean onEntry (Object[] params)
         {
            // params [0], the 'this' parameter, can't and won't be changed.
            params [1] = new String ("This is a new string");
            params [2] = new Boolean (true);
      
            return true;
         }
      
         public Object onExit (Object returnValue)
         {
            int value = ((Integer) returnValue).intValue ();
      
            return new Integer (value + 1);
         }
      }
      

    18.11 Can I log any Java variables other than method parameters?

    The Variables pane in the RootCause Trace Setup dialog only supports logging Java parameters (all or none). In a custom probe, you can access individual parameters by position, and the return value. From a custom Java probe, you can access public class data just as you would from another class in your Java application. There is no access to method local data or class private data.

    18.12 Is there a way to define nested probes in Java similar to that supported in APC?

    Yes. In APC you'd write something like:

    
       probe "a()" {
         probe "b()" {
           on_entry
              do_something();
         }
       }
    

    For Java it's not quite as clean as with APC because of the split between the probes in Java and the definition in XML. The file Example14.java has two Probe Methods; the MyUmbrellaProbe is the equivalent of the "a()" in the above example. It creates a new MyNestedMethodProbe probe (i.e., "b()") in it's onEntry method. The file Example14.xml is the probe deployment descriptor. We just define both targets in it. Note that you don't specify the hierarchy in the XML: it's defined by the Java probe.

    19. Logging Data

    19.1 What's the difference between "logging" and "printing"?

    Printing you understand. You call "printf()" or "puts()" and it displays what you passed to it directly to standard output (or some other file if you used fprintf()) as soon as the call is executed.

    Logging, as implemented by the "log" directive in APC, is more complicated. It writes the data you specified within the parentheses to a memory-mapped APD file, and associates a "format routine" with that data. The format routine is not called, and the data is not displayed, until later when the "apformat" command is run over the APD file.

    Another important difference between printing and logging is that the Aprobe log mechanism is lock-free, whereas printing requires a lock to get exclusive access for the printing thread. This gives a significant advantage to the log operation in multi-threaded applications where performance and deadlock are considerations.

    19.2 Why do I get data mismatch warnings logging to my very simple format routine?

    All parameters to a format routine must be *addresses*. So if you do

            log((int) x) with myformat;

    then you must have

            static void myformat(int *i) { ... };

    If you had declared "myformat(int i)" then you would get a warning from the C compiler invoked from `apc'.

    19.3 Why do my format routine parameters (usually) have to be pointers to the type logged?

    The short answer is, "Because that's how it works." There are two real reasons. The first has to do with the whole logging/formatting concept. Data is copied to a memory-mapped file when logged. When formatting, we memory-map the APD file. To pass the data to the format routine directly, we'd have to allocate temporary space of the right size and copy it again.

    It's much more elegant to pass everything -- scalars, structs, and arrays -- by pointer. That way, when you log an `int' value, you write it to the APD file, and when you format it, you just pass its address in the memory-mapped apd file directly to the format routine. This allows ints, arrays of ints, and structs to all work the same way.

    The second reason is related to the first, and has to do with the fact the C doesn't have an array "type", but rather treats any adjacent locations in memory as an array. Here's what our chief designer has to say on this subject:

    When designing the APC extensions such as 'log' statements we had to make sure that they would work with any data types, including scalars, structs and arrays. It was array types that gave us the most problems, mostly due to the fact that C has very little support for arrays.

    Even though one can declare an array with a given number of elements, such declarations are limited as to where they can appear (e.g. you can not use a pointer to an array declaration inside of a formal parameter list) and operations for array types are essentially the same as operations for pointer types.

    Now consider these 2 log statements below:

    int foo[10];
    
    log(foo[0]) with MyFormat;
    log(foo[0..9]) with MyFormat;
    

    The format for the first log statement could have used 'int' like you suggest, but what about the second log statement? Of course, we could have treated the first log statement differently from the second one, since the first one clearly logs one element, while the other logs a range of elements. If we did so we would use 'int' in the format declaration for the first 'log' statement and 'int *' for the second. But even so, you would still have cases like this:

    log(foo[0..0]) with ... // Do you use 'int *' here or 'int'
    ?  log(foo[Var1..Var2] with ...  // We don't even know the number of
    elements here.  

    The requirement that all formats use pointers to the data as argument allowed us not to make any distinction between the way we log scalars and arrays. If this seems to be confusing to you, you can always use a simpler interface, where you don't have to provide any formatting routine at all.

    log("foo[0] => ", foo[0]);

    If this doesn't make sense to you, you are not alone. Some of us didn't like the way this had to be done either, unfortunately no one came up with a better solution than the one we have right now. If you have such suggestions, feel free to share them with us.

    19.4 How can I control the size of the APD file produced?

    This is specified as a parameter to the aprobe command. By default there is a single 256M file (1M on Windows). You can specify the number of files (see the next Q.) and/or the maximum size of each file. You set the maximum size of each file (in bytes) with "-s n_bytes". You set the number of files with "-n num_files", where num_files must be in the range 0-9. If you specify 0, all logged output is discarded. If you specify 2 or more, but don't explicitly set the size with "-s", the maximum size is set to 2 megabytes.

    Note that on Unix Aprobe data files grow up to the maximum size. Unfortunately Windows does not allow memory-mapped files to grow. They are opened to their maximum size.

    19.5 What is an "APD ring"?

    The "APD ring" is how the aprobe logging mechanism deals with large quantities of data. By default there's a single APD file produced by aprobe, with a maximum size of 256 M on Unix platforms and just 1 M on Windows (because memory-mapped files are not dynamically extensible). If you try to log more than that, the last (newest) data is lost.

    If you specify more than one file, the files conceptually form a "ring" so that the most recent data is always kept, and the oldest data is lost. The ring is really more like a fixed-length stack where data falls off the bottom when additional data is pushed onto a full stack.

    Details are described under "APD File" in Appendix B (Files Reference) of the Aprobe User's Guide.

    19.6 How can I control what goes into each APD file?

    You can't log data to whatever file you want, but you can register a callback routine that is called whenever the logging mechanism changes to a new file in the ring. This is illustrated by the example in APROBE/examples/learn/apd_ring included with Aprobe.

    19.7 How can I reduce the time that is spent logging data in my probes?

    See the section "Log Statement Overhead", under "Aprobe Performance Considerations", in Chapter 4 of the Aprobe User's Guide.

    19.8 How can I log data so it's guaranteed to be available when I format, even if the APD ring wraps around?

    The appropriate place for such data is the persistent apd file. You can log to this like this:
    log (...) with blahformat to ap_PersistentLogMethod;

    Since the persistent file is always formatted first this would mean that you would get your data earlier than you would if you logged to the apd files, in the format on_entry part.

    20. Other Aprobe Questions

    20.1 Where does aprobe get its "time" from (e.g., for the profile probe)?

    On Windows, Aprobe calls GetSystemTimeAsRealTime defined in winbase.h

    On Solaris, Aprobe reads the realtime clock directly using:

          clock_gettime( CLOCK_REALTIME, ap_TimeT_ptr);

    defined in /usr/include/time.h .

    On AIX, Aprobe reads the realtime clock directly using read_real_time , then converts to ap_TimeT using time_base_to_time , both defined in sys/time.h .

    On Linux, Aprobe just calls gettimeofday() defined in sys/time.h.

    20.2 Why do my threads execute in different order under aprobe?

    Almost certainly it's timing. Each time a thread is created, aprobe collects some information. This can delay thread creation somewhat and change the order in which threads are executed. Also, your probes take some time, and delay a thread that executes a probe relative to another that does not.

    20.3 It looks like if I run "aprobe -if", both the probe program and probe format get executed, which messes up initialization. How can I avoid this?

    There's a function ap_CurrentAprobeState() that returns either ap_AprobeRunTime or ap_AprobeFormatTime. So you can do:

       if (ap_CurrentAprobeState() == ap_AprobeFormatTime) { ... } 

    in your probe format. This is the preferable way.

      probe program {       
        on_entry {          
          DumpInfo();          
          // Don't run the program.  Exit after printing all the info.
          // (MAGIC exit code tells runtime this is *not* and error)
          exit(APROBE_MAGIC_EXIT_CODE); 
        }
      }
    
      probe format {      
        on_entry {          
          if (ap_CurrentAprobeState () == ap_AprobeFormatTime)
          {             
            DumpInfo();             
            /* Don't do any formatting.  Exit after printing all the info. */
            exit(0);          
          }       
        }    
      }
    

    20.4 Solaris: I have a probe on_exit to a function to change the struct that is returned. It causes a core-dump when the probed function called as a procedure. What's the problem?

    On Solaris, a structure returned by value is written to space on the stack allocated by the caller. However, if the caller is discarding the returned value by calling the function as a procedure, no space is allocated. In this case, a probe which may normally attempt to change the return value should not do so, as it will likely corrupt memory. In order to allow users to handle this problem, the following macro is provided:

    #define ap_StructValueReturnExpected private

    This would be used as a boolean expression in an on_exit part as follows:

    probe "UpdateCoordinates()"
    on_exit
      if (ap_StructValueReturnExpected) 
        $return.x = $return.y = $return.z = 0;
    }

    20.5 Windows: There is a parameter in a method call which is passed by reference. It is modified by the method and I want to see what it is on exit. Aprobe doesn't allow this, saying that parameters are visible only on entry. Is there a way to see how this value gets modified?

    The way to do it is to save the address of the parameter on entry and log the dereferenced value of the saved pointer on exit from the subprogram. For example:

    Given the C++ file "t.C":

    typedef struct
    {
      int a;
      int b;}
    MyStructT;
    void foo(MyStructT &MyS) {}
    main(){
      MyStructT S = {10, 1999};
      foo(S);}

    Then the following apc file "t.apc" would do it:

    probe thread
    {
      probe "foo"
      {
        typeof($1) *Param1 = &($1);
        on_exit
        {
          log("Param1 => ", *Param1);
        }
      }
    }

    Note that the declaration where Param1 is initialized is executed in an implicit on_entry part. See the next Q about using "typeof", and other ways to declare variables.

    20.6 I want to capture the address of a target expression on entry in a pointer to the right target type. How do I declare this?

    There are (at least) 3 possibilities, illustrated in the APC file below:

    probe thread{
      // Method 1: Use the APC "typeof" operator on the type name directly as a
      //           target expression:
      probe "foo"  {
        typeof($MyStructT) *Param1 = &($MyS);
          on_exit {
          log("Param1 => ", *Param1);
        }
      }
      // Method #2: Use the APC "typeof" operator on the target
      //            expression for the parameter name:
      probe "foo"{
        typeof($MyS) *Param1 = &($MyS);
        on_exit{
          log("Param1 => ", *Param1);
        }
      }
      // Method # 3: Use the "typeof" operator on the target expression
      //             for the positional parameter.
      probe "foo"
      {
        typeof($1) *Param1 = &($1);
        on_exit
        {
          log("Param1 => ", *Param1);
        }
      }
    }

    This applies whether you're capturing a parameter or global value, or even assigning an APC value to a target expression. The type declaration is the important point here.

    Of course target expressions apply only if you have debug information available for the definition of the various names. Otherwise, you must reproduce or include the C type declaration directly in the APC, and reference it there.

    20.7 I want to probe a method in a template class. How do I refer to the method in the function probe on that method?

    This can be tricky. What you need to do is get a list of all the functions as Aprobe will reference them. The info.ual predefined probe is provided precisely for this purpose, and "apcgen -L" also works. In this case, if your executable were named "myprog.exe", and the method you wanted to probe were called Method, try:

      aprobe -u info -p -s myprog.exe | grep Method
    
    or
    
      apcgen -Lv myprog.exe | grep Method

    This gives each function name which can be probed, and the file and line on which it's declared. This can still be pretty tricky for template instances, but it's the best we have at the moment.

    20.8 When I trace all the functions in my (Windows) DLL some functions appear to be entered twice, once with a name that has the string "?0" appended to it and once with the name I think it should have. What is going on?

    For example:

    [Enter : extern:"dyndll2d1_a()?0" in "dyndll2d1.dll" at 10:08:02.799519095
     [Enter : extern:"dyndll2d1_a()" in "dyndll2d1.dll" at 10:08:02.799526359
      [Enter : extern:"dyndll2d2_c()" in "dyndll2d1.dll" at 10:08:02.799534181
      ]Leave : extern:"dyndll2d2_c()" in "dyndll2d1.dll" at 10:08:02.800248518
     ]Leave : extern:"dyndll2d1_a()" in "dyndll2d1.dll" at 10:08:02.800256899
    ]Leave : extern:"dyndll2d1_a()?0" in "dyndll2d1.dll" at 10:08:02.800262486 
    

    In the above example my DLL has a routine named dyndll2d1_a in it. In the trace above some routine named "dyndll2d1_a()?0" is called before my routine "dyndll2d1_a()" is called. There certainly isn't anything like dyndll2d1_a()?0 in your source code anywhere?

    Aprobe is attempting to show you what is really happening in your program. Often when one creates a DLL in Windows the C++ compiler adds a small routine that just calls the real routine. This "thunk" routine is what is pointed at by the exports directory in your DLL. It is this piece of code that Aprobe has named "dyndll2d1_1()?0" and that you are seeing in the trace.

    Aprobe attempts to gather as much information about the routines in your program as possible. The sources of information include:

    1. An exports directory (the list of exported routines in a DLL)
    2. A COFF symbol table in the executable file
    3. A CODEVIEW symbol table in the executable file
    4. A COFF symbol table in an associated DBG file
    5. A CODEVIEW symbol table in an associated DBG file
    6. A CODEVIEW symbol table in an associated PDB file

    While this variety of sources presents a wealth of information about your program it also can cause a problem when the information is not consistent. In the above example the exports directory lists a symbol "dyndll2d1_a" and has it pointing one place, the PDB symbol table has the same name pointing another place. Aprobe requires that each global symbol point in only one place (you can have similarly named local symbols pointing in different places since they must be distinguished by their associated file name). Thus when Aprobe detects two symbols that point in different places it will change the name of one of the symbols to resolved the conflict. You can use the apinfo command to discover what symbols Aprobe has found and what name it uses for each one.

    Another symbol-related issue, not illustrated above, is when two different symbols both point to the same address. In this case both symbols will exist in Aprobe's master symbol table but only one ap_FunctionId will be created. Both symbols will point to this single ap_FunctionId but it will only point back to one of the symbols. This can produce a situation where you probe one routine but the name given in a traceback for the routine will be a different name.

    Generally the only way to get these kinds of duplicate symbols is by taking some explicit action in the source code or in a .DEF file to the linker.

    20.9 Solaris: I'd like my probe to call a little C++ function which creates an object and invokes a method with that object. Can I do this?

    Yes, you can. This is described in the User's Guide under "Building a UAL with Unresolved References" in Chapter 4 of the Aprobe User's Guide.

    20.10 Solaris: I use pathmap to tell dbx where to find my object & source files. How do I tell Aprobe where to find them?

    In dbx, run pathmap with no arguments to list the pathmap. Then build a value for the environment variable APROBE_SEARCH_PATH that consists of each of the "to" directories listed in the pathmap output in the reverse order they're listed.

    However, if your pathmap values are partial paths (that is, there are several subdirectories under the "to" directory that contain object), you will need one entry in APROBE_SEARCH_PATH for each such subdirectory.

    Note that aprobe and apc will always use the object file at its original location if it exists. This could be a problem if you replace that with new object files of the same name while still referencing the older executable.

    20.11 In what order do separate probes on the same function probes execute?

    A user wrote: "What I want to do is:

    probe thread
    {
        probe "myfunc()"
       {
           ap_BooleanT IsEnabled = TRUE;
       on_entry
           if (some_expression)
           {
              IsEnabled = FALSE;
              return;
           }
        on_exit
          if (! IsEnabled) return;
        on_entry
           do_something();
        on_exit
           do_something();
       }
    }
    

    The first on_entry/on_exit pair would be the wrapper part and would prevent any second on_entry/on_exit pair from executing. Can I count on the first pair executing in order?"

    Here's the answer: "Yes. On_entry/exit should execute in lexical order. If you have multiple probes on the same routine, their on_entry's should execute in lexical order as well, however on_exit's will execute in the reverse order to ensure proper nesting. Probe program on_entry actions are executed before probe thread's ones and those are executed before any subprogram probes if any.

    UALs on the aprobe command-line (or in the RootCause workspace's aprobe script) are initialized in reverse order, i.e., right-to-left. Similarly, two probes on the same function in different UALs are executed right to left, for example:

    $ aprobe -u t1 -u t2 t.exe
    enter t2:main()
    enter t1:main()
    exit t1:main()
    exit t2:main()
    

    20.12 Is it possible to reference C++ files from my application from within my UAL.

    Not in general, but on Solaris, it is possible to compile C++ object files against your C++ target application, and link them into your UAL under certain conditions, but there are restrictions on what data you can access and you must supply extra flags on the apc command. This process is described below:

    20.13 Solaris: Can I build a UAL with unresolved references?

    Yes, on Solaris. A UAL is a shared library created by the apc compiler when you compile your probes. Generally all functions and data called from within the probe must be defined in the probe, or in a library linked with it. Accesses of data and calls to functions whose names are not known when the UAL is linked are unresolved references, and usually render the library unusable since its undefined what should happen when the reference is encountered at run-time. However, there are certain circumstances when unresolved references are useful, thanks to the run-time linking mechanism provided by the Solaris operating system.

    The circumstances in which an unresolved reference in a UAL are these:

    • you have written some functions or methods in your programs native language (e.g., C++), and you wish to call these functions from your probe.
    • these functions are not already part of your target application, so you cant just call them using a target expression (see "Target Expressions" in Chapter 3 of the Aprobe User's Guide).
    • these functions call other functions in your target application, but they do not reference any data from the target application.

    For example, suppose you want to write a probe that creates a new object of a given class, gives it the right values, and passes it to a method in the class. This is trivial to do in a new C++ function, so you write one up, give in an extern C, so it can be called from your probe (which is in C) and compile your C++ function into a separate object file call CallMyClass.o. CallMyClass.o contains unresolved external references to the C++ runtime, and to constructors of the class which are used to create the object.

    Now you write your probe, MyClassProbe.apc, and compile it as follows:

    apc MyClassProbe.apc CallMyClass.o -u -x MyProgram

    The option -u on the apc command line specifies that unresolved symbols are allowed to remain in the UAL.

    Now run aprobe:

    aprobe -u MyClassProbe.ual MyProgram

    This is what happens to the unresolved symbols when you run aprobe and the UAL is loaded:

    Data symbols

    When the UAL is loaded, all references to data symbols are resolved. If the undefined symbols are not present in the application or one of its shared libraries, the UAL will not be loaded. This is particularly important to understand at format time since the application symbols are not present at format time. Therefore, no UAL with unresolved data symbols may be used at format time, even if the references will never be executed.

    Function symbols

    When the UAL is loaded, function references are not resolved until the function is called. This means that symbols may remain unresolved so long as no attempt is made to call those functions. This means that at format time, UALs with references to function symbols which are only found in the application may be loaded so long as no attempt is made to call those functions This is generally the case, since only format routines and not probes are executed at format time.

    This can be tricky if the C++ code you want to link in is complicated, or unknown to you. However, if its something straightforward that is under your control to change, it can generally be adapted to these restrictions. As always, contact OC Systems for guidance in these advanced features.

    (Unix) Can I force a snapshot of my predefined probe data by sending a signal to my application?

    Yes. The following apc code registers for SIGPROF and does a snapshot:

    #include <signal.h>
    #include "memwatch.h"
    
    static void Handler (int sig, siginfo_t *siginfo, void *ucp)
    {
       printf ("Taking snapshot on signal %d\n", sig);
       ap_Memwatch_DoSnapshot ("snapshot signal");
    }
    
    probe program
    {
       on_entry
       {
          ap_RegisterSignalHandler (SIGPROF,
                                    ap_CallBeforeUserAction,
                                    Handler);
       }
    }
    

    If this file was memwatch_sig you would compile it with:

    apc memwatch_sig.apc memwatch.ual

    and then use memwatch_sig.ual instead of memwatch.ual when running. Then send the signal (kill -PROF pid ) to generate a snapshot.

    20.14 How do I log multi-dimensional Ada arrays?

    Aprobe only supports getting one slice at a time -- for the right-most index. For individual elements, therefore, it's trivial:

      log ($available_overlays [1] [1]);

    or you could use a single slice:

      log ($available_overlays [1] [1 .. 10]);

    Multi-dimensional arrays should scale up fine. Since the arrays are stored contiguously you could cheat and cast it to a one-dimensional array if you're clever about your labeling.

    20.15 AIX: Why isn't my ual world readable?

    The apc command does a `chmod 640' on the ual it generates after a successful link. This is necessary because this effects how the shared module is loaded at run-time. Here's an excerpt from AIX 'info' output for 'dlopen()', which is the runtime routine used to load UALs when running aprobe:

    • If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the slibclean command to remove unused modules from the global shared library segment.

    It seems obvious that we don't want individual's uals the shared library segment. Multiple edit/apc/aprobe cycles could result in bizarre behavior. The slibclean command can only be run by an account with su privileges.

    20.16 AIX: When I use pthreads calls in my probes, the UAL won't link. Do I need to explicitly specify the library or change my compiler_profiles file?

    We strongly advise against linking probes with a thread library since it can cause major problems when run against a single threaded application. The recommended approach on AIX, although a little painful, is to look up the symbol dynamically and call it by pointer. Here is an example for pthread_attr_getstacksize:

    
    // Define a type to map to the routine
    typedef int (*pthread_attr_getstacksize_subprogram_T)
       (pthread_attr_t *, size_t *);
    
    // Declare a variable to hold the address
    static pthread_attr_getstacksize_subprogram_T
       pthread_attr_getstacksize_subprogram_ptr = NULL;
    
    probe program
    {
       on_entry
       {
          pthread_attr_getstacksize_subprogram_ptr =
             (pthread_attr_getstacksize_subprogram_T)
                ap_FunctionPointer (ap_ModuleNameToId (PthreadModuleId (),
                                   "pthread_attr_getstacksize()",
                                   ap_NoName);
    
          // Call it - note don't do this on program entry until you have the
          // fix for that!
          if (pthread_attr_getstacksize_subprogram_ptr)
          {
             pthread_attr_getstacksize_subprogram_ptr (&Attributes, &Size);
          }
       }
    
    The PthreadModuleId () routine would look something like:
    
    static ap_ModuleIdT PthreadModuleId(void)
    {
       ap_ModuleIdT Result;
       /* First, the 4.3 case */
       Result = ap_ModuleNameToId("libpthreads.a(shr_xpg5.o)");
    
       if (ap_IsNoModuleId(Result))
       {
          /* Didn't find it in shr_xpg5.o, so if we don't find it in shr.o ...
    */
          Result = ap_ModuleNameToId("libpthreads.a(shr.o)");
          /* ...we'll give back that null result. */
       }
       return Result;
    }
    

    For Solaris things are a little different because, in general, Solaris provides stubs to all of these routines in libc.so. Therefore you can just call the routines directly.

    For Linux a similar approach can be used as for AIX. In that case the module is "libpthread.so" always.

    Windows always has thread support so nothing special is needed.

    20.17 Is there a way I can manage thread-specific data without using native thread-management routines?

    Yes. Defining and referencing thread-specific data is built into Aprobe. Here is an example:

    
    int *GetThreadSpecificInt();
    
    probe thread
    {
       int ThreadSpecificItem = 0;
    
       int *GetThreadSpecificInt()
       {
          return &ThreadSpecificItem;
       }
    }
    

    Now you can call GetThreadSpecificInt() function from anywhere to get hold of the thread specific data item. This should work equally well on all the platforms and be usually much faster than using pthread functions.

    You can report or take actions when each thread starts and stops as well:

    
    probe thread
    {
       on_entry
         printf("Entering thread\n");
       on_exit
         printf("Exiting thread\n");
    }
    
    The predefined probes in the $APROBE/probes have many sophisticated examples of this. A simple example is available on Unix platforms in $APROBE/examples/evaluate/5.threads.

    20.18 How does using Aprobe for C++ differ from using Aprobe for C or Ada?

    These are interesting differences:

    • memory
    • objects
    • mangling
    • exceptions
    • generics/templates
    Aprobe tries to make the probe interface common, but language differences may get in the way:
    • C++ applications tend to be much bigger, which can make the RootCause GUI very slow.
    • C++ calls extra procedures, like class constructor/destructor.
    • C++ has objects whose members Aprobe tries to access, with varying degrees of success.
    • C++ has name mangling that Aprobe tries to hide, with varying degrees of success.
    • Aprobe does not support throwing C++ exceptions as it does for Ada exceptions.
    • C++ throws objects with exceptions, while Aprobe only logs the object's address.
    • C++ uses standard templates whose expanded form is all that Aprobe sees.
    • C++ has multiple inheritance whose rules are resolved by the compiler.

    These differences can make Aprobe a little harder to use on a C++ application, or a little less satisfying when a probe logs data to be formatted for easy reading. For example, constructors and destructors may get profiled/traced, but most of the time, they just clutter the report; objects may be shown with member addresses instead of member data; mangled names sometimes show in reports or apc input; exception object content may be needed but lacking; output may show the internal form of an expanded template rather than the source form written by the programmer; a probe's references to inherited data may need compilation by the C++ compiler to be right.

    OC Systems is developing a strategy whereby C++ can be linked with the probes to circumvent many of these problems--contact us to learn more.

    20.19 Why does my C++ application crash when run with Aprobe?

    If your application is bigger than, say 100M, the chances are that it's running out of memory. You can verify this by running the "apsymbols" command, for example, apsymbols c2.eab. If it crashes, then that's the problem. If apsymbols doesn't crash it, the problem might be elsewhere. See FAQ 13.13.

    There are two known reasons why aprobe may cause the application to run out of memory:

    • Demangler memory leaks - On AIX, the IBM C++ runtime is used to "demangle" the C++ symbols. There was a memory leak in all versions before 7.0.0.3. You can use the command "lslpp -l xlC.aix50.rte" on AIX to see what version is installed on your AIX box.
    • Old Aprobe Version - Aprobe creates a symbol table in memory. Prior to version 4.3.4a (RootCause 2.1.4a) it used the same memory as the application itself, and so there would be insufficient memory for both aprobe and the application. Do aprobe -h | head to see what version you have.

    See the next question for possible workarounds.

    20.20 (AIX) My application + aprobe or its tools runs out of memory. What can I do?

    This is a side-effect of having huge C++ applications as described above. On AIX there's a way to give your application more memory. AIX supports a concept called the Large Address-Space Model. This may be applied via an environment variable when running, for example:

       LDR_CNTRL=MAXDATA=0x20000000 c2.eab
    or
       LDR_CNTRL=MAXDATA=0x20000000 apformat c2.apd
    

    This means it allocates all of 2 memory segments (3 and 4) for your application's memory. If you need even more memory you could try 0x30000000 but this may not work at runtime because some applications hard-code use of segment 5.

    20.21 My application + aprobe or its tools is very slow starting up. What can I do?

    This is, again, because of the huge symbol tables in ERAM C++ programs. The workaround is to use Aprobe's ADI (Aprobe Debug Information) mechanism to pre-construct the symbol table for an executable. Here's how it works:

    1. Assume m2.exe is a executable that still has its symbol table and line information:
    2. Create an ADI file for that executable using the apmkadi command, for example:
      cd /u/m2
      apmkadi -o m2.adi m2.exe
      
    3. Reference the ADI file just like a UAL, for example:
      aprobe -u m2.adi -u trace.ual m2.exe
      
    4. Run as you do now.
    5. Explicitly reference the adi file when you format, for example:
      apformat -u m2.adi m2.apd
      
    6. If an ADI file of the default name is found in the same directory as the exexecutable, and its checksum matches, it is used automatically.

    20.22 (AIX) Why is the C++ exception raised in my libxml++-1.0.a library not reported by exceptions.ual?

    The AIX C++ compiler, unlike other compilers, generates a copy of the C++ runtime exception-catching function in every shared library, rather than just the C++ runtime library. Aprobe automatically instruments this function, "__Throw" in the predefined libC.a library, but not in user-provided libraries. For that, you must use a special probe, cppexcmodules.apc, edited to name your library or libraries.

    20.23 Why don't my on_line probes work?

    This is likely because the code you are probing was compiled with optimize. Check your Makefile to see if CFLAGS, CXXFLAGS contain -O.

    20.24 How do I probe a C++ application's CPU usage?

    Unless you're an Aprobe or RootCause power user, the way to do this is with the statprof predefined probe (Unix platforms only). If possible, use it in an environment where the application terminates normally or with Ctrl-C (but not "kill -9"). Simply put "-u statprof" on the command-line or in the .apo file, and when you format a table will be generated showing what functions used what percentage of CPU. Details are in the user's guide.

    If your application doesn't terminate normally you'll need to force a snapshot, as described below. If the output of statprof says something like:

    
      56.7    0.59     Other functions (not in profiled module)
    
    then you can see the usage throughout all modules by re-running with -u statprof -p -c, where -c means "course" and will show the usage of all modules. If the usage was mostly in, say, "libXm.a(shr4.o)" then you can rerun again to analyze just that one with -u statprof -p "libXm.a(shr4.o)".

    20.25 How do I probe a C++ application's memory usage?

    Aprobe has predefined uals for watching memory use:
    • memcheck
    • memwatch
    • memstat
    Some read configuration files, though a ual will generate a default configuration file if it doesn't read a user file.

    The memcheck ual watches for things like spilling over the limit of a memory area. memcheck requires no configuration files and simply checks standard allocation and deallocation routines. It checks the validity of allocated data on normal program termination, memory signal, or explicit request via call to ap_Memcheck_DoCheckpoint.

    The memwatch ual can detect things like unfreed memory accumulating. It doesn't have a configuration file, but requires that the program terminate normally to dump its data. If the program doesn't terminate normally you can use dbx to force a snapshot.

    The memstat probe is used primarily with the RootCause GUI because it requires some configuration, but is much more usable with respect to overhead and analysis. For more details on this probe, see RootCause Memory Tracking Probes on the web site.

    20.26 How can I interactively debug an application in real time?

    Debugging a real-time application with dbx (or gdb) is usually tricky, because the debugger must attach to the process in real-time. Aside from the problem of hitting a moving target, hitting the target stops the process. With Aprobe, both problems are easily solved using a custom probe. The model for the probe is below, but an introduction to the concept is needed:

    The Aprobe solution is to write a probe which monitors for a reason to debug, and forks a copy of the real-time process when the monitor sees a need. The parent process then continues, while the copy stalls itself in the probe so dbx can attach to the copy. Here is the model, and a talk-through follows the model:

    
    #include <sys/types.h>
    #include <unistd.h>
    probe thread {
     probe "somewhere_where_there_can_be_a_problem" {
       on_line (where_there_can_be_a_problem) { // or on_entry or on_exit
         if (the elusive problem the user is watching for has occurred) {
    // here is the guts of the probe
           int normal = $some_reference; // save a normal state, explained below
           pid_t child;
           child = fork();
           if (child) fprintf(stderr,
              "Oops -- such-and-such happened -- gdb xxx %d\n", child);
           else while (++child) {
             if (child > 600) exit(1); // kill if unused in ten minutes
             if ($some_reference==normal) sleep(1); // stay in the probe
             else {$some_reference = normal; break;} // leave the probe
           }
         }
       }
     }
    }
    
    The 10-minute stall loop stops counting as soon as dbx attaches. If the user finishes digging and detaches dbx, loop counting would resume and the probe would kill the application copy if the user forgot to kill it. But if the user wants to set breakpoints and resume the application copy out of the stall to debug it, the method is to use dbx-set to change a chosen piece of static data and dbx-continue. The probe sees a state change, restores the saved state, and returns from the probe. This is the only way the throwaway child process would execute beyond the probe.

    Debugging the forked process over a breakpointed path goes beyond interactive data digging at the point of a problem, and may not be needed for every problem. If not, there is no need to chooses a static integer visible to dbx and the probe.

    This "living dump" concept is useful for distributed applications, because the parent application process is unaffected by this probe. The whole distributed operation should be unaffected. Yet the user would have an attachable copy of a troubled process that might have stalled itself while the cause of a problem was still visible. Digging for the problem can be leisurely, since it makes no difference if the parent process continues or ends.

    20.27 How do I get the size of my "std::list<std::string>" object generated by g++?

    Different compilers have different low-level implementations for these and it's best to just call the C++ size method if possible. This worked on our RH8 gcc 2.95.2 system:

    
    probe thread
    {
       probe extern:"::myroutine(void)"
       {
          on_entry
          {
             // The list is in a variable called my_list.
             // We need to call list.size ():
             log 
    ($("list<basic_string<char,string_char_traits<char>,__default_alloc_template<false,0> >,allocator<basic_string<char,string_char_traits<char>,__default_alloc_template<false,0> > > >::size(void)const") (&$my_list));
          }
       }
    }
    

    I found the routine's fully qualified name using apsymbols (or apcgen) and grepping for "size".

    20.28 What do I do if my program dumps core when run with Aprobe?

    For possible reasons for such crashes, see questions 13.13, 13.16 and 20.19. If you have a core file, keep reading.

    The first thing to check is whether any probes you have written are responsible for illegal memory references. These will cause core dumps just like any C or C++ program. If you have a machine-level debugger installed you can usually use it to get the a stack trace. On AIX and Solaris:
       dbx /full/path/of/your-application /find/the/core-file
    On Linux:
       gdb /full/path/of/your-application -c /find/the/core-file

    (That is, the first argument is the name of your executable, and the second is the path to the core file it dropped, which should be in the program's PWD.) Then enter the command where which will give the stack trace at the point of the core dump.

    (On Solaris, dbx is part of the Sun Workshop toolset and may not be installed on your target system if your applications run on a different host than they are compiled on. Similarly, on AIX you need to have the bos.adt.debug fileset installed.)

    If the stack trace includes a function name which looks like:
       OnExit_0094_L0013(...
    then the core dump probably occurred in one of your own probes. Look at the integer in the third part of the name: this is the line number of the 'probe' directive in the .apc file (in this case, 13). You may also see names beginning 'OnEntry' or 'OnOffset'.

    If dbx complains that the core file doesn't match the your application, you should run:

    On Solaris:    dbx $APROBE/bin/aprobe /find/the/core-file
    On AIX:    dbx $APROBE/bin/aprobe.exe /find/the/core-file
    On Linux:    gdb $APROBE/bin/aprobe -c /find/the/core-file

    Send the output of the where command to support@ocsystems.com and it should give us a clue. Remember to state what version of RootCause/Aprobe you are running (this is reported by 'apconfig' or 'aprobe -h | head')

    AIX only: slibclean to correct shared module problems:
    Lastly, run 'slibclean' and see if that fixes the problem. 'slibclean' is an AIX utility which removes unused shared modules from the system's memory. It require root access, but some sites elect to make this application 'setuid' so it can be run by ordinary users.

    Allowing full core files
    In the event dbx complains about a truncated core file, you should verify that your environment allows full core dumps. This entails two steps:

    1. Login as the same user that runs the application and run: ulimit -c
    2. If this does not report 'unlimited', then the account ulimit for core files needs to be set with: ulimit -c unlimited
    3. If this command returns an error, contact your sysadmin to adjust the account's 'hard core file limit'. If you are running your application from a login shell, you will need to logout and login again for the change to take effect.
    4. AIX only: check that the operating system allows full core dumps with:
         lsattr -E -l sys0 | grep fullcore
      This should report fullcore true. If not, the sysadmin needs to enable full core files through smitty System Environments->Change / Show Characteristics of Operating System->Enable full CORE dump.

    21. Licensing

    OC Systems spends a surprising amount of time helping users get licensing set up on their machines. Here are a few of the most common questions and answers.

    21.1 What do we do with a license key that looks like "ocs-Aprobe-48833..."?

    This is a decimal format key for use in the prompt that appears during installation. It is a single text string with no blanks or line breaks .

    • If you haven't yet installed RootCause/Aprobe, go ahead and install it and give this key at the prompt. When installation has completed, RootCause will be ready to use.
    • If you've already got an installation, append the exact single text line to the file $ APROBE/licenses/license.dat .

    21.2 What do we do with a license key that looks like "FEATURE ..."?

    This is human-readable format key, and can't be used at the prompt that appears during installation.

    • If you haven't yet installed RootCause/Aprobe, go ahead and install it, but give no key. Just confirm that it's ok to proceed without a key. Then proceed to the next step.
    • If you've already got an installation, append the exact text lines given in the mail message to the file $ APROBE/licenses/license.dat .

    21.3 Unix: How do I start a second license server just for Aprobe?

    When there is already a license server running on the machine and you want to start another one just for Aprobe, here's how to do it. This should applies to all Unix hosts.

    The procedure for running a second license server on the same hostis very simple.

    When you are issued a concurrent-use license for Aprobe, it will include a line like the following:

    SERVER my.server.name 000347b371fe

    You should amend this line by adding a third parameter to the SERVER directive, which will be the port the license server will listen on for client requests. The server and clients all read this same file. The default port for FlexLM is 27000 but any available port number can be specified. One convention is to use the next available port higher than 27000, for example:

    SERVER my.server.name 000347b371fe 27001

    This parameter is the only one needed to support multiple flex servers on the same host.

    21.4 AIX: How do I start lmgrd when the machine boots?

    These instructions apply to any services which need to be started at boot time on AIX, not just lmgrd .

    The details of these instructions may or may not be applicable to your own situation, depending on the exact configuration of your systems. You should consult your local policies and support organizations and convince yourself that the suggestions made here are appropriate before putting them into practice.

    That said, this is fairly basic stuff.

    We will use the ' mkitab ' command to add an entry to the /etc/inittab file. This command is used in place of simply editing inittab,because it helps to insure that the integrity of inittab is maintained. If you were to make even a small error while editing inittab, the system may become unbootable. The mkitab command helps to alleviate this risk.

    With root authority, execute the following command:

    mkitab -i rcnfs rclocal:2:once:/etc/rc.local

    This adds an entry to inittab immediately following the ' rcnfs ' entry, which instructs the init program to run / etc/rc.local and not wait for it to complete before proceeding with the rest of system initialization. Thus, you will probably be able in /etc/rc.local to take advantage of services which may only be available on NFS filesystems (again, depending on the exact configuration of the system we are installing on, which may not even mention rcnfs , in which case you would need to determine the correct point to add your local startup script).

    You should then create the /etc/rc.local file, set its execute permission, and add to it the appropriate commands to start lmgrd and log its output, as well as whatever other site-specific initializations you may need to perform, not limited to OCS products.

    Reboot the system and verify correct operation.