Difference between revisions of "Aprobe FAQ"

From OC Systems Wiki!
Jump to: navigation, search
(Created page with "= RootCause/Aprobe FAQ = Frequently Asked Questions for RootCause and Aprobe (All Platforms)<br /> Updated March 18, 2013 This document describes aspects of the products "Ro...")
 
m
Line 5: Line 5:
 
This document describes aspects of the products "RootCause" and "Aprobe" from OC Systems, Inc. (<code>[http://www.ocsystems.com/ www.ocsystems.com]</code>):
 
This document describes aspects of the products "RootCause" and "Aprobe" from OC Systems, Inc. (<code>[http://www.ocsystems.com/ www.ocsystems.com]</code>):
  
* [#RootCause_FAQ RootCause FAQ]
+
* [#RootCause_FAQ|RootCause FAQ]
* [#Aprobe_FAQ Aprobe FAQ]
+
* [#Aprobe_FAQ|Aprobe FAQ]
  
 
It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.
 
It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.
Line 32: Line 32:
 
<div>
 
<div>
  
=== [#rootcause  1. RootCause FAQ] ===
 
 
<br />[#q1.1  1.1 What is RootCause?]<br />[#q1.2  1.2 What are some potential uses of RootCause?]<br />[#q1.3  1.3 How do I get started quickly with RootCause?]<br />[#q1.4  1.4 Who can use RootCause?]<br />[#q1.5  1.5 For which platforms is RootCause available?]<br />[#q1.6  1.6 How do I get technical support?]<br />[#q1.7  1.7 Do I really need a C compiler to use RootCause?]<br />[#q1.8  1.8 What documentation is available for RootCause?]<br />[#q1.9  1.9 How is RootCause licensed?]<br />[#q1.10  1.10 In what language(s) can my program be written?]<br />[#q1.11  1.11 What compiler(s) must have been used to compile my program?]<br />[#q1.12  1.12 Do I need to build the program with debug to trace it?]<br />[#q1.13  1.13 What do these terms mean: probes, console, agent, logging, etc.?]<br />[#q1.14  1.14 Is there any way to attach with RootCause to a running application?]<br />[#q1.15  1.15 Why should I update to the current version of RootCause?]<br />[#q1.16  1.16 What Java (JVM/JRE) versions are supported for use with RootCause?]</div><div>
 
 
=== [#installation  2. Installation] ===
 
 
<br />[#q2.1  2.1 Why does install_rootcause offer to install in a directory called "aprobe"?]<br />[#q2.2  2.2 When the installation prompts for a compiler, does it want the one that builds my application?]<br />[#q2.3  2.3 The installation process prompts me for a license key, but I don't have one right now; can I continue?]<br />[#q2.4  2.4 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?]</div><div>
 
 
=== [#console  3. The RootCause Console (GUI)] ===
 
 
<br />[#q3.1  3.1 Why does the command <code>rootcause open</code> fail with Java errors?]<br />[#q3.2  3.2 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?]<br />[#q3.3  3.3 Can I just use my Web Browser instead of the built-in Help Viewer?]<br />[#q3.4  3.4 Can I run the RootCause GUI on Windows to view data collected my Unix system?]<br />[#q3.5  3.5 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?]</div><div>
 
 
=== [#log  4. The RootCause Log] ===
 
 
<br />[#q4.1  4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?]<br />[#q4.2  4.2 Why do I see two identical copies of a program in the RootCause Log?]<br />[#q4.3  4.3 Why don't I see the program I want to trace listed in the RootCause log?]<br />[#q4.4  4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?]<br />[#q4.5  4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?]<br />[#q4.6  4.6 How do I clear the RootCause log?]<br />[#q4.7  4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?]<br />[#q4.8  4.8 Can I locate my .rootcause directory somewhere other than $HOME?]<br />[#q4.9  4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?]</div><div>
 
 
=== [#workspace  5. The Workspace Window] ===
 
 
<br />[#q5.1  5.1 Should I say Yes or No to "Application is not registered with workspace" dialog?]<br />[#q5.2  5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?]<br />[#q5.3  5.3 Where do I find out about the Predefined UALs listed here?]</div><div>
 
 
=== [#tracesetup  6. The Trace Setup Dialog] ===
 
 
<br />[#q6.1  6.1 What does &lt;Unknown File&gt; mean in the Trace Setup tree?]<br />[#q6.2  6.2 What do the black and blue dots mean in the Trace Setup tree?]<br />[#q6.3  6.3 How do I trace a dynamically loaded shared library (DLL)?]<br />[#q6.4  6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?]<br />[#q6.5  6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?]<br />[#q6.6  6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.]<br />[#q6.7  6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?]<br />[#q6.8  6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?]<br />[#q6.9  6.9 How can I turn on trace just when I'm in a chosen method or function?]<br />[#q6.10  6.10 How can I enable my custom probe only when Trace is also enabled?]<br />[#q6.11  6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?]<br />[#q6.12  6.12 How can I trace and time everything between point A and point B?]<br />[#q6.13  6.13 How can I allow all Java parameters to be traced?]</div><div>
 
 
=== [#tracedisplay  7. The Trace Display (Event) Dialog] ===
 
 
<br />[#q7.1  7.1 Why are some functions found in the traced Events not found in the Trace Setup?]<br />[#q7.2  7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?]<br />[#q7.3  7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?]<br />[#q7.4  7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?]<br />[#q7.5  7.5 RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?]<br />[#q7.6  7.6 When I trace a Java synchronized method, does the method time include lock delay time?]<br />[#q7.7  7.7 Why was <code>malloc()</code> listed as being LOAD_SHED in the Trace Display when it really wasn't?]<br />[#q7.8  7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?]<br />[#q7.9  7.9 I see that I can do "Save As XML": can I view this XML later?]<br />[#q7.10  7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?]<br />[#q7.11  7.11 Do the times shown in Trace Events reflect the aprobe overhead?]<br />[#q7.12  7.12 How do I know what overhead to specify in the ''Set Statistics Overhead'' dialog?]<br />[#q7.13  7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?]<br />[#q7.14  7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?]<br />[#q7.15  7.15 Is there a way to save the text for a specific node in the Trace Events tree?]<br />[#q7.16  7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere? ]<br />[#q7.17  7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?]</div><div>
 
 
=== [#rootcauseandaprobe  8. RootCause and Aprobe] ===
 
 
<br />[#q8.1  8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?]<br />[#q8.2  8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?]<br />[#q8.3  8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?]<br />[#q8.4  8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?]<br />[#q8.5  8.5 How do I add my own UAL to the RootCause trace?]<br />[#q8.6  8.6 How can I use the Events probe with RootCause?]</div><div>
 
 
=== [#runtime  9. RootCause at Run Time] ===
 
 
<br />[#q9.1  9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer? I was thinking that it would be interesting to see all the processes as my computer boots.]<br />[#q9.2  9.2 How much will RootCause slow my application?]<br />[#q9.3  9.3 How can I trace Linux daemons with RootCause?]<br />[#q9.4  9.4 How do I apply RootCause to applications run at boot time?]<br />[#q9.5  9.5 Can I apply different workspaces (or none at all) for the same program invoked with different command-line parameters?]<br />[#q9.6  9.6 How can I "intercept" a Java server on AIX?]<br />[#q9.7  9.7 How can I dump Java objects with a probe on a known program point, rather than at a certain elapsed time as done by java_memstat?]</div><div>
 
 
=== [#j2ee  10. RootCause J2EE Support] ===
 
 
RootCause J2EE support has been discontinued with the introduction of OC Systems "RTI Enterprise" product. See [http://rtiperformance.com rtiperformance.com]. <br />
 
 
</div><div>
 
 
=== [#troubleshooting  11. RootCause TroubleShooting] ===
 
 
<br />[#q11.1  11.1 I applied a Trace on function (method) in the RootCause GUI, but I don't see it being called in the output. Why?]<br />[#q11.2  11.2 I add a library as a dynamic module and trace the init function, but the trace doesn't show up. Why?]<br />[#q11.3  11.3 I Add Dynamic Module of mylib.so, then specify some traces in mylib.so. But when I run the program, those traces don't appear. Why?]<br />[#q11.4  11.4 I did Custom..., and saved my probes to an APC file, but those probes don't show up in my trace. Why?]<br />[#q11.5  11.5 How do I ''stop'' tracing something I've got a workspace for?]<br />[#q11.6  11.6 What do I do about the message "(E) ADI checksum (0x84b1c4d) does not match module checksum (0xa1c5e35)." when I register on a .dply file at a remote site?]<br />[#q11.7  11.7 Why does my Java app fail with "Class Not Found" under RootCause, but work fine without RootCause?]<br />[#q11.8  11.8 How can I probe Java classes loaded with a custom class loader and so not in the CLASSPATH?]<br />[#q11.9  11.9 When I have "rootcause on" I sometimes notice that commands piped together (for instance "env|grep MyVariable") can hang for a while before completing. Why is this?]<br />[#q11.10  11.10 When I add my library to the workspace with Add Dynamic Module and run with RootCause, my application never starts. What's wrong and how can I fix it?]<br />[#q11.11  11.11 Is there a way to add my own files to a deploy file so they will unpack into the directory created by <code>rootcause register xxx.dply</code>?]<br />[#q11.12  11.12 Why doesn't the pi_demo program doesn't run on my new Linux version?]<br />[#q11.13  11.13 Why didn't my trace on Linux didn't log any data?]</div>
 
 
<div>
 
 
=== [#aprobe  12. Aprobe FAQ] ===
 
 
<br />[#q12.1  12.1 What is Aprobe?]<br />[#q12.2  12.2 What is ProbePak?]<br />[#q12.3  12.3 What are some potential uses of Aprobe?]<br />[#q12.4  12.4 How do I get started quickly with Aprobe?]<br />[#q12.5  12.5 Who can use Aprobe?]<br />[#q12.6  12.6 What different versions of Aprobe are there?]<br />[#q12.7  12.7 For which platforms is Aprobe available?]<br />[#q12.8  12.8 How do I get Aprobe?]<br />[#q12.9  12.9 What documentation is available for Aprobe?]<br />[#q12.10  12.10 What tools make up Aprobe?]<br />[#q12.11  12.11 How is Aprobe licensed?]<br />[#q12.12  12.12 Is there a point-and-click (GUI) interface to Aprobe?]<br />[#q12.13  12.13 Can I run Aprobe on any executable program file?]<br />[#q12.14  12.14 In what language(s) can my program be written?]<br />[#q12.15  12.15 What compiler(s) must have been used to compile my program?]<br />[#q12.16  12.16 (Unix) How do I tell if a program file is "stripped"?]<br />[#q12.17  12.17 How do I tell what symbols a program has available?]<br />[#q12.18  12.18 What do I do to get symbols in my program?]<br />[#q12.19  12.19 What do I do to get "debug information" in my program?]<br />[#q12.20  12.20 How do I tell if a program file has "debug information"?]<br />[#q12.21  12.21 What is a "probe"?]<br />[#q12.22  12.22 What is a "UAL" (.ual file)?]<br />[#q12.23  12.23 What is "logging"?]<br />[#q12.24  12.24 What is an ".apd" file?]<br />[#q12.25  12.25 What can't I do if my executable or library doesn't have debug information?]<br />[#q12.26  12.26 Does use of on_line() requires application to be have debug information?]<br />[#q12.27  12.27 What is the maximum number of probes allowed?]<br />[#q12.28  12.28 Is there access to C private/protected variables?]<br />[#q12.29  12.29 Is there any way to attach with Aprobe to a running application?]<br />[#q12.30  12.30 Is there a way to probe a function for which no symbol is available?]</div><div>
 
 
=== [#aprobecommand  13. Using the "aprobe" Command] ===
 
 
<br />[#q13.1  13.1 What does "aprobe" do?]<br />[#q13.2  13.2 How do I specify options to my program when using aprobe?]<br />[#q13.3  13.3 How do I specify options to my probes?]<br />[#q13.4  13.4 How do I print my output at run time instead of sending to the APD file?]<br />[#q13.5  13.5 Can I suppress generating an ".apd" file?]<br />[#q13.6  13.6 How can I run my probes without invoking aprobe?]<br />[#q13.7  13.7 How do I probe a function in a dynamically-loaded shared library?]<br />[#q13.8  13.8 Can I probe a function in native C or C code loaded by a Java application?]<br />[#q13.9  13.9 Is there a way I can use Aprobe in a target environment where my application has no symbol or debug information with it (is stripped)?]<br />[#q13.10  13.10 Can I run aprobe but produce no APD files?]<br />[#q13.11  13.11 Why does my program crash when using aprobe, and not without?]<br />[#q13.12  13.12 AIX: Aprobe version 3.2 had the -s1 option to prevent conflicts with my application's shared memory. Is there a similar feature in version 4.2?]<br />[#q13.13  13.13 Why does Aprobe ask for such a large memory-mapped file on startup, when I've specified only a 4M APD file with "-s"?]<br />[#q13.14  13.14 When I run my application under Aprobe it crashes during initialization with a problem in malloc. This doesn't happen without Aprobe. Why?]</div><div>
 
 
=== [#apformat  14. Using the "apformat" Command] ===
 
 
<br />[#q14.1  14.1 What does apformat do?]<br />[#q14.2  14.2 Which of the ".apd" files do I specify on the command-line?]<br />[#q14.3  14.3 Can I restrict the apformat output to just that generated by one of the several UALs provided at aprobe time?]<br />[#q14.4  14.4 Can I restrict the apformat output to just that generated by one or two of my format routines?]<br />[#q14.5  14.5 Can I programmatically filter which formats are used?]<br />[#q14.6  14.6 Can I do the previous 2 if I'm using automatically generated formats?]<br />[#q14.7  14.7 When do I need to specify the UAL file to apformat?]<br />[#q14.8  14.8 Can I use "apformat" without an APD file?]<br />[#q14.9  14.9 Aprobe works fine, but I get a crash from apformat; why?]<br />[#q14.10  14.10 Can can I use ap_UalArgv in "probe format ... on_entry" to get arguments passed at run-time (aprobe time)?]</div><div>
 
 
=== [#predefined  15. Using Predefined Probes] ===
 
 
<br />[#q15.1  15.1 What is a predefined probe?]<br />[#q15.2  15.2 Do I have to use "apc" to build these probes myself?]<br />[#q15.3  15.3 The examples show invocation of predefined probes using aprobe -u info myprog.exe. How does aprobe find these UALs when they're not in the current directory?]<br />[#q15.4  15.4 Can I use Coverage without using the Java configuration GUI?]<br />[#q15.5  15.5 The trace probe really slows down the program--how can I speed it up?]<br />[#q15.6  15.6 How can I get a snapshot of my predefined probe data before my program dumps core?]<br />[#q15.7  15.7 Is there a way to invoke predefined probe operations from within my probes?]<br />[#q15.8  15.8 How can my probes use the Java GUI facilities that the predefined probes use?]<br />[#q15.9  15.9 I'd like to customize a predefined probe -- how do I rebuild it?]<br />[#q15.10  15.10 How do I use the coverage probe with multiple test cases?]<br />[#q15.11  15.11 Where did the "heap" probe go?]<br />[#q15.12  15.12 How do I use this "events" probe everyone's talking about?]<br />[#q15.13  15.13 In the `profile' probe, what do "Calls to Self/Child" columns mean?]<br />[#q15.14  15.14 Why don't memstat, memwatch, heap probes work on my application?]<br />[#q15.15  15.15 Can you please explain the fields "Alloc Count" and "Free Count" in the memstat "Outstanding Allocation" report?]<br />[#q15.16  15.16 Can I use memstat to track ''all'' allocations and frees?]<br />[#q15.17  15.17 Is there a way to only report allocations in a certain module based on the stack traceback entries?]<br />[#q15.18  15.18 Is there a predefined probe for detecting memory corruption?]<br />[#q15.19  15.19 Is there a predefined probe for tracking down lock contentions?]<br />[#q15.20  15.20 What options in the trace.cfg file are obsolete, and why?]<br />[#q15.21  15.21 Why does the memstat summary file say it can't do the analysis because I only have one sample?]<br />[#q15.22  15.22 How do I force a snapshot from a predefined probe?]<br />[#q15.23  15.23 Could you explain the memstat summary's "Leaked Memory" and "Total Leakage" values?]<br />[#q15.24  15.24 How can I define a memstat (or memwatch) filter matching any number of call levels?]<br />[#q15.25  15.25 Is there a probe to check for stack corruption?]</div><div>
 
 
=== [#apc  16. Using the "apc" Command] ===
 
 
<br />[#q16.1  16.1 What does apc do?]<br />[#q16.2  16.2 How do I indicate what C compiler and options apc should use?]<br />[#q16.3  16.3 Do I need to specify an object file or executable to apc?]<br />[#q16.4  16.4 How do I specify other object files to link into my UAL?]<br />[#q16.5  16.5 apc says my function name's not known--why not?]<br />[#q16.6  16.6 How do I generate debug information for my APC files so line and function information show up in tracebacks?]<br />[#q16.7  16.7 Can I specify an environment variable for the compiler path in the compiler_profiles file?]<br />[#q16.8  16.8 How do I compile a probe for a 32-bit app when running 64-bit Linux?]</div><div>
 
 
=== [#writing  17. Writing Probes in APC] ===
 
 
<br />[#q17.1  17.1 How do I use "apcgen" to generate a probe automatically?]<br />[#q17.2  17.2 How do I write a "probe"?]<br />[#q17.3  17.3 What is the difference between APC and straight C?]<br />[#q17.4  17.4 Why do I need a "probe thread"?]<br />[#q17.5  17.5 What's the difference between "probe thread" and "probe program"?]<br />[#q17.6  17.6 When exactly are the "on_entry" and "on_exit" parts of a function probe executed?]<br />[#q17.7  17.7 Why can't I dump some parameters in the on_exit part?]<br />[#q17.8  17.8 Why is my local variable "unknown" in on_entry and on_exit parts?]<br />[#q17.9  17.9 Is there a way to probe "the first line" or "the last line" in my function?]<br />[#q17.10  17.10 How do I specify which of several overloaded functions I want to probe?]<br />[#q17.11  17.11 How do I reference a hardware register?]<br />[#q17.12  17.12 How do I query the parameters to a function?]<br />[#q17.13  17.13 Can I use automatic formatting if I don't have an executable with debug information?]<br />[#q17.14  17.14 How do I change the return value from a function?]<br />[#q17.15  17.15 How do I log the value of a string parameter?]<br />[#q17.16  17.16 How do I log the contents of an array?]<br />[#q17.17  17.17 How do I "stub out" the probed function so it does nothing?]<br />[#q17.18  17.18 How do I query the data in a class from when probing a member function?]<br />[#q17.19  17.19 How do I query a global (or static) variable when there's a local one of the same name?]<br />[#q17.20  17.20 Can I reference a static variable that wouldn't normally be visible to my probed function?]<br />[#q17.21  17.21 Can I call a function in my program from within a probe?]<br />[#q17.22  17.22 Can my APC files reference names in one another like a C program?]<br />[#q17.23  17.23 Can I call a function in another UAL?]<br />[#q17.23  17.23 How do I change the return code from my Unix program?]<br />[#q17.24  17.24 How do I print or change a GNAT Ada string value in my probe?]<br />[#q17.25  17.25 How can I just log some data and format it as hex?]<br />[#q17.26  17.26 How do I log information about each thread as it starts?]<br />[#q17.27  17.27 GNAT turns SIGSEGV into CONSTRAINT_ERROR; can I use Aprobe to get a core dump?]<br />[#q17.28  17.28 How can get I get Aprobe actions to happen when my program dumps core?]<br />[#q17.29  17.29 Is there a way to find out where a signal occurs when it doesn't cause a core dump?]<br />[#q17.30  17.30 How can I reduce the overhead of my probes?]<br />[#q17.31  17.31 Can I use Aprobe on JOVIAL or Fortran programs?]<br />[#q17.32  17.32 How can a log a composite object without using debug information?]<br />[#q17.33  17.33 How can I cast a value to a type name from the program?]<br />[#q17.34  17.34 Is there a special editor or editor mode for APC?]<br />[#q17.35  17.35 How do I execute a probe only if a certain data condition is met?]<br />[#q17.36  17.36 How can I interactively modify the parameters to a routine in my application?]<br />[#q17.37  17.37 I'm trying to stub a function called by my program, but APC can't seem to find it.]<br />[#q17.39  17.39 I only want to probe malloc() if it's called by realloc(). How would I do that?]<br />[#q17.40  17.40 I have a GNAT Ada procedure that I'm stubbing out, but want to return a string value. The procedure has a declaration similar to the one below. What's the APC?]<br />[#q17.41  17.41 Is there a simple probe that just traces the lines in one routine?]<br />[#q17.42  17.42 How do I reference enumeration literals in APC?]<br />[#q17.43  17.43 Why does including &lt;math.h&gt; in my APC keep it from compiling? (I want to call the "pow()" function in my probe.)]<br />[#q17.44  17.44 How do I query an environment variable from with a probe?]<br />[#q17.45  17.45 The above looks like a useful utility. How can I structure my probes so it can be shared?]<br />[#q17.46  17.46 Can I define functions in one APC file and call them from another APC file?]<br />[#q17.47  17.47 I am trying to write an aprobe that will call an Ada routine in a package body, but the routine never seems to get called.Why?]<br />[#q17.48  17.48 How can I log a string passed to a library function like strdup() where there's no debug information?]<br />[#q17.49  17.49 Can I use Aprobe to change the command run by a call to system() from my application to run my own little script instead?]<br />[#q17.50  17.50 Is there a way to catch and suppress exceptions?]<br />[#q17.51  17.51 Can I track stack usage with Aprobe?]<br />[#q17.52  17.52 Is there a way to access local variables that doesn't depend on a hard-coded line number?]<br />[#q17.53  17.53 Can I use Aprobe to query a caller's local data that wouldn't be visible by normal visibility rules?]<br />[#q17.54  17.54 In APC I can reference some class members as fields of class objects, but others I cannot. Why?]<br />[#q17.55  17.55 How can I enable and disable probes externally while my program runs?]<br />[#q17.56  17.56 AIX: How do I convert my pre-version-3 APC file to current one?]<br />[#q17.57  17.57 (Unix) Is there a probe to see when my application "exec's" another program?]<br />[#q17.58  17.58 How can I cast an enumeration value to print its numeric value?]<br />[#q17.59  17.59 How can I detect memory overwrites on dynamically allocated (malloc'd) memory?]<br />[#q17.60  17.60 How do I know when my application has forked?]<br />[#q17.61  17.61 How do I know what lines I can probe in a function?]<br />[#q17.62  17.62 Is there a routine available to find symbol ids by mangled name, or one that will demangle for us?]<br />[#q17.63  17.63 Is there a way to suppress (or force) the warning when probing a symbol that is undefined?]<br />[#q17.64  17.64 Can I call a C method from a probe?]<br />[#q17.65  17.65 How do I print/change a C std::string object?]
 
 
</div><div>
 
 
=== [#java  18. Writing Java Probes] ===
 
 
<br />[#q18.1  18.1 How do I use Aprobe on a Java application?]<br />[#q18.2  18.2 Can I change the return value of a Java function?]<br />[#q18.3  18.3 Can I throw an arbitrary Java exception from my probe?]<br />[#q18.4  18.4 When using a Java custom probe, can I get output to appear in the Trace Display tree?]<br />[#q18.5  18.5 Is it possible to "stub" a Java method so it does not execute the code in the original method?]<br />[#q18.6  18.6 Is there any way to probe classes from rt.jar, e.g., java.io.*?]<br />[#q18.7  18.7 How do I call another method in the same class instance from within my Java method probe?]<br />[#q18.8  18.8 Can I add custom Java probes within the RootCause GUI?]<br />[#q18.9  18.9 Can I change the value of parameters passed to a Java method?]<br />[#q18.10  18.10 Can I log any Java variables other than method parameters?]<br />[#q18.11  18.11 Is there a way to define nested probes in Java similar to that supported in APC?]</div><div>
 
 
=== [#logging  19. Logging Data] ===
 
 
<br />[#q19.1  19.1 What's the difference between "logging" and "printing"?]<br />[#q19.2  19.2 Why do I get data mismatch warnings logging to my very simple format routine?]<br />[#q19.3  19.3 Why do my format routine parameters (usually) have to be pointers to the type logged?]<br />[#q19.4  19.4 How can I control the size of the APD file produced?]<br />[#q19.5  19.5 What is an "APD ring"?]<br />[#q19.6  19.6 How can I control what goes into each APD file?]<br />[#q19.7  19.7 How can I reduce the time that is spent logging data in my probes?]<br />[#q19.8  19.8 How can I log data so it's guaranteed to be available when I format, even if the APD ring wraps around?]</div><div>
 
 
=== [#other_aprobe_questions  20. Other Aprobe Questions] ===
 
 
<br />[#q20.1  20.1 Where does aprobe get its "time" from (e.g., for the profile probe)?]<br />[#q20.2  20.2 Why do my threads execute in different order under aprobe?]<br />[#q20.3  20.3 It looks like if I run "aprobe -if", both the probe program and probe format get executed, which messes up initialization. How can I avoid this?]<br />[#q20.4  20.4 I have a probe on_exit to a function to change the struct that is returned. It causes a core-dump when the probed function called as a procedure. What's the problem?]<br />[#q20.5  20.5 I want to capture the address of a target expression on entry in a pointer to the right target type. How do I declare this?]<br />[#q20.6  20.6 I want to probe a method in a template class. How do I refer to the method in the function probe on that method?]<br />[#q20.7  20.7 In what order do separate probes on the same function probes execute?]<br />[#q20.8  20.8 Is it possible to reference C files from my application from within my UAL.]<br />[#q20.9  20.9 Can I force a snapshot of my predefined probe data by sending a signal to my application?]<br />[#q20.10  20.10 How do I log multi-dimensional Ada arrays?]<br />[#q20.11  20.11 AIX: Why isn't my ual world readable?]<br />[#q20.12  20.12 AIX: When I use pthreads calls in my probes, the UAL won't link. Do I need to explicitly specify the library or change my compiler_profiles file?]<br />[#q20.13  20.13 Is there a way I can manage thread-specific data without using native thread-management routines?]<br />[#q20.14  20.14 How does using Aprobe for C differ from using Aprobe for C or Ada?]<br />[#q20.15  20.15 Why does my C application crash when run with Aprobe?]<br />[#q20.16  20.16 (AIX) My application aprobe or its tools runs out of memory. What can I do?]<br />[#q20.17  20.17 My application aprobe or its tools is ''very'' slow starting up. What can I do?]<br />[#q20.18  20.18 (AIX) Why is the C exception raised in my libxml -1.0.a library not reported by exceptions.ual?]<br />[#q20.19  20.19 Why don't my on_line probes work?]<br />[#q20.20  20.20 How do I probe a C application's CPU usage?]<br />[#q20.21  20.21 How do I probe a C application's memory usage?]<br />[#q20.22  20.22 How can I interactively debug an application in real time?]<br />[#q20.23  20.23 How do I get the size of my "std::list&lt;std::string&gt;" object generated by g ?]<br />[#q20.24  20.24 What do I do if my program dumps core when run with Aprobe?]</div><div>
 
 
=== [#licensing  21. Licensing] ===
 
 
<br />[#q21.1  21.1 What do we do with a license key that looks like "ocs-Aprobe-48833..."?]<br />[#q21.2  21.2 What do we do with a license key that looks like "FEATURE ..."?]<br />[#q21.3  21.3 How do I start a second license server just for Aprobe?]<br />[#q21.4  21.4 AIX: How do I start lmgrd when the machine boots?]</div>
 
----
 
<div>
 
  
 
== 1.  RootCause FAQ ==
 
== 1.  RootCause FAQ ==

Revision as of 17:41, 5 May 2017

Contents

RootCause/Aprobe FAQ

Frequently Asked Questions for RootCause and Aprobe (All Platforms)
Updated March 18, 2013

This document describes aspects of the products "RootCause" and "Aprobe" from OC Systems, Inc. (www.ocsystems.com):

  • [#RootCause_FAQ|RootCause FAQ]
  • [#Aprobe_FAQ|Aprobe FAQ]

It consists of questions asked by evaluators and customers, as well as "artificial" questions intended to provide an introduction to the use of the products.

More complete and detailed descriptions of RootCause and Aprobe are provided by the User's Guides for those products, but this FAQ may provide answers not easily found there, and also includes specific code examples not applicable to a general User's Guide.

RootCause is built on Aprobe, a fully general mechanism for applying patches to programs without changing source or object code. See [#q12.1 "What is Aprobe?"] for more information.

Users are encouraged to send questions (and answers!) to .

This FAQ is Copyright (c) 2013 by OC Systems, Inc. ALL RIGHTS RESERVED.

Note to Windows and Solaris Users:

The last updates to RootCause/Aprobe for the Windows and Solaris platforms were version 2.1.4b/4.3.4b in mid-2006. Support for these platforms was officially dropped in 2011. A recent update of this FAQ has removed all questions and answers that are specific to those platforms. If by some unlucky chance you're still using them, here is the [rc_aprobe_faq-2007.html old version of the FAQ].

Note to 64-bit RootCause/Aprobe Users:

Whereever you read APROBE in the questions and answers below, replace APROBE64. Different file names and environment variables must be used to allow both 32- and 64-bit versions to co-exist.


1. RootCause FAQ

1.1 What is RootCause?

RootCause is a tool for developing and deploying traces that act as a software "flight recorder", simplifying and speeding root cause analysis, as well as proactively monitoring the health and performance of the application. It can also be used to repair applications in the operational environment without rebuilding or reinstalling the software. RootCause is based on Aprobe (see [#q12.1 "What is Aprobe?"] ) but steps beyond Aprobe in a number of important ways:

  • RootCause provides a GUI "Console" which supports the development of traces and other actions for data collection and modification and viewing of the resulting data.
  • RootCause provides a mechanism for identifying all processes started, and applying Aprobe to designated process as they are run in their "natural" environment.
  • RootCause provides a mechanism for packaging and "deploying" a set of actions to a remote machine, and collecting the resulting data for offline viewing.
  • RootCause does all of the above for Java as well as C/C , and supports mixed applications seamlessly.

This FAQ addresses questions that apply to these aspects of RootCause. The full power of Aprobe is delivered with RootCause, and is addressed by the Aprobe FAQ.

See also [#q12.1 "What is Aprobe?"] .

1.2 What are some potential uses of RootCause?

It's a long list. Here are just some of the uses of RootCause:

  • Performing root cause analysis after an application failure.
  • Identifying the cause of an application's incorrect operation.
  • Resolving performance bottlenecks.
  • Monitoring the ongoing health of an application and alerting engineers to problems before significant deterioration in performance occurs.
  • Repairing an application in the operational or test environment quickly, without having to rebuild, recompile, or reinstall the application.
  • Obtaining information about how beta users are testing an application; finding out what features are used and how they are accessed.
  • Integrating software applications.
  • Identifying the specific application or component which is causing a problem.
  • Tracking down memory usage problems.
  • Replacing or enhancing problem reports with execution details and dumps.
  • Monitoring compliance with an SLA.
  • Obtaining information about an application's execution when it isn't possible to replicate the user's environment.

For a more in-depth discussion of some of these, see the RootCause white papers .

1.3 How do I get started quickly with RootCause?

Do the Demos in chapter 5 of the User's Guide.

1.4 Who can use RootCause?

RootCause has several facets which apply to different classes of users. Technical support personnel will use it to gather information about a product in the field. Developers will use RootCause to develop traces that the support personnel can use, or which the developers themselves may use to track down problems. Testers might use it to gather data to provide back to developers to supplement test results.

1.5 For which platforms is RootCause available?

RootCause is available for 32-bit and (separately) for 64-bit executables on AIX and Linux (x86) platforms. (There is no longer a distinction between the Java and C/C versions.)

The detailed requirements are documented on the System requirements page.

1.6 How do I get technical support?

The best way is to send e-mail to , or phone 703-359-8160, extension 3. You can expect a quick response between 9am and 5pm Eastern US Time.

1.7 Do I really need a C compiler to use RootCause?

Only if you want to apply probes to native code. You can trace Java and native code, and dump Java parameters, without a C compiler. However, the only thing you can do with native code is trace it; you can't dump parameters or variables or generate probes (e.g., SNAPSHOT or COMMENT) because those are implemented by generating APC source code and then compiling it with Aprobe's apc compiler, which requires a C compiler backend.

1.8 What documentation is available for RootCause?

The on-line user's guide is available here.

RootCause is delivered with a User's Guide in HTML and PDF formats.

1.9 How is RootCause licensed?

The RootCause Console is licensed per-concurrent-developer. RootCause Agent (run-time) licenses may also be purchased to allow deploying probes outside the development environment. Licensing is enforced on a per-user basis or per-CPU basis with FlexLM. Contact our sales department for more information at .

If you already have a license but it's not working for you, see [#licensing "Licensing"] or [#q1.6 "How do I get technical support?"]

1.10 In what language(s) can my program be written?

Explicit support is provided for Java, C, C and Ada. Functions written in Assembler will work to the extent that they adhere to standard calling conventions.

Functions written in other high-level compiled languages (e.g., Fortran JOVIAL) may also be probed if the probe doesn't reference source-level identifiers ("target expressions").

1.11 What compiler(s) must have been used to compile my native application program?

Almost any program with symbols can be probed. For "full support" (for referencing source lines and variable names and handling exceptions) you must use one of the compilers listed for each platform on the system requirements page. Here's a summary:

AIX

  • RootCause/Aprobe supports any IBM C or C compiler that runs on AIX 5.2 or newer
  • gcc/g support is no longer supported but there is partial support for gcc and g versions 2.95.x, and for gcc versions 3.x compiled with -gstabs .
  • if your program is Ada, OC Systems' PowerAda, and (starting with version 4.4.2) GNATPro 5.04 are supported.

Linux

  • RootCause/Aprobe supports Linux x86 gcc and g at whatever version is shipped with generally available Red Hat Enterprise Linux and Gnat Ada releases.

1.12 Do I need to build the program with debug to trace it?

No, but for non-Java programs it helps. The suggested compromise is to build it with debug, develop your traces, then strip the debug information when shipping the product. This is fully discussed in Chapter 6 of RootCause for C User's Guide, "Building a Traceable Application".

1.13 What do these terms mean: probes, console, agent, logging, etc.?

RootCause has many unique features which require a unique terminology to describe. See the glossary in Chapter 3 of the user's guide for their definitions. Some basics are:

agent

The part of the RootCause product which actually applies and enables the probes, also known as the Aprobe runtime.

console

The Graphical User Interface (GUI) used for developing probes , and viewing the data logged by them.

log

verb : to efficiently record data into a memory-mapped file for later viewing.
noun : the RootCause log, a list of all programs run with "rootcause on".

probes

Programmatic actions to be inserted and executed at specific points in the probed application.

1.14 Is there any way to attach with RootCause to a running application?

No. See [#q12.29 Q12.29].

1.15 Why should I update to the current version of RootCause?

Full details are in the README file delivered with each version, available from the download page.

1.16 What Java (JVM/JRE) versions are supported for use with RootCause?

  • On AIX IBM JVM version 1.5 and 1.6 have been verified and are supported.
  • On Linux RootCause has been tested with Oracle (Sun) Java 5 and OpenJDK. RootCause does not work with gcj.

We have provided support for older versions of Java for specific customers: please contact us if you have a specific need.

2. Installation

2.1 Why does install_rootcause offer to install in a directory called "aprobe"?

RootCause is a superset of Aprobe, and in fact shares the same installation script. You can choose a different name if you like.

2.2 When the Linux installation prompts for a compiler, does it want the one that builds my application?

No. RootCause for C/C , like Aprobe, requires a C compiler to build the probes. This is not provided with Linux RootCause because it's assumed customers have gcc installed. If you don't, OC Systems can help you download and install it.

2.3 The installation process prompts me for a license key, but I don't have one right now; can I continue?

Yes. Just enter an empty string, ignore the warnings you may get, and then put the license key into the file license.dat in the licenses directory under the RootCause installation directory before you start using RootCause. See also [#licensing "Licensing"].

2.4 The installation prompts me for a single-line license key, but the one I have consists of several lines; do I just paste it in there?

No. Leave it blank as in [#q2.3 Q2.3] , and see [#q21.2 Q21.2] .

3. The RootCause Console (GUI)

3.1 Why does the command rootcause open fail with Java errors?

There could be a number of reasons. On AIX, RootCause does not include its own Java Runtime Environment (JRE), so if it's not found in your PATH or expected default locations, or if the Java found there has problems, you'll get errors. While Linux RootCause does include a Java 1.4 JRE, it may again be that it doesn't run right on your system for some reason.

In either case, the workaround is:
   export APROBE_JRE=`which java`
That is set the global environment variable APROBE_JRE to the full path to the java command you want to use. This must be a Java 1.4 or newer JRE, for example, /usr/bin/java or /opt/jdk1.5.0_06/jre/bin/java.

3.2 How can I see the whole context menu when I click the right mouse button (MB3) on something at the bottom of the screen?

Just right-click farther up on the screen so there's room for the whole menu. The Java popup menu behavior is separate from the selection of the item on which it works. So once you've selected an item with a left -click (MB1), you can right-click anywhere in the window to see the context menu for that selected item.

3.3 Can I just use my Web Browser instead of the built-in Help Viewer?

Yes, you can point your browser (Firefox, Mozilla, Internet Explorer, etc.) to $APROBE/html/rcguihelp.html (where $APROBE is the value of the APROBE environment variable, the root of your RootCause installation.) However, the Help operations won't update that automatically -- you'll have to use your browser's Find operation.

However, note that Chapter 8 of the RootCause User's Guide is pretty much identical to the On-line help, and is cross-referenced with the rest of the user's guide (see [#q1.8 Q1.8] ).

3.4 Can I run the RootCause GUI on Windows to view data collected my Unix system?

No. The RootCause Console must be run on the same kind of platform (AIX or Linux) as that on which the data is collected, both for defining the trace and for viewing the data. The format of the deployed workspace and of the collected data is platform-specific.

3.5 Is it possible to monitor a Java program without entering the classpath, working directory, etc. that the New Workspace dialog prompts for?

Yes. The demo program that we beg everyone to do first shows exactly how to set this up and create a default workspace.

However, since you asked so nicely, here's what you do:

  1. Start the RC GUI.
  2. Turn RC on by entering rootcause on in a window where you'll start your app.
  3. Run your Java program as you normally do.
  4. Examine the RC log (File->Open RootCause Log).
  5. Search near the bottom and find you Java program APP_START node. If you see two identical ones, choose the second.
  6. Click on it.
  7. Right-click to get context menu.
  8. Choose Open Associated Workspace.
  9. New Workspace Dialog should appear with information filled in so you just click OK.

4. The RootCause Log

4.1 Can I trace any and all of the executables that I see in the log? Are there some restrictions?

Yes, you should be able to trace anything. If you find one that you cannot trace, please report it as a bug. However, most executables that are part of the system have no symbolic information, so you cannot see functions in the executable itself. You can get functions in shared libraries/DLLs that are loaded, and use the predefined UALs without symbols and debug information.

4.2 Why do I see two identical copies of a program in the RootCause Log?

Some programs like Java 1.4 and Netscape "fork and exec themselves" so these are distinct processes. You generally want the second one, since the first probably set up some things missing from the environment and then tried again.

4.3 Why don't I see the program I want to trace listed in the RootCause log?

There could be a number of reasons:

  • The program you're looking for is really a driver or script that runs another executable of a different name. Investigate this and look for that "real" program in the log.
  • RootCause was not "on" in the environment when the application was run. Use the rootcause status command to check.
  • RootCause was not on at application startup because the application starts at boot-time. See the explanation for [#q9.4 Q9.4], for example.
  • Many other processes have started since the one you're looking for, and the log file "wrapped around". The RootCause Log is a fixed size. When the maximum size is reached, newer entries overwrite older ones. Each entry is variable length, and if you have long command-lines or CLASSPATH values the log may hold fewer entries. The default size of the log file is 100,000 bytes. You may want to make the log file bigger. To see its current size, run the command rootcause log -s. Then choose a bigger number, say 20000, and run rootcause log -s 20000 (see [#q4.7 Q4.7]). You can clear out the current log contents with: rootcause log -Z (see [#q4.6 Q4.6]).
  • RootCause was "on", but the verbose setting was "off". To find out, use Workspace->List RootCause Registry (or rootcause register -l from the command-line) and look at the verbose setting near the top of the output, and see if it's missing or off. To enable it, enter the command: rootcause register -s verbose.
  • RootCause is being turned "off" again before the application starts up. This can happen when there's a wrapper or startup script that resets the environment by changing the PATH and deleting unknown environment variables. In this case you could see these scripts in the RootCause Log around the time when you think the application would be starting -- you can then edit them to turn RootCause back on again.
  • The program you're trying to trace is run using setuid root, which prevents the program intercept library ( libapaudit.so ) from being loaded from its default, non-secure location. See "SetUID Applications" in Chapter 10 of the RootCause User's Guide.

In all but the first case, you'll have to run the program again with "rootcause on" for it to show up in the RootCause log.

4.4 I ran only one application with rootcause on, and I see about a dozen processes in the RootCause log; where did they come from?

When you start a program, that may start a shell script. Korn shell, C shell and others can have associated "rc" files (e.g., ~/.kshrc , ~/.cshrc ), which run some commands. Then the script itself may run some commands to evaluate the environment. Then the program itself may start some processes (e.g,. by using CreateProcess() or system() ) to do some tasks. You can learn amazing stuff when you use RootCause even without tracing!

4.5 Can I cause only APP_TRACED events to show up in the RootCause Log?

Yes, by turning verbose logging off. This is done on with the command:

rootcause register -s verbose -e off

Also, you can set the environment variable APROBE_LD_AUDIT_VERBOSE=FALSE in a shell and it will disable logging of all commands started in that shell and its subshells. This trick is used by the rootcause_status script.

4.6 How do I clear the RootCause log?

There's currently no way to do this from the Console. From the command line: rootcause log -Z . Then do File->Refresh to see everything disappear.

4.7 Does the RootCause log wraparound? If so, how do I set the wraparound size?

Yes, it wraps so that it doesn't get huge. The default size is 100000 bytes. You can use the rootcause log -s command to query and change the size in bytes (there's no access to this from the Console). For example:

# show the log size:
 rootcause log -s
 100000
# set the log size to 20000 bytes:
 rootcause log -s 20000

4.8 Can I locate my .rootcause directory somewhere other than $HOME?

Yes, using the APROBE_HOME (or APROBE64_HOME, for 64-bit RootCause) environment variable. The value of this environment variable, if set, is used instead of the defaults: ~/.rootcause_aix, ~/.rootcause_aix64, ~/.rootcause_linux or ~/.rootcause_linux64. This directory is where the RootCause Log and RootCause registry reside, so if you want these files accessible system-wide you should set APROBE_HOME/APROBE64_HOME to some central, writable location.

4.9 Is there a way to keep the RootCause Log window from appearing when I start rootcause?

Yes. Edit the "preferences" file in your APROBE_HOME directory (see [#q4.8 Q4.8])and change

<start_with_log value="true"/>


to

<start_with_log value="false"/>

5. The Workspace Window

5.1 Should I say Yes or No to the "Application is not registered with workspace" dialog?

You'll nearly always want to click Yes, which means "use this workspace to trace this application next time you run the application with RootCause on". You might click No if you don't want to trace that application with RootCause yet, or if you want to keep tracing it with a different workspace with which it's already registered. When in doubt click No: you can always use Workspace->Register Program to do it later.

5.2 What does the blue dot mean in the Predefined UALs part of the Workspace Tree?

It means that something has been changed or added that must be recorded when the workspace is saved. You can ignore it.

5.3 Where do I find out about the Predefined UALs listed here?

See Chapter 8 of the User's Guide, which fully describes the Console GUI. Also, look for a file in $APROBE/probes with the same name and suffix ".apc" and you'll see the details of its implementation. This doesn't apply to X.trace.ual, which is custom for each workspace.

6. The Trace Setup Dialog

6.1 What does <Unknown File> mean in the Trace Setup tree?

This means "Unknown Source File", probably because no debug information was found. Look in the Messages pane of the Workspace browser window for messages about debug information. You can still trace entry and exit to these functions, and can write custom probes that get data without using debug information.

6.2 What do the black and blue dots mean in the Trace Setup tree?

The dots are there to act as a "path" to help you find the traces and probes you've defined.

A black dot indicates an entry/exit trace of the marked function, method, file, class, or directory. Functions and methods marked with black dots are represented by equivalent entries in the Wildcards dialog, and are implemented by entries in the trace.cfg file in the workspace.

A blue dot indicates a probe or data trace in the marked function, method, file, directory, or class. These actions are not mapped to wildcards, and are implemented by compiled APC for C functions.

6.3 How do I trace a dynamically loaded shared library (DLL)?

You must add the library to the workspace, and then it will show up in the Trace Setup window. To do this, select Add Dynamic Module... from the Workspace menu. If the module changes, you must do Reset Dynamic Module .

6.4 What's the difference between "Don't Trace..." and "Remove Probes..."?

"Don't Trace..." will remove the black dots from the subtree it applies to, meaning those methods and functions won't have their entry and exit traced. "Remove Probes..." will remove the blue dots, meaning specific Probe and Data logging actions will be removed.

6.5 I've got a UAL that I compiled with the apc command -- how do I get that into RootCause?

The easiest way is to copy it into the workspace. You can also use Add UAL, and you'll need to do that if it takes parameters and other complications, but that's a bit more advanced: see Chapter 8 of the User's Guide or contact .

6.6 Why don't I see all the symbols shown by "apinfo" or "apcgen -L" in the Trace Setup window.

For improved usability (at a customer's request), functions whose names match certain patterns are filtered from the list. This list can be changed, replaced or nullified, though this is not documented.

The filtering is defined by the patterns in the file $APROBE/arca/trace_filters . See the commentary at the top of that file for complete information.

6.7 I define APROBE_SEARCH_PATH to include my source location, but the RC GUI still isn't finding my source. Why?

Could it be you set APROBE_SEARCH_PATH after you started the GUI? If so, quit RC and restart it so it can pick up the env var.

6.8 How can I see and dump parameters for C functions for which there are symbols but no debug information?

This is addressed in Chapter 10 of the RootCause User's Guide, under Libraries With No Debug Information. Here's a paraphrasing of that given by our support staff:

The easiest way is to create a ".h" file that contains prototypes for the functions that you want. RootCause will automatically compile and use the "debug information" in that file so, for example, you can see the parameters in the setup window of the Console or reference them by name in the custom apc that you write.

To do this:

  1. Put the prototypes (C, not C ) into a ".h" file and give the file the same name as the shared library (or executable) where the functions reside (for example if your executable was named a.out, then the .h file would be named a.out.h)
  2. Place the .h file in the local or global "shadow" directory, with the name of your executable or library plus ".h" on the end. For example, if your program were called t.exe then on Unix the global location is $APROBE/shadow/t.exe.h and the user-local one is $APROBE_HOME/shadow/t.exe.h. See [#q4.8 Question 4.8] about APROBE_HOME (and APROBE64_HOME).

Placing the .h file in $APROBE/shadow would make it available for all invocations of RootCause, whereas the other two locations would be more user specific. Note that RootCause will search the directories in the opposite order of their listing above, so a.out.h in the .rootcause directory will be used instead of a.out.h in the $APROBE directory.

You can see an example of this by doing a directory of the $APROBE/shadow/*.h. RootCause uses this feature to provide parameter information for some of the system shared libraries.

Make sure that you have a supported C compiler available, as this is needed to compile the .h files. (You may not have a supported C compiler if you installed RootCause as Java only and now want to do C probing; contact to add the C capability.)

6.9 How can I turn on trace just when I'm in a chosen method or function?

This is called a "Trigger" and has been a feature of the Aprobe-level trace all along. It was added as a Probes action in the Trace Setup dialog in version 2.1.3a (April 2004). It works like this:

  1. Apply Trace to all the functions and methods you want to trace, as usual.
  2. Select the function or method that is to be the "trigger".
  3. Click the Probes tab in the lower right pane.
  4. Check the On checkbox, then use the Probe Action dropdown menu to select Trigger Trace.
  5. Click Ok to apply and build your trace.

You should see the function or method to which you applied the Trigger action at the top of each traced call tree in your trace, and nothing outside of that (even if you selected it for tracing).

6.10 How can I enable my custom probe only when Trace is also enabled?

You can check whether trace is enabled with the ap_RootCauseTraceIsEnabled macro. For example:


         if (ap_RootCauseTraceIsEnabled)
         {
            printf ("Enabled\n");
         }
         else
         {
            printf ("Disabled\n");
         }

Disabling your probe independently from Trace is covered in the "Disable Probe" example in $APROBE/examples/learn/disable_probe.

6.11 I notice "Disable Tracing" does not effect the "exception" predefined probe. How can I disable that as well?

You can't. This is deliberately designed to remain active even after trace is disabled. We do deliver source for the probes so that users can customize their behavior. In this case it would be a simple matter of putting the "if (RootCauseTraceIsEnabled)" check (see [#q6.10 Q6.10]) around the code in the "ExceptionHandler" routine within $APROBE/probes/exception.apc, recompiling it, and either using a local copy or overwriting $APROBE/ual_lib/exception.ual.

6.12 How can I trace and time everything between point A and point B?

  1. Create a workspace for the application (which you have already done).
  2. In the main window:
    • Enable the xxx.trace.ual (the first one).
    • Enable perf_cpu.
  3. Go to the trace setup dialog:
  4. Click on the program node (the very first one).
  5. In the probes tab, create a probe on program entry to disable tracing.
  6. In the left pane, click on the application module node (first 'M' icon).
  7. Right click and choose trace all.
  8. Find and select the point A function in the tree.
  9. In the probes tab, create a probe to enable tracing on entry.
  10. Find and select the point B function in the tree.
  11. In the probes, create a probe to disable tracing on exit.
  12. Click the Options... button to open the Trace Options dialog.
  13. Disable load shedding so you get everything.
  14. Click OK to build the workspace.
  15. Restart your application.

After you run through your test, format the APD files with Examine. The tree will reflect the trace path from point A to B. At the end is a summary call tree with call times in it. Or you can look at the performance table node (right click and choose show associated table) to see a table.

6.13 How can I allow all Java parameters to be traced?

To enable the Log All Parameters menu item, set and/or export the undocumented environment variable RC_ENABLED_LOG_ALL before starting the RootCause GUI.

7. The Trace Display (Event) Dialog

7.1 Why are some functions found in the traced Events not found in the Trace Setup?

There are two possibilities, but the most likely is that the traced function is a compiler-generated one that is explicitly filtered from the Trace Setup list, but which is covered by the "wildcard" trace used when you do "Trace All Child Nodes" from the Trace Setup module node. See [#q6.6 Q6.6] .

The other possibility is that the event was introduced by some other custom probe, such as a J2EE trace. See [#q7.2 Q7.2] .

7.2 Why are some Java methods found in the traced Events not found in the Trace Setup?

Probably because the events didn't originate in the Trace Setup, but were introduced by a supplementary J2EE trace. Still, you should be prompted to add the containing class, and so be able to define traces on it.

7.3 RootCause keeps asking to find a source file. Is there a way to just point to this once without specifying the path to every file we wish to view?

Yes, RootCause has a concept of a source file path. There are a number of ways to set this:

If you click on a method, the first time it will ask if you want to find the source. If you browse and select the source file, the enclosing path is automatically added to a list. If the end of the Java path matches the package path of the class, the "root" of the package path is added also.

You can edit the path directly off the RootCause Setup menu.

We'll pick up an environment variable APROBE_SEARCH_PATH when the RootCause Console starts.

7.4 The trace shows a problem in third-party software; what's the best way to pass this along to them?

Of course it depends on the vendor, but the best thing to do is to send them what you would want your customers to send you: text with as much pertinent information as possible. If the trace contains enough information for you to determine where the problem is, then the other piece of information they would want is the system configuration, as collected with logenv.ual.

To create the bug report, you could do File->Save As Text from the Trace Display window; then edit the resulting text file to include the program and system configuration and the tracebacks and execution information that identify the problem; then e-mail the result, indicating it was collected with RootCause. (They might have RootCause also, and ask you to re-run to collect additional information).

7.5 RootCause shows signal 11 during my Java application run, but there was no crash. Is this a valid signal 11?

Yes. The JVM routinely uses signal 11 (perhaps for extending the stack) and signal 4 (illegal instruction -- not sure what that's for). These can show up in the trace and are fine. Later versions of the JVM provide options for reducing its use of signals; you can search java.sun.com for details.

7.6 When I trace a Java synchronized method, does the method time include lock delay time?

The JVM implements the synchronization on the calling side rather than on the callee side. Once you are inside the method's code, the lock has already been grabbed. This means that the time you see is after the synchronization.

For instance, I have a test that calls a synchronized method from a thread's run method:


try
{
   Thread.sleep (1000);
   parent.synchronizedMethod ();            // Line 15
}
catch (InterruptedException e)
{
   e.printStackTrace ();
}

If I trace lines and have things set up so another thread is within synchronizedMethod(), I see something like this:


Line 15                    10.45.00            ; Waiting ...
synchronizedMethod entry   10.46.00            ; Got it ...

7.7 Why was malloc() listed as being LOAD_SHED in the Trace Display when it really wasn't?

Because it was attempted to be load-shed, which recorded it as such, but the actual disabling of the probe was disabled by another UAL's explicit request, using #pragma nopatchcount.

The confusion comes from the fact that load shedding may mean two things:

  1. The patch for the subprogram is disabled (no more probes for this routine will get triggered);
  2. This routine is no longer traced.

Since we don't want (1) to happen for allocation/deallocation routines when running memstat, these patches could not be disabled. This was indicated by using #pragma nopatchcount in combined_memstat.apc.

However, when traced these routines will get load shed just like everything else, and the LOAD_SHED event and appearance in the table indicate that (2) has happened. So this is pretty much "as designed".

If you explicitly mark the function as, "Do Not Shed", it will no longer show up in the table.

7.8 When formatting my data, an error pops up saying, "The maximum event tree size ... has been reached." What do I do?

You are hitting the limit on the maximum number of items displayed in the trace display. You can either reduce the size of the APD files, reduce the number of APD files selected or increase the limit at the expense of longer processing times and higher memory overhead. I would try the last one first and if this works for you, great. The option is "Maximum number of events in Trace Display" and is described here. Briefly:

  1. Go to the RootCause Main window
  2. Open the Setup menu (not the button, but the pulldown menu)
  3. Select Options...
  4. Change the value of the option Maximum number of events in Trace Display (third from the bottom) to a higher value. A value of 2000000 (two million) is appropriate for processors with more than 128M of memory.

The values are recorded per-user, so must be set for each user in the user preferences file: $APROBE_HOME/preferences.

7.9 I see that I can do "Save As XML": can I view this XML later?

Yes, but only in RootCause (see below). It is not quite legal XML and so will be rejected by general XML viewers. (If you think this is an important feature, let us know.)

To import saved XML back into RootCause again, you have to set the environment variable RC_ENABLE_LOAD_XML to a nonempty value before starting the RootCause GUI. If you've done this, you will then see the menu item Examine XML File... in the Analyze menu in the RootCause Main menu. Clicking this menu item will open a file selection dialog from which you can select an XML file. This must be a file previously saved from RootCause Trace Display using File->Save As XML. When you click the Examine XML Output button in this dialog, you will then see a Trace Data Dialog in which one of the checkboxes is the name of your XML file. Check it, and click Open, to view the Trace Display.

7.10 How can I see just the major time-consuming children of nodes in the Trace Events Summary tree?

Under the View menu, click Statistics Filter.... This dialog is used to create a "filtered" copy of the statistics summary tree. The copied tree will be added to the end of the event tree and will identify what filter was used. You specify a statistic to use (Wall time or CPU time, if collected) and a threshold percentage to create the "filtered" copy. A child node in the summary tree will only be copied to the new tree if the child's statistic value is at least the given percentage of the parent's statistic value. Choose "None" to create an exact copy. The threshold must be a numeric percentage between 0 and 100.

7.11 Do the times shown in Trace Events reflect the aprobe overhead?

No, these are actual times. You can specify overhead values by clicking View->Statistics Overhead. This opens the Set Statistics Overhead dialog. You'll see an options menu from which you can select the statistic to adjust, and type-in fields for the normal (native) call overhead and the Java overhead (which is generally bigger).

Note you must each statistic separately, for example:

  • Click View->Statistics Filter...
  • Click None and change it to Wall Time
  • Type in Overhead and Java Overhead values
  • Click Ok
  • Click View->Statistics Filter...
  • Click None and change it to CPU Time
  • Type in Overhead and Java Overhead values
  • Click Ok

When you've completed setting overhead values, you must regenerate the data:

  • Click File->Refresh' to reformat the data with the new values.

7.12 How do I know what overhead to specify in the Set Statistics Overhead dialog?

As described in [#q7.11 Q7.11], you can specify tracing overhead to be applied to times shown in the Trace Events details. But what number should you put in there? The answer depends on a number of factors, including your hardware and OS speed, whether you're dumping parameters, and whether it's Java or native code. A good guess is the minimum time you see in the entire tree for that kind of call, or if that seems to big, you can instrument some do-nothing function and see what its time is. This value would be the overhead for every call, and you can use that.

7.13 What are the various times I'm seeing in the details pane for Enter and Exit nodes?

The nodes look like:

ENTER Factor::addWidgets()
  time = 2004-05-03 16:32:10.079965024
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java
  CPU Time 0.428844 ( 0.428844)
  Wall Time 0.552496 ( 0.552496)

 EXIT Factor::addWidgets()
  time = 2004-05-03 16:32:10.632461354
  elapsed time = 00:00:00.552496330
  process = 15193, thread = 0 _start()
  symbol = "Factor::addWidgets()" IN "$java$", Factor.java

The Details pane for each node gives the (wall) time at which the function or method was entered. In addition, any statistics that were being gathered are attached to the ENTER Node. Shown here are the elapsed CPU Time (gathered because the perf_cpu probe was enabled) and elapsed Wall Time. Both were computed on EXIT from this specific invocation. The EXIT node also shows the elapsed (wall) time, which is the same as the Wall Time statistic.

7.14 What are the various times and percentages I'm seeing in the Details panes on nodes in the Event Summary tree?

Consider the following node:

Java_Factor_smallestFactor()
  process = 15193, thread = 10 _start()
  symbol = extern:"Java_Factor_smallestFactor()" in "libFactorJNI.so", /work/JNI/factor.c
  Times called: 29
  Child calls (native/Java): 4190 / 0
  CPU Time (29):  1.248102 ( 1.298730) [99.753%]
    Max  :  1.231153 ( 1.274449)
    Min  :  0.000048 ( 0.000072)
    Avg  :  0.043038 ( 0.044783)
  Wall Time (29): 375.135004 (375.185632) [99.998%]
    Max  : 375.105686 (375.148982)
    Min  :  0.000043 ( 0.000067)
    Avg  : 12.935689 (12.937435)

Recall that each node in the Event Summary tree represents a unique call stack in the execution. The one shown above is for the native JNI function Java_Factor_smallestFactor() (see $APROBE/demo/RootCause/JNI).

The function was called 29 times. Those 29 calls together used 1.248102 seconds of CPU Time after overhead adjustment (See [#q7.11 Q7.11].) The slightly larger time shown in parentheses after it (1.298730) is the "raw" time before the overhead adjustment. The percentage in brackets indicates that the total CPU time used for this function comprised 99.753% of the total time used by its caller, the parent node in the summary tree (See [#q7.10 Q7.10] about filtering based on this percentage.). Of those 29 calls, the longest (Max) took 1.274449 seconds of CPU, the shortest (Min) took only 0.000072 seconds, and the average took 1.248102 / 29 = 0.043038 seconds of CPU.

7.15 Is there a way to save the text for a specific node in the Trace Events tree?

Yes. Click on a node to select it, then right-click to pop up the context menu, then click 'Save Node As Text' to save the selected node in a text file. This will save the node and its details exactly as it would appear in the 'File->Save As Text..' output. Note that it works only for one node, so if multiple nodes are selected it applies only to the first of those. See also the [#q7.16 next question].

7.16 Can I copy a Trace Events node to the clipboard to be pasted elsewhere?

Yes. In either the Events tree on the left, or the details in the lower left: Click on a node (or multiple nodes using shift or control keys in the usual way). Then right-click to pop up the context menu, then click 'Copy'. This will put the selected nodes in the Java clipboard.

7.17 I know my method was executed many times, so why isn't it in the Performance Summary table?

Probably because it was Load Shed. This means that it was called so often its tracing overhead became excessive and tracing was disabled for it during the run. It will appear in the Load Shed table, where you can choose to stop it from being Load Shed during the next run.

8. RootCause and Aprobe

8.1 How do I adjust the Trace "DefaultLevels" option so only a fixed depth is traced when an application is run with RootCause?

You can't. The concept of levels is no longer supported. Instead you can apply a Trace Trigger, or disable and enable the trace using the probes tab for a given function.

8.2 How can I use Aprobe's predefined probes (profile, coverage, events, memwatch, statprof) with RootCause?

These are not currently integrated with RootCause. If you can run them from the command-line using Aprobe you should do that. If you wish to use the "RootCause On" mechanism to run them using the workspace, you must add them to the workspace options using the "Setup->Add UAL" menu item. This adds a new UAL "permanently" to the Workspace UAL tree. For example, to add the "memwatch" probe, you would:

  • provide "memwatch" as the path to the UAL and its name;
  • check "Has parameters";
  • provide "-g" as the Aprobe parameter if you want to see the memory usage display;
  • give no apformat parameters.

This adds "memwatch" to the UAL tree in the Workspace window. You could then check this to enable memwatch on applications run under RootCause. The output of these probes isn't integrated with RootCause, so the output appears as a "Text" node in the Trace Display event tree. You can use "Save As Text" from that display to view it outside of RootCause.

Prior to RootCause version 1.3.3, you would reference these probes using the Aprobe options and Apformat options dialogs (see Chapter 8 of the user's guide), just as you would on the Aprobe command-line. For example, to enable memwatch, you would add -u memwatch -p -g as "Additional Aprobe Options" (under Aprobe options in the Execute menu in the Workspace window) and -u memwatch in the Apformat options (under the Analyze menu). For probes like profile that require configuration files, you would have to put the full pathname of the configuration file into the options as well, like -u profile -p -c /testdisk/probes/prog1.profile.cfg .

8.3 Is it possible to develop in Aprobe, but still use the RootCause "intercept" mechanism?

Yes, but this is not explicitly supported. In particular, most operations from the RootCause Console overwrite the scripts in the workspace which apply Aprobe to the application. So after you use the Console to create a workspace, you quit, and edit the aprobe.ksh and apformat.ksh scripts directly to apply your probes.

8.4 If RootCause is built on Aprobe, and RootCause supports Java, is there an Aprobe for Java?

Aprobe supports Java with the apjava command. Writing custom probes in Java is described in Chapter 11 of the RootCause for Java User's Guide and the nearly-identical Chapter 5 of the Aprobe User's Guide, and if you really wanted to you could do everything from the command line.

8.5 How do I add my own UAL to the RootCause trace?

There are three ways of adding a UAL to a trace:

  1. Update the predefined_uals file in ual_lib to add it for all workspaces. It will show up in the list in the workspace when you do that.
  2. Use the Add Ual option on the setup menu - this will also cause it to show up in the list.
  3. Copy it into the workspace. It will not show up in the list because it's not until runtime that we look in the directory to see what other UALs are present.

Personally I like option b, choosing not to copy the UAL to the workspace. This makes it easy to enable / disable from the GUI.

8.6 How can I use the Events probe with RootCause?

The events probe is not integrated with RootCause Trace Display, but you can still use it. Here's a quick way to get started, by simply applying events to all Java methods and all native functions in the main module (if any), and letting load shedding reduce overhead.

  1. cp $APROBE/probes/events.cfg MyWorkspace.aws
  2. echo ';event function "*"' >> MyWorkspace.aws/events.cfg
  3. echo 'event function "*::*"' in $java$ >> MyWorkspace.aws/events.cfg
  4. Workspace->AddUal: add events.ual and specify the following aprobe parameter:
   -c $RC_WORKSPACE_LOC/events.cfg
  1. Keep the trace.ual enabled with load shedding on, but don't specify any traces (this would load shed low level events)
  2. Run the application
  3. From the command line, use
  rootcause format -r MyWorkspace.aws > format.txt

Your results are in format.txt. You can then edit the events.cfg file to do more, as shown in [#q15.12 Q15.12] , and you can specify an alternate output file so you get the events output while still formatting within RootCause.

9. RootCause at Run Time

9.1 Can I just leave RootCause "on" all the time? For example, while I power down and power up my computer? I was thinking that it would be interesting to see all the processes as my computer boots.

Not exactly, but you can turn it on early in the boot process in the same way you would start other services, by putting a script under /etc. Check with your system administrator or contact OC Systems support.

9.2 How much will RootCause slow my application?

This depends almost entirely on what you do with it. If you trace almost nothing, it will introduce almost no overhead. If you trace every method call on your machine, it will slow things down too much. The keys to good performance are:

  • only ask questions you want the answers to; that is, don't blindly trace everything if you're worried about performance; and
  • avoid logging data over the network: put your workspace on a local disk. Experience tells us that collecting too much data is a bigger problem than slowing down the application too much.

9.3 How can I trace Linux daemons with RootCause?

The following steps should allow you to use RootCause to trace activity in several of the daemons on your Linux system:

Background

RootCause keeps a log file and a registry as defined by the APROBE_LOG and APROBE_REGISTRY environment variables. These are generally set on a per-user basis by the Aprobe setup script, based on the user's $HOME environment variable or on the environment variable APROBE_HOME if that's defined. The default location for these files is a hidden directory under a users home directory called ".rootcause". When RootCause intercepts a program that is starting up it looks in the user's registry to see if this program should be instrumented. If so, there will be an associated workspace file named in the registry. By changing the APROBE_HOME environment variable before running setup, you can change the locations of the log and registry. Note that these files have to be writable by all processes that access them.

Daemons like sshd are started on Linux using a shell (bash) script located in /etc/init.d . For sshd the file is /etc/init.d/sshd . If you edit this file you will see a subroutine named "start". Not surprisingly it is this subroutine that we want to add a few statements to setup RootCause to intercept sshd .

Details

  1. Create a RootCause workspace to trace sshd :

We recommend that you create your workspace on a disk local to the machine that will be running the intercepted program on. Create it in the same way we did today, that is using the "new" pulldown menu on the main RootCause screen.

  • Verify the location of your log and registry files:

These files are probably in $HOME/.linux_rootcause . They are named: "registry" and "rclog". You can specify a different location using the APROBE_HOME environment variable (see [#q4.8 Q4.8] ) but be sure to run "setup" after setting APROBE_HOME and make sure the protections of the resulting files are correct.

  • Back up your /etc/init.d/sshd script.

You should probably make a copy of the sshd file before you modify it so you can restore it when you are finished tracing sshd.

  • Modify the /etc/init.d/sshd script to setup aprobe:

Find the start subroutine in the /etc/init.d/sshd file and insert the following four lines after the "do_dsa_keygen" line:

  export APROBE_HOME=directory identified in step 2
. aprobe_root
/aprobe/setup
  . $APROBE/bin/rootcause_enable
    1. Stop and restart the sshd daemon.

As root and with your current directory as /etc/init.d execute

  sshd stop
  sshd start

You should see a stopped message from the stop and some output indicating that rootcause has started from the start message. You may get a "FAILED" message from the start. On our system even when we get the failure message the daemon seems to start with no problems. So I think you can ignore this message. Tracing the libcrypt.so library was interesting, you can really see the ssh protocol flow as it generates keys and such. The technique outlined above should work for many of the daemons on Linux.

9.4 How do I apply RootCause to applications run at boot time?

Once you've used Aprobe to investigate the behavior of processes on a running machine, there is nothing particularly complicated about doing the same for system processes while the machine boots, but there are a number of special factors to take into account. These are listed below, and an example given of how we applied these to one of our machines.

The techniques described here were tested on Solaris (no longer supported) but should apply approximately to Linux. AIX is a bit different, and in any case should be done in coordination with a knowlegeable system administrator.

  1. Any time you make your own modifications to a system's startup procedures, there is a risk that you may make the system unbootable. We'll try to point out the pitfalls, but as with any procedures like this you should be prepared to recover the system from maintenance mode or even to reinstall the OS.
  2. At startup, system resources you may want to rely on may not be available. Make sure your RootCause installation is not on remote disks, and even for local installations, check that the filesystems used for the installation and for logging are available at the expected point during the boot process. If you want to get in at the start of Runlevel 2, the only filesystems typically available at that point are "/" and "/var", which may not have enough free space to support installation and logging.
  3. Startup scripts are run with /sbin/sh, which does not provide all the features you may be accustomed to with ksh, although it is very close for most purposes. Where possible, test scripts by running them under /sbin/sh before adding them to the boot process.
  4. For the test I just performed, I chose to monitor processes started as the system enters Runlevel 3, which starts NFS server processes, among others. At this point, all local filesystems are mounted, so I had no problem finding space for an installation, but many potentially 'interesting' services had already been launched.
  5. The libapaudit.so shared library needs to be installed in a secure location. With root authority, run:
  . /opt/aprobe/setup
  rootcause_libpath -c
  1. The startup procedure for a given Runlevel is determined by a script, " /sbin/rcN ". The execution of these scripts is described in /etc/rcN.d/README , for N = 2 or 3. Since RootCause depends on an environment being defined, we need to 'source' some scripts into this command so the environment is defined when servers and daemons are started. I did this by creating files in /etc/rc3.d. If you look at the README and /sbin/rc3 script, you should see how this works.
  2. You will need to perform three steps to enable RootCause intercept in the rc driver. We will accomplish this by creating three files in the /etc/rc3.d directory.
    • /etc/rc3.d/K00RootCauseLocal.sh

Defines the APROBE_HOME environment variable where the logs and registry are stored:

APROBE_HOME=/opt/aprobe_home
export APROBE_HOME

    • /etc/rc3.d/K01RootCause.sh

Is a soft link to the setup script in the RootCause installation directory:

ln -s  /opt/aprobe/setup /etc/rc3.d/K01RootCause.sh

    • /etc/rc3.d/K02RootCause.sh

contains the command to enable intercept:

. rootcause_enable

Normally, scripts whose names start with 'K' are used to shut down processes before others are started, but we will take advantage of the fact that these are executed first to ensure that the RootCause setup is performed before anything else.

  1. All that is required now is to reboot the machine, then login as root, define APROBE_HOME, source the installation setup script, and start the RootCause GUI. The event viewer should show you what processes were launched.

9.5 Can I apply different workspaces (or none at all) for the same program invoked with different command-line parameters?

Yes, by the addition of a "-p pattern" option to the rootcause register command. The pattern argument consists of a simple expression that can specify argument positions, wildcards and simple comparison and logical operations. You can associated the same executable (or Java class) and different patterns with different workspaces. At run-time, actual command-line arguments are substituted for special identifiers in the expression (like %2, $*) and then the expression is evaluated. If it evaluates to TRUE, the associated workspace is used to probe the application. If no expression evaluates to true, then the application is not probed. There's no GUI support; you have to register your application from the command-line to use this feature. All the details are described [regpattern.txt here]. If it's still not clear how to do what you want, don't hesitate to [#q1.6 contact us].

9.6 How can I "intercept" a Java server on AIX?

As described in the user's guide, RootCause on AIX does not support the automatic "intercept" of applications at load time: the application must either be run directly from the command line with "rootcause run", or else the binary must be renamed/replaced with a soft-link to a script that simulates the intercept effect.

Starting with version 2.1.3b (May 2004) you can do implement this second alternative with the rootcause link command, which renames/replaces the java binary with a script that uses access-lists and environment variables to manage who's applying rootcause to each Java instance.

The command rootcause link is used to apply Rootcause to applications (typically services and application servers) which cannot easily be started from a user's shell environment. rootcause link uses symbolic links to "intercept" these applications. A set of subcommands are available to manage these links safely and conveniently.

Note that step 4 will probably require root authority, depending on where the application to be traced is installed.

  1. Identify the full path to the executable you wish to trace with RootCause. In the case of an application server, this will be a program named "java". You should use the 'ps' command to verify the pathname if possible. Write this path name to a file, for example:
       echo /usr/java131/bin/java > server.lst

The application named here cannot be a symbolic link.

  1. Install the above list as the application list with
       rootcause link -I server.lst

You may specify more than one application, each on a separate line, in this file. The rootcause link -I command instructs RootCause to save this file as the list of applications whose links are to be managed. rootcause link -I will require write access to the RootCause installation directory. If you need to change the application list later you will need to apply step 7 below (remove symbolic links).

  1. Verify the application list is installed as expected with
       rootcause link -l

This will report a line like the following:


     - /usr/java131/bin/java

The '-' indicates that the application is eligible to have its link managed, but that link does not exist and as a result the application will not be run under RootCause. rootcause link -L will show an explanation of the characters used to describe the link state. These are:


   - Executable is not RootCause linked
   * Executable will be run under RootCause
   ? File is not an executable or is invalid
   ! A serious error was detected;  contact support immediately

  1. Create the application link with
       rootcause link -K

This will create symbolic links into the RootCause installation directory for each application designated with the rootcause link -I command. rootcause link -K requires write access to the directory where the application to be traced is installed. Typically this will require root authority.

  1. Turn on rootcause interception with
       rootcause link -a

Now whenever the application is started, an entry will appear in the rootcause log. Follow the usual procedure to create a workspace and set up trace definitions. rootcause link -a can be run by any user. At this point you are ready to begin analyzing and debugging your application with RootCause. The remaining steps describe how to return the application to its original state and should be performed if RootCause is uninstalled.

  1. Turn off rootcause tracing with
       rootcause link -Z

The symbolic links will remain in place, but the application will not be run under Rootcause. rootcause link -Z can be run by any user.

  1. Remove symbolic links with
       rootcause link -D

rootcause link -D requires write access to the directory where the application to be traced is installed (same as -K). This will restore your applications to their original state, where they will run completely independently of any component of the RootCause toolset.

9.7 How can I dump Java objects with a probe on a known program point, rather than at a certain elapsed time as done by java_memstat?

The java memstat probe is built on top of another probe called libapjvmpi. It is an interface to the Java JVMPI library and takes care of a bunch of the low-level work. One of the things it provides is a mechanism to take a heap dump. Working with the interface requires getting a dynamic pointer to the libapjvmpi interface and then using that. For instance:


 #include "libapjvmpi.h"
 
 static apjvmpi_InterfacePtrT JvmpiInterface = NULL;
 static apjvmpi_InterfaceHandlePtrT JvmpiHandle = NULL;
 
 void InitializeUal_early_heapdump (void)
 {
    // Load the jvmpi interface UAL
    if (ap_IsNoUalId (ap_LoadAndInitializeUal (LIBAPJVMPI_LIBRARY_NAME)))
    {
       ap_Error (ap_WarningSev,
                 "Unable to load "LIBAPJVMPI_LIBRARY_NAME"\n");
    }
 }
 
 probe program
 {
    on_entry
    {
       JvmpiInterface = apjvmpi_Initialize;
 
       if (JvmpiInterface == NULL)
       {
          ap_Error (ap_WarningSev,
                    "Unable to initialize JVM support for\n"
                    "Java object tracking.");
          return;
       }
 
       // Get an interface handle
       JvmpiHandle = JvmpiInterface->Initialize (3);
       if (JvmpiHandle == NULL)
       {
          ap_Error (ap_WarningSev,
                    "Unable to get a necessary interface for "
                    "Java object\n"
                    "    tracking. It requires interface version 3 but the
 "
                    "apjvmpi library\n"
                    "    is at version %d\n",
                    JvmpiInterface->GetVersion ());
          JvmpiInterface = NULL;
          return;
       }
    }
 }
 

To call the heap dump you would need a probe to determine when and call the heap dump routine:


// Request a heap dump. Keep the last n heap dumps specified - note that
// if there is already a larger count set, that value is retained.
// void (*RequestHeapDump) (apjvmpi_InterfaceHandlePtrT Handle,
//                          int                         RetainHeapDumpCount);

   {
      // Keep 3 dumps
      JvmpiInterface->RequestHeapDump (JvmpiHandle, 3);
   }

You'll need java_memstat around to format the object dump(s).

10. RootCause J2EE Support

RootCause J2EE support has been discontinued with the introduction of OC Systems "RTI Enterprise" product. See http://rtiperformance.com.

11. RootCause TroubleShooting

11.1 I applied a Trace on function (method) in the RootCause GUI, but I don't see it being called in the output. Why?

Here are some possibilities:

  • The function was called so often that it was load shed, and calls stopped being recorded. Click on the LOAD_SHED node at the end of your Trace Display, choose Show Associated Table, and look for your function there. Using the option-menu in the first column can designate the function as Do Not Shed for subsequent runs.
  • The function was called, but it's not shown in the data file you're viewing. Use Add Data Files to Display in the File menu to add earlier files. If you still don't find it, then the data containing the last call may have been overwritten (i.e., the "trace buffer wrapped around"). You can save all data files containing the trace of a function by adding a SNAPSHOT probe ON_ENTRY to the function in the Trace Setup dialog.
  • There are multiple instances of the method in different classes, and you chose the wrong one. Use Find in Trace Setup and set traces on others that occur.

The following possibilities apply only to native (C/C ) functions:

  • The function that's really being traced is in a different module. For example open() in libc.so instead of your application module. Use Find in trace setup and set traces in all modules where your function appears.
  • You did "Trace All In" which generates a wildcard, but the function was one of those that's not traced as part of a wildcard because it requires an expensive "trap" patch. Return to Trace Setup and force an explicit trace on this function by adding a "probe" on entry.
  • The function was optimized and so "inlined" at the point of call. If there really is no call, the function can't be traced.
  • The function cannot be traced. There are a few functions that because of the way they're coded simply cannot be probed. To test for this, go to the command-line and type:
  apinfo -sa -x your_application.exe | grep "your_missing_function"
  • If you see your missing function in the output, it cannot be instrumented. Contact OC Systems to find out why.

11.2 I add a library as a dynamic module and trace the init function, but the trace doesn't show up. Why?

When you add a module as a dynamic dll, this forces it to be preloaded (loaded before program start rather than at the point of the dlopen() / LoadLibrary() ). This means that the _init() function is called before _start of your main application, which is before probes have been applied.

11.3 I Add Dynamic Module of mylib.so, then specify some traces in mylib.so. But when I run the program, those traces don't appear. Why?

You may be loading a different instance of the library at runtime than you specified to Add Dynamic Module. This may be the case if LD_LIBRARY_PATH (or LIBPATH on AIX) is set. Make sure that the full path to mylib.so you've added to your workspace is the same as the one that will be loaded at runtime.

11.4 I did Custom..., and saved my probes to an APC file, but those probes don't show up in my trace. Why?

Make sure the "Add to Custom APC Files" checkbox is checked. If you've already got an APC file, make sure the Append checkbox is checked as well. Also, see [#q11.1 Q11.1] .

11.5 How do I stop tracing something I've got a workspace for?

You need to delete it from the registry. The easiest way to do this is with the GUI:

  • Open the workspace in the RootCause Console GUI
  • In the RootCause main window, click Unregister Program in the Workspace menu.
From the command-line, do rootcause register -d -c class_name to unregister a Java main class.

To unregister a native program, first do rootcause register -l to see the exact path of the program that is registered, then do rootcause register -d -x exe_path.

11.6 What do I do about the message "(E) ADI checksum (0x84b1c4d) does not match module checksum (0xa1c5e35)." when I register on a .dply file at a remote site?

This message will be followed by specific information about the ADI file and module. The module is the executable or shared library on the remote machine, and the ADI file contains the debug information from the host machine where the workspace was developed.

The error messages indicate that the version of the module (application) on the remote machine does not match the version against which you developed your original traces.

You must create the workspace and traces against the same version you send to the remote site because we compare checksums.

11.7 Why does my Java app fail with "Class Not Found" under RootCause, but work fine without RootCause?

The most likely cause of this is that you're using the "-jar" option on your 'java' command, which is not supported by RootCause prior to version 2.1.2 (October 2003).

So, if your application is run with


java -jar $APROBE/lib/probeit.jar

You could run it instead with:

java -classpath $APROBE/lib/probeit.jar  com.ocsystems.probeit.Main

If you don't know what the main class is, it is defined in the manifest of the .jar file. For instance:


mkdir tmp
cd tmp
jar -xf $APROBE/lib/probeit.jar META-INF/MANIFEST.MF
grep "Main-Class" META-INF/MANIFEST.MF
     This will give a line "Main-Class: com.ocsystems.probeit.Main".
cd ..
rm -rf tmp

You would do the same thing using your own java command line and jar file in place of the above.

After you have changed the command line, you should then re-run the application and got through the "New Workspace" steps. This time it should work fine.

If this is too much of a hassle, contact support@ocsystems.com about getting a version with -jar support. If you weren't using -jar, or if the problem persists after going through above process, also contact OC Systems support and we can help you debug it.

11.8 How can I probe Java classes loaded with a custom class loader and so not in the CLASSPATH?

You will find that when you use "Open Associated Workspace" it imports only the jars in the class path and and so other classes that might be explicitly loaded do not appear in the Trace Setup. This can be easily remedied.

So long as the class loader follows the standard model for class loader inheritance (e.g. classes loaded by that loader have visibility to classes loaded by the application class loader) this is trivial:

  1. From the Main Workspace menu choose the Setup->Class Path menu item o bring up the Class Path dialog.
  2. In the Class Path dialog, add the path(s) to the class directories or jar files you will be loading from. Note that this does not have to be where they will be loaded from at runtime. This just gets them into the Trace Setup.

If there is no physical representation of the class available, you can use wildcards:

  1. Select the Root Java Module in the Trace Setup;
  2. Right click to bring up the context menu;
  3. Choose Edit Wildcards to pen the >Edit Wildcards dialog.
  4. On the left "Trace" side of the dialog, enter strings like:
    "MyClass::*"
    "MyClass::aMethod"

11.9 When I have "rootcause on" I sometimes notice that commands piped together (for instance "env|grep MyVariable") can hang for a while before completing. Why is this?

Your home directory (which will be the default disk for the rootcause log) is probably on an NFS disk. When two processes try to lock a file at the same time, one will be halted until the other one is done. However, with NFS it can take a while for the state of the unlock to propagate back, leaving the caller waiting on the lock routine even though the other process has unlocked it. The solution is to set APROBE_HOME to a local disk.

11.10 When I add my library to the workspace with Add Dynamic Module and run with RootCause, my application never starts. What's wrong and how can I fix it?

Add Dynamic Module causes a library to be "preloaded" (using the aprobe -dll option) because it's only on program startup that automatic trace configuration can be done. However, some user libraries cannot be preloaded because they rely on some global state being defined which isn't done until the program starts running.

This means you can't trace or do anything else on this module. You're beat unless you can change the library to allow it to be pre-loaded.

11.11 Is there a way to add my own files to a deploy file so they will unpack into the directory created by rootcause register xxx.dply?

A .dply file is just a zip file. You can just use zip (provided with RootCause) to add files to this archive, like:
   zip xxx.dply this.txt, that.class, other.ual

11.12 Why doesn't the pi_demo program doesn't run on my new Linux version?

Because it was built on an old version of Linux. You can rebuild it from source using the Makefile in that directory, or else load the compatibility package for Fedora: compat-libstdc -*.i386.rpm.

11.13 Why didn't my trace on Linux didn't log any data?

If your Workspace is being accessed over NFS, this means you're writing the data to APD files over NFS, and Linux has known bugs with this. You really need to have your workspace/APD files on a locally-mounted disk. (Even if it weren't for this bug, logging over NFS is orders of magnitude slower.)

12. Aprobe FAQ

12.1 What is Aprobe?

Aprobe is a suite of tools and libraries which support dynamic modification and extension of a program by dynamically patching the program executable and/or shared libraries.

A dictionary defines "Probe" as "Device for exploring an otherwise inaccessible place or object." "Aprobe" stands for "Algorithmic Probe". It is hence a tool for exploring your program with the help of user-written algorithmic probes. These probes are installed into your program with the help of OC Systems' patented "dynamic action linking" technology.

A user runs a program with the "aprobe" tool, indicating that certain "probes" are to be patched into the program and executed as the program itself runs.

A "probe" consists of "actions" composed in C, with some special syntax added to indicate where in the program the actions are to be invoked.

There are a number of predefined probes included in Aprobe; there is a tool to generate simple probes directly from a linked or unlinked object file; or the user may easily compose his own probes in a simple extension of the C language.

See also [#q1.1 "What is RootCause?"]

12.2 What is ProbePak?

The ProbePak was an experiment at introducing users to the power of Aprobe and RootCause by making a subset available for free download. It didn't work out, and ProbePak is no longer supported. See the main page www.ocsystems.com for information on our current products.

12.3 What are some potential uses of Aprobe?

Read more about uses of Aprobe in the Product section of the web site or read the white papers in the Resources section. See also [#q1.2 "What are some potential uses of RootCause?"]

12.4 How do I get started quickly with Aprobe?

The best way to get started writing probes is to look at examples, and make some small changes.

If you have RootCause and have been using the GUI, you can use the Custom... button in the Trace Setup window to generate a probe, and look at that. If that looks too daunting, or you want a more tutorial approach, try the graduated examples in the examples (or ada_examples ) and demo/Aprobe subdirectories of the Aprobe installation. Check out $APROBE/examples/evaluate/README.

12.5 Who can use Aprobe?

Technical people who are developing, testing, and maintaining software.

12.6 What different versions of Aprobe are there?

The current version of Aprobe on AIX is 4.4.1; on all other platforms it is 4.3.4b, released in June 2005.

The original version of Aprobe is version 2. for AIX, included as part of OC Systems LegacyAda/OATS product, and in earlier versions of OC Systems "PowerAda" product.. While it shares the "probe" concept with the newer version, the user interface and details of Aprobe Version 2 differ substantially from Versions 3 and 4.

12.7 For which platforms is Aprobe available?

Same as those for RootCause: See [#q1.5 Q1.5].

12.8 How do I get Aprobe?

E-mail , and we will arrange for you to receive the software.

12.9 What documentation is available for Aprobe?

Aprobe is delivered with a User's Guide in hardcopy, HTML, and PDF formats. The latter two softcopy forms are included in the evaluation version which can be downloaded. The HTML version is available on-line at www.ocsystems.com/sup_ug_index.html .

There are a series of graduated examples that come with their own text documentation in the examples and demo subdirectories of the Aprobe installation. You should read $APROBE/examples/evaluate/README and try at least some of the examples under that directory, before trying Aprobe on your own application or looking through this FAQ for answers.

12.10 What tools make up Aprobe?

apcgen - generates APC for some or all functions in the specified object file(s)

apc - compiles and links the specified APC file(s) into a UAL (DLL).

aprobe - runs the specified program after loading and applying patches in the specified UALs.

apformat - formats any data logged in the specified aprobe data (APD) file(s).

These tools are described further in other questions below. A number of additional tools and scripts and for specific situations are also provided. See Appendix A of the Aprobe User's Guide.

12.11 How is Aprobe licensed?

Same as RootCause. See [#q1.9 Q1.9] .

12.12 Is there a point-and-click (GUI) interface to Aprobe?

Yes. It's called RootCause. See [#q1.1 Q1.1] .

Also, Some predefined probes (see [#predefined Q15.] below) include a Java GUI to specify configuration parameters for that probe.

12.13 Can I run Aprobe on any executable program file?

Yes. You can run aprobe (without any probes) on any application at all unless:

  • It is a secure application which a debugger doesn't have authority to attach to. In this case you should get a clear explanatory message.
  • The application does something very strange like replacing some low-level system routines with its own versions that do something different.
  • There's a bug in Aprobe.

If you find that using aprobe causes your application to crash, you should try running aprobe without any probes. If it still crashes, it should be reported as a bug to .

A slightly different question is, "Can I use Aprobe to put probes on any program?" To actually apply probes to a native module, there are three basic requirements:

  • Symbols

For Aprobe to do what it does it must be able to figure out where the subroutines you are trying to probe have been linked and loaded. We call this location information "symbols". All symbolic debuggers have the same requirement. See [#q12.17 Q12.17] .

The symbols may be as originally added to the application (i.e., not stripped, see [#q12.16 Q12.16] ), or they may have been saved separately by Aprobe using apmkadi (see [#q13.9 Q13.9] ).

Most programs delivered with the operating system, and off-the-shelf software, are stripped, so you can't use Aprobe directly on the application code, but you can generally probe shared libraries (DLLs) that support them.

  • Standard Call/Return behavior

If the program uses a mechanism that transfers control other than by the normal call and return mechanism, such as setjmp / longjmp or an unsupported exception mechanism, and there is an active probe at the time of that non-standard transfer of control, the program will likely crash.

  • Supported exception mechanism.

Ada and C (and Java, but that's a separate issue) support exceptions which are non-standard transfers of control. Each compiler does this in a different way, and must be explicitly supported by the Aprobe runtime. See [#q12.15 Q12.15].

12.14 In what language(s) can my program be written?

Same as for RootCause. See [#q1.10 Q1.10]

12.15 What compiler(s) must have been used to compile my program?

Same as for RootCause. See [#q1.11 Q1.11]

12.16 How do I tell if a program file is "stripped"?

Use the "file" command, e.g.:

AIX:

$ file a.out
a.out:      executable (RISC System/6000) or object module not stripped
$ file /bin/ls
bin/ls:     executable (RISC System/6000) or object module

Linux:

$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), not stripped
$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped

12.17 How do I tell what symbols a program has available?

apcgen -L will list the Aprobe function symbols in any compiled object module, for example:

apcgen -L C:\WinNT\system32\kernel32.dll
apcgen -L /usr/lib/libc.so
apcgen -L /work/programs/prog.exe

There are other apcgen options such as -m to show "mangled" names and -v to show file names--use apcgen -h for usage.

The RootCause Trace Setup window shows a tree of all the functions organized by module, directory and file, using the same mechanism used by apcgen.

If you want information about data symbols, or want to confirm that a function may actually be probed, you can use the apinfo command, which runs the "info" predefined probe. This only works on executable programs. For example:

apinfo -d /work/programs/prog.exe

will show all the global and file-static data symbols found when prog.exe is loaded by aprobe. There are lots of other options: use apinfo -h to see them.

12.18 What do I do to get symbols in my program?

On Unix, every program has its symbols unless they're explicitly stripped (see [#q12.16 Q12.16]

12.19 What do I do to get "debug information" in my program?

This is documented in Chapter 10 of the RootCause User's Guide, "Building a Traceable Application", and in Chapter 3 of the Aprobe User's Guide, but it's summarized here:

  • For C/C compilers compile with -g.
  • With PowerAda you get debug information by default, but you need the PowerAda program library available just as you would for adbg.

In addition to compiling with the right option to generate the debug information, you also must retain that information and have it available where it's supposed to be:

  • For gcc-based compilers, including GNAT, and IBM's C and C compilers, debug information is collected at link-time into the executable, and is retained unless you explicitly use the "strip" command.
  • PowerAda line information is recorded in the executable and can't be stripped, so you don't need any debug information at run-time. However, for `apc' and the RootCause GUI, you need the Ada program library, which must be consistent with the executable and available at the same location recorded in the executable. If the library is moved, you can specify its location with the environment variable APROBE_POWERADA_LIBRARY, for example
  export APROBE_POWERADA_LIBRARY=/builds/old/prog1/adalib

12.20 How do I tell if a program file has "debug information"?

The apcgen command will list those functions that have debug information associated with them:

apcgen -Ld a.out

This should be all you need, but there are some system utilities that look in the object files themselves that may also be used:

  • On Linux you should find objdump :
  objdump -W a.out | grep "DW_TAG_subprogram" | awk '{ print $NF }'
  • On AIX, you can use the dump utility with the -t option to dump symbol information, including debug "stab" strings. For example:
  dump -t a.out | grep ":F"
  • will show the functions that have debug information.

12.21 What is a "probe"?

A "probe" is a "user action" associated with a specific location in a program. The user action is executed whenever control passes through the location with which it is associated. A "probe" is described in an extension of C called "APC", for example:

probe thread
{
  probe "foo"
  {
    on_entry
    {
      printf("Entering foo.\n");
    }
  }
}

The block following the "on_entry" is the "user action". The syntax surrounding it describes exactly where and when the action should be executed: immediately upon entering function "foo()" in each thread.

12.22 What is a "UAL" (.ual file)?

A UAL is a "User Action Library". It is the output of the "apc" command, and is a shared library consisting of the object code generated from your apc files. Not just any shared library (DLL) may be used as a UAL, and it a UAL may not be renamed after creation, because it has specially-named entry points based on its filename which are called by the Aprobe runtime to perform initialization.

12.23 What is "logging"?

With respect to Aprobe, "logging" means "writing data to a file for later analysis" Aprobe provides a built-in logging facility that allows saving raw data in a time and space-efficient way, and using "apformat" to display the logged data later. See [#logging "Logging Data"] for related questions.

12.24 What is an ".apd" file?

An ".apd" file is one that contains the data generated by a program run under aprobe. These are binary files which are read with the "apformat" tool.

There is always a ".apd" file generated giving aprobe invocation information, even if no "log" statements are executed. If log statements were executed there will be a "-1.apd" file, and maybe "-2.apd" files as well.

12.25 What can't I do if my executable or library doesn't have debug information?

You can't reference source-level information in your probes. It's just like using a source level debugger in this respect, and for the same reason. A good rule is, if the debugger can print the value of a variable x at line 15, then you can do "on_line(15) log($x)" in your probe.

More specifically, you need to specify "-x exe_or_library " on the apc command, and the exe_or_library must contain debugging information, if you use a construct in your probe that cannot be resolved without specific debug information from the program. Such constructs are:

(a) target expressions: names from the probed program preceded by $, or $* ($1, $2 are ok, as are hardware-register references starting with '$$').

and

(b) references to specific source lines;

Note that there are lots of probes you can write; for example, all but one of the predefined probes provided with Aprobe will work fine in the absence of debug information, and the one that does require it (coverage) does so in order to get source line number information.

12.26 Does use of on_line() requires application to be have debug information?

Yes, but things aren't that simple. To build a probe that requires debug information (including line information) the debug must be available when the probe is compiled. However, the debug information can then be stripped and the probe ran against the stripped executable.

For the symbol table, the necessary symbols must be present at runtime, either in the application (or application libraries) or in a .adi file which is generated with the Aprobe tool apmkadi . That tool allows you to capture the symbol table in an internal form and then strip the executable.

Also, PowerAda programs always contain source line information -- this is not considered debug information.

Finally for low-level hacking, you can instrument specific offsets using on_offset.

12.27 What is the maximum number of probes allowed?

For probes you are just limited by paging space. For UALs there is a more practical limit - we limit the total number of modules to 255 and that includes UALs.

12.28 Is there access to C private/protected variables?

Yes, if it's in the debug we can see it. We don't look at whether the debug says it's private, protected or public - we just use it.

12.29 Is there any way to attach with Aprobe to a running application?

No. This question is very frequently asked. It sounds great in theory but in practice Aprobe is a tool for tracking problems that have yet to happen, not those that have just happened. There is also quite a bit of work done by Aprobe when an application starts up; often doing this to a running application is as big an issue as re-starting the application.Finally for Java you wouldn't be able to change the classpath to see our classes or intercept classes that have already loaded.

12.30 Is there a way to probe a function for which no symbol is available?

Yes, if you know its address and size, you can define a symbol for it using ap_RecordDynamicFuntionSymbol() in the Aprobe Runtime Library and and then apply probes using the define symbol.

Here is are example C and Apc files illustrating how to use it.

defsym.c

 #include
 #include
 
 static char *image(char *s)
 {
    char *s1 = strdup(s);
    return s1;
 }
 
 int main (void)
 {
   printf (image("Hello\n"));
   return 0;
 }
 

defsym.apc

 //---------------------------------------------------------------------------
 // Define Dynamic Function Symbol Example
 //
 // This is an example of using ap_RecordDynamicFunctionSymbol()
 // to define symbols when no debug information is available.
 //
 // NOTE:  If the offset for symbols is wrong the program will
 // likely crash because you will have directed Aprobe to instrument
 // the wrong piece of code.
 //---------------------------------------------------------------------------
 
 #include "aprobe.h"
 
 // To define your symbols early enough to be instrumented and
 // probed, you have to define them from a UAL initalize function.
 // The initial part of the name must be InitializeUal_, and the first
 // character following that must be lower in the ASCII collating order
 // than the first character of the UAL name. '0' is the lowest legal
 // character.
 
 void InitializeUal_0_defsym_apc()
 {
    // In this example I just define an alias for the symbol "main"
    // and probe that instead.  You have to know the correct offset
    // and size of the function (though size is not so critical).
    // The offset is the offset in the moudle, not just the text
    // section.
    ap_SymbolIdT NewSym =
       ap_RecordDynamicFunctionSymbol (
          ap_ApplicationModuleId(),
          "MyAliasForMain",
          ap_ExternSymbol,
          ap_IntegerToOffset(0x10),
          0x1d,
          0);
    if (ap_IsNoSymbolId(NewSym))
    {
       printf("Couldn't define symbol...\n");
    }
 }
 
 probe thread
 {
    // You'll get a warning about the symbol not being defined
    // when you compile this with apc, but it's OK.
    probe "MyAliasForMain"
    {
       on_entry  printf("Hello again...\n");
    }
 }
 

13. Using the "aprobe" Command

13.1 What does "aprobe" do?

Aprobe locates the specified UALs (if any), loads them as well as the Aprobe runtime, patches the executable to invoke the probes described in the UAL files, and starts execution of the specified program.

13.2 How do I specify options to my program when using aprobe?

The executable program name is the last argument on the aprobe command line. All options after that are passed as arguments to the executable. For example, if your regular command-line would be:

    mygrep "a_string" *.txt

Then with aprobe it would be:

  aprobe -u mygrep.ual mygrep "a_string" *.txt

The most reliable way to do this, used by RootCause, is with the aprobe "-execvp" option. In this case you specify a filename in place of the parameters, and the filename includes all arguments, including "argv[0]" that is to be passed as the executablename. For example, in the above case:

aprobe -execvp -u mygrep.ual mygrep mygrep.args

where mygrep.args might contain the lines:

mygrep.exe
"a_string"
file1.txt
file2.txt

13.3 How do I specify options to my probes?

Options and parameters can be passed to each UAL as well. This is done by following the UAL name with the -p option followed by the options in quotes. This is most commonly seen when invoking a predefined probe that is part of Aprobe, for example:

   aprobe -u info -p "-sa" mygrep.exe

The options to the info probe are "-sa".

13.4 How do I print my output at run time instead of sending to the APD file?

The "-if" ("immediate format") option on the aprobe command does this, e.g.,

   aprobe -if -u fooTest foo

13.5 Can I suppress generating an ".apd" file?

Not at this time. Even if you do "aprobe -if -n 0 ... " you get the basic .apd file.

13.6 How can I run my probes without invoking aprobe?

Use RootCause. That's one of it's key features If for some reason you can't do that, you do the things described in Chapter 4 of the Aprobe User's Guide, "Loading Probes without aprobe":

  • Substitute the Aprobe command line for the command that starts your program
  • Rename your application and replace it with a hard-coded script that calls aprobe on the renamed executable. On Unix there is a script delivered that facilitates this, called "run_with_aprobe_edit", which documents its use.
  • Use the "run_with_aprobe_apo" script. The script text includes documentation on its use.
  • Link aprobe into your application by linking it with the shared library "libdal.so".

13.7 How do I probe a function in a dynamically-loaded shared library?

If your program explicitly loads a file by calling dlopen("dynamic.so"), Aprobe does not support this directly since it does all its patching when the executable and any shared libraries linked in are first loaded into memory. So the only shared libraries you can probe are those listed by the command

ldd exe_name

You can force a shared library to be pre-loaded at startup by specifying its full path as the argument to the "-dll" option on the aprobe command line, for example:

aprobe -dll /my/application/dynamic.so /my/application/app.exe

This assumes that dynamic.so is not dependent on any other dynamically loaded shared libraries and that it doesn't hurt to be initialized earlier than would have been the case with dlopen.

13.8 Can I probe a function in native C or C code loaded by a Java application?

In general, if you're using Java, you should be using RootCause and not Aprobe directly. However, you can do this using apjava -dll option. If your Java class named "JTest" contains "LoadLibrary("native") then this should work:

apjava -dll /full/path/libnative.so -u native_probes.ual -java JTest

13.9 Is there a way I can use Aprobe in a target environment where my application has no symbol or debug information with it (is stripped)?

If you have a program that can be probed, you can run the tool apmkadi on it to create an Aprobe Debug Information (ADI) file. You can then remove the symbols from the executable (using the strip command) and ship it to the target site. When you want to run Aprobe on that, you would then specify not only the UAL file(s) containing the probes, but the ADI file(s) as well, which contain only the symbolic information needed by Aprobe. See Appendix A, "apmkadi" for more information.

13.10 Can I run aprobe but produce no APD files?

Yes. The "-p" flag, which prevents generation of any APD files, was introduced in Aprobe version 4.2.5. This is useful if your probes don't log any data using the default log method.

13.11 Why does my program crash when using aprobe, and not without?

The possibilities are:

  1. There's a bug in your probe, for example, one of your action routines is dereferencing a null pointer. See "Debugging Your Probes" near the end of Chapter 3 of the Aprobe User's Guide.
  2. Your application provides its own "malloc()" function which requires initialization before its first use. Since aprobe gets control before your application does, and uses the application's malloc(), this could cause a crash on startup. See also [#q13.14 Q13.14].
  3. Your probe is accessing or logging data on_exit to a function or thread, but the on_exit action is being called in an exception or thread exit condition and so may not have valid data available. In order to check for this, put on_exit code within a block that checks the ap_ProbeActionReason implicit parameter, e.g.:
  on_exit {
    if (ap_ProbeActionReason == ap_ExitAction)
    {
      log("foo returns ", $return);
    }
    else
    {
      log("foo exits abnormally for: ", ap_ProbeActionReason);
    }
  }
  1. Your program is very time-critical, or is such that timing may change the order in which order-dependent operations are executed. Aprobe introduces some overhead, and your probes likely introduce a lot more overhead, which can change your program's behavior. You can use aprobe itself to find out what's happening, and to force synchronization between threads -- contact .
  2. There's an Aprobe capacity problem. This may happen with the predefined probes if you select all functions, or (equivalently) specify "*" IN "*" in the configuration file. (See Q. 4.9). You can either reduce the number of functions you're probing, or increase the default probe stack size with "aprobe -q stacksize=20000000" (or some other big number).
  3. You're probing a function that doesn't follow standard generated code conventions. This can happen when you try to probe everything in a shared system library such as libc.so on Linux or libc.a(shr.o) on AIX. See if it reproduces when you only probe known entry points in the system library, or limit your probes to your application module only.
  4. There's a bug in Aprobe. Contact Contact our sales department for more information ( ).

13.12 AIX: Aprobe version 3.2 had the -s1 option to prevent conflicts with my application's shared memory. Is there a similar feature in version 4.2?

We hoped that by getting rid of shmat() from our code that we would no longer cause conflicts. Unfortunately we didn't realize that the OS would choose memory map addresses that would conflict, so the problem immediately reappeared. We added a different flag to allow you to specify the memory area that should be used: -q mmap=address where address is the address that should be passed to mmap() when Aprobe requests its shared memory. For example:
aprobe -q mmap=0xd0000000 -u myprobes myapp.exe

If you don't have this flag, you'll need an updated version of Aprobe but you might be able to get around it:

Many users find that they can avoid shared memory conflicts simply reducing the size of the APD files. The default maximum size is 256M persistent and 256M user APD file. By using a ring (aprobe -n flag) you can vastly reduce the user apd size and you can use the -sp flag to specify a reduced persistent file. For instance, the following: aprobe -sp 16000000 -n 5

will create a persistent file of approx 16M and up to 5 APD files of 2M each.

13.13 Why does Aprobe ask for such a large memory-mapped file on startup, when I've specified only a 4M APD file with "-s"?

The size of the persistent APD file is controlled independently of the size of the APD ring files. You can use the -sp option to lower this significantly. The default is 256Mbytes because we need to set it to the maximum at the beginning. However we've found that 16M is generally sufficient in practice.

If you look and see how big your persistent files grow you can use that at a baseline. The main things that get logged to the persistent file after program start are:

  • New java classes / methods
  • b) New threads
  • Tracebacks recorded with the traceback to ID mechanism
  • LOAD_SHED functions

13.14 On Linux when I run my application under Aprobe it crashes during initialization with a problem in malloc. This doesn't happen without Aprobe. Why?

The application might have a poor implementation of malloc built-in. On Linux an application can provide it's own implementation of malloc, free, etc. and this will be used. Most local versions of malloc are well behaved. Some, however, require initializing by the application before first use. Since Aprobe gets in earlier than the main() this can cause a malloc request to be made ahead of it being initialized.

If you have control over the code you should fix this by making the malloc self-initializing. If you don't then, unfortunately, you will not be able to run the application under Aprobe.

14. Using the "apformat" Command

14.1 What does apformat do?

apformat reads one or more related APD (.apd) files and formats the data they contain. For example, if the command

    aprobe -u a.ual a.exe

produced the files

    a.apd a-1.apd

Then the command

    apformat a.apd
  1. Reads a.apd to find out the executable (a.out) and UAL(s) (a.ual or a.dll) that were used by aprobe to generate the file, and what other APD files were generated (a-1.apd).
  2. Reads the data records contained in a-1.apd, and for each one, invokes the associated format routine contained in the UAL file, passing the data in the record as parameters to the format routine.

14.2 Which of the ".apd" files do I specify on the command-line?

If you specify the "base" one, without any number at the end (e.g., a.apd), all of the files that were written to during the most recent invocation will be formatted. If you specify an individual data file, such as "a-2.apd", only the data in that specific APD file will be formatted.

14.3 Can I restrict the apformat output to just that generated by one of the several UALs provided at aprobe time?

Yes. Use the "-z" option to indicate that no UALs are to be loaded implicitly, then use "-u" to explicitly state which one you want to use:

  apformat -z -u first myprog.apd

14.4 Can I restrict the apformat output to just that generated by one or two of my format routines?

Yes. If you provided your own format routines, you can do it by editing those routines and re-generating the UAL of the same name as the original .

Lets say you have "dumpall.apc", from which you generated "dumpall.ual". Copy "dumpall.apc" to "dumpall.apc.save". Then edit "dumpall.apc" and comment out the bodies of all the format routines except for the one(s) you want to keep. Use `apc' to compile "dumpall.apc" into "dumpall.ual", e.g., apc dumpall.apc -x myprog then do:

  apformat -z -u dumpall myprog.apd

The UAL name must be preserved because the basename of each UAL is part of the "key" used to map formats to data in the APD.

14.5 Can I programmatically filter which formats are used?

Yes, and this is actually preferable:

  1. Define global flags corresponding to the different kinds of filtering you want, initializing all to (say) "true".
  2. Code your format routines such that each has "if (FormatFlag1 || FormatFlag2) { ... }" guarding the execution of the print actions in the format routine.
  3. In the "on_entry" part of a "probe format", read the command-line arguments to the UAL (ap_UalArgc, ap_UalArgv), or an environment variable, or file, or whatever, to determine the desired settings of the flags.

14.6 Can I do the previous 2 if I'm using automatically generated formats?

No. The formats are generated automatically and there's no way to put your own conditions within them. (Of course you can put conditions around the log statement at run time, so that no data is recorded to begin with, but this is a different issue.)

14.7 When do I need to specify the UAL file to apformat?

When you want to use UALs different from, or in addition to, the ones that were specified when you ran aprobe. You might want to do this in order to only process part of the data, or use different format routines. Use apformat -z if you want to use only those UALs explicitly specified on the apformat command line.

14.8 Can I use "apformat" without an APD file?

No. There must be a valid APD file generated by aprobe.

14.9 Aprobe works fine, but I get a crash from apformat; why?

This is almost certainly because there's a bug in one of your format routines. See Debugging Your Probes near the end of Chapter 3 of the Aprobe User's Guide.

However, if you didn't write any of your own format routines, either because you're using a predefined probe, or because you just used "log(something);", then this is probably OC Systems' fault and you should contact Aprobe support.

14.10 Can can I use ap_UalArgv in "probe format ... on_entry" to get arguments passed at run-time (aprobe time)?

No. ap_UalArgv at apformat time is for reading arguments passed to the UAL on the apformat command line, as in:

   apformat -u my_probe -p "param1 param2" t.apd

You would have to log the data you need from run-time yourself, and format it later. This can be done by including the following APC file into your APC file prior to the "probe format" or other format routine in which you want to use the arguments. You can then use the variables ap_RuntimeUalArgc and ap_RuntimeUalArgv just as you would use ap_UalArgc/v at run time.

 /* logualargs.apc
  * Include this once per UAL to record runtime arguments for format time use.
  */
 
 #ifndef _LOGUALARGS_APC_
 #define _LOGUALARGS_APC_
 
 static int ap_RuntimeUalArgc = 0;
 static ap_NameT *ap_RuntimeUalArgv = NULL;
 static void ap_RuntimeUalArgStart(ap_Uint32 *argc)
 {
    ap_SizeT size = ((*argc) 1) * sizeof(ap_NameT);
    ap_RuntimeUalArgc = *argc;
    ap_RuntimeUalArgv = (ap_NameT*)(ap_Malloc(size));
    memset(ap_RuntimeUalArgv, 0, size);
 }
 
 static void ap_RuntimeUalArgAdd(int *pos, ap_NameT Arg)
 {
    ap_RuntimeUalArgv[*pos] = ap_StrDup(Arg);
 }
 
 probe program
 {
    on_entry
    {
        int i;
        log (ap_UalArgc)
           with ap_RuntimeUalArgStart to ap_PersistentLogMethod;
        for (i = 0; i < ap_UalArgc; i  )
        {
           log(i, ap_StringValue(ap_UalArgv[i]))
            with ap_RuntimeUalArgAdd to ap_PersistentLogMethod;
        }
    }
 }
 #endif
 

For example:

 #include "logualargs.apc"
 
 probe thread
 {
 }
 
 probe format
 {
    on_entry
    {
        int i;
        // Run-time arguments to this UAL
        printf("ap_RuntimeUalArgc = %d\n", ap_RuntimeUalArgc);
        for (i = 0; i < ap_RuntimeUalArgc; i  )
        {
           printf("ap_RuntimeUalArgv[%d] = \"%s\"\n", i, ap_RuntimeUalArgv[i]);
        }
        // Format-time arguments to this UAL
       for (i = 0; i < ap_UalArgc; i  )
        {
           printf("ap_UalArgv[%d] = \"%s\"\n", i, ap_UalArgv[i]);
        }
    }
 }
 

15. Using Predefined Probes

15.1 What is a predefined probe?

This is just a UAL containing probes written by OC Systems for a specific purpose. They are generally more complex than ones you would write yourself, and are designed to work on any program that can be probed. Most of these probes include a Java GUI to simplify parameterization of the probe for your specific program, such as specifying the functions to be probed.

All predefined probes are in $APROBE/ual_lib/*.ual; the source code is $APROBE/probes/*.apc. The documentation for these probes is in Appendix D of the User's Guide.

15.2 Do I have to use "apc" to build these probes myself?

No! The UALs for all of the predefined probes are already built and located in $APROBE/ual_lib. This is in the UAL search path, so the simple name of the UAL is sufficient. For example:

    aprobe -u info myprog.exe

15.3 The examples show invocation of predefined probes using aprobe -u info myprog.exe. How does aprobe find these UALs when they're not in the current directory?

The directory $APROBE/ual_lib is always searched for UALs after the working directory. The environment variable APROBE_LIBPATH may also be defined to add additional directories.

15.4 Can I use Coverage without using the Java configuration GUI?

Yes. In fact, that's the default. There is no GUI for `info'. The coverage, profile and trace probes provide a GUI to assist in building or modifying configuration file which defines what should be done, but this file is just a text file that can be edited by hand.

The `memwatch' predefined probe provides a "runtime" GUI to monitor memory usage as the program is running, and to take interactive snapshots of the allocation data.

See the documentation for each probe in Appendix D of the Aprobe User's Guide.

15.5 The trace probe really slows down the program--how can I speed it up?

You should see the Aprobe User's Guide documentation about this probe. However, you can try these things in this order:

  1. Use Load Shedding by specifying "LoadShedThreshold 10" in your configuration file.
  2. Don't use wildcards like "Trace *", but rather use apcgen -L to list specific functions you want to trace and just name those.
  3. Use the TRIGGER configuration parameter to specify a specific call-tree you want to trace.
  4. Use the circular-buffer mechanism, by specifying SaveTraceDataTo CIRCULAR_BUFFER in the configuration file, rather than logging data in real time. Note that your program must complete in a well-behaved way in order to get a snapshot of the data logged to the circular buffer.

15.6 How can I get a snapshot of my predefined probe data before my program dumps core?

The ability to take a snapshot when an unexpected signal occurs is provided by combining the predefined probe of your choice with the "sigsegv" probe:


   // my_coverage.apc
   #include "sigsegv.h"
   #include "coverage.h"
   static void MyHandler(int sig, void *Data)
   {
      ap_Coverage_DoSnapshot("Snapshot on signal.");
   }
   probe program
   {
      on_entry
      {
         ap_Sigsegv_AddCallback(MyHandler, NULL);
      }
   }

Then you link this with the existing predefined probes:

  $ apc my_coverage.apc coverage.ual sigsegv.ual # creates my_coverage.ual

15.7 Is there a way to invoke predefined probe operations from within my probes?

An API for each predefined probe is defined by the ".h" file corresponding to it in $APROBE/probes. For example, "profile.h" defines "ap_Profile_DoSnapshotForAll()". To call this, you would #include "profile.h" in your APC file (it's in $APROBE/include as well, which is always searched for include files). Then when you compile your apc file, specify the UAL as if it were just another object file to link with:

    apc myprofile.apc profile.ual

This will produce myprofile.ual.

15.8 How can my probes use the Java GUI facilities that the predefined probes use?

There are two interfaces to the Java GUI objects used by the predefined probes. The one to start with is defined in $APROBE/include/quick_gui.h and implemented in quick_gui.ual. This supports simple graphs, and interactive message, Yes/No, and confirmation dialogs. An example of using this is given in the example $APROBE/examples/learn/visualize_data/.

The full GUI interface used by the predefined probes like profile.ual is apGUI.h, but this is only for fearless experts.

15.9 I'd like to customize a predefined probe -- how do I rebuild it?

This is a bit ugly because the Makefile for building probes relies on adjacent directories, so you have to rebuild in place (after saving the original) or copy and/or soft-link them locally:

mkdir my_aprobe ; cd my_aprobe
 cd my_aprobe
 ln -s $APROBE/include $APROBE/lib $APROBE/bin .
 mkdir ual_lib
 mkdir probes
 cd probes
 cp $APROBE/probes/memwatch.apc . # if you wanted to edit memwatch
 ln -s $APROBE/probes/* .         # to get everything else
 chmod  w memwatch.apc
 # edit memwatch.apc (or whatever) as desired
 cd ../ual_lib
 make -f $APROBE/ual_lib/Makefile memwatch.ual # or whatever
 

If you have problems or questions, contact OCS Support.

15.10 How do I use the coverage probe with multiple test cases?

The `atcmerge' tool merges formatted results from different runs on the same or different executables. You can use the aprobe "-d" option to create different APD filesets and corresponding ".tc" files for each run, and use the "atcmerge" tool to merge these. See Aprobe\Examples\Advanced\Test_Coverage for an example.

15.11 Where did the "heap" probe go?

heap.ual has been superseded by memwatch.ual. This is a simpler, more robust probe that provides information about allocation patterns, but does not save all the additional data necessary to do error checking. Contact OC Systems if you need a probe with this allocation-checking functionality.

Other memory analysis probes provided are:

  • memstat.ual - statistical memory tracking ling for long-running programs
  • java_memstat.ual - memstat for Java applications
  • memleak.ual - a light-weight heuristic allocation tracker
  • memcheck.ual - a memory-corruption checker

15.12 How do I use this "events" probe everyone's talking about?

With RootCause 2.0.5 (Aprobe 4.2.5) there's an example under examples/predefined_probes/events, and documentation in Appendix D of the Aprobe User's Guide. Here's a quick summary we sent to a user:

You must have an app_name .events.cfg file, otherwise events does nothing. Let's take a simple case with the routines one() and two() which both call routine three() which, in turn, calls routine four():

   main()
      one()
         three()
            four()
      two()
         three()
            four()

The simplest configuration file is:

EVENT FUNCTION one()
EVENT FUNCTION two()
EVENT FUNCTION three()
EVENT FUNCTION four()

To just look at the calls nested under one() you would add:

FOCUS one()

If you wanted to restrict this at runtime:

FOCUS RUNTIME one()

Let's say that the processing for one() becomes more complex and you want to do end in another routine. This would do the trick:

EVENT START MyEvent one() ON ENTRY
EVENT START MyEvent another() ON_ENTRY

FOCUS MyEvent
FOCUS RUNTIME MyEvent

15.13 In the `profile' probe, what do "Calls to Self/Child" columns mean?

Assume we have a program foo with two functions outer() and inner() . outer loops and calls inner which does some work. We setup the foo.profile.cfg file to profile both of them.

If we look at the output for routine outer we would expect to see Calls to Self being one - it's just called once. Calls to Child should be something like 10 or however many times inner is called.

Similarly the two tables show individual and cumulative time. The individual time for outer would be much lower than the cumulative time since the individual time has all of the recorded times for inner subtracted from it.

Finally, note that this only applies to routines profiled. If outer also calls routine another() which is not profiled, another 's call counts do not show and its time is recorded as part of outer 's individual time.

15.14 Why don't memstat, memwatch, heap probes work on my application?

The most likely reason is that your application doesn't use the default system allocation routines. These might be actual replacements for malloc(), etc. in your own application or in another library such as libsafe or libefence.

Sometimes if you explicitly replace malloc() it can break RootCause/Aprobe completely: see [#q13.14 Q13.14].

If Aprobe mostly works except for memory probes, then you can override the default routines used by memwatch by registering for your own allocation routines, or by changing the probe itself. This will require writing or editing some apc code, depending on your exact situation. for further assistance.

15.15 Can you please explain the fields "Alloc Count" and "Free Count" in the memstat "Outstanding Allocation" report?

A specific allocation point (see below) might be reached just once (usually at initialization) and will have an Alloc Count of 1. It may or may not ever free that so the Free Count will be 0 or 1. But many (most!) applications have allocation points that give rise to more than one allocation. For instance:


for (i = 0; i < 10; i  )
{
   linkedList.add (new MyObject (i));
}

Obviously each instance of MyObject was created from the same allocation point. Most growth happens this way - in fact we don't count any allocations we only see once as growth.

What is an allocation point? For native code it's the unique traceback up to the current maximum depth, something like:


  Line 10 of a()
  called from Line 15 of b()
  called from Line 32 of c()

For Java each allocation point is a combination of a traceback and the object type allocated there.

15.16 Can I use memstat to track all allocations and frees?

The default setting of the memstat probes is to pinpoint leaks in a longer-running program. However, you can change the options. From the main RC window select the memstat probe in the UAL list, right click and choose Edit UAL. From the Runtime tab change the Sampling Ratio to 1 so you see every allocation.

From the Format tab check the Display Freed Allocations box. You might also find the Display Zero Growth Allocations useful. Next run, you'll start seeing those freed allocations.

Click the OK button and then the Build button. Re-format (either through the Index or Examine button) and the reports should have the information you need.

15.17 Is there a way to only report allocations in a certain module based on the stack traceback entries?

This mechanism wasn't available in memstat until version 2.1.4b (June 2005), (only in memwatch, which is more focused on individual allocations). For earlier versions, you could edit and build your own custom version of "combined_memstat.apc" that has filtering: see filtermemory.apc.

Version 2.1.4b also introduced EXCLUDE filters in memstat and memwatch, which eliminate the named stack traces and show all others. See $APROBE/probes/[java]_memstat.cfg or $APROBE/probes/memwatch.cfg for usage information.

15.18 Is there a predefined probe for detecting memory corruption?

Yes. The "memcheck" probe, introduced in version 2.1.3 (February 2004) uses a "fence" mechanism to detect corruption of allocated (but not stack/local) memory. It also reports double deallocations.

15.19 Is there a predefined probe for tracking down lock contentions?

We have done some work in this area for customers, but we have not productized it, because the platform- and problem-specifics are not easily generalized. If you want some unsupported probes to start from please contact us.

15.20 What options in the trace.cfg file are obsolete, and why?

Many changes have occurred as the Trace predefined probe has been adapted to support RootCause users. A number of the options have been deprecated, and others apply only when used directly outside of RootCause.

The following options have been deprecated.

MaxDepthOfTracedCalls, DefaultLevels
These were synonyms. It is no longer possible to specify a maximum depth at which tracing is disabled.
LogTimes
times are always logged.
LogLines
lines are logged if and only if specified on each TRACE line with LINES.
TracingEnabledInitially
Tracing is enabled initially if and only if no TRIGGER lines appear. if there are one or more TRIGGER lines, then tracing is only enabled when executing the functions specified by the TRIGGER(s).
CallCountOptions. ExactCallCounts
call counts are now done at format-time by the RC trace display rather than by the trace probe, so these options don't apply.
IndexSymbols
Symbols cannot be indexed.
MaxIndentLevelsBeforeWrap, IndentColumns, AlwaysShowNumericNestingLevel
these used to control formatting but now custom formatting is done by providing alternative formatting routines. The ones provided for the RootCause Trace Display are in $APROBE/probes/rc_formats.ual.

15.21 Why does the memstat summary file say it can't do the analysis because I only have one sample?

Some possibilities are:

  • You didn't run for long enough. A couple of minutes isn't long enough if you want to run the statistical sampling.
  • You didn't format all of the available data. You can do this by selecting all of the apd files for the ring instead of the default (which is just the last one).
  • You don't have a real problem. This is more common than you think: People often see instability that they think are memory leakage issues that aren't.
For more information see the Memory Probes page.

15.22 How do I force a snapshot from a predefined probe?

The coverage, memcheck, memwatch, profile, and statprof probes record data in memory and dump it only at normal program termination, or when explicitly requested with a programmatic snapshot. A snapshot can be forced without terminating the program by calling the entry point provided by the probe:

  • coverage - ap_Coverage_DoSnapshot( "comment" );
  • memcheck - ap_Memcheck_DoCheckpoint( "comment" );
  • memwatch - ap_Memwatch_DoSnapshot( "comment" );
  • profile - ap_Profile_DoSnapshotForAll( "comment", 1 );
  • statprof - ap_Statprof_Snapshot( "comment" );
The second parameter to ap_Profile_DoSnapshotForAll() is 1 (TRUE) if it will be the final snapshot, and 0 (FALSE) if it will be called again via a snapshot or normal program completion.

There are three ways these can be called:

Use demand.ual

Aprobe version 4.4.4a (March, 2013) introduced demand.ual, which along with its header file demand.h and supporting command-line tool apdemand provide a framework for "demanding" action from another probe from the command-line at nun-time, independent of what the probed program might be doing. This is an advanced feature, but can be powerful in the right circumstances. To learn more, copy the example directory $APROBE/examples/predefined_probes/demand to a working area, and start with the README file there. It shows how you can use demand.ual in conjunction with profile.ual to take a performance snapshot "on demand". The "RemoteControl" file in that same directory provides more detail in how to use this to control your own probes or your application. Contact support@ocsystems.com if you have questions.

Use 'call' from dbx or gdb

A very convenient way is to attach with dbx or gdb and use the "call" operation. For example if ps says that the PID of application appdriver is 12345, then you can do:


   $  dbx -a 12345
   (dbx) call ap_Statprof_Snapshot( "dbx" );
   (dbx) detach
   $  apformat appdriver.apd

Even when using detach it's possible that the program will terminate at this point so you shouldn't use this if it's important that the program to continue.

Call the Snapshot function from a custom probe

An alternative is to link your own custom version of the predefined probe with a probe which takes a snapshot at a certain point in the program, for example:


  // my_profile.apc
  #include "profile.h"
  probe thread {
    probe "abnormal_end_signal_was_handled" {
      on_entry ap_Profile_DoSnapshotForAll( "probe snap",  FALSE );
    }
  }

Then you link this with the existing predefined probe:

  $ apc my_profile.apc profile.ual # creates my_profile.ual

Note that the name abnormal_end_signal_was_handled is only a suggestion, not a name in the Aprobe runtime. An application programmer may offer another name which is called when the application averts an abnormal end. If not, an application programmer may need to help by creating and calling this dummy function at the right time for the snapshot probe, which is when the application averts an abnormal end. Part of the challenge is finding programmers who know that much about the application.

A special case of this is to take a snapshot when an unexpected signal occurs: see [#q15.6 Q15.6].

15.23 Could you explain the memstat summary's "Leaked Memory" and "Total Leakage" values?

The statistical part works like this. Say you have a setting (Sampling rate) of one in thirty. Every 30th allocation we record it in a table. Every free gets looked up in that table. If it is in there it is recorded, if it isn't it is ignored. So the sampling is only on the allocations, not the frees.

In the table, the totals (including leaked memory) and counts are multiplied by the sampling rate. If you have enough samples, this will be entirely valid.

We record what you pass to the O/S, not necessarily what the O/S actually allocates. This could under-estimate the amount of memory in certain cases. (e.g. if the memory manager always allocates in quad-word steps it would allocate 16 bytes when you requested 4).

The statistics that identify certain allocation points as "Growth" are based on least squares linear regression analysis.

15.24 How can I define a memstat (or memwatch) filter matching any number of call levels?

That is, is there a way to do something like the following?


   FILTER      extern:"malloc_y_heap()" in "libc.a(shr.o)"
        ==> **** any number of levels matching anything ****
        ==> "ap_demangle.c":"Demangle_Xlc_Symbol_Name()" at line 2103 (ap_demangle.c)

No, the best you can do is enumerate all the possible matches from your test cases. Wildcards of one or more levels may be implemented in the future.

15.25 Is there a predefined probe to check for stack corruption?

Not really. Long ago we wrote stackcheck.apc for a customer. This version is just for Windows, which is no longer supported, but it might give you an idea of how you could write one for yourself. It checks that the return address is not corrupted on_entry and on_exit to all instrumented functions. instrumentation is hard-coded in the probe for now. A configuration file or separate cconfiguration probe could be added to handle specifying the instrumentation points.

16. Using the "apc" Command

16.1 What does apc do?

The apc command translates one or more APC files into C, and then uses a native C compiler to compile these into object code, and link them with other files specified on the command-line to form a shared library called a UAL. A UAL has a suffix of .ual

16.2 How do I indicate what C compiler and options apc should use?

The compiler is defined in the file $APROBE/lib/compiler_profiles and by the APROBE_CC_COMMAND environment variable. This is described in the Files Reference (Appendix B) of the Aprobe User's Guide.

Options to the compiler can also be specified on the aprobe command line by including them in quotes after the "-compiler" option, for example,

    apc foo.apc -compiler "-v"

16.3 Do I need to specify an object file or executable to apc?

You need to specify "-x object module" if you use a construct in your APC that cannot be resolved without specific symbol table or debug information from the program. Such constructs are:

  1. target expressions: names from the probed program preceded by `$', or "$*" ($1, $2 are ok, as are hardware-register references starting with '$$'), and
  2. references to specific source lines.

In general, probes that you compose to gather information about specific parts of your program will contain one of the above, and you'll want to include the executable or an object file.

For probes on shared libraries which don't contain any debug information, or for probes that should apply to any program (like the predefine probes included with Aprobe), you generally will not provide an object module.

16.4 How do I specify other object files to link into my UAL?

Just include them on the apc command-line. Linker options are specified in quotes after the "-linker" flag, for example,

    apc foo.apc -linker "-lX11"

16.5 apc says my function name's not known--why not?

There are a number of possibilities. If you specified "-x ... " on the apc command line, then it means it couldn't find the named function in that file's symbol table. Since apc works pretty hard to match incomplete function names, the name is probably wrong in case or spelling, or, if you provided a parameter profile, it's probably not exactly what the C compiler encoded as the name for the function.

You could try using apcgen to generate a probe template for all the functions in the source file (or object file, if it's a template instance) containing the function you want, or the tool apinfo or apsymbols to dump out all the function names in the whole program.

16.6 How do I generate debug information for my APC files so line and function information show up in tracebacks?

As with C, use the -g flag; this passes the appropriate debug options to the C compiler, and saves the generated C source file.

16.7 Can I specify an environment variable for the compiler path in the compiler_profiles file?

Yes! If "ls -l ${CC_PATH}/bin/gcc" on the command-line shows that the compiler exists, then a stanza like:
CC_COMMAND ${CC_PATH}/bin/gcc
will work.

Also note that the environment variable APROBE_COMPILER_PROFILES can be used to override the default of $APROBE/lib/compiler_profiles and point to your own variant of this file. See compiler_profiles file in the user's guide.

16.8 How do I compile a probe for a 32-bit app when running 64-bit Linux?

If you build your application with the compilation option -m32, then to build your probe you'll need to pass -m32 to apc's backend compiler, plus define the i386 macro to the preprocessor. For example:
   apc -Di396 -compiler -m32 -linker -melf_i386 foo.apc
The link stage just invokes ld directly which should automatically build a 32-bit shared library from a 32-bit object file.

If you're going to be doing this regularly you should edit $APROBE/lib/compiler_profiles to update the CFLAGS and PREPROCESS lines so these options are applied automatically.

17. Writing Probes in APC

This section contains questions and answers about writing in probes in APC for native (C, C , Ada) programs.

17.1 How do I use "apcgen" to generate a probe automatically?

You need an object file or executable that contains debug information, i.e., was compiled with debug (see [#q12.19 Q12.19] ) or a C header file. For example:

   apcgen foo.exe > foo.apc
  apc foo.apc -x foo.exe

generates foo.apc, an APC file probing all the user-defined functions in foo.exe that have debug information, then compiles that into a UAL.

  apcgen -qparams -p sin -o math_sin.apc /usr/include/math.h
  apc math_sin.apc -x /usr/include/math.h

generates and compiles math_sin.apc containing a probe on the sin() function which logs the parameter and return value. Use apcgen -h to see what options are available to control the output.

Note that RootCause provides this functionality in a point-and-click GUI.

17.2 How do I write a "probe"?

One way is to start with a file generated by "apcgen" (see previous Q.). Or you compose one in your favorite text editor. It's pretty much like writing C, but there's some syntax needed to indicate where and when your probe should be executed. Here's a very simple one:

  probe thread
  {
    probe "main"
    {
      on_entry
      {
        printf("Entering main.\n");
      }
    }
  }

If you put this in the file "foo.apc", then you would compile it:

   apc foo.apc 

which produces "foo.ual", which you can then probe your program with:

   aprobe -u foo foo.exe
 

17.3 What is the difference between APC and straight C?

There are several differences:

1) There is special syntax to indicate where and when the probe should be executed, such as "probe", "on_entry", "on_exit", "on_line", etc.

2) There is a special keyword called "log" for recording data at run time and defining the format with which it should be displayed afterward.

3) There are special data references, called "target expressions" which start with `$' and refer to values in the probed program.

All of these are expanded or converted to ANSI C by the apc compiler.

In addition, there is an implicit " #include "aprobe.h ", which makes available the extensive Aprobe API defined in APROBE/include/aprobe.h and documented in Appendix C of the Aprobe User's Guide.

17.4 Why do I need a "probe thread"?

This is an artifact of the clever Aprobe scoping rules. When one probe is nested within another (that is, defined in the declarative part of an enclosing probe), it not only gives visibility to the enclosing probe's data as you would expect, it also means that the inner probe is "active" (its actions may be executed) only if the outer probe is active.

Since every function is executed within some thread of execution, if a function probe weren't inside a thread probe it would never be active.

Anyway, just put in the probe thread{ .. }. It's what works.

17.5 What's the difference between "probe thread" and "probe program"?

The on_entry actions of a "probe program" occur once each, before calling main() (or WinMain() , etc.) and after returning from main() , respectively. The corresponding actions of a "probe thread" occur at the creation and destruction of each separate thread.

Data defined in the declarative part of a "probe thread" is global to all probes, but is unique for each thread. There is always at least one, the "main" thread, which is conceptually nested immediately within the probe program.

17.6 When exactly are the "on_entry" and "on_exit" parts of a function probe executed?

The on_entry actions are executed before the first instruction of the function itself. In particular, the function's local stack frame hasn't been created yet.

The on_entry actions are executed at the very first instruction pointed to by the function's linker symbol, before any compiler-generated saves of parameters or other values.

The on_exit actions are executed after the stack frame has been discarded, so local data is not available. The next (target program) instruction executed will be the one following the call to the probed function.

17.7 Why can't I dump some parameters in the on_exit part?

Parameters passed by value are essentially local data. They are stored on the stack and the stack frame has been discarded by the time the on_exit part is executed.

If you want to be able to access the input parameters you can save them in the on_entry part, for example:

probe thread
{
  probe "foo"
  {
    int parm1;
     on_entry
    {
      parm1 = $1;
    }

    on_exit
    {
      if (parm1 == 1)
      {
         ...
      }
    }
  }
}

C reference parameters, and composite parameters passed by reference to Ada, are available by-name on_exit because `apc' implicitly generates code in an on_entry section to save the address passed in. GNAT Ada OUT and IN OUT parameters can be displayed because these are implemented as fields of a 'struct' returned by the function.

17.8 Why is my local variable "unknown" in on_entry and on_exit parts?

The on_entry and on_exit parts are conceptually outside the scope of the function, so the local data is not visible. Local data is visible only within an "on_line" action.

17.9 Is there a way to probe "the first line" or "the last line" in my function?

Yes. Simply write on_line(first) or on_line(last) . You can use this to do function-relative line numbers as well, such as on_line(first 5) .

17.10 How do I specify which of several overloaded functions I want to probe?

In C , you must specify the exact parameter profile encoded in symbol table by the C compiler. The best way to get this is either to look at the output of "apcgen -vL" applied to the object file generated by the compiler, or use ` apinfo -sa myprog ' to list probe names of the functions symbols in your application.

17.11 How do I reference a hardware register?

A hardware register is referenced within a user action (e.g., on_entry) by preceding the name commonly used for the register by "$$". The exact register names are documented in Appendix B, "Files Reference", under "APC File".

Note that the value you get for the register is the value it had at the point the target program called the probed routine.

17.12 How do I query the parameters to a function?

If the function is compiled with debug (see [#q12.19 Q12.19] ) you can reference a parameter by name ($param) and reference all parameters with "$*.

Whether or not a function is compiled with debug,or there's an object module available, you can reference the first parameter with "$1", the second with "$2", etc., up to $8.

Note, however, that if there is no debug information provided, you must cast the "$1" to its proper type.

17.13 Can I use automatic formatting if I don't have an executable with debug information?

Yes, but you must (a) include the definition of each logged item's type in the APC file (if it's not a predefined type), and (b) cast each item to that type. This is how one can log parameters to system routines, for example:

#include <stdio.h> // includes the struct FILE
 probe thread
 {
   probe "fopen"
   {
     /* fopen returns *FILE, defined in stdio.h */
     on_exit
       log("fopen() returns ", (FILE *)$return, " = ", *(FILE*)$return);
   }
   probe "fclose"
   {
     /* first parameter to fclose is *FILE */
     on_entry
       log("fclose() called with ", (FILE *)$1, " = ",
         *(FILE*)$1 );
   }
 }

17.14 How do I change the return value from a function?

     on_exit { $return = desired_value; }

17.15 How do I log the value of a string parameter?

ap_StringValue is a macro which logs everything from the address provided up to the first null character:

     on_entry { log("NameParam = ", ap_StringValue($NameParam)); } 

Note : this only applies to null-terminated (C, C ) strings. It does not apply to the Ada predefined string type -- see [#q17.25 Q17.25] .

17.16 How do I log the contents of an array?

You must specify the bounds of the array in the log statement:

     on_entry { log("Items = ", $Items[0 ..9]); }

If the array bounds are dynamic (as most are), you can compute them first

      on_entry
    {
      int last;
      for (last = 0; $Items[last] != 0; last  );
      log ("Items = ", $Items[0 .. last-1]);
    }

17.17 How do I "stub out" the probed function so it does nothing?

Use the "ap_StubRoutine" macro in the on_entry part of a function, and be sure to return something sensible if necessary in the on_exit part, e.g.,

   probe "foo" {
    on_entry ap_StubRoutine;
    on_exit  $return = 0;
  }

Note that you can't assign the return value in the on_entry part, since the return register is reset as part of the stub implementation.

17.18 How do I query the data in a class from when probing a member function?

All data in a class is defined as a field of the local variable "this", so to get at the class data item "NCalls" you would do:

  log("$this->NCalls");

17.19 How do I query a global (or static) variable when there's a local one of the same name?

To specify you want a data item other than that visible by default, add an expression context string, to the target expression:

  log("static NItems = ", $(NItems, "-file items.c"));

To get the global one, if any:

  log("global NItems = ", $(NItems, "-module foo.exe"));

17.20 Can I reference a static variable that wouldn't normally be visible to my probed function?

Yes. See the previous Q. You can reference a static item by name in any file:

        log("static NItems = ", $(NItems, "-file items.c"));

even if the probed function this appears in is not in file "items.c".

17.21 Can I call a function in my program from within a probe?

If your program is compiled with debugging enabled, you can precede its name with a `$'. This is often useful for using a probe to call debugging-support routines, e.g.,

probe thread
{
  probe "ReadSymbolTable"
  {
    on_exit
      $DumpSymbolTable($0);
  }
}

In the absence of debug information, you can get the symbol address from Aprobe and cast that to the correct type.

Calling C methods is more complex (they require a "this" pointer, and the naming can be tricky)--see [#q17.64 Q17.64].

 

17.22 Can my APC files reference names in one another like a C program?

Yes, but if they do they must all be compiled in the same "apc" command into a single UAL.

17.23 Can I call a function in another UAL?

Yes. A UAL is just a shared object library (a DLL), so you must do the following:

1) Export the symbol for the function to be called, using the apc "-e" option, when you build the UAL to be referenced, e.g.,

   apc funcdef.apc -e func

2) specify the referenced UAL as an input file on the command-line when you compile the probe that contains the external reference flag when you specify the other UAL as a shared module

  apc main.apc funcdef.ual

17.24 How do I change the return code from my Unix program?

From $APROBE/examples/learn/probe_exit/exit.apc:

probe thread {
  probe "exit" in "libc.so" // "libc.a(shr.o)" on AIX
  {
    on_entry {
      /* return 0 even if an error occurred: */
      $1 = 0;
    }
  }
}

17.25 How do I print or change a GNAT Ada string value in my probe?

An unconstrained string is represented as a record with two components. The first is a pointer to the string (which is not null-terminated) and the second is a pointer to another record which contains the bounds of the string.

The "apc" tool recognizes this special type and displays it appropriately, if debug information is available. Since it's length is known, ap_StringValue is not used. For example:

probe thread {
  probe "hello.qualify_name" {
     on_entry
     {
        // log the input parameter then stub the routine itself
        log("qualify_name called with: ", $1
);
        ap_StubRoutine;
     }
   }
}

In the absence of debug information (e.g., for Ada.Text_IO.Put_Line ), or when you want to assign to an unconstrained string, you can use macros defined in gnatstrings.h. For example:

#include "gnatstrings.h" 
probe thread {
  probe "hello.qualify_name" {
     on_exit
     {
        // return what we want to:
         ap_SetGnatUCString
(
            $return,
            ap_CatenateStrings(
               "/home/ocs/",
               ap_ExtractGnatUCString
($1),
               NULL));
     }
  }
}

17.26 How can I just log some data and format it as hex?

This is an example of an APC file to log a buffer's worth of data and format it as hex.

// Example APC file to demonstrate logging a block of data and
 // formatting it as hex.
 
 // Use this macro to provide a buffer and length of data you wish to log
 // and be formatted as hex. e.g. LogAsHex (MyBuffer, 100);
 #define LogAsHex(B,L)                              \
 log (((ap_Byte *) ((ap_Byte *) B)) [0 .. ((L)-1)], \
      (ap_Uint32) (L),                              \
      (ap_Uint32) (B)) with HexFormat
 
 // Buffer is the actual data, Length the length and StartAddress the
 // address of the data at runtime.
 static void HexFormat (ap_Byte    *Buffer,
                        ap_Uint32  *Length,
                        ap_Uint32  *StartAddress)
 {
    ap_Uint32 PrintAddress;
    ap_Uint32 EndAddress;
 
    // We start printing at the first 16 byte boundary below StartAddress
    // which might be below where we actually need to show characters. So
    // we check if we are in range before printing a character
    PrintAddress = *StartAddress & 0xfffffff0;
    EndAddress = *StartAddress   *Length;
 
    while (PrintAddress < EndAddress)
    {
       int i;
 
       // Print out the hex bytes
       printf ("�x: ", PrintAddress);
       for (i = 0; i < 16; i  )
       {
          // Check we're in range
          if ((PrintAddress   i) < *StartAddress ||
              (PrintAddress   i) >= EndAddress)
          {
             printf ("  ");
          }
          else
          {
             printf ("�x", Buffer [PrintAddress - *StartAddress   i]);
          }
 
          if (i && i % 4 == 0)
          {
             printf ("  ");
          }
       }
 
       // Print out the ascii
       printf ("   ");
       for (i = 0; i < 16; i  )
       {
          // Check it's in range
          if ((PrintAddress   i) < *StartAddress ||
              (PrintAddress   i) >= EndAddress)
          {
             printf (" ");
          }
          else
          {
             ap_Byte c = Buffer [PrintAddress - *StartAddress   i];
 
             // Is this a printable character?
             if (c >= 32 && c <= 127)
             {
                printf ("%c", c);
             }
             else
             {
                printf (".");
             }
          }
       }
 
       printf ("\n");
       PrintAddress  = 16;
    }
 }
 
 // This is an example of using the above log mechanism - the first
 // parameter must be an address (e.g. an array, a pointer, etc.). The 2nd
 // parameter is the number of bytes.
 probe thread
 {
    probe "fred()"
    {
       on_entry LogAsHex ($1, $2);
    }
 }
 

A C file follows to test it with:

void fred (const char *Buffer, int Length)
{
   ;
}

int main (int argc, char *argv)
{
   char Buffer [100];
   int  i;

   for (i = 0; i < 100; i  )
   {
      Buffer [i] = (char) i;
   }
   fred ((const char *) Buffer, 100);
   return 0;
}

17.27 How do I log information about each thread as it starts?

You log the Thread ID using a format routine that prints information about it, since the information, especially the thread entry point, may not be available on_entry to the thread:

void PrintThreadInfo(ap_ThreadIdT *ThreadIdPtr)
{
  printf("Thread %d: ", *ThreadIdPtr);
  ap_PrintSymbol(
     ap_AddressToSymbol(
        ap_ThreadEntryPoint(*ThreadIdPtr)));
}

probe thread
{
  on_entry
  {
     log(ap_ThreadId()) with PrintThreadInfo;
  }
}

Note that the thread entry point symbol will probably be a system function.

17.28 GNAT turns SIGSEGV into CONSTRAINT_ERROR; can I use Aprobe to get a core dump?

Yes. Here's a probe which stubs (disables) the call the GNAT runtime makes to sigaction() to register a signal handler. This allows the default action to occur when the signal occurs.

#include <signal.h>
 
 probe thread
 {
    probe "sigaction()" in "libthread.so"
    {
       ap_BooleanT Stubbed = FALSE;
 
       on_entry
       {
          if ($1 == SIGSEGV)
          {
             printf ("Stubbing sigaction(SIGSEGV)\n");
             Stubbed = TRUE;
             ap_StubRoutine;
          }
       }
       on_exit if (Stubbed) $0 = 0;
    }
 }
 

17.29 How can get I get Aprobe actions to happen when my program dumps core?

First, you should be running with sigsegv.ual: it will provide a traceback and exit actions in these cases. If you want to add additional exit actions, such as a predefined probe snapshot, see [#q15.6 Q15.6], or you can copy and extend $APROBE/probes/sigsegv.apc to build your own probe.

17.30 Is there a way to find out where a signal occurs when it doesn't cause a core dump?

The sigsegv.ual predefined probe will log a traceback for the following signals:

  • 3 - SIGQUIT
  • 4 - SIGILL
  • 8 - SIGFPE
  • 10 - SIGBUS
  • 11 - SIGSEGV
  • 15 - SIGTERM

17.31 How can I reduce the overhead of my probes?

The most obvious way is to use #pragma nofloat in probes that don't use floating point; this eliminates the need to save/restore floating point registers. See also Aprobe Performance Considerations in Chapter 4 of the Aprobe User's Guide.

probe thread
{
  probe "your_routine"
  {
    #pragma nofloat
    // Your probes
  }
}

17.32 Can I use Aprobe on JOVIAL or Fortran programs?

Yes, but there will be no "debug" information found, so you won't be able to use named target expressions (e.g., "$x", "$*") or do on_line probes. Furthermore, no type information is available for parameters, etc., like "$1".

17.33 How can a log a composite object without using debug information?

A. Declare or #include a C type that maps to the structure you want, then cast your target expression to a dereference of a pointer to this C type. For example:

typedef struct
{
   int Field1;
   float Field2;
} MyStruct;

probe thread
{
  probe "foo"
  {
    on_entry
    {
      if (((MyStruct *) $1)->Field1 > 0)
      {
        log(*((MyStruct *) $1));
      }
    }
  }
}

or perhaps a bit cleaner is:

probe thread
{
  probe "foo"
  {
    on_entry
    {
      MyStruct *Param1 = (MyStruct *)$1;
      if (Param1->Field1 > 0)
      {
        log(*Param1);
      }
    }
  }
}

17.34 How can I cast a value to a type name from the program?

"I have part of my program without debug info, but I know the type of a parameter passed in that "no debug" part, and furthermore, I know that the type name is defined in a part that does have debug info. How can I cast an "unknown-type" parameter to the known type name?"

This is similar to the previous question, except instead of defining the type in your APC, refer to the type in your program by its name and file, wrapped in "typeof", within your probe declarative part, as follows:

probe thread
{
  probe "foo"
  {
    typedef ($(MyStruct, "-file debug_part.c")) MyStruct;
    on_entry
    {
      MyStruct *Param1 = (MyStruct *)$1;
      if (Param1->Field1 > 0)
      {
        log(*Param1);
      }
    }
  }
}

17.35 Is there a special editor or editor mode for APC?

No, but it's pretty close to C. The C mode for Emacs, Lemmy, or other editor works pretty well. Contact OC Systems if you think we should put work into this.

17.36 How do I execute a probe only if a certain data condition is met?

In Aprobe version 2, you could do something like:

    probe .outer_routine
    on entry
       if $r3 = 3 then
          probe .inner_routine
            null; -- inner_routine stuff
          end probe;
       end if;
    end probe;

Probes in Aprobe2 were executable but in Aprobe3 they are declarative. You declare a named probe, and make an explicit calls to enable or disable it. For example:

probe thread
{
  probe "outer_routine"
  {
    // Note that this probe has a name "InnerProbe"
    probe "inner_routine"
    {
      ; // Inner routine stuff
    } InnerProbe;

    // Entry to outer_routine
    on_entry
    {
      if ($param1 == 3)
      {
        // We can enable or disable the probe
        ap_EnableProbe (InnerProbe);
      }
      else
      {
        // Disable the inner probe
       ap_DisableProbe (InnerProbe);
      }
    }
  }
}

17.37 How can I interactively modify the parameters to a routine in my application?

The basic approach is simple. In the little C example "t.c" below, main() calls Test() every 5 seconds, passing to it an integer and a float. Subprogram "Test" prints these values out. In t.apc We put a probe onTest, and replace the parameters with values we retrieve from the environment. The trick is how to retrive values from the environment.

One obvious way is to prompt to stdout and read from stdin. This may work for some applications, but not many. A more general approach is to check if a user created a file "Test.cfg" in the directory where the program is run and if so we read the new values of parameters with the help of a call to fscanf(). This approach works pretty well as long as the overhead of `fopen' call on entry to "Test" is acceptable. In cases when it is not one could move this call some place else and store the new values in global APC variables.

Note that this "read-a-file" approach can be used for a wide range of program iteraction. One could simply use the presence of a file as a "switch" to enable or disable certain probes.

t.c 
void Test(float parm1, int parm2)
{
  printf("Test(%f,%d)n", parm1, parm2);
}


main()
{
  while(1)
  {
    Test(0.0, 0);
    sleep(5);
  }
}

t.apc

#include <stdio.h>
 
 #define CONFIG_FILE "Test.cfg"
 
 probe thread
 {
   probe "Test"
   {
     on_entry
     {
       FILE *fd = fopen(CONFIG_FILE, "r");
 
       if (fd != NULL)
       {
          // We have a file with new values
         float Parm1;
         int   Parm2;
         fscanf(fd, "Test(%f,%d)", &Parm1, &Parm2);
 
         // Now update the target parameters with new values
         $parm1 = Parm1;
         $parm2 = Parm2;
         fclose(fd);
         remove(CONFIG_FILE);
       }
     }
   }
 }
 

17.38 I'm trying to stub a function called by my program, but APC can't seem to find it.

The Ada code looks like:

    function Plock (N : in Types.Integer_T) return Types.Integer_T;
    pragma Import (C, Plock, "plock");

Plock is some system call to lock or unlock into memory process, text or data. I get a warning message from apc stating: Function "....plock[1] not found in the modules(s) provided to apc . And also an error message from apc stating: Could not resolve function name: "......plock[1]"

plock() is a system function - it is not defined within your application. The following will work:

probe thread
{
  probe "plock()" in "libc.so"
  {
    on_entry ap_StubRoutine;
    on_exit $0 = 0;   // Or whatever return you want
  }
}

17.39 I only want to probe malloc() if it's called by realloc(). How would I do that?

Here's one way, which also illustrates some other useful idioms.

#define MyCallerFunctionId               
    ap_SymbolToFunction(                 \
       ap_AddressToSymbol(               \
          ap_LocationAddress(            \
             ap_CallerLocation(          \
                ap_CurrentLocation))))
 
 #define NamedFunctionId(SYMBOL,MODULE)  \
     ap_SymbolToFunction (               \
       ap_SymbolNameToId(                \
         ap_ModuleNameToId (MODULE),     \
         SYMBOL,                         \
         ap_NoName,                      \
         ap_FunctionSymbol))
 
 probe program
 {
   int MallocCalls = 0;
   int ReallocCalls = 0;
 
   ap_FunctionIdT ReallocFunctionId = NamedFunctionId("realloc()", "libc.so");
 
   probe thread
   {
     int NestingLevel = 0;
 
     probe "malloc()" in "libc.so"
     {
       #pragma nofloat
       on_entry
       {
         ap_FunctionIdT CallerFunctionId = MyCallerFunctionId;
 
         if (! ap_FunctionIdsEqual(CallerFunctionId, ReallocFunctionId))
         {
            MallocCalls  ;
         }
       }
     }
 
     probe "realloc()" in "libc.so"
     {
       #pragma nofloat
       on_entry
         ReallocCalls  ;
     }
   }
 
   on_exit // from program:
   {
     log("Heap statistics on program exit");
     log("-------------------------------");
     log("Number of calls to "malloc()"  => ", MallocCalls);
     log("Number of calls to "realloc()" => ", ReallocCalls);
   }
 }
 

17.40 I have a GNAT Ada procedure that I'm stubbing out, but want to return a string value. The procedure has a declaration similar to the one below. What's the APC?

   procedure Read_Foo (File : in  File_Type;
                       Item : out String;
                       Size : out Integer);

For routines like this, although the Item is an out parameter, GNAT implements it as if it were an in parameter (but modifiable) since the bounds of the string must already be set. The following probe shows an example of changing this:

static const char *NewString = "Aprobe string";

probe thread
{
  probe "read_package.read_foo"
  {
    on_entry
    {
      sprintf ((char *) $item.P_ARRAY, NewString);
      ap_StubRoutine;
    }
    on_exit
    {
      $return.size = strlen (NewString);
    }
  }
}

17.41 Is there a simple probe that just traces the lines in one routine?

The following gives output similar to:

   MyPackage.MyRoutine line: 120
   MyPackage.MyRoutine line: 122

when formatted:

probe thread
{
  // Replace your name here
  probe "MyPackage.MyRoutine"
  {
    on_line (all)
    {
      log ("MyPackage.MyRoutine line: ",
      ap_StringValue (ap_LineIdToNumber (ap_CurrentLineId)));
    }
  }
}

17.42 How do I reference enumeration literals in APC?

Here is an example:

a.cpp

#include <iostream.h>
 #define VALUE satu
 
 enum TYPE { sund, mond, tues, wedn, thur, frid, satu };
 
 int main (void)
 {
   TYPE bar = satu;
   cout << "Hello Worldn";
 }
 
a.apc

probe thread
{
  probe "main"
  {
    on_line (11)
    {
      if ($bar == $satu)
      {
        log ("Match");
      }
      else
      {
        log ("No Match");
      }
    }
  }
}

If the enumeration literals are defined in a class, you can qualify them. So for:

class a
{
  enum TYPE { sund, mond, tues, wedn, thur, frid, satu };

  private:
    TYPE bar;

  public:
    void seta(){ bar = VALUE; }
};

a test;

You could use

   if ($test.bar == ($("a::satu")))

17.43 Why does including <math.h> in my APC keep it from compiling? (I want to call the "pow()" function in my probe.)

The problem here is that "log" is an Aprobe directive and it is also defined as a function in the mathematical library. So, you need a small workaround to use any function other than 'log' from the mathematical library. Here is an example:

#undef log               /* 1. undefine definition in aprobe.h */
 #include <math.h>        /* 2. process math.h */
 #undef log               /* 3. remove math.h's log define (AIX) */
 #define log aPl          /* 4. restore aprobe's definition */
 
 probe thread
 {
   probe "main"
   {
     on_exit
     {
       log("pow(2,3) = ", pow(2,3));
     }
   }
 }
 

The workaround is to add the preprocessor lines numbered 1 through 4 above.

If you need to use the math.h log function in an APC file, you avoid the workarounds in steps 3 and 4 above, and use 'aPl' instead of Aprobe's log operation everywhere thereafter. That is:

#undef log               /* 1. undefine definition in aprobe.h */
 #include <math.h>        /* 2. process math.h */
 
 probe thread
 {
   probe "main"
   {
     on_exit
     {
       aPl("log(2.0) = ", log(2.0));
     }
   }
 }
 

In either case, when compiling your APC file on Unix, you must pass the linker flags "-lm" as follows:

apc xxx.apc -linker -lm

because compiling any routines from the libm.a library requires the -lm flags.

You can see the macros for the keywords that Aprobe uses (e.g., #define log aPl) at the top of aprobe.h, preceded by #ifdef APROBE_KEYWORDS, which is only defined when the file is being processed by the APC compiler.

17.44 How do I query an environment variable from with a probe?

Call getenv() , as in the following example:

#include <stdlib.h> /* defines getenv() */
 ap_NameT LOG_LEVEL = NULL;
 
 static ap_BooleanT IsSevereLogLevel()
 {
   return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
 }
 
 probe program {
   on_entry
     LOG_LEVEL=getenv("LOG_LEVEL");  /* can set LOG_LEVEL to NULL */
 
   probe thread
   {
     probe "main()"
     {
       on_entry
         if (IsSevereLogLevel()) printf("Severe\n");
     }
   }
 }
 
 

17.45 The above looks like a useful utility. How can I structure my probes so it can be shared?

Here's one way, if your "utility" is pure C and doesn't use aprobe stuff.

  1. Write "loglevel.h", "loglevel.c" in the obvious way, e.g.
loglevel.h

extern ap_BooleanT InitializeLogLevel(void);
extern ap_BooleanT IsSevereLogLevel(void);
 
loglevel.c

#include <stdlib.h>                /* defines getenv() */
 #include <aprobe.h>                /* defines ap_NameT */
 
 static ap_NameT LOG_LEVEL = NULL;
 
 void InitializeLogLevel(void)
 {
     LOG_LEVEL = getenv("LOG_LEVEL");  /* can set LOG_LEVEL to NULL */
 }
 
 ap_BooleanT IsSevereLogLevel(void)
 {
   return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
 }
 
  1. Compile loglevel.c into loglevel.o. If you #include aprobe.h , just put $APROBE/include in your include path:
cc -c -I$APROBE/include loglevel.c

  1. Write the probe:
t.apc

#include "loglevel.h"
 probe program
 {
   on_entry InitializeLogLevel();
 
   probe thread
   {
     probe "main()"
     {
       on_entry
         if (IsSevereLogLevel()) printf("Severe\n");
     }
   }
 }
 
  1. Compile the probe, referencing loglevel.o:
apc -g t.apc loglevel.o

17.46 Can I define functions in one APC file and call them from another APC file?

Yes. See also [#q17.22 Q17.22] and [#q17.23 Q17.23] . This is how our predefined probes are structured. The difference is that you must provide both UALs on the aprobe command-line. One could restructure the above example like so:

  1. Define the header file:
 
loglevel.h

extern ap_BooleanT IsSevereLogLevel(void);

  1. Write the probe:
 
loglevel.apc

#include <stdlib.h>                   /* defines getenv() */
 
 static ap_NameT LOG_LEVEL = NULL;
 
 // the externally callable function:
 ap_BooleanT IsSevereLogLevel(void)
 {
   return LOG_LEVEL && (strcmp(LOG_LEVEL, "severe") == 0);
 }
 
 // initialization of data accessed by the above function:
 probe program
 {
   on_entry
     LOG_LEVEL = getenv("LOG_LEVEL");  /* can set LOG_LEVEL to NULL */
 }
 
  1. Compile the probe into loglevel.ual exporting IsSevereLogLevel :
apc -g loglevel.apc -e IsSevereLogLevel

  1. Write the "client" probe:
t.apc

#include "loglevel.h"
 probe thread
 {
   probe "main()"
   {
     on_entry
       if (IsSevereLogLevel()) printf("Severe\n");
   }
 }
 
  1. Compile the client probe, referencing loglevel.ual.
apc -g t.apc loglevel.ual

  1. When you run an application, you need both t and loglevel:
aprobe -u t -u loglevel my_program

17.47 I am trying to write an aprobe that will call an Ada routine in a package body, but the routine never seems to get called.Why?

Presumably because the probe on that function is not triggered. That's because we disable probes whilst in an entry action. This is pretty easy to understand given an example. Suppose you have the following probe:

probe thread
{
   probe "printf()" in "libc.so"
   {
      on_entry printf ("We're in printf\n");
   }
}

Obviously if Aprobe didn't do anything specific, you would end up in an infinite loop: Your code would call printf() which would call the entry action for printf which would call printf which would call the entry action ... So what we do is disable the probes while you're in an action. That way the call to printf() from your probe wouldn't trigger the probe on printf itself.

In your example you are calling a routine while probes are disabled so the probe on that routine doesn't get triggered. Of course you can manually turn probes on yourself (although it is then your responsibility that you won't allow an infinite loop). The description of this in aprobe.h was improved in version 3.1.7, to the following:

These two routines

     extern void ap_IncrementDisableProbesCount (ap_ThreadContextPtrT);
     extern void ap_DecrementDisableProbesCount (ap_ThreadContextPtrT);

can be used to turn off / on probes for the thread. Normally when a probe is hit, Aprobe disables further probes in the thread for the duration of the action. This is to prevent recursive loops (for instance imagine if a probe on "printf()" called "printf()" and we did nothing about it). Sometimes you may want to temporarily enable probes. For instance, suppose on_entry to routine A you make a call to another routine in your application (say B) which calls routine C. You have a probe on C which you want to happen. You could bracket the call as follows:


on_entry
      {
         // Turn on probes before the call
         ap_DecrementDisableProbesCount (ap_ThreadContextPtr);
         // Make the call
         $B (1, 2, 3);
         // Turn probes back off
         ap_IncrementDisableProbesCount (ap_ThreadContextPtr);

So, your probe becomes:

probe thread
{
    probe "test.adb":"test.x[1]"
    {
      on entry
        ...
      on exit
        ...
        ap_DecrementDisableProbesCount (ap_ThreadContextPtr);
        $("test.y[1]");
        ap_IncrementDisableProbesCount (ap_ThreadContextPtr);
    }
}

17.48 How can I log a string passed to a library function like strdup() where there's no debug information?

In the absence of debug information all parameters would be assumed to be of type 'int' and only positional ($1, $2, etc.) references will be allowed.

If you know the type of such parameter you could cast it to the right type. The strdup() function doesn't have debug information, but you could still compile and use the following apc file:

probe thread
{
   probe extern:"strdup()" in "libc.so"
   {
   on_entry
      log("strdup(", ap_StringValue($1), ")");
   }
}

Note that ap_StringValue is a macro which among other things casts the argument to a string.

For a complete list of subprograms that you can probe in shared libraries do:

aprobe -u info -p -sa <your_executable_here>

It is best not to mix apc code that relies on debug information with the apc code that should compile without it. This way when you compile the apc code that doesn't require debug info you may omit the -x option altogether and you would not have any warnings from the apc compiler.

17.49 Can I use Aprobe to change the command run by a call to system() from my application to run my own little script instead?

Yes: replace the parameter to system() with a path to your script. In this example, the new path fits in the space occupied by the old. Imagine the possibilities...

my_ls.apc

// change these 2 lines to work on a different command:
static char cmd_to_change[] = "/bin/ls";
static char my_script[]     = "/tmp/my_ls ";

probe thread
{
  probe "system()" in "libc.so" // or libc.a(shr.o) for AIX
  {
    ap_NameT new_command = NULL;

    on_entry
    {
      char *command = (char *)$1;

      // for debugging, give some info about where we are:
      log("system() called with ", ap_StringValue($1));
      ap_LogTraceback(99);

      // make sure we only replace the right command
      {
        char *cmdpos = strstr(command, cmd_to_change);
         if (cmdpos == command)
        { // replace it
          char *argstring = command   strlen(cmd_to_change);
          new_command = ap_CatenateStrings(my_script, argstring, NULL);
          $1 = (int)new_command;
          log("*** changed to: ", ap_StringValue($1));
        }
      }
    }

    on_exit
      // indicate the return code for the command:
      log("system() returns ", $0);
      // free our string:
      ap_StrFree(new_command);
  }
}
my_ls script

echo "MY_LS: --->"
ls -ltF
echo "<---- MY_LS"

17.50 Is there a way to catch and suppress exceptions?

We do support suppressing C exceptions on AIX Aprobe only. The syntax is:

probe "fred"
{
  on_exit
      if (ap_ProbeActionReason ==
        ap_CppExceptionPropagated)
          ap_SuppressException;
}

You can catch exceptions in the on_exit section of your probes. To catch exceptions all you have to do is to distinguish between a normal exit from your subprogram and an exception exit from it as both would trigger your probe's on_exit actions. For example, if subprogram "fred()" may leave via exception you could test for this as follows:

  probe thread
  {
    probe "fred()"
    {
      on_exit
        switch(ap_ProbeActionReason)
        {
          case ap_AdaExceptionPropagated:
          case ap_CppExceptionPropagated:
            log("Exception exit from fred()\n");
        }
    }
  }

If you need to, you can find other action reasons defined in aprobe.h.

The example above works well when you know where the exception may be raised, when you don't know you can log all exceptions raised in your program. To do so use the following probe:

probe thread
{
   ap_LogExceptionsInThread;
}

There are also other macros for this: ap_PrintExceptionsInThread , ap_PrintAndLogExceptionsInThread . These are all defined in aprobe.h

17.51 Can I track stack usage with Aprobe?

A probe to track stack usage is available here for [stack_usage_aix.apc AIX]. This should be easily extended for Linux.

17.52 Is there a way to access local variables that doesn't depend on a hard-coded line number?

Yes. Function-relative line numbers are supported using an expression consisting of a constant offset from the special values 'first' and 'last'. For example:

   probe "Outer"
   {
      // Assume that 30 is the relative line number for the next line
      // after the call to Inner
      on_line (first   30)
      {
         $i = 99;
      }
   }

To be sure you're using the right value, you'll have to know the probe-able lines in your function (see [#q17.61 Q17.61]). The offset is then the difference between that line and the probe-able line you want (e.g., if the first line is 12, and you want line 22, then probe on_line (first 10).

Now if the file changes your probe will still work unless you modify Outer (which is obviously less of a concern since that's the one your working with anyway).

17.53 Can I use Aprobe to query a caller's local data that wouldn't be visible by normal visibility rules?

What you might want to do is hold the address of the variable and then change that.

   probe thread
   {
      int *i;

      probe "Outer"
      {
         on_line (first)
         {
            // Store the address of i
            i = &$i;
         }
      }

      probe "Inner"
      {
         on_entry
         {
            // Change the value of i
            *i = 100;
         }
      }
   }

Obviously this is harder for types that aren't straight integers, etc. The typeof expression can be useful here:

   probe thread
   {
      typeof ($("myrecordt", "-file types.ads")) *RecordPtr;
   }

17.54 In APC I can reference some class members as fields of class objects, but others I cannot. Why?

Here are some general limitations and workarounds for accessing class data and methods:

  1. Class static data is not part of the object; it is a global and is referenced using a qualified name, like
          $("Screen::nNumScreens")
  • If you're unsure of the full name of a static data item you can use:
          apinfo -d myprog.exe
  1. A class object is always called $this within a method. However, static class methods do not have a $this argument.
  2. To see what's really in the class object, use "log(*$this);" on_entry to a method.
  3. If you're unsure of the full method name in class "Class", you can use
          apcgen -L <dll-or-exe> | grep "Class::"
  • or
          apinfo -sa myprog.exe | grep "Class::"

Here's a simple example:

////////////////////////////////////////////////////////////
// TestStatic.apc
////////////////////////////////////////////////////////////
probe program
{
  on_entry
    printf ("  p. Static1.exe execution has started\n");
  on_exit
    printf ( "  p. Static1.exe execution has completed\n");

}
probe thread
{
  probe "Screen::Screen"
  {
    on_entry
      printf ("  p. New screen has been constructed!\n");
  }
  probe "Screen::~Screen"
  {
    on_entry
      printf ("  p. A SCREEN HAS BEEN DESTRUCTED!\n");
  }
  probe "Screen::Update(void)"
  {
    on_entry
      printf ("  p. A screen update has started!\n");
      printf ("  p. Within Update, Current nNumScreens =%d\n",$Screen::nNumScreens);
  }
  probe "Screen::GetNumScreens(void)"
  {
    on_entry
      printf ("  p. GetNumScreens has started!\n");
      printf ("  p. Current nNumScreens = %d\n",$Screen::nNumScreens);
  }
  probe "main()"
  {
    on_entry
      printf ("  p. Main() has been started!\n");
  }
}

17.55 How can I enable and disable probes externally while my program runs?

You can do this by periodically checking for the existence of a file. If you find the file enable the probe. You can automatically delete it from your probe if you want a single-action check, or delete it yourself when you want to disable the action again. For example:

static ap_BooleanT MemsetProbeEnabled = FALSE;
 
 probe thread
 {
    probe extern:"memset()"
    {
    // We are not using floating point registers.
    // Use nofloat pragma to avoid saving them and
    // speed things up a little.
    #pragma nofloat
    on_entry
       if (MemsetProbeEnabled)
       {
          // Log parameters, traceback, etc.
       }
    }
 }
 
 #define CONFIG_FILE "/tmp/memset.cfg"
 
 static void PeriodicAction(void *EP)
 {
    FILE *fd = fopen(CONFIG_FILE, "r");
 
    if (fd != NULL)
    {
       // Togle the value of MemsetProbeEnabled
       MemsetProbeEnabled = !MemsetProbeEnabled;
 
       fclose(fd);
       remove(CONFIG_FILE);
    }
 }
 
 probe program
{
on_entry
   ap_DoPeriodically(
      PeriodicAction,
      15, // interval in seconds
      NULL);
}

17.56 AIX: How do I convert my pre-version-3 APC file to current one?

Aprobe version 2, which was delivered with OC Systems' PowerAda and OATS products as well as being sold separately for C and C , was fundamentally different in its processing and expression of APC.

The best way isn't to "convert" at all, but to understand what the probes in your old APC file are trying to do, read the current documentation about Aprobe, and then write a probe to do the same thing in Aprobe version 4. This answer will just enumerate a few of the key differences, and rely on you to look in the user's guide for details:

Version 2 Aprobe was available only for the AIX platform, and used low-level AIX register and symbol names. Aprobe versions 3 and newer support multiple platforms.

In Version 2, "aprobe" actually compiled each APC file at run-time. In Version 4, you use the new `apc' program to compile the APC file(s) into a linkable UAL file, and name the UAL files on the aprobe command line.

Version 4 APC is C with a few extra keywords. Version 2 was an invented language based on Ada syntax. So, for example, instead of case $r3 is ... you'd write switch($$r3) { ...

In Version 4 there's an underscore to make "on entry" one word: "on_entry", "on_exit", "on_line".

In Version 2 you could write

probe .sym1, .sym2, on entry ...."

In Version 4 each probe can name only one symbol, but there is the new concept of a "probe type" or "typedef probe" which may be defined and then applied to many symbols. So you'd do

typedef probe { on_entry ... } CoolProbeT;
CoolProbeT  Sym1Probe("sym1");
CoolProbeT Sym2Probe("sym2");

In Version 2 APC there were only registers ($r3). In Version 4 you can reference parameters by position ($1, $2, etc.); In Version 4 you can reference the return value on_exit as $0, and that's not to mention accessing program variables by their source names...

Because version 2 APC was so low-level, there was another tool "apgen" which read an "apg" file that supported a few operations on source-level variables and generated APC to access them. In Version 4, you can reference a source-level name anywhere, provided that name is available in the debug information of the executable provided to the apc compiler.

In Version 2 a `format' was required for each log statement, and was a special syntax that could be named or unnamed in-line. In version 4 a format routine is just a C routine which can be automatically generated based on the types of the `log' arguments.

Here is a reply given to a customer who asked this question:

> It is my understanding that the new aprobe is more "C like" than "ADA like".
>Beyond that, I could use a little help.

That's true - it is basically ANSI C with some extra keywords. I take it you have gone through the examples in $APROBE/examples/evaluate to get yourself acquainted with the syntax? If not do that first and then come back to your larger problem.

> I wasn't sure if the [Aprobe v2] words format and bytes were aprobe terms.

Yes they are. In v2 you had `format start' and `format finish': These have been made consistent with all other probes on v4 so you would use:

probe format
{
  on_entry
  {
    // Put the equivalent actions to the format start here
  }

  on_exit
  {
    // Put the equivalent actions to the format finish here
  }
}

The bytes operator was a v2 thing. In v4 you would express the code in
terms of C so you would probably use char [] :


  on_entry
   {
    char CmdText [200];
  }

> I wasn't sure about the $function.

This is where v4 is much better than v2. Since you are writing your probes in C, you can just include the header files and call the functions directly. For instance, you wish to call the `creat' function. All you need to do is:

// Include the header files
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
 
 probe format
 {
   on_entry
    {
     int fd;
     fd = creat ("Filename", 0644);
   }
 }
 

and the same for access, system, printf, sprintf, write, etc. You'll find your probes look much better!

> The probe part I did I am pretty sure is wrong.

`probe' is quite different and you have to account for the different names used by the newer compiler and hardware, if any. Here's an example which will be close:

probe thread {
   // I'm guessing on the name here: If you have trouble finding the
   // routine, run `aprobe -u info.ual -p -s <exe name> > syms' and all of
   // the routines will be placed in the syms file.
   probe "Queuing_Services.Read_From_Q[2]"
   {
      // Store the parameters on entry since the registers aren't
      // available on_exit
      int   SrNum = $2;            // Second parameter was $r4 on AIX
      int   Length = $3;           // Third parameter was $r5 on AIX
      char *Data = (char *) $4;    // Fourth parameter was $r6 on AIX

      on_exit
      {
         // Log the data
         log (SrNum, Length, Data [0 .. Length - 1]) with DitFormat;
      }
   }
}

Your format routine should be defined above this; in v4 they are regular C routines but the important thing is that they take pointers to the data, so:

void DitFormat (int *SrNum, int *Length, char *Data)
{
   // Do your processing here
}
      

A couple of comments on the new file: It is recommended that you use C style comments (//) unless you wish to keep code common with some existing C code since they are less error prone.

Make sure your format routine only has pointers for it's parameters.

Hope this helps - like I said, make sure you understand how to write simple probes, logs and logs with formats and then you should be fine to tackle this exercise.

17.57 (Unix) Is there a probe to see when my application "exec's" another program?

[faq_exec.apc Here] is the source for a probe that should do the trick. It will record calls to all of the exec routines, including the file, calling user/group IDs, file user, group and mode information and the environment. It was written for Solaris but should work on other Unixes.

To compile, just save to your local disk and do apc faq_exec.apc.

To use this probe you will need to have a new or existing workspace for the process you want to watch. Then either,

  • Copy the exec.ual file into the workspace directory, or
  • Use the Add Ual option from the Setup menu in the main RC window. In the "Ual file" field type or browse to the exec.ual file. Uncheck the copy UAL checkbox and (optionally) give it a description like "Record exec calls" and click OK. The UAL will be listed in the list on the left hand side of the window; check the checkbox for it and click Build.

Although the first option is simpler, using Add Ual will make it easier to turn on or off later.

Now rerun, format and look for the exec calls. If necessary the probe can be expanded to record parameters if this will be necessary to identify it.

17.58 How can I cast an enumeration value to print its numeric value?

Yeah, the "obvious" direct cast doesn't work. The trick is to get that byte into something you can safely cast. The reliable way to do it is as shown below.


probe thread {
  probe "qts.write_to_q" {
    on_exit
    {
       /* this doesn't work: log("rc=", $rc, "=", (int)($rc)) */
       char rc_val = *(char *)&$rc;
       log("rc=", $rc, "=",  (int)rc_val);
    }
  }
}

17.59 How can I detect memory overwrites on dynamically allocated (malloc'd) memory?

A crash can happen because memory allocated using malloc() or its variants is being corrupted by code that writes past the end (or before the beginning) of the memory that's returned, corrupting malloc's internal pointers or adjacent data.

The predefined "memcheck" probe detects this by putting a "fence" at the end of allocated memory, and checking the fence is intact when the memory is freed: see [#q15.18 Q15.18].

17.60 How do I know when my application has forked?

You can use the ap_AddNewProcessCallback to add a callback when Aprobe detects your new process. Pass it a handler that will be called in the child process. For instance:


static void MyNewProcessHandler (ap_ThreadContextPtrT ThreadContext)
{
   log ("Here is my new process");
}

17.61 How do I know what lines I can probe in a function?

The most reliable way is to use:
   apcgen -qlines -p function_name -x module_name

This generates an on_line section for each line in the given function. You can redirect the output to a file and edit the file with your on_line actions.

For an executable module you can use:
   apinfo -l exe_name
which lists all the symbols and their lines, if any. This output is simply the "raw" line information, sorted by code offset, so is not as useful for writing probes, though the output may be a good reference for use with test coverage or a debugger.

17.62 Is there a routine available to find symbol ids by mangled name, or one that will demangle for us?

You can generally pass a mangled name as the name to ap_NameToSymbolId() and you'll get the correct Symbol ID. However, there is also the following (defined in aprobe.h, of course):


extern void ap_Demangle(
   ap_DemangledNameT *Result,
   ap_NameT          MangledName,
   ap_BooleanT       IsSubprogram,
   ap_CompilerKindT  CompilerKind);

Here is an example of how to use it:


{
   ap_DemangledNameT DemangledName;

   ap_Demangle(
      &DemangledName,
      ".sec_fdk_Nam_Svc_Def__ELAB",
      TRUE,
      ap_AIXpa4_CompilerKind);


   // Now we can use DemangledName.FullName
   SymbolId =
      ap_SymbolNameToId(
         ap_ApplicationModuleId(),
         DemangledName.FullName,
         ap_ExternSymbol,
         ap_FunctionSymbol);
}

17.63 Is there a way to suppress (or force) the warning when probing a symbol that is undefined?

Yes, this was introduced in RootCause 2.1.3/Aprobe 4.3.3 (February 2004). The way to do it is specify #pragma optional in column 1 immediately inside the probe (or typedef probe), for example:


 probe thread {
    probe extern:"PrintDebug()" {
 #pragma optional
     ...
    }
 }
 

Conversely there is also a #pragma required which forces a warning in the case where the module is undefined. By default, a warning is not generated on probes on missing modules. For example:


 probe thread {
    probe extern:"open()" in "libpthread.so" {
 #pragma required
    }
 }
 
would force an warning if libpthread.so was not among the libraries loaded by the application.

This was possible prior to version 2.1.3 but was harder since it required use of a typedef probe and programmatic checking and instrumentation using the Aprobe API. (See for example AllocationFunctions[] array in memwatch.apc.)

17.64 Can I call a C method from a probe?

Yes, if:

  • You know the full method name, and
  • You have a this pointer for that method's class available (or else the method is static).

In these cases, you call it just like a C function (see [#q17.21 Q17.21] except that you pass this as the first parameter). For example, suppose you have a class that looks something like this:


class Example {
public:
   void doIt(const string& s);
   void debugIt(const string& s);
};

And you want to call debugit() on entry to doit(). The following works:


probe thread {
  probe "Example::doIt" {
    on_entry {
      $("Example::debugIt")($this, &($s));
    }
  }
(Note the & when passing the string parameter: APC automatically dereferences reference parameters, so you need to "restore" the reference.)

But obviously this is a very simple example. In many real cases you have template instances with long and subtly different names. In such cases, you can use apcgen -vL to list the methods in an individual object file and "grep" for the methods you're looking for and try to match up the line numbers.

When you have dynamically dispatched calls, you are limited to methods in common base classes, or else you need to use some conditional test to determine which specific method to call.

Often the best choice is to use a separate extern "C" C module as an interface between your probe and the call, as described in [#q20.8 Q20.8].

As always, if you have problems or questions, contact .

17.65 How do I print/change a C std::string object?

This is provided by the predefined probe cppstring.ual and its associated header file $APROBE/include/cppstring.h You can learn more from the example at $APROBE/examples/predefined_probes/cppstring/

This is a good example how to combine some simple C with a probe to avoid having to reverse-engineer the C .

18. Writing Java Probes

18.1 How do I use Aprobe on a Java application?

See Chapter 5 of the Aprobe User's Guide.

18.2 Can I change the return value of a Java function?

Yes. Here's a simple application, a probe, and the xmj file:

// The application Simple.java
public class Simple
{
   int doIt ()
   {
      return 10;
   }

   public static void main (String[] args)
   {
      System.out.println ("doIt returns "   new Simple ().doIt ());
   }
}

// The probe SimpleProbe.java
public class SimpleProbe extends com.ocsystems.aprobe.ProbeMethod
{
   public Object onExit (Object returnValue)
   {
      return new Integer (11);
   }
}

<!-- The xmj file simple.xmj -->
<probe_deployment>
   <probe class="SimpleProbe" parameters="readonly">
      <target value="Simple::doIt"/>
   </probe>
</probe_deployment>


$ javac Simple.java
$ javac -classpath $APROBE/lib/aprobe.jar SimpleProbe.java
$ apjava -u simple.xmj -java Simple
doIt returns 11

18.3 Can I throw an arbitrary Java exception from my probe?

Unfortunately not. Java requires that all exceptions, other than RuntimeException and it's descendants, must be declared by the method or caught. We cannot specify that the base Aprobe Patch class throws a specific exception because that would require that all methods that called it would have to either catch the exception or specify that it throws it. However, you can throw any RuntimeException.

18.4 When using a Java custom probe, can I get output to appear in the Trace Display tree?

Yes there are a few ways:

  1. Use the methods in com.ocsystems.aprobe.Logger to log objects (including strings).
  2. Use the com.ocsystems.aprobe.TraceBean.logComment method to log a comment. You'll get an exception if you have de-selected trace for the run because you are calling a native method directly.
  3. Write some custom apc to go along with the custom java; have the custom apc define specific format routines for the logged data and export some native methods; have the probe bean call the exported native methods. Needless to say this option is about as advanced as you can get and we don't really document it. No user has got to the stage of doing it yet. If you are there, .

18.5 Is it possible to "stub" a Java method so it does not execute the code in the original method?

Yes, starting with RootCause version 2.1.3a (April 2004). To stub a method, simply call the stub() method at the end of the onEntry probe method, for example:


import com.ocsystems.aprobe.*;
public class TestProbe1 extends ProbeMethod
{
   public boolean onEntry(Object[] parameters)
   {
      stub();
      return true;
   }
}

18.6 Is there any way to probe classes from rt.jar, e.g., java.io.*?

Sorry but you cannot probe any classes in the bootpath, which includes rt.jar. This is a limitation basically imposed by the JVM because you cannot call methods which are not in the bootpath from within bootpath classes. That is, you could never apply a probe because that class would be in the child's class loader so the parent wouldn't have visibility. In informal discussions with engineers in Sun's JVM group they said it was a bad limitation of the JVM because it made bytecode patching, which was a "preferred" technology, very difficult.

We have kicked around the ideas of having a bridge to native code in the bootpath classes and then the native code calling the probes but the technical issues are difficult.

For some problems, instead of probing these classes it's possible to probe the native methods underneath. For example, probe the file access routines in the libc library rather than the java.io methods.

18.7 How do I call another method in the same class instance from within my Java method probe?

The 'this' object is the first parameter (params[0]). So if you're probing a method in class SquareID, and you want to call otherMethod() there, then it'd be something like:


...
  SquareID id = (SquareID) params[0];

  id.otherMethod();
  return true;
...
Note that the code has to import the SquareID class, too:
import SquareID;

See Custom Java Probes in the RootCause Java user guide for more basic information.

18.8 Can I add custom Java probes within the RootCause GUI?

No. Most or all of it must be done from the command line. In a GUI you can click on "Custom" button in the setup options, but this would only bring up a help dialog with the instructions on how to set the XMJ and the corresponding Java code. You can cut and paste from this dialog to create your .xmj file in the workspace. After that, you would probably only use the workspace and intercept mechanism to deliver your probes to the application in an automated fashion. You could apply these probes directly to your application using the apjava command. RootCause just hides this from the user of the application.

18.9 Can I change the value of parameters passed to a Java method?

Yes, starting with RootCause version 2.1.3a (April 2004). There are two parts:

  1. In the deployment descriptor XML file, indicate that the parameters are read/write (not the default of read-only):
<?xml version="1.0" encoding="UTF-8"?>
<probe_deployment>
   <probe class="TestParamsProbe">
        <target value="ParamsTester::callIt(java.lang.String,boolean)"
                parameters="readwrite" />
   </probe>
</probe_deployment>

  1. In the probe itself, simply assign new Objects to the params vector:
import com.ocsystems.aprobe.*;
public class TestParamsProbe extends ProbeMethod
{
   public boolean onEntry (Object[] params)
   {
      // params [0], the 'this' parameter, can't and won't be changed.
      params [1] = new String ("This is a new string");
      params [2] = new Boolean (true);

      return true;
   }

   public Object onExit (Object returnValue)
   {
      int value = ((Integer) returnValue).intValue ();

      return new Integer (value   1);
   }
}

18.10 Can I log any Java variables other than method parameters?

The Variables pane in the RootCause Trace Setup dialog only supports logging Java parameters (all or none). In a custom probe, you can access individual parameters by position, and the return value. From a custom Java probe, you can access public class data just as you would from another class in your Java application. There is no access to method local data or class private data.

18.11 Is there a way to define nested probes in Java similar to that supported in APC?

Yes. In APC you'd write something like:


   probe "a()" {
     probe "b()" {
       on_entry
          do_something();
     }
   }

For Java it's not quite as clean as with APC because of the split between the probes in Java and the definition in XML. The file Example14.java has two Probe Methods; the MyUmbrellaProbe is the equivalent of the "a()" in the above example. It creates a new MyNestedMethodProbe probe (i.e., "b()") in it's onEntry method. The file Example14.xml is the probe deployment descriptor. We just define both targets in it. Note that you don't specify the hierarchy in the XML: it's defined by the Java probe.

19. Logging Data

19.1 What's the difference between "logging" and "printing"?

Printing you understand. You call "printf()" or "puts()" and it displays what you passed to it directly to standard output (or some other file if you used fprintf()) as soon as the call is executed.

Logging, as implemented by the "log" directive in APC, is more complicated. It writes the data you specified within the parentheses to a memory-mapped APD file, and associates a "format routine" with that data. The format routine is not called, and the data is not displayed, until later when the "apformat" command is run over the APD file.

Another important difference between printing and logging is that the Aprobe log mechanism is lock-free, whereas printing requires a lock to get exclusive access for the printing thread. This gives a significant advantage to the log operation in multi-threaded applications where performance and deadlock are considerations.

19.2 Why do I get data mismatch warnings logging to my very simple format routine?

All parameters to a format routine must be *addresses*. So if you do

        log((int) x) with myformat;

then you must have

        static void myformat(int *i) { ... };

If you had declared "myformat(int i)" then you would get a warning from the C compiler invoked from `apc'.

19.3 Why do my format routine parameters (usually) have to be pointers to the type logged?

The short answer is, "Because that's how it works." There are two real reasons. The first has to do with the whole logging/formatting concept. Data is copied to a memory-mapped file when logged. When formatting, we memory-map the APD file. To pass the data to the format routine directly, we'd have to allocate temporary space of the right size and copy it again.

It's much more elegant to pass everything -- scalars, structs, and arrays -- by pointer. That way, when you log an `int' value, you write it to the APD file, and when you format it, you just pass its address in the memory-mapped apd file directly to the format routine. This allows ints, arrays of ints, and structs to all work the same way.

The second reason is related to the first, and has to do with the fact the C doesn't have an array "type", but rather treats any adjacent locations in memory as an array. Here's what our chief designer has to say on this subject:

When designing the APC extensions such as 'log' statements we had to make sure that they would work with any data types, including scalars, structs and arrays. It was array types that gave us the most problems, mostly due to the fact that C has very little support for arrays.

Even though one can declare an array with a given number of elements, such declarations are limited as to where they can appear (e.g. you can not use a pointer to an array declaration inside of a formal parameter list) and operations for array types are essentially the same as operations for pointer types.

Now consider these 2 log statements below:

int foo[10];

log(foo[0]) with MyFormat;
log(foo[0..9]) with MyFormat;

The format for the first log statement could have used 'int' like you suggest, but what about the second log statement? Of course, we could have treated the first log statement differently from the second one, since the first one clearly logs one element, while the other logs a range of elements. If we did so we would use 'int' in the format declaration for the first 'log' statement and 'int *' for the second. But even so, you would still have cases like this:

log(foo[0..0]) with ... // Do you use 'int *' here or 'int'
?  log(foo[Var1..Var2] with ...  // We don't even know the number of
elements here.  

The requirement that all formats use pointers to the data as argument allowed us not to make any distinction between the way we log scalars and arrays. If this seems to be confusing to you, you can always use a simpler interface, where you don't have to provide any formatting routine at all.

log("foo[0] => ", foo[0]);

If this doesn't make sense to you, you are not alone. Some of us didn't like the way this had to be done either, unfortunately no one came up with a better solution than the one we have right now. If you have such suggestions, feel free to share them with us.

19.4 How can I control the size of the APD file produced?

This is specified as a parameter to the aprobe command. By default there is a single 256M file. You can specify the number of files (see the next Q.) and/or the maximum size of each file. You set the maximum size of each file (in bytes) with "-s n_bytes". You set the number of files with "-n num_files", where num_files must be in the range 0-9. If you specify 0, all logged output is discarded. If you specify 2 or more, but don't explicitly set the size with "-s", the maximum size is set to 2 megabytes.

19.5 What is an "APD ring"?

The "APD ring" is how the aprobe logging mechanism deals with large quantities of data. By default there's a single APD file produced by aprobe, with a maximum size of 256 MB. If you try to log more than that, the last (newest) data is lost.

If you specify more than one file, the files conceptually form a "ring" so that the most recent data is always kept, and the oldest data is lost. The ring is really more like a fixed-length stack where data falls off the bottom when additional data is pushed onto a full stack.

Details are described under "APD File" in Appendix B (Files Reference) of the Aprobe User's Guide.

19.6 How can I control what goes into each APD file?

You can't log data to whatever file you want, but you can register a callback routine that is called whenever the logging mechanism changes to a new file in the ring. This is illustrated by the example in APROBE/examples/learn/apd_ring included with Aprobe.

19.7 How can I reduce the time that is spent logging data in my probes?

See the section "Log Statement Overhead", under "Aprobe Performance Considerations", in Chapter 4 of the Aprobe User's Guide.

19.8 How can I log data so it's guaranteed to be available when I format, even if the APD ring wraps around?

The appropriate place for such data is the persistent apd file. You can log to this like this: log (...) with blahformat to ap_PersistentLogMethod;

Since the persistent file is always formatted first this would mean that you would get your data earlier than you would if you logged to the apd files, in the format on_entry part.

20. Other Aprobe Questions

20.1 Where does aprobe get its "time" from (e.g., for the profile probe)?

On AIX, Aprobe reads the realtime clock directly using read_real_time , then converts to ap_TimeT using time_base_to_time , both defined in sys/time.h .

On Linux, Aprobe just calls gettimeofday() defined in sys/time.h.

20.2 Why do my threads execute in different order under aprobe?

Almost certainly it's timing. Each time a thread is created, aprobe collects some information. This can delay thread creation somewhat and change the order in which threads are executed. Also, your probes take some time, and delay a thread that executes a probe relative to another that does not.

20.3 It looks like if I run "aprobe -if", both the probe program and probe format get executed, which messes up initialization. How can I avoid this?

There's a function ap_CurrentAprobeState() that returns either ap_AprobeRunTime or ap_AprobeFormatTime. So you can do:

   if (ap_CurrentAprobeState() == ap_AprobeFormatTime) { ... } 

in your probe format. This is the preferable way.

  probe program {
    on_entry {
      DumpInfo();
      // Don't run the program.  Exit after printing all the info.
      // (MAGIC exit code tells runtime this is *not* and error)
      exit(APROBE_MAGIC_EXIT_CODE);
    }
  }

  probe format {
    on_entry {
      if (ap_CurrentAprobeState () == ap_AprobeFormatTime)
      {
        DumpInfo();
        /* Don't do any formatting.  Exit after printing all the info. */
        exit(0);
      }
    }
  }

20.4 I have a probe on_exit to a function to change the struct that is returned. It causes a core-dump when the probed function called as a procedure. What's the problem?

On some architectures, a structure returned by value is written to space on the stack allocated by the caller. However, if the caller is discarding the returned value by calling the function as a procedure, no space is allocated. In this case, a probe which may normally attempt to change the return value should not do so, as it will likely corrupt memory. In order to allow users to handle this problem, the following macro is provided:

#define ap_StructValueReturnExpected private

This would be used as a boolean expression in an on_exit part as follows:

probe "UpdateCoordinates()"
on_exit
  if (ap_StructValueReturnExpected)
    $return.x = $return.y = $return.z = 0;
}

20.5 I want to capture the address of a target expression on entry in a pointer to the right target type. How do I declare this?

There are (at least) 3 possibilities, illustrated in the APC file below:

probe thread{
  // Method 1: Use the APC "typeof" operator on the type name directly as a
  //           target expression:
  probe "foo"  {
    typeof($MyStructT) *Param1 = &($MyS);
      on_exit {
      log("Param1 => ", *Param1);
    }
  }
  // Method #2: Use the APC "typeof" operator on the target
  //            expression for the parameter name:
  probe "foo"{
    typeof($MyS) *Param1 = &($MyS);
    on_exit{
      log("Param1 => ", *Param1);
    }
  }
  // Method # 3: Use the "typeof" operator on the target expression
  //             for the positional parameter.
  probe "foo"
  {
    typeof($1) *Param1 = &($1);
    on_exit
    {
      log("Param1 => ", *Param1);
    }
  }
}

This applies whether you're capturing a parameter or global value, or even assigning an APC value to a target expression. The type declaration is the important point here.

Of course target expressions apply only if you have debug information available for the definition of the various names. Otherwise, you must reproduce or include the C type declaration directly in the APC, and reference it there.

20.6 I want to probe a method in a template class. How do I refer to the method in the function probe on that method?

This can be tricky. What you need to do is get a list of all the functions as Aprobe will reference them. The info.ual predefined probe is provided precisely for this purpose, and "apcgen -L" also works. In this case, if your executable were named "myprog.exe", and the method you wanted to probe were called Method, try:

  aprobe -u info -p -s myprog.exe | grep Method

or

  apcgen -Lv myprog.exe | grep Method

This gives each function name which can be probed, and the file and line on which it's declared. This can still be pretty tricky for template instances, but it's the best we have at the moment.

20.7 In what order do separate probes on the same function probes execute?

A user wrote: "What I want to do is:

probe thread
{
    probe "myfunc()"
   {
       ap_BooleanT IsEnabled = TRUE;
   on_entry
       if (some_expression)
       {
          IsEnabled = FALSE;
          return;
       }
    on_exit
      if (! IsEnabled) return;
    on_entry
       do_something();
    on_exit
       do_something();
   }
}

The first on_entry/on_exit pair would be the wrapper part and would prevent any second on_entry/on_exit pair from executing. Can I count on the first pair executing in order?"

Here's the answer: "Yes. On_entry/exit should execute in lexical order. If you have multiple probes on the same routine, their on_entry's should execute in lexical order as well, however on_exit's will execute in the reverse order to ensure proper nesting. Probe program on_entry actions are executed before probe thread's ones and those are executed before any subprogram probes if any.

UALs on the aprobe command-line (or in the RootCause workspace's aprobe script) are initialized in reverse order, i.e., right-to-left. Similarly, two probes on the same function in different UALs are executed right to left, for example:

$ aprobe -u t1 -u t2 t.exe
enter t2:main()
enter t1:main()
exit t1:main()
exit t2:main()

20.8 Is it possible to reference C files from my application from within my UAL.

Yes, but it's a bit tricky. On Linux you can specify "-linker g " and on AIX "-linker xlC" on the apc command line to indicate that the UAL should be linked with the C compiler rather than directly with the linker ld. This causes C static initialization and includes the C shared library. This in turn allows you to link an object code archive of already-compiled C files with your APC, and use extern "C" { ... } to access it. This is down by the cppstring.ual predefined probe illustrated in $APROBE/examples/predefined_probes/cppstring/.

If your C code is already linked into a shared library then you can link to it when you build your UAL, for example:


 # link to /path/to/my/lib/libmy.so:
 apc my.apc -linker "-L/path/to/my/lib -lmy"
 
 There are probably other ways as well -- at base, a UAL is just a shared library built from gcc-compiled C files.

20.9 Can I force a snapshot of my predefined probe data by sending a signal to my application?

Yes. The following apc code registers for SIGPROF and does a snapshot:

#include <signal.h>
 #include "memwatch.h"
 
 static void Handler (int sig, siginfo_t *siginfo, void *ucp)
 {
    printf ("Taking snapshot on signal %d\n", sig);
    ap_Memwatch_DoSnapshot ("snapshot signal");
 }
 
 probe program
 {
    on_entry
    {
       ap_RegisterSignalHandler (SIGPROF,
                                 ap_CallBeforeUserAction,
                                 Handler);
    }
 }
 

If this file was memwatch_sig you would compile it with:

apc memwatch_sig.apc memwatch.ual

and then use memwatch_sig.ual instead of memwatch.ual when running. Then send the signal (kill -PROF pid ) to generate a snapshot.

20.10 How do I log multi-dimensional Ada arrays?

Aprobe only supports getting one slice at a time -- for the right-most index. For individual elements, therefore, it's trivial:

  log ($available_overlays [1] [1]);

or you could use a single slice:

  log ($available_overlays [1] [1 .. 10]);

Multi-dimensional arrays should scale up fine. Since the arrays are stored contiguously you could cheat and cast it to a one-dimensional array if you're clever about your labeling.

20.11 AIX: Why isn't my ual world readable?

The apc command does a `chmod 640' on the ual it generates after a successful link. This is necessary because this effects how the shared module is loaded at run-time. Here's an excerpt from AIX 'info' output for 'dlopen()', which is the runtime routine used to load UALs when running aprobe:

  • If the module being loaded has read-other permission, the module is loaded into the global shared library segment. Modules loaded into the global shared library segment are not unloaded even if they are no longer being used. Use the slibclean command to remove unused modules from the global shared library segment.

It seems obvious that we don't want individual's uals the shared library segment. Multiple edit/apc/aprobe cycles could result in bizarre behavior. The slibclean command can only be run by an account with su privileges.

20.12 AIX: When I use pthreads calls in my probes, the UAL won't link. Do I need to explicitly specify the library or change my compiler_profiles file?

We strongly advise against linking probes with a thread library since it can cause major problems when run against a single threaded application. The recommended approach on AIX, although a little painful, is to look up the symbol dynamically and call it by pointer. Here is an example for pthread_attr_getstacksize:


 // Define a type to map to the routine
 typedef int (*pthread_attr_getstacksize_subprogram_T)
    (pthread_attr_t *, size_t *);
 
 // Declare a variable to hold the address
 static pthread_attr_getstacksize_subprogram_T
    pthread_attr_getstacksize_subprogram_ptr = NULL;
 
 probe program
 {
    on_entry
    {
       pthread_attr_getstacksize_subprogram_ptr =
          (pthread_attr_getstacksize_subprogram_T)
             ap_FunctionPointer (ap_ModuleNameToId (PthreadModuleId (),
                                "pthread_attr_getstacksize()",
                                ap_NoName);
 
       // Call it - note don't do this on program entry until you have the
       // fix for that!
       if (pthread_attr_getstacksize_subprogram_ptr)
       {
          pthread_attr_getstacksize_subprogram_ptr (&Attributes, &Size);
       }
    }
 
 The PthreadModuleId () routine would look something like:
 
 static ap_ModuleIdT PthreadModuleId(void)
 {
    ap_ModuleIdT Result;
    /* First, the 4.3 case */
    Result = ap_ModuleNameToId("libpthreads.a(shr_xpg5.o)");
 
    if (ap_IsNoModuleId(Result))
    {
       /* Didn't find it in shr_xpg5.o, so if we don't find it in shr.o ...
 */
       Result = ap_ModuleNameToId("libpthreads.a(shr.o)");
       /* ...we'll give back that null result. */
    }
    return Result;
 }
 

For Linux a similar approach can be used as for AIX. In that case the module is "libpthread.so" always.

20.13 Is there a way I can manage thread-specific data without using native thread-management routines?

Yes. Defining and referencing thread-specific data is built into Aprobe. Here is an example:


int *GetThreadSpecificInt();

probe thread
{
   int ThreadSpecificItem = 0;

   int *GetThreadSpecificInt()
   {
      return &ThreadSpecificItem;
   }
}

Now you can call GetThreadSpecificInt() function from anywhere to get hold of the thread specific data item. This should work equally well on all the platforms and be usually much faster than using pthread functions.

You can report or take actions when each thread starts and stops as well:


probe thread
{
   on_entry
     printf("Entering thread\n");
   on_exit
     printf("Exiting thread\n");
}

The predefined probes in the $APROBE/probes have many sophisticated examples of this. A simple example is available on Unix platforms in $APROBE/examples/evaluate/5.threads.

20.14 How does using Aprobe for C differ from using Aprobe for C or Ada?

These are interesting differences:

  • memory
  • objects
  • mangling
  • exceptions
  • generics/templates
Aprobe tries to make the probe interface common, but language differences may get in the way:
  • C applications tend to be much bigger, which can make the RootCause GUI very slow.
  • C calls extra procedures, like class constructor/destructor.
  • C has objects whose members Aprobe tries to access, with varying degrees of success.
  • C has name mangling that Aprobe tries to hide, with varying degrees of success.
  • Aprobe does not support throwing C exceptions as it does for Ada exceptions.
  • C throws objects with exceptions, while Aprobe only logs the object's address.
  • C uses standard templates whose expanded form is all that Aprobe sees.
  • C has multiple inheritance whose rules are resolved by the compiler.

These differences can make Aprobe a little harder to use on a C application, or a little less satisfying when a probe logs data to be formatted for easy reading. For example, constructors and destructors may get profiled/traced, but most of the time, they just clutter the report; objects may be shown with member addresses instead of member data; mangled names sometimes show in reports or apc input; exception object content may be needed but lacking; output may show the internal form of an expanded template rather than the source form written by the programmer; a probe's references to inherited data may need compilation by the C compiler to be right.

OC Systems is developing a strategy whereby C can be linked with the probes to circumvent many of these problems--contact us to learn more.

20.15 Why does my C application crash when run with Aprobe?

If your application is bigger than, say 100M, the chances are that it's running out of memory. You can verify this by running the "apsymbols" command, for example, apsymbols c2.eab. If it crashes, then that's the problem. If apsymbols doesn't crash it, the problem might be elsewhere. See [#q13.11 Q13.11].

There are two known reasons why aprobe may cause the application to run out of memory:

  • Demangler memory leaks - On AIX, the IBM C runtime is used to "demangle" the C symbols. There was a memory leak in all versions before 7.0.0.3. You can use the command "lslpp -l xlC.aix50.rte" on AIX to see what version is installed on your AIX box.
  • Old Aprobe Version - Aprobe creates a symbol table in memory. Prior to version 4.3.4a (RootCause 2.1.4a) it used the same memory as the application itself, and so there would be insufficient memory for both aprobe and the application. Do aprobe -h | head to see what version you have.

See the next question for possible workarounds.

20.16 (AIX) My application aprobe or its tools runs out of memory. What can I do?

This is a side-effect of having huge C applications as described above. On AIX there's a way to give your application more memory. AIX supports a concept called the Large Address-Space Model. This may be applied via an environment variable when running, for example:


   LDR_CNTRL=MAXDATA=0x20000000 c2.eab
or
   LDR_CNTRL=MAXDATA=0x20000000 apformat c2.apd

This means it allocates all of 2 memory segments (3 and 4) for your application's memory. If you need even more memory you could try 0x30000000 but this may not work at runtime because some applications hard-code use of segment 5.

20.17 My application aprobe or its tools is very slow starting up. What can I do?

This is, again, because of the huge symbol tables in ERAM C programs. The workaround is to use Aprobe's ADI (Aprobe Debug Information) mechanism to pre-construct the symbol table for an executable. Here's how it works:

  1. Assume m2.exe is a executable that still has its symbol table and line information:
  2. Create an ADI file for that executable using the apmkadi command, for example:
cd /u/m2
apmkadi -o m2.adi m2.exe

  1. Reference the ADI file just like a UAL, for example:
aprobe -u m2.adi -u trace.ual m2.exe

  1. Run as you do now.
  2. Explicitly reference the adi file when you format, for example:
apformat -u m2.adi m2.apd

  1. If an ADI file of the default name is found in the same directory as the exexecutable, and its checksum matches, it is used automatically.

20.18 (AIX) Why is the C exception raised in my libxml -1.0.a library not reported by exceptions.ual?

The AIX C compiler, unlike other compilers, generates a copy of the C runtime exception-catching function in every shared library, rather than just the C runtime library. Aprobe automatically instruments this function, "__Throw" in the predefined libC.a library, but not in user-provided libraries. For that, you must use a special probe, cppexcmodules.apc, edited to name your library or libraries.

20.19 Why don't my on_line probes work?

This is likely because the code you are probing was compiled with optimize. Check your Makefile to see if CFLAGS, CXXFLAGS contain -O.

20.20 How do I probe a C application's CPU usage?

Unless you're an Aprobe or RootCause power user, the way to do this is with the statprof predefined probe (Unix platforms only). If possible, use it in an environment where the application terminates normally or with Ctrl-C (but not "kill -9"). Simply put "-u statprof" on the command-line or in the .apo file, and when you format a table will be generated showing what functions used what percentage of CPU. Details are in the user's guide.

If your application doesn't terminate normally you'll need to force a snapshot, as described [#SNAPSHOT below]. If the output of statprof says something like:


  56.7    0.59     Other functions (not in profiled module)

then you can see the usage throughout all modules by re-running with -u statprof -p -c, where -c means "course" and will show the usage of all modules. If the usage was mostly in, say, "libXm.a(shr4.o)" then you can rerun again to analyze just that one with -u statprof -p "libXm.a(shr4.o)".

20.21 How do I probe a C application's memory usage?

Aprobe has predefined uals for watching memory use:
  • memcheck
  • memwatch
  • memstat
Some read configuration files, though a ual will generate a default configuration file if it doesn't read a user file.

The memcheck ual watches for things like spilling over the limit of a memory area. memcheck requires no configuration files and simply checks standard allocation and deallocation routines. It checks the validity of allocated data on normal program termination, memory signal, or explicit request via call to [#SNAPSHOT ap_Memcheck_DoCheckpoint].

The memwatch ual can detect things like unfreed memory accumulating. It doesn't have a configuration file, but requires that the program terminate normally to dump its data. If the program doesn't terminate normally you can use dbx to force a [#SNAPSHOT snapshot].

The memstat probe is used primarily with the RootCause GUI because it requires some configuration, but is much more usable with respect to overhead and analysis. For more details on this probe, see RootCause Memory Tracking Probes on the web site.

20.22 How can I interactively debug an application in real time?

Debugging a real-time application with dbx (or gdb) is usually tricky, because the debugger must attach to the process in real-time. Aside from the problem of hitting a moving target, hitting the target stops the process. With Aprobe, both problems are easily solved using a custom probe. The model for the probe is below, but an introduction to the concept is needed:

The Aprobe solution is to write a probe which monitors for a reason to debug, and forks a copy of the real-time process when the monitor sees a need. The parent process then continues, while the copy stalls itself in the probe so dbx can attach to the copy. Here is the model, and a talk-through follows the model:

 #include <sys/types.h>
 #include <unistd.h>
 probe thread {
  probe "somewhere_where_there_can_be_a_problem" {
    on_line (where_there_can_be_a_problem) { // or on_entry or on_exit
      if (the elusive problem the user is watching for has occurred) {
 // here is the guts of the probe
        int normal = $some_reference; // save a normal state, explained below
        pid_t child;
        child = fork();
        if (child) fprintf(stderr,
           "Oops -- such-and-such happened -- gdb xxx %d\n", child);
        else while (  child) {
          if (child > 600) exit(1); // kill if unused in ten minutes
          if ($some_reference==normal) sleep(1); // stay in the probe
          else {$some_reference = normal; break;} // leave the probe
        }
      }
    }
  }
 }
 
The 10-minute stall loop stops counting as soon as dbx attaches. If the user finishes digging and detaches dbx, loop counting would resume and the probe would kill the application copy if the user forgot to kill it. But if the user wants to set breakpoints and resume the application copy out of the stall to debug it, the method is to use dbx-set to change a chosen piece of static data and dbx-continue. The probe sees a state change, restores the saved state, and returns from the probe. This is the only way the throwaway child process would execute beyond the probe.

Debugging the forked process over a breakpointed path goes beyond interactive data digging at the point of a problem, and may not be needed for every problem. If not, there is no need to chooses a static integer visible to dbx and the probe.

This "living dump" concept is useful for distributed applications, because the parent application process is unaffected by this probe. The whole distributed operation should be unaffected. Yet the user would have an attachable copy of a troubled process that might have stalled itself while the cause of a problem was still visible. Digging for the problem can be leisurely, since it makes no difference if the parent process continues or ends.

20.23 How do I get the size of my "std::list<std::string>" object generated by g ?

Different compilers have different low-level implementations for these and it's best to just call the C size method if possible. This worked on our RH8 gcc 2.95.2 system:


probe thread
{
   probe extern:"::myroutine(void)"
   {
      on_entry
      {
         // The list is in a variable called my_list.
         // We need to call list.size ():
         log
($("list<basic_string<char,string_char_traits<char>,__default_alloc_template<false,0> >,allocator<basic_string<char,string_char_traits<char>,__default_alloc_template<false,0> > > >::size(void)const") (&$my_list));
      }
   }
}

I found the routine's fully qualified name using apsymbols (or apcgen) and grepping for "size".

20.24 What do I do if my program dumps core when run with Aprobe?

For possible reasons for such crashes, see questions [#q13.11 Q13.11], [#q13.14 Q13.14] and [#q20.15 Q20.15]. If you have a core file, keep reading.

The first thing to check is whether any probes you have written are responsible for illegal memory references. These will cause core dumps just like any C or C program. If you have a machine-level debugger installed you can usually use it to get the a stack trace. On AIX:
   dbx /full/path/of/your-application /path/to/core
On Linux:
   gdb /full/path/of/your-application -c /path/to/core.12345

(That is, the first argument is the name of your executable, and the second is the path to the core file it dropped, which should be in the program's PWD.) Then enter the command where which will give the stack trace at the point of the core dump.

On AIX you need to have the bos.adt.debug fileset installed.)

If the stack trace includes a function name which looks like:
   OnExit_0094_L0013(...
then the core dump probably occurred in one of your own probes. Look at the integer in the third part of the name: this is the line number of the 'probe' directive in the .apc file (in this case, 13). You may also see names beginning 'OnEntry' or 'OnOffset'.

If dbx complains that the core file doesn't match the your application, you should run:

On AIX:    dbx $APROBE/bin/aprobe.exe /find/the/core-file
On Linux:    gdb $APROBE/bin/aprobe -c /find/the/core-file

Send the output of the where command to support@ocsystems.com and it should give us a clue. Remember to state what version of RootCause/Aprobe you are running (this is reported by 'apconfig' or 'aprobe -h | head')

AIX only: slibclean to correct shared module problems:
Lastly, run 'slibclean' and see if that fixes the problem. 'slibclean' is an AIX utility which removes unused shared modules from the system's memory. It require root access, but some sites elect to make this application 'setuid' so it can be run by ordinary users.

Allowing full core files
In the event dbx complains about a truncated core file, you should verify that your environment allows full core dumps. This entails two steps:

  1. Login as the same user that runs the application and run: ulimit -c
  2. If this does not report 'unlimited', then the account ulimit for core files needs to be set with: ulimit -c unlimited
  3. If this command returns an error, contact your sysadmin to adjust the account's 'hard core file limit'. If you are running your application from a login shell, you will need to logout and login again for the change to take effect.
  4. AIX only: check that the operating system allows full core dumps with:
       lsattr -E -l sys0 | grep fullcore
    This should report fullcore true. If not, the sysadmin needs to enable full core files through smitty System Environments->Change / Show Characteristics of Operating System->Enable full CORE dump.

21. Licensing

OC Systems spends a surprising amount of time helping users get licensing set up on their machines. Here are a few of the most common questions and answers.

21.1 What do we do with a license key that looks like "ocs-Aprobe-48833..."?

This is a decimal format key for use in the prompt that appears during installation. It is a single text string with no blanks or line breaks .

  • If you haven't yet installed RootCause/Aprobe, go ahead and install it and give this key at the prompt. When installation has completed, RootCause will be ready to use.
  • If you've already got an installation, append the exact single text line to the file $APROBE/licenses/license.dat .

21.2 What do we do with a license key that looks like "FEATURE ..."?

This is human-readable format key, and can't be used at the prompt that appears during installation.

  • If you haven't yet installed RootCause/Aprobe, go ahead and install it, but give no key. Just confirm that it's OK to proceed without a key. Then proceed to the next step.
  • If you've already got an installation, append the exact text lines given in the mail message to the file $APROBE/licenses/license.dat.

21.3 How do I start a second license server just for Aprobe?

When there is already a license server running on the machine and you want to start another one just for Aprobe, here's how to do it. This should applies to all Unix hosts.

The procedure for running a second license server on the same host is very simple.

When you are issued a concurrent-use license for Aprobe, it will include a line like the following:

SERVER my.server.name 000347b371fe

You should amend this line by adding a third parameter to the SERVER directive, which will be the port the license server will listen on for client requests. The server and clients all read this same file. The default port for FlexLM is 27000 but any available port number can be specified. One convention is to use the next available port higher than 27000, for example:

SERVER my.server.name 000347b371fe 27001

This parameter is the only one needed to support multiple flex servers on the same host.

21.4 AIX: How do I start lmgrd when the machine boots?

These instructions apply to any services which need to be started at boot time on AIX, not just lmgrd .

The details of these instructions may or may not be applicable to your own situation, depending on the exact configuration of your systems. You should consult your local policies and support organizations and convince yourself that the suggestions made here are appropriate before putting them into practice.

That said, this is fairly basic stuff.

We will use the 'mkitab' command to add an entry to the /etc/inittab file. This command is used in place of simply editing inittab,because it helps to insure that the integrity of inittab is maintained. If you were to make even a small error while editing inittab, the system may become unbootable. The mkitabcommand helps to alleviate this risk.

With root authority, execute the following command:

mkitab -i rcnfs rclocal:2:once:/etc/rc.local

This adds an entry to inittabimmediately following the 'rcnfs' entry, which instructs the init program to run /etc/rc.local and not wait for it to complete before proceeding with the rest of system initialization. Thus, you will probably be able in /etc/rc.local to take advantage of services which may only be available on NFS filesystems (again, depending on the exact configuration of the system we are installing on, which may not even mention rcnfs, in which case you would need to determine the correct point to add your local startup script).

You should then create the /etc/rc.local file, set its execute permission, and add to it the appropriate commands to start lmgrd and log its output, as well as whatever other site-specific initializations you may need to perform, not limited to OCS products.

Reboot the system and verify correct operation.