[Next] [Previous] [Top] [Contents] [Index]

RootCause

Selected Topics


This chapter contains discussions of various RootCause topics that may be of interest to you, the RootCause user.

RootCause and Efficiency Concerns

RootCause should preferably be installed on a local file system. It will work if it is mounted on a remote file system, but this may also impact performance.

The RootCause workspace should be created on a file system that is local to the machine on which the traced process will be run. The data logged by RootCause is written to the workspace. If the workspace is remote, then the logged data will have to be transmitted across the network, increasing the overhead of logging as much as tenfold. See also, "RootCause Data Management".

RootCause adds probes to the application in memory. These probes are optimized machine code, so while they are fast, they must of course add overhead to the execution of the application. RootCause only "patches" the traced functions and methods. For Java, RootCause inserts byte code to only trace the methods of interest, not all methods.

Furthermore, RootCause tracing applies automatic "load shedding" to automatically turn off tracing of functions that are introducing high trace overhead. Such functions can then be removed from the trace specification by the user in the next run. Using this mechanism and by adjusting the load shedding level, one can quickly get to an acceptable level of overhead. See "RootCause Overhead Management".

Typically, we have seen that one can add a 5% load and still get a useful trace. In general, you will have to iterate to define a good trace that adds a reasonable load so the application can still run in the operational environment. Note that RootCause supports this workflow, by allowing one to choose (and remove) trace items from the viewer to speed the removal of "noise" routines (noise routines are those that add little value to the trace).

Note that a program being probed by RootCause, will take somewhat longer to start. Typically, a few extra seconds are required for a RootCause session on an application. This minimal overhead is incurred because RootCause does as much as possible up-front, to reduce the runtime penalty later.

Solaris SETUID, and Security Concerns

This section briefly describes how RootCause / Aprobe can be used with certain "secure" applications on Solaris. These mechanisms are not yet provided for other platforms; contact OC Systems for more information.

The Solaris operating system provides a secure environment for debugging and running your applications. RootCause and Aprobe do not interfere with this mechanism but extend it to work safely in a number of environments that require it.

For the purposes of this document, a secure application is one that has the setuid bit set. We discuss how the Solaris security mechanism works with these applications and how Aprobe and RootCause provide their own extensions to the Solaris security protections to allow you to safely run probes on these applications without compromising system security.

Note that this document does not discuss applications with the setgid (group) bit set. At the time of writing, Aprobe and RootCause do not support running such applications.

Avoiding Solaris Warnings

Even if you do not wish to probe secure applications, you may want to place libapaudit.so in the secure location anyway to eliminate error messages. If you do not do this and try to run RootCause on an application that has the SETUID bit set, you will get an error message something like:

ld.so.1: mail: warning: /opt/RootCause/lib/libapaudit.so: open failed: illegal insecure pathname
ld.so.1: mail: fatal: /opt/RootCause/lib/libapaudit.so: audit initialization failure: disabled. 

Although these look like fatal errors, the application ran without error, and it was only the loading of libapaudit.so that failed.

Placing libapaudit.so in the secure location as described below will allow libapaudit.so to load for SETUID applications like /usr/bin/mail, so it can determine whether to probe the new process or not.

Note that just placing libapaudit.so in the secure location does not allow one to actually probe the SETUID application unless one is running as the effective user.

The secure path for dynamically-loaded libraries is different on each version of Solaris. This logic is encapsulated in a script, rootcause_libpath.

The simplest usage is:

  1. Log on as root so you have write access to /usr/lib and its subdirectories.

  2. Set up for using RootCause, e.g.,

    . /opt/RootCause/setup

    (see "The Setup Script").

  3. Run the command:

    rootcause_libpath -c

    This will copy the appropriate library to the secure locations. These locations are under /usr/lib, so you must be super-user. The script assumes that you are set up for RootCause, so you must run the RootCause setup script first. You should see output like:

    /usr/lib/libapaudit.so correctly installed.
    /usr/lib/secure/libapaudit.so correctly installed.
    /usr/lib/64/libapaudit.so correctly installed.
    /usr/lib/secure/64/libapaudit.so correctly installed.
  4. Log off root on this machine.

  5. You will need to do this on each machine on which you use RootCause.

  6. After doing this, you will need to do rootcause_off, then rootcause_on again to pick up the new values.

Description of Solaris Security

This section briefly describes the Solaris security measures that are appropriate for RootCause / Aprobe. It should be noted that each version of Solaris has it's own subtle variations on this. All examples given are for Solaris 8 and over although, with the exception of Solaris 2.5.1, RootCause and Aprobe can be expected to behave identically on older versions as far as security goes. (Solaris 2.5.1 has overly tight restrictions that were corrected in later versions).

The first concept that must be understood is that every executable run has two users associated with it at runtime. The first is the "real" user, the logged in user - the user shown when you use the command "id". The second is the "effective" user which really governs the permissions you have during runtime.

(One important point is that if the real user is root, all security mechanisms are effectively disabled because they are moot. One practical result of this is that you may use Aprobe on any application if you are logged in as root).

Normally the real and the effective user are the same. If, however, the setuid bit is set on an application, the operating system changes the effective user to match the owner of that application. Most commonly this is the root user and is done to give a regular user temporary access to a limited set of secure resources.

Let's take the "/usr/bin/at" command as an example. The output from "ls -l" might look like this:

   -rwsr-xr-x   1 root sys  37876 Jul 10  2000 /usr/bin/at

Note that instead of an 'x' where we would expect the owner's executable bit, we see a 's'. This means that the application will run with the effective user root, with all the permissions that that allows.

What would happen if we were allowed to attach a debugger to this application? Suddenly we would be able to cause the application to execute arbitrary instructions as if it were root! To prevent this, the operating system will prevent the debugger interface being used in such a situation. (Again, if you are actually logged in as root, you will be allowed access).

Another aspect of security for these applications is where they load their libraries from. Obviously the application can have a set of specific libraries linked in and these can be safely loaded. But the runtime linker also provides some capabilities to add arbitrary shared libraries in using the LD_PRELOAD and LD_AUDIT runtime linker environment variables. Once again it would be a security risk if any library could be specified, so the operating system only allows libraries in "secure" paths to be loaded by these environment variables.

Impact of Security Measures on Aprobe

When we run the "aprobe" command on an executable, we start out life as a debugger, patching in the probes that we've specified. Once this is done, the "aprobe" executable detaches from the application and goes away. As was mentioned above, Solaris will not allow the use of the debugger interface on a secure application. Aprobe will specifically check for this so it can give a more friendly warning if you try to run it:

$ aprobe /usr/bin/at 
(E) /usr/bin/at
This file is owned by root and has the setuid bit set.  
You need to use the secure version of aprobe (saprobe) to run this
application under Aprobe. Please see the section on secure applications
in the Aprobe user's guide.

As this error describes, there is a secure version of Aprobe that allows us to run on these applications. In fact, there are three ways we could run this application:

  1. Log in as root. As was mentioned above, security restrictions are moot for the root user and so Aprobe will run fine.

  2. If you could rebuild or relink the application, you could link in the libdal.so file that allows an executable to patch itself. The use of this is outside the boundaries of this document but you can find more details in the Aprobe user's guide.

  3. Use the secure version of Aprobe mentioned above - saprobe. The secure version itself has the setuid bit set so that it runs as root and can attach to the application.

It doesn't take much thought to realize that option (2), if implemented blindly, could leave a big security hole in your application. But, of course, it isn't implemented blindly. When you run saprobe on an application, the application must be listed in $APROBE/lib/secure_applications. This file is created so that it is only writable by root and we check this is still the case at runtime before allowing its use. Let's see what happens when we try to run without an entry for it:

$ saprobe /usr/bin/at
(W) /usr/bin/at
You are running a secure application but the secure_applications file
did not contain an entry for it.
(F) Aprobe will not run this application due to security restrictions. Please see the section on secure applications in the Aprobe user's guide.

The second level of checking is that the files loaded by Aprobe - the runtime libraries and the UALs - must all be owned by root and not writable by anyone else. Additionally, for all UALs except the default system_ual, an entry for them must exist in the secure_applications file under that application. If it doesn't:

saprobe -u trace /usr/bin/at
(W) "/app1/aprobeinst/fred/aprobe_sun_50/ual_lib/trace.ual":
This ual is not valid for your secure application. It must be listed in the secure_applications file under this application. 
(F) Aprobe will not run this application due to security restrictions. Please see the section on secure applications in the Aprobe user's guide.

The format of the secure_applications file is defined in its header. However, it is pretty trivial. For each application we allow we have an "APPLICATION" keyword followed by any number of "FILE" keywords. Another APPLICATION keyword automatically ends the list of allowed files. For instance:

APPLICATION /usr/bin/at
FILE /app1/aprobe/inst/fred/aprobe_sun_50/ual_lib/trace.ual
FILE /opt/product/probes/myprobe.ual
APPLICATION /usr/bin/another_app ...

Impact of Security Measures on RootCause

RootCause builds on top of Aprobe and so has the same protections described above. However, the RootCause intercept mechanism is based on the LD_AUDIT environment variable and must be managed appropriately.

By default, if you set LD_AUDIT to a specific path, Solaris will not load that audit library when the application is run. Annoyingly, later versions of Solaris give a misleading error message about this being a fatal condition which it isn't!

If, however, the audit library is in a secure location and the LD_AUDIT environment variable is appropriately set, it will be loaded by the runtime linker. The path to that library varies between versions of the O/S but, on Solaris 8 and higher, is /usr/lib/secure.

So, to allow RootCause to intercept secure applications, the audit library is placed within here. In order that this does not create a security risk in itself, RootCause ensures that it will only run an application under RootCause if the workspace's script file is secure. If it isn't, you'll get an error message and the application will be run without RootCause.

By this mechanism, we safely control access to the scripts that will execute Aprobe and trigger the protections that Aprobe introduces.

Using the Secure Version of RootCause / Aprobe

The first step that must be taken is to provide appropriate ownership, permissions and location of certain RootCause files. A normal installation of RootCause does not have a secure version of Aprobe, it doesn't locate the audit libraries in secure paths and it may not have appropriate ownership of runtime libraries and UALs.

To create a secure environment, you must log in as root and run the rootcause_libpath script. This takes a number of parameters and must be run on each machines on which you wish to use the secure version of RootCause.

There are two main parts to this:

  1. Creation of the secure Aprobe files. This must be performed once for a given installation of RootCause / Aprobe. In many networks it must be done on the machine that the installation is directly mounted on (e.g. many NFS mounted filesystems do not allow root write access from across the network). The command to update the installation is

    rootcause_libpath -s

    This is described in more detail in "Avoiding Solaris Warnings".

  2. Creation of the secure RootCause files. This must be performed once on each machine you wish to intercept secure applications on. To command to do this is

    rootcause_libpath -c

    Note that you can combine this and the "-s" option where appropriate.

A secondary step for RootCause is to define the workspace as secure. When creating a workspace, check the "Secure Application" checkbox to mark the workspace as secure. This will create runtime scripts that invoke the secure version of Aprobe. If, at a later time, you wish to change the security property of the workspace, you can change it in the Aprobe options tab of the RootCause options dialog (accessed from the Setup menu).

Note that if you build a secure workspace for a non-secure application or vice-versa, you will get error messages at runtime.

64 bit applications

64 bit applications are not yet supported by RootCause. If you require this support, please let us know.

Logging Controls

One of the most fundamental features of RootCause is a robust and fast logging mechanism, both for persistent and wraparound data collection.

RootCause chooses sane defaults for logging, but you may want to change them. There are several main user-selectable options for logging application data provided in the RootCause Options Dialog.

See "RootCause Data Management" for more information.

Multiple Application Tracing

Each application puts its trace data into an application specific workspace. This mapping of application to workspace is defined in the registry.

When viewing trace data, RootCause can add trace data from other applications/workspaces, so that you can view a fully integrated process trace. The traces are automatically ordered so there is a coherent time line for all traced applications.

RootCause collects data into separate files to eliminate contention for a single logging buffer. For example, if you are tracing 10 processes and all 10 are trying to write to the same buffer, then there will be much contention for that buffer and performance would suffer. RootCause solves this problem by logging the data into independent application specific workspaces and then combining the traces in the GUI viewer.

A trace is merged with an existing trace using the Add Selected Process Data operation in the Trace Display Popup Menu of the Trace Display window. You can then use Save As XML or Save As Text to save this merged trace for future examination.

This is illustrated by the Advanced demo delivered with RootCause in $APROBE/demo/RootCause/Advanced. See the README.html file in that directory for a detailed description of that application, the separate Java and C++ portions, and the merging of combined traces.

The ability to view a single time line trace of multiple processes (even on SMP computers) is a very powerful feature of RootCause.

Multiple Executions of a Single Application

It is not uncommon in production environments for a single application to have multiple processes executing simultaneously. RootCause handles this by tracing each process independently.

As mentioned previously, each application has a workspace. In the workspace there are a number of sets of Process Data Sets.

RootCause automatically reuses the oldest of these process data sets upon each new invocation of the registered application. The number of process data sets to keep is specified with "Keep logged data for N previous processes" in the RootCause Options Dialog.

So if you wish to trace a total of 10 simultaneous executions of your application, you will tell RootCause to create at least 10 process data sets in the workspace. Note that this mechanism can also be used to save serial executions of a process too. For example, if you would like to trace the last 4 executions of the registered application, tell RootCause to keep 4 previous processes.

See "RootCause Data Management" for more information.

Libraries with No Debug Information

The RootCause Console GUI takes advantage of Aprobe's APC translator to provide function prototype information for C object modules in shadow header files.

A shadow header file is a legal C header file, containing C type and function prototype definitions and C preprocessor directives (such as #include). The information in this file supplements the information in a compiled object module of the same name, resulting in more useful traces and custom probes.

When you click on the name of a compiled module, say "libc.so", in the Trace Setup Dialog, this causes that module to be opened and searched for debug information provided by the compiler. Then, a shadow header file corresponding to that module--in this case, "libc.so.h"-- is searched for, and if found, the information found there correlated to the symbols read from the module. This results in otherwise "unknown" functions being grouped according to the header file from which they are read, and having parameter type (and often name) information.

Shadow header files are searched for in a "shadow" subdirectory of the .rootcause Directory (e.g., ~/.rootcause/shadow/libm.so.h), and if not found there, in $APROBE/shadow.

OC Systems provides only one or two sample shadow header files on each platform. You're encouraged to add your own, and to contact OC Systems if you need help developing a header file for a particular library. Note that you don't have to provide all the prototypes in the library, only those you need. Conversely, if there are a few extras that aren't in the shadowed library that's okay, too -- they'll be ignored.

The easiest way to create such a file is simply to add #include preprocessor directives for existing C header files provided with your system or compiler. Note that these must be C header files ending in .h, not C++ header files. These are preprocessed using the same environment (include path and preprocessor definitions) as the APC files, but you can edit the files and add your own #define directives as necessary.

Your Application and Different JREs

If you've defined traces for a Java class in a workspace, but if after running the application under RootCause, the RootCause Log shows only an APP_STARTED (but not an APP_TRACED) event for the java program, this indicates that it wasn't recognized as Java. There could be several reasons for this:

  1. The version of Java you're running isn't supported. Check the program name in the rootcause log entry against the supported JREs identified in "System Requirements".

  2. The JRE hierarchy in which RootCause looks for files was unusual, so RootCause could not find the necessary files. In this case you can explicitly register the JRE with the java executable using the rootcause register command. This is done automatically when applying a workspace to a class in the GUI, but the JRE in the execution environment may be different.

  3. The program which is running your class isn't "Java", but some other program which loads a Java plug-in or DLL. See "Using RootCause on an Application with an Embedded JVM".

Using RootCause on an Application with an Embedded JVM

RootCause currently supports probing applications that process Java by using the Sun version 1.2, 1.3 or 1.4 Java runtime ("libjvm.so") library. To do this:

Tracing Java and C++ In One Program

RootCause is designed to support both Java and compiled-language probes and traces in a single application. To do this, you will need a license for both RootCause for Java and RootCause for C++; contact OC Systems if you have questions about this. The RootCause GUI itself is an example of mixing Java and C in an application. It is implemented in Java, but has significant portions of its functionality implemented in C, which is dynamically loaded by Java. To see the Java/C interaction in a trace, one would:

  1. Open a Java Workspace for the Java main class of the application.

  2. Use Workspace->Add Dynamic Module to specify the dynamic C/C++ library that will be loaded.

  3. Click Setup to show the Java classes and dynamically loaded module, and define your traces as usual.

Another common scenario is when a C++ application creates another process to act as its GUI, and communicates with it by sockets. In this case, one creates separate workspaces for the compiled and Java parts of the application, and merges the results, as described in "Multiple Application Tracing".

RootCause J2EE Support

RootCause will work with any J2EE-compliant Enterprise Java Application Server that uses a standard JRE from Sun (version 1.2 or higher). This includes Sun iPlanet 6.5 and AS7, BEA WebLogic 5.1, 6.1 and 7, and JBOSS 3. It will also work with standalone Web Servers such as TomCat.

RootCause can trace an Application Server that is run as a standalone Java JVM (using the java executable) or it can trace a JVM that is embedded within a native executable.

If the Application Server runs as a standalone Java JVM, you can create a workspace just like any other Java application. Make sure RootCause is enabled in the shell or environment you are running the Application Server JVM. Run the Application Server, and find the Java APP_START event in the Trace Display window.

Note: you may need to increase the application server's Java heap size to accommodate RootCause tracing overhead; check your app server documentation.

In the New Workspace Dialog, there is an option for "J2EE Server Directory". Enter the directory where deployable Enterprise Java Bean (EJB) and Servlet classes and jars reside. RootCause will automatically add EJB and Servlet classes and jars that are specified in any J2EE compliant XML deployment descriptors.

Once a Java workspace has been created and opened, the J2EE Modules directory can be changed to another location, or the current directory can be searched again for updated or new J2EE applications. This can be done using Update J2EE Modules in the Workspace Menu.

If the Application Server runs embedded within a native executable, you can create a workspace for the native executable, and then add the libjvm library as a dynamic module. First create a workspace for the executable that runs the Application Server as you would for any other. The open the Trace Setup window.

An Application Server might run an embedded JVM, but already have libjvm library loaded as a dynamic module. If this is the case, the libjvm library will show up in the list of loaded libraries in the Trace Setup Dialog.

If libjvm does not appear as a statically-loaded module in Trace Setup, you must find the server version of the libjvm library (libjvm.so on Solaris,
libjvm.dll on Windows). Once this module has been found, it can be added using Add Dynamic Module in the Workspace Menu.

Once the libjvm module is shown in the Trace Setup window, you can complete the J2EE configuration from the main workspace window using Update J2EE Modules.

RootCause Shipped as Part of Your Application

RootCause is designed to solve problems from a single occurrence while simultaneously reducing support costs. While you can wait until a user reports a problem and then use RootCause to debug it, it is an intended use of RootCause that you include it as part of your application, so your application is always logging trace data. Whenever a user encounters a problem, they merely send you the RootCause collect file, and the root cause analysis of the problem is performed from that file. This greatly simplifies the reporting and debugging of problems. In some cases, for particularly difficult problems, you may have to send a more focused trace to the user site to complete the analysis of the problem, but the RootCause workflow is optimized to do this.

If you plan to include RootCause as part of your shipped application, we suggest that you contact OC Systems support to enter into a discussion with one of our technical staff. It is not difficult, but we can discuss various issues with you to save time and effort.


[Next] [Previous] [Top] [Contents] [Index]

Copyright 2006 OC Systems, Inc.