Applications of Aprobe

1. Overview

Aprobe is a unique tool that allows one to add and remove code to an executable without access to the source. It has a number of applications to the various roles in software development, all the way from requirements verification to debugging in the operational environment.

To get a feel for what aprobe can do, you should have already executed the demo. It is also suggested that you read the whitepaper.

You can reference the first parameter by the name "$1". For example, one can examine the first parameter to fred as follows:

    probe thread
    {
      probe "fred"
      {
         log("First parameter is ", $1);
      }
    }

Also, global variables may be referenced by their names, e.g.,

log ("Global var is ", $global_var_name );

where global_var_name is a global variable.

Note that aprobe automatically determines the correct number of bytes to log and the format for the logged data.

This version also does not have support to probe specific lines in the users application. It is nonetheless a very useful tool; our previous version of aprobe had neither of these two features and that product is currently being used in large, complex projects around the world.

The following sections provide a quick introduction for various roles in software development.

Using aprobe tends to be an iterative process. It is suggested that you start out with small probes, evolving towards the final set of probes to accomplish your task.

Lastly, remember that probes are independent and can be added or removed at each new run of the application. Over time, you will acquire a useful library of useful probes that are specific to your application.

If something is missing, or you need a different media format, or you have any other installation or configuration problems, please contact OC Systems by internet at support@ocsystems.com or by telephone at (703)359-8160.

2. Test & Integration

A successful T&I effort requires application knowledge as well as programming and systems programming knowledge. The T and I team will likely not have all of that in their team, so one of the goals for T and I is to minimize their disruption of others as they acquire that knowledge. Aprobe excels here, decoupling the T&I efforts from the actual development efforts.

Before you can do any T&I, you should understand the significant health and performance metrics for the system. Then you will write probes to measure those. Some of those probes will not actually print anything out unless something abnormal occurs. Others will log data to be processed offline to provide metrics and progress statements about the system.

So, get the system architects to talk to the developers; this can be done in an informal manner to identify the key routines and the data that are relevant to major system level activities. Discuss what reports the architects desire (e.g., message arrival times) Be careful not to try and define everything all at once; the danger is is in trying to overachieve. aprobe lends itself very well to an incremental approach: adding and removing logged data and reports as the project and its needs evolves. Specifically, much of the debugging data allocated early on will not be needed later on.

Since aprobe is minimally-invasive, you can change the collected data without perturbing the application. Use an evolutionary approach.

In summary, coordinate with systems engineering and application specialists as to what data should be collected, ask development where in the software such data is available and then produce standard reports from the data. Accept that this process will evolve over time. The goal is that T&I has an set of probes that are included with each and every system run. Much of the collected data may only be examined when an anomaly occurs, yet the data will be collected all the time. Often, new probes will be added on a temporary basis to help development track down specific problems or to investigate specific system aspects. The T&I probes will become very valuable over time. Remember to use some kind of configuration management on them.

Before aprobe, the hardest part of T&I was actually collecting the data. With aprobe collecting the data is simple, instead, now concentrate on the real job: what data gives a good view of the system.

3. Debugging From A Single Occurrence

Modern software engineering principles emphasize that software should be composed of software functions/subprograms being easily readable and small enough to be comprehensible (small enough is often interpreted as fitting on a single page).

The most useful application of aprobe for those in a developer or debugging role is to dump out the parameters (and possibly the times) for specific subprograms in the application. The times can be used to infer performance and performance bottlenecks, and the parameters, along with browsing the source code, provide a very good comprehension of the dynamics of the system and thus is an excellent debugging tool. Virtually all problems can be quickly debugged if you can see the actual parameters and times for the right set of subprograms/functions.

You will likely run many times dumping different parameters/times, so don't put too many in at the beginning or you will be flooded with data. A modern CPU can execute more than 40 million functions a second. When we say aprobe is non-invasive, we don't mean that you can log more than 40 million data items per second without affecting something !!!! You can log more than enough though, don't worry.

Use the demo and the examples to learn about how to dump parameters and use the time functions. Then, try it on a few routines of interest in your application.

Aprobe can be also used to actually affect your application. Perhaps you have a good idea what is going on in the application and what is wrong, but want to check out a fix to be sure. Depending on the fix, it is often quite simple to code the fix into a probe and quickly try it on the application.

4. Fault Injection

Fault injection usually means corrupting some data item during the execution of the application.

There are many different reasons to inject faults into the application, although usually, it is to test an application for robustness. Aprobe is a good tool to inject faults into the application, because these faults can be easily added into any environment, for any particular run, without actually modifying any executable.

With aprobe, it is easy to change data at any particular location in your program (without aprobe, this requires rebuilds of the application), so the main task here is to identify where one wants to corrupt the data and what the corrupted values should be.

After executing the demos, it should be quite clear how to corrupt application data at specific points in the applications execution using aprobe. Because the full C language is available as the aprobe language, quite sophisticated fault scenarios can be developed, although commonly fault injection scenarios are just a few lines of code.

5. Requirements Verification

Aprobe can collect data to verify that certain requirements are being met. The best way to do this is to "Write Probes instead of Prose". Take the list of requirements and for each one (or at least as many as possible), write probes from the requirements prose. Later, you can run the application with those probes and automatically verify whether the application meets (or does not meet) requirements.

You will need to talk to developers to ask such questions as "What routine is called to refresh the screen?", but it is not hard to get such questions answered as you are not really placing additional demands on them. Contrast this with the normal situation of getting development to make special versions of the applications that collect the requested data.

Start with one requirement and write a probe for it. You can insert and remove this probe in any particular run. As you add more and more of these requirements probes, you can rerun the system with them occasionally and check progress against requirements and use these requirements probes as a kind of regression check.

See the OCS whitepaper for a more in-depth discussion and some examples about requirements.

6. Performance Tuning

System level performance problems tend to be observed at a very high level, (for example, end-to-end response time), yet to determine how to fix them, it is necessary to get measurements from "within" the applications themselves. With aprobe, you can easily get timings for pretty much any part of your application: COTS software, application code and system calls.

So if you are interested in performance issues, try the demo; this will show you how to collect timings from pretty much anywhere in your application. From there, start with a few points in your application and then add others as appropriate.

It is very likely that your system is distributed. In this case, you will want to determine what is the current system synchronized clock, so that you can collate the data between different CPUs. You may want to use a different clock than is provided by aprobe.h or you may just want to do a single log of that system clock and the aprobe one, so you can determine the difference and do the time corrections at formatting

time. (On most distributed systems that we have seen, the built in aprobe clock is already maintained by the application to be coherent across the system, so there is likely not a problem here.)

As for how to start, first, decide what you want to measure in the system. This may sound a bit trite, but you will soon need to ask other people questions, and if you know what specifically what you want, they will help you much better than if you take up a bunch of their time with things you should already have decided. For example, in an end-to-end response type problem, you might want to gather the time in-to and out-of each box.

Second, talk to the developers and find out which functions (or lines in the source code) are executed whenever those things happen. You will be surprised at how many times the developers have exactly a routine that was designed specifically to handle that event (this is the way good software is written).

Third, go off and write a probe to collect the time at those points. This is actually the simplest part of this process. The probe will look something like:

   probe thread
   {
     probe "MyClass::Read(int)"
     {
          on_entry
        {
          log("READ entered at ", ap_GetCurrentTime() );
        }
     }
   }

Four, compile the probe, run the application with your probes and get the output using apformat (this is simpler than it sounds. Again, try the demo).

Later, examine the umbrella concept in aprobe. Basically, an umbrella in aprobe is a nested probe and this nested probe allows one to very easily apply probes to a specific call tree. This can be very useful in isolating performance problems. There are samples in this distribution in the \Examples and the \Learn subdirectories.