OCS: Debug at User Sites

From OC Systems Wiki!
Jump to: navigation, search

Performing remote debugging at user sites

Project:

A major contractor created a widely-distributed system for the U.S. Department of Defense designed to assess military readiness for a variety of emergency situations.

Problem:

To remain effective, this system needed close to 100 percent uptime. But the budget wouldn't cover flying senior support engineers to multiple sites to track down every bug that appeared during operational test.

Solution:

The contractor was already using Aprobe technology to find bugs in its integration testing lab. They soon realized its power could be extended to remote debugging.

Probes were defined by the contractor in the contractor's test lab, then sent by e-mail or ftp to user sites. A technician at each site loaded the probes into the Aprobe directory and re-started the system.

As the application ran, trace data was logged. The trace captured code-level, system-level and hardware/software configuration details, minimizing the data each site had to supply manually.

At any point, the trace data could be e-mailed back to support staff, who would step through the trace. This helped them zero in on bugs quickly.

They would also use Aprobe to create a temporary patch to test a fix. When the fix worked, it could be left in place until the next build was ready.

Remarks:

The contractor was able to debug the problem in the customer's environment with- out burdening the customer. Doing remote debugging avoided the high cost and delays of sending senior support staff to perform on-site troubleshooting. The probes had virtually no impact on system performance. These application-specific probes are being used throughout the life of the military system to ensure rapid time-to-resolution of any issues.