PowerAda Symbol Naming Conventions

A central issue in the use of many of the PowerAda tools is the association between identifiers in your Ada program and the linker symbols contained in a corresponding Ada object file, shared library, or executable. This section describes the mapping from Ada names to symbols, and gives some hints for finding one from the other.

The Basics

In AIX and Linux, the C compiler generates a unique symbol for each function. This symbol is just the C function name. In AIX, this is preceded by a dot. If the function is passed as a parameter or its address is needed for some other reason, a function descriptor is created for the function, denoted by the name of the function (without the leading dot in AIX). Each symbol encountered by the linker must be unique; when duplicate symbols are encountered they are ignored, so only the code or data associated with the first occurrence of a symbol is used, and others are ignored (with warnings). C requires that all function names be unique. This is not true in languages that allow overloading such as C++ and Ada. In C++, the transformation applied to a function name to support overloading is generally called mangling: a mangled symbol is the function name with some additional characters added to distinguish it from other overloads. This concept is equally applicable to Ada. The remainder of this section describes the PowerAda mangling algorithm. AIX symbols will be used for most of the examples. These examples are equally applicable to Linux, but remember to strip the leading dot for Linux.

The Details

The linker symbol for the instance of TEXT_IO.PUT_LINE that takes no file type parameter is:

.lib_text_io__put_line__1

It is fairly easy to see the resemblance here. Each symbol is of the format

compilation_unit "__" description

where a compilation_unit always starts with either ".lib_" or ".sec_" followed by the Ada unit name in lowercase.

A symbol starts with ".lib_" if it is declared in a package specification, or is a library subprogram. If declared in a package body, or nested in a library subprograms's declarative part, then it starts with ".sec_".

For subprograms, the description is just the subprogram name, possibly with the addition of a sequence number if the subprogram is overloaded, as above.

The symbol name for a non-overloaded library subprogram named diners would be .lib_diners__diners.

Wherever Ada names appear, they could be selected names, containing dots. An embedded dot is not legal in a linker symbol, so a dot is replaced by a single underscore, with the letter immediately following the underscore being uppercase, e.g., the symbol for the procedure PKG_BODY.SUBUNIT.PROC would be

.sec_pkg_body_Subunit__proc

And a subprogram called NESTED declared within PROC would be:

.sec_pkg_body_Subunit__proc_Nested

There are many other kinds of symbols in an object file beside subprogram entry points, however. In those cases the compilation_unit part is the same, but the description has a special value. Here is a summary of the special description values:

ELAB compilation unit elaboration DATA compilation unit static read/write data LIT compilation unit read-only (literal) data Xexception_name a declaration of the exception exception_name Hsubprogram_name the exception-handling portion of the subprogram whose symbol does not have the 'H' Here are some other points to note:

DATA and LIT symbols do not start with a dot ('.').
Ignore symbols ending in "__G".
The compilation_unit portion of a subprogram's symbol name is that containing the subprogram's specification, which may be different from the unit containing its body.
Nested package specification names never appear in a symbol. Nested package body names appear as part of a symbol only if the package body is a subunit.
For items declared in a generic (specification or body), determine all symbols as if the items are declared at the point of the generic instantiation.
Exception symbols are always at the compilation unit level.
A renaming never creates a new symbol: determine the symbol by tracing the renaming back to the real declaration.

Finding Symbols

With the above information you can often determine the linker symbol for an Ada name with no trouble. However, with overloading, nested subprograms, generic instantiations, etc. the above rules are often not enough, or at least there are easier ways. Probably the easiest way is to use the nm command:

$ nm myprog.exe | grep put_line | grep text

This will print out all the symbols with "put_line" in them. Since only those symbols that are referenced are included, there may be only one symbol that matches the rules for subprograms described above.