9. API checking
- 9.1. Including headers
- 9.2. Specifying APIs to tcc
- 9.3. API Checking Examples
- 9.4. Redeclaring Objects in APIs
- 9.5. Defining Objects in APIs
- 9.6. Stepping Outside an API
- 9.7. Using the System Headers
- 9.8. API usage analysis
9.1. Including headers
The token syntax described in the previous annex provides the means of describing an API specification independently of any particular implementation of the API. Every object in the API specification is described using the appropriate #pragma
token statement. These statements are arranged in TenDRA header files corresponding to the headers comprising the API. Each API consists of a separate set of header files. For example, if the ANSI C89 API is used, the statement:
#include <sys/types.h>
will lead to a header not found
error, whereas the header will be found in the POSIX API.
Where relationships exist between APIs these have been made explicit in the headers. For example, the POSIX version of stdio.h consists of the ANSI version plus some extra objects. This is implemented by making the TenDRA header describing the POSIX version of stdio.h
include the ANSI C89 version of stdio.h
.
9.2. Specifying APIs to tcc
The API against which a program is to be checked is specified to tcc by means of a command-line option of the form -Yapi
where api is the API name. For example, ANSI X3.159 is specified by -Yc89
(this is the default API) and POSIX 1003.1 is specified by -Yposix
(for a full list of the supported APIs see Chapter 2).
Extension APIs, such as X11, require special attention. The API for a program is never just X11, but X11 plus some base API, for example, X11 plus POSIX or X11 plus XPG3. These composite APIs may be specified by, for example, passing the options -Yposix -Yx5_lib
(in that order) to tcc to specify POSIX 1003.1 plus X11 (Release 5) Xlib. The rule is that base APIs, such as POSIX, override the existing API, whereas extension APIs, such as X11, extend it. The command-line option -info
causes tcc to print the API currently in use. For example:
% tcc -Yposix -Yx5_lib -info file.c
will result in the message:
tcc: Information: API is X11 Release 5 Xlib plus POSIX (1003.1).
9.3. API Checking Examples
As an example of the TenDRA compiler's API checking capacities, consider the following program which prints the names and inode numbers of all the files in the current directory:
#include <stdio.h> #include <sys/types.h> #include <dirent.h> int main () { DIR *d = opendir ( "." ); struct dirent *e; if ( d = NULL ) return ( 1 ); while ( e = readdir(d), e != NULL ) { printf ( "%s %lu\n", e->d_name, e->d_ino ); } closedir ( d ); return ( 0 ); }
A first attempted compilation using strict checking:
% tcc -Xs a.c
results in messages to the effect that the headers <sys/types.h>
and <dirent.h>
cannot be found, plus a number of consequential errors. This is because tcc is checking the program against the default API, that is against the ANSI API, and the program is certainly not ANSI compliant. It does look as if it might be POSIX compliant however, so a second attempted compilation might be:
% tcc -Xs -Yposix a.c
This results in one error and three warnings. Dealing with the warnings first, the returns of the calls of printf
and closedir
are being discarded and the variable d
has been set and not used. The discarded function returns are deliberate, so they can be made explicit by casting them to void
. The discarded assignment to d
requires a little more thought - it is due to the mistyping d = NULL
instead of d == NULL
on line 9. The error is more interesting. In full the error message reads:
"a.c":11 printf ( "%s %lu\n", e->d_name, e->d_ino!!!! ); Error:ISO[6.5.2.1]{ANSI[3.5.2.1]}: The identifier 'd_ino' is not a member of 'struct/union posix.dirent.dirent'. ISO[6.3.2.3]{ANSI[3.3.2.3]}: The second operand of '->' must be a member of the struct/union pointed to by the first.
That is, struct dirent
does not have a field called d_ino
. In fact this is true; while the d_name
field of struct dirent
is specified in POSIX, the d_ino
field is an XPG3 extension (This example shows that the TenDRA representation of APIs is able to differentiate between APIs at a very fine level). Therefore a third attempted compilation might be:
% tcc -Xs -Yxpg3 a.c
This leads to another error message concerning the printf
statement, that the types unsigned long
and (the promotion of) ino_t
are incompatible. This is due to a mismatch between the printf
format string %lu
and the type of e->d_ino
. POSIX only says that ino_t
is an arithmetic type, not a specific type like unsigned long
. The TenDRA representation of POSIX reflects this abstract nature of ino_t
, so that the potential portability error is detected. In fact it is impossible to give a printf
string which works for all possible implementations of ino_t
. The best that can be done is to cast e->d_ino
to some fixed type like unsigned long
and print that.
Hence the corrected, XPG3 conformant program reads:
#include <stdio.h> #include <sys/types.h> #include <dirent.h> int main () { DIR *d = opendir ( "." ); struct dirent *e; if ( d == NULL ) return ( 1 ); while ( e = readdir(d), e != NULL ) { ( void ) printf ( "%s %lu\n", e->d_name, ( unsigned long ) e->d_ino ); } ( void ) closedir ( d ); return ( 0 ); }
9.4. Redeclaring Objects in APIs
Of course, it is possible to redeclare the functions declared in the TenDRA API descriptions within the program, provided they are consistent. However, what constitutes a consistent redeclaration in the fully abstract TenDRA machine is not as straightforward as it might seem; an interesting example is malloc in the ANSI API. This is defined by the prototype:
void *malloc ( size_t );
where size_t
is a target dependent unsigned integral type. The redeclaration:
void *malloc ();
is only correct if size_t
is its own integral promotion, and therefore is not correct in general.
Since it is not always desirable to remove these redeclarations (some machines may not have all the necessary functions declared in their system headers) the TenDRA compiler has a facility to accept inconsistent redeclarations of API functions which can be enabled by using the pragma:
#pragma TenDRA incompatible interface declaration allow
This pragma supresses the consistency checking of re-declarations of API functions. Replacing allow by warning causes a warning to be printed. In both cases the TenDRA API description of the function takes precedence. The normal behaviour of flagging inconsistent redeclarations as errors can be restored by replacing allow
by disallow
in the pragma above. (There are also equivalent command-line options to tcc of the form -X:interface_decl=status
, where status can be check
, warn
or dont
.)
9.5. Defining Objects in APIs
Since the program API is meant to define the interface between what the program defines and what the target machine defines, the TenDRA compiler normally raises an error if any attempt is made to define an object from the API in the program itself. A subtle example of this is given by compiling the program:
#include <errno.h> extern int errno;
with the ANSI API. ANSI states that errno
is an assignable lvalue of type int
, and the TenDRA description of the API therefore states precisely that. The declaration of errno
as an extern int
is therefore an inconsistent specification of errno
, but a consistent implementation. Accepting the lesser of two evils, the error reported is therefore that an attempt has been made to define errno
despite the fact that it is part of the API.
Note that if this same program is compiled using the POSIX API, in which errno
is explicitly specified to be an extern int
, the program merely contains a consistent redeclaration of errno
and so does not raise an error.
The neatest workaround for the ANSI case, which preserves the declaration for those machines which need it, is as follows: if errno
is anything other than an extern int
it must be defined by a macro. Therefore:
#include <errno.h> #ifndef errno extern int errno; #endif
should always work.
In most other examples, the definitions are more obvious. For example, a programmer might provide a memory allocator containing versions of malloc
, free
etc.:
#include <stdlib.h> void *malloc ( size_t sz ) { .... } void free ( void *ptr ) { .... }
If this is deliberate then the TenDRA compiler needs to be told to ignore the API definitions of these objects and to use those provided instead. This is done by listing the objects to be ignored using the pragma:
#pragma ignore malloc free ....
(also see section G.10). This should be placed between the API specification and the object definitions. The provided definitions are checked for conformance with the API specifications. There are special forms of this pragma to enable field selectors and objects in the tag namespace to be defined. For example, if we wish to provide a definition of the type div_t
from stdlib.h
we need to ignore three objects - the type itself and its two field selectors - quot
and rem
. The definition would therefore take the form:
#include <stdlib.h> #pragma ignore div_t div_t.quot div_t.rem typedef struct { int quot; int rem; } div_t;
Similarly if we wish to define struct lconv
from locale.h
the definition would take the form:
#include <locale.h> #pragma ignore TAG lconv TAG lconv.decimal_point .... struct lconv { char *decimal_point; .... };
to take into account that lconv
lies in the tag name space. By defining objects in the API in this way, we are actually constructing a less general version of the API. This will potentially restrict the portability of the resultant program, and so should not be done without good reason.
9.6. Stepping Outside an API
Using the TenDRA compiler to check a program against a standard API will only be effective if the appropriate API description is available to the program being tested (just as a program can only be compiled on a conventional machine if the program API is implemented on that machine). What can be done for a program whose API are not supported depends on the degree to which the program API differs from an existing TenDRA API description. If the program API is POSIX with a small extension, say, then it may be possible to express that extension to the TenDRA compiler. For large unsupported program APIs it may be possible to use the system headers on a particular machine to allow for partial program checking (see section H.7).
For small API extensions the ideal method would be to use the token syntax described in Annex G to express the program API to the TenDRA compiler, however this is not currently encouraged because the syntax of such API descriptions is not yet firmly fixed. For the time being it may be possible to use C to express much of the information the TenDRA compiler needs to check the program. For example, POSIX specifies that sys/stat.h
contains a number of macros, S_ISDIR
, S_ISREG
, and so on, which are used to test whether a file is a directory, a regular file, etc. Suppose that a program is basically POSIX conformant, but uses the additional macro S_ISLNK
to test whether the file is a symbolic link (this is in COSE and AES, but not POSIX). A proper TenDRA description of S_ISLNK
would contain the information that it was a macro taking a mode_t
and returning an int
, however for checking purposes it is sufficient to merely give the types. This can be done by pretending that S_ISLNK
is a function:
#ifdef __TenDRA__ /* For TenDRA checking purposes only */ extern int S_ISLNK ( mode_t ); /* actually a macro */ #endif
More complex examples might require an object in the API to be defined in order to provide more information about it (see H.5). For example, suppose that a program is basically ANSI compliant, but assumes that FILE
is a structure with a field file_no
of type int
(representing the file number), rather than a generic type. This might be expressed by:
#ifdef __TenDRA__ /* For TenDRA checking purposes only */ #pragma ignore FILE typedef struct { /* there may be other fields here */ int file_no; /* there may be other fields here */ } FILE; #endif
The methods of API description above are what might be called example implementations
rather than the abstract implementations
of the actual TenDRA API descriptions. They should only be used as a last resort, when there is no alternative way of expressing the program within a standard API. For example, there may be no need to access the file_no
field of a FILE
directly, since POSIX provides a function, fileno
, for this purpose. Extending an API in general reduces the number of potential target machines for the corresponding program.
9.7. Using the System Headers
One possibility if a program API is not supported by the TenDRA compiler is to use the set of system headers on the particular machine on which tcc happens to be running. Of course, this means that the API checking facilities of the TenDRA compiler will not be effective, but it is possible that the other program checking aspects will be of use.
The system headers are not, and indeed are not intended to be, portable. A simple-minded approach to portability checking with the system headers could lead to more portability problems being found in the system headers than in the program itself. A more sophisticated approach involves applying different compilation modes to the system headers and to the program. The program itself can be checked very rigorously, while the system headers have very lax checks applied.
This could be done directly, by putting a wrapper around each system header describing the mode to be applied to that header. However the mechanism of named compilation modes (see 2.2) provides an alternative solution. In addition to the normal -Idir command-line option, tcc also supports the option -Nname:dir, which is identical except that it also associates the identifier name with the directory dir. Once a directory has been named in this way, the name can be used in a directive:
#pragma TenDRA directory name use environment mode
which tells tcc to apply the named compilation mode, mode, to any files included from the directory, name. This is the mechanism used to specify the checks to be applied to the system headers.
The system headers may be specified to tcc using the -Ysystem
command-line option. This specifies /usr/include
as the directory to search for headers and passes a system start-up file to tcc. This system start-up file contains any macro definitions which are necessary for tcc to navigate the system headers correctly, plus a description of the compilation mode to be used in compiling the system headers.
In fact, before searching /usr/include, tcc searches another directory for system headers. This is intended to hold modified versions of any system headers which cause particular problems or require extra information. For example:
-
A version of stdio.h is provided for all systems, which contains the declarations of printf and similar functions necessary for tcc to apply its printf-string checks (see 3.3.2).
-
A version of stdlib.h is provided for all systems which includes the declarations of exit and similar functions necessary for tcc to apply its flow analysis correctly (see 5.7).
-
Versions of stdarg.h and varargs.h are provided for all systems which work with tcc. Most system headers contain built-in functions which are recognised by cc (but not tcc) to deal with these.
The user can also use this directory to modify any system headers which cause problems. For example, not all system headers declare all the functions they should, so it might be desirable to add these declarations.
It should be noted that the system headers and the TenDRA API headers do not mix well. Both are parts of coherent systems of header files, and unless the intersection is very small, it is not usually possible to combine parts of these systems sensibly.
Even a separation, such as compiling some modules of a program using a TenDRA API description and others using the system headers, can lead to problems in the intermodular linking phase (see Chapter 9). There will almost certainly be type inconsistency errors since the TenDRA headers and the system headers will have different representations of the same object.
9.8. API usage analysis
The abstract standard headers provided with the tool are the basis for the API usage analysis checking on dump files described in Chapter 9. The declarations in each abstract header file are enclosed by the following pragmas:
#pragma TenDRA declaration block API_name begin #pragma TenDRA declaration block end
API_name has a standard form e.g. api__ansi__stdio
for stdio.h
in the ANSI API.
This information is output in the dump format as the start and end of a header scope, i.e.
SSH position ref_no = <API_name> SEH position ref_no
The first occurence of each identifier in the dump output contains scope information; in the case of an identifier declared in the abstract headers, this scope information will normally refer to a header scope. Since each use of the identifier can be traced back to its declaration, this provides a means of tracking API usage within the application when the abstract headers are used. The disadvantages of this method are that only APIs for which abstract headers are available can be used. Objects which are not part of the standard APIs are not available and if an application requires such an identifier (or indeed attempts to use a standard API identifier for which the appropriate header has not been included) the resulting errors may distort or even completely halt the dump output resulting in incomplete or incorrect analysis.
The second method of API analysis allows compilation of the application against the system headers, thereby overcoming the problems of non-standard API usage mentioned above. The dump of the application can be scanned to determine the identifiers which are used but not defined within the application itself. These identifiers form the program's external API with the system headers and libraries, and can be compared with API reference information, provided by dump output files produced from the abstract standard headers, to determine the applications API usage.
Analysis performed on the set of dump files produced for an entire application can detect the objects, types, etc. from external APIs which are used by the application. The API usage analysis is enabled by passing one or more -api_checkAPI
flags to tcc where API may be any of the standard APIs listed in section 2.1. The -api_check_outFILE
flag may be used to direct the API analysis information to the file FILE (by default it is written to stdout). The APIs used to perform API usage analysis may be different from those used to process the application. Annex G.8 contains details of the methods used to perform the API usage analysis.