C/C++ Checker Reference Manual
- i. Introduction
- 1. Configuring the Checker
- 2. Integral Types
- 3. Type Checking
- 4. Control Flow Analysis
- 5. Operator Analysis
- 6. Variable Analysis
- 7. Discard Analysis
- 8. Preprocessing checks
- 8.1. Preprocessor directives
- 8.2. Indented Preprocessing Directives
- 8.3. Multiple macro definitions
- 8.4. Macro arguments
- 8.5. Unmatched quotes
- 8.6. Include depth
- 8.7. Text after
#endif
- 8.8. Text after
#
- 8.9. New line at end of file
- 8.10. Conditional Compilation
- 8.11. Target dependent conditional inclusion
- 8.12. Unused headers
- 9. API checking
- 10. Intermodular analysis
© , , , , , , , The TenDRA Project.
© DERA.
First published .
Revision History
kate | Split off tcc Compilation Modes to create a tccmodes manpage. Moved out Merged appendix of CLI portability options into body text. Merged in linking for symbol table dump files from the The C/C++ Symbol Table Dump document. Merged in compilation scheme for linking C++ spec files from C/C++ Producer Configuration Guide. | |
kate | Moved out the DRA producers as a standalone tool. | |
kate | Moved out a description of the symbol table semantics from into a seperate document, The C/C++ Symbol Table Dump. Moved out Removed API listing. | |
kate | Split off various sections to the C/C++ Producer Implementation, C++ and Portability and Style Guide documents. | |
asmodai | Converted to a new build system. | |
DERA | tdfc2 1.8.2; TenDRA 4.1.2 release. |
i. Introduction
The C program static checker was originally developed as a programming tool to aid the construction of portable programs using the Application Programming Interface (API) model of software portability; the principle underlying this approach being:
If a program is written to conform to an abstract API specification, then that program will be portable to any machine which implements the API specification correctly.
This approach gave the tool an unusually powerful basis for static checking of C programs and a large amount of development work has resulted in the production of the TenDRA C static checker (invoked as tcc -ch
). The terms, TenDRA C checker and tcc -ch
are used interchangably in this document.
Responsibilities of the C static checker are:
-
strict interface checking. In particular, the checker can analyse programs against abstract APIs to check their conformance to the specification. Abstract versions of most standard APIs are provided with the tool; alternatively users can define their own abstract APIs using the syntax described in Annex G;
-
checking of integer sizes, overflows and implicit integer conversions including potential 64-bit problems, against a 16 bit or 32 bit architecture profile;
-
strict ISO C90 standard checking, plus configurable support for many non-ISO dialect features;
-
extensive type checking, including prototype-style checking for traditionally defined functions, conversion checking, type checking on printf and scanf style argument strings and type checking between translation units;
-
variable analysis, including detection of unused variables, use of uninitialised variables, dependencies on order of evaluation in expressions and detection of unused function returns, computed values and static variables;
-
detection of unused header files;
-
configurable tests for detecting many other common programming errors;
-
Complete standard API usage analysis. No API definitions are built in to the checker; these are provided externally. A complete list of API definitions available to tcc is documented by tccenv.
-
Support for user-defined checking profiles. No checking profiles are built-in to the checker; these are provided externally. A complete list of profiles exposed as
-X
modes to tcc as startup files is documented by tccmodes.
1. Configuring the Checker
- 1.1. Individual command line checking options
- 1.2. Customising checking profiles
- 1.3. Scoping checking profiles
- 1.4. Other checks
This section describes the built-in checking modes and the design of customised environments.
There are several methods available for configuring the checker. Most configuration is provided by built-in "modes" which are selected by using the relevant -X
command line option for tcc. These modes are documented by tccmodes.
More detailed customisation may require special #pragma
statements to be incorporated into the source code to be analysed (this commonly takes the form of a startup file). The configuration options generally act independently of one another and unless explicitly forbidden in the descriptions below, they may be combined in any way.
1.1. Individual command line checking options
Some of the checks available can be controlled using a command line option of the form -Xopt,opt,...
, where the various opt options give a comma-separated list of commands. These commands have the form test=status, where test is the name of the check, and status is either check (apply check and give an error if it fails), warn (apply check and give a warning if it fails) or dont (do not apply check). The names of checks can be found with their descriptions in Chapters 3 - 8; for example the check for implicit function declarations described in 3.4.1 may be switched on using -X:implicit_func=check
.
The command line options for portability checking are:
Check | Reference | Command Line Option |
---|---|---|
Weak Prototype Analysis | 3.3.1 | -X:weak_proto=status |
Implicit Function Declaration | 3.4 | -X:implicit_func=status |
Printf String Checking | 3.2.2 | -X:printf=status |
Incompatible Void Returns | 3.2.2 | -X:void_ret=status |
Unreachable Code | 5.2 | -X:unreached=status |
Case Fall Through | 5.3 | -X:fall_thru=status |
Conversion Analysis | 3.2 | -X:convert_all=status |
Integer ↔ Integer Conversion | 3.2.1 | -X:convert_int=status |
-X:convert_int_implicit=status | ||
-X:convert_int_explicit=status | ||
Integer ↔ Pointer Conversion | 3.2.2 | -X:convert_int_ptr=status |
Integer ↔ Pointer Conversion | 3.2.3 | -X:convert_ptr=status |
Complete struct /union Analysis | 8.3 | -X:complete_struct=status |
Variable Analysis | 5.6 | -X:variable=status |
Discard Analysis | 5.8 | -X:discard_all=status |
Discarded Function Returns | 5.8.1 | -X:discard_func_ret=status |
Discarded Values | 5.8.2 | -X:discard_value=status |
Unused Statics | 5.8.3 | -X:unused_static=status |
where status can be check
, warn
or dont
.
1.2. Customising checking profiles
The individual checks performed by the C static checker are generally controlled by #pragma
directives. The reason for this is that the ISO standard places no restrictions on the syntax following a #pragma
preprocessing directive, and most compilers/checkers can be configured to ignore any unknown #pragma
directives they encounter.
Most of these directives begin:
#pragma TenDRA ...
and are always checked for syntactical correctness. The individual directives, together with the checks they control are described in Chapters 3 - 8. Section 2.2 describes the method of constructing a new checking profile from these individual checks.
1.3. Scoping checking profiles
Almost all the available checks are scoped (exceptions will be mentioned in the description of the check). Scopes may be controlled by the same #pragma TenDRA begin
directive described by the C/C++ Producer Configuration Guide.
1.4. Other checks
Several checks of varying utility have been implemented in the C++ producer but do not as yet have individual directives controlling their use. These can be enabled en masse using the directive:
#pragma TenDRA++ catch all allow
It is intended that this directive will be phased out as these checks are assigned controlling directives. It is possible to achieve finer control over these checks by enabling their individual error messages as described above.
2. Integral Types
- 2.1. Integer promotion rules
- 2.2. Arithmetic operations on integer types
- 2.3. Interaction with the integer conversion checks
- 2.4. Target dependent integral types
- 2.5. Integer overflow checks
- 2.6. Integer operator checks
- 2.7. Support for 64 bit integer types (
long long
)
The checks described in the previous chapter involved the detection of conversions which could result in undefined values. Certain conversions involving integral types, however, are defined in the ISO C standard and so might be considered safe and unlikely to cause problems. This unfortunately is not the case: some of these conversions may still result in a change in value; the actual size of each integral type is implementation-dependent; and the "old-style" integer conversion rules which predate the ISO standard are still in common use. The checker provides support for both ISO and traditional integer promotion rules. The set of rules used may be specified independently of the two integral range scenarios, 16 bit(default) and 32 bit, described in section 2.1.2.
The means of specifying and alternative sets of promotion rules, their interaction with the conversion checks described in section 3.2 and the additional checks which may be performed on integers and integer operations are described in the remainder of this chapter.
2.1. Integer promotion rules
The ISO C standard rules may be summarised as follows: long
integral types promote to themselves; other integral types promote to whichever of int
or unsigned int
they fit into. In full the promotions are:
Note that even with these simple built-in types, there is a degree of uncertainty, namely concerning the promotion of unsigned short
. On most machines, int
is strictly larger than short
so the promotion of unsigned short
is int
. However, it is possible for short
and int
to have the same size, in which case the promotion is unsigned int
. When using the ISO C promotion rules, the checker usually avoids making assumptions about the implementation by treating the promotion of unsigned short
as an abstract integral type. If, however, the -Y32bit
option is specified, int
is assumed to be strictly larger than short
, and unsigned short
promotes to int
.
The traditional C integer promotion rules are often referred to as the signed promotion rules. Under these rules, long integral types promote to themselves, as in ISO C, but the other integral types promote to unsigned int
if they are qualified by unsigned
, and int
otherwise. Thus the signed promotion rules may be represented as follows:
The traditional promotion rules are applied in the Xt
built-in environment only. All of the other built-in environments specify the ISO C promotion rules. Users may also specify their own rules for integer promotions and minimum integer ranges; the methods for doing this are described in Annex H.
2.2. Arithmetic operations on integer types
The ISO C standard rules for calculating the type of an arithmetic operation involving two integer types is as follows - work out the integer promotions of the types of the two operands, then:
-
If either promoted type is
unsigned long
, the result type isunsigned long
; -
Otherwise, if one promoted type is
long
and the other isunsigned int
, then if along int
can represent all values of anunsigned int
, the result type islong
; otherwise the result type isunsigned long
; -
Otherwise, if either promoted type is
long
, the result type islong
; -
Otherwise, if either promoted type is
unsigned int
, the result type isunsigned int
; -
Otherwise the result type is
int
.
Both promoted values are converted to the result type, and the operation is then applied.
2.3. Interaction with the integer conversion checks
A simple-minded implementation of the integer conversion checks described in 3.2 would interact badly with these rules. Consider, for example, adding two values of type char:
char f ( char a, char b ) { char c = a + b ; return ( c ) ; }
The various stages in the calculation of c
are as follows - a
and b
are converted to their promotion type, int
, added together to give an int
result, which is converted to a char
and assigned to c
. The conversions of a
and b
from char
to int
are always safe, and so present no difficulties to the integer conversion checks. The conversion of the result from int
to char
, however, is precisely the type of value destroying conversion which these checks are designed to detect.
Obviously, an integer conversion check which flagged all char arithmetic would never be used, thereby losing the potential to detect many subtle portability errors. For this reason, the integer conversion checks are more sophisticated. In all typed languages, the type is used for two purposes - for static type checking and for expressing information about the actual representation of data on the target machine. Essentially it is a confusion between these two roles which leads to the problems above. The C promotion and arithmetic rules are concerned with how data is represented and manipulated, rather than the underlying abstract types of this data. When a
and b
are promoted to int
prior to being added together, this is only a change in representation; at the conceptual level they are still char
's. Again, when they are added, the result may be represented as an int
, but conceptually it is a char
. Thus the assignment to c
, an actual char
, is just a change in representation, not a change in conceptual type.
So each expression may be regarded as having two types - a conceptual type which stands for what the expression means, and a representational type which stands for how the expression is to represented as data on the target machine. In the vast majority of expressions, these types coincide, however the integral promotion and arithmetic conversions are changes of representational, not conceptual, types. The integer conversion checks are concerned with detecting changes of conceptual type, since it is these which are most likely to be due to actual programming errors.
It is possible to define integral types within the TenDRA extensions to C in which the split between concept and representation is made explicit. The pragma:
#pragma TenDRA keyword TYPE for type representation
may be used to introduce a keyword TYPE for this purpose (as with all such pragmas, the precise keyword to be used is left to the user). Once this has been done, TYPE ( r, t )
may be used to represent a type which is conceptually of type t
but is represented as data like type r
. Both t
and r
must be integral types. For example:
TYPE ( int, char ) a ;
declares a variable a which is represented as an int, but is conceptually a char.
In order to maintain compatibility with other compilers, it is necessary to give TYPE
a sensible alternative definition. For all but conversion checking purposes, TYPE ( r, t )
is identical to r
, so a suitable definition is:
#ifdef __TenDRA__ #pragma TenDRA keyword TYPE for type representation #else #define TYPE( r, t ) r #endif
2.4. Target dependent integral types
Since the checker uses only information about the minimum guaranteed ranges of integral types, integer values for which the actual type of the values is unknown may arise. Integer values of undetermined type generally arise in one of two ways: through the use of integer literals and from API types which are not completely specified.
2.4.1. Integer literals
The ISO C rules on the type of integer literals are set out as follows. For each class of integer literals a list of types is given. The type of an integer literal is then the first type in the appropriate list which is large enough to contain the value of the integer literal. The class of the integer literal depends on whether it is decimal, hexadecimal or octal, and whether it is qualified by U
(or u
) or L
(or l
) or both. The rules may be summarised as follows:
These rules are applied in all the built-in checking modes except Xt
. Traditional C does not have the U
and L
qualifiers, so if the Xt
mode is used, these qualifiers are ignored and all integer literals are treated as int
, long
or unsigned long
, depending on the size of the number.
If a number fits into the minimal range for the first type of the appropriate list, then it is of that type; otherwise its type is undetermined and is said to be target dependent. The checker treats target dependent types as abstract integral types which may lead to integer conversion problems. For example, in:
int f ( int n ) { return ( n & 0xff00 ) ; }
the type of 0xff00
is target dependent, since it does not fit into the minimal range for int specified by the ISO C standard (this is detected by the integer overflow analysis described in section 4.6). The arithmetic conversions resulting from the &
operation is detected by the checker's conversion analysis. Note that if the -Y32bit
option is specified to tcc, an int
is assumed to contain at least 32 bits. In this case, 0xff00
fits into the type int, and so this is the type of the integer literal. No invalid integer conversions is then detected.
2.4.2. Abstract API types
Target dependent integral types also occur in API specifications and may be encountered when checking against one of the implementation-independent APIs provided with the checker. The commonest example of this is size_t, which is stated by the ISO C standard to be a target dependent unsigned integral type, and which arises naturally within the language as the type of a sizeof
expression.
The checker has its own internal version of size_t
, wchar_t
and ptrdiff_t
for evaluating static compile-time expressions. These internal types are compatible with the ISO C specification of size_t
, wchar_t
and ptrdiff_t
, and thus are compatible with any conforming definitions of these types found in included files. However, when checking the following program against the system headers, a warning is produced on some machines concerning the implicit conversion of an unsigned int
to type size_t
:
#include <stdlib.h> int main() { size_t size ; size = sizeof( int ) ; }
The system header on the machine in question actually defines size_t
to be a signed int
(this of course contravenes the ISO C standard) but the compile time function sizeof
returns the checker's internal version of size_t
which is an abstract unsigned integral type. By using the pragma:
#pragma TenDRA set size_t:signed int
the checker can be instructed to use a different internal definition of size_t
when evaluating the sizeof
function and the error does not arise. Equivalent options are also available for the ptrdiff_t
and wchar_t
types.
2.5. Integer overflow checks
Given the complexity of the rules governing the types of integers and results of integer operations, as well as the variation of integral ranges with machine architecture, it is hardly surprising that unexpected results of integer operations are at the root of many programming problems. These problems can often be hard to track down and may suddenly appear in an application which was previously considered safe
, when it is moved to a new system. Since the checker supports the concept of a guaranteed minimum size of an integer it is able to detect many potential problems involving integer constants. The pragma:
#pragma TenDRA integer overflow analysis status
where status is on
, warning
or off
, controls a set of checks on arithmetic expressions involving integer constants. These checks cover overflow, use of constants exceeding the minimum guaranteed size for their type and division by zero. They are not enabled in the default mode.
There are two special cases of integer overflow for which checking is controlled seperately:
-
Bitfield sizes
. Obviously, the size of a bitfield must be smaller than or equal to the minimum size of its integral type. A bitfield which is too large is flagged as an error in the default mode. The check on bitfield sizes is controlled by:#pragma TenDRA bitfield overflow permit
where permit is one of
allow
,disallow
orwarning
. -
Octal and hexadecimal escape sequences
. According to the ISO C standard, the value of an octal or hexadecimal escape sequence shall be in the range of representable values for the type unsigned char for an integer character constant, or the unsigned type corresponding towchar_t
for a wide character constant. The check on escape sequence sizes is controlled by:#pragma TenDRA character escape overflow permit
where the options for permit are
allow
,warning
anddisallow
. The check is switched on by default.
2.6. Integer operator checks
The results of some integer operations are undefined by the ISO C standard for certain argument types. Others are implementation-defined or simply likely to produce unexpected results.In the default mode such operations are processed silently, however a set of checks on operations involving integer constants may be controlled using:
#pragma TenDRA integer operator analysis status
where status is replaced by on
, warning
or off
. This pragma enabled checks on:
-
shift operations where an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted;
-
right shift operation with a negative value of signed integral type as the first argument;
-
division operation with a negative operand;
-
test for an unsigned value strictly greater than or less than 0 (these are always true or false respectively);
-
conversion of a negative constant value to an unsigned type;
-
application of unary
-
operator to an unsigned value.
2.7. Support for 64 bit integer types (long long
)
Although the use of long long
to specify a 64 bit integer type is not supported by the ISO C90 standard it is becoming increasingly popular as in programming use. By default, tcc does not support the use of long long
but the checker can be configured to support the long long
type to different degrees using the following pragmas:
#pragma TenDRA longlong type permit
where permit is one of allow
(long long
type accepted), disallow
(errors produced when long long
types are detected) or warning
(long long
types are accepted but a warning is raised).
#pragma TenDRA set longlong type : type_name
where type_name is long
or long long
.
The first pragma determines the behaviour of the checker if the type long long
is encountered as a type specifier. In the disallow case, an error is raised and the type specifier mapped to long
, otherwise the type is stored as long long
although a message alerting the user to the use of long long
is raised in the warning mode. The second pragma determines the semantics of long long
. If the type specified is long long
, then long long
is treated as a separate integer type and if code generation is enabled, long long
types appears in the output. Otherwise the type is mapped to long
and all objects declared long long
are output as if they had been declared long
(a warning is produced when this occurs). In either case, long long
is treated as a distinct integer type for the purpose of integer conversion checking.
Extensions to the integer promotion and arithmetic conversion rules are required for the long long
type. These have been implemented as follows:
-
the types of integer arithmetic operations where neither argument has
long long
type are unaffected; -
long long
andunsigned long long
both promote to themselves; -
the result type of arithmetic operations with one or more arguments of type
unsigned long long
isunsigned long long
; -
otherwise if either argument has type
signed long long
the overall type islong long
if both arguments can be represented in this form, otherwise the type isunsigned long long
.
There are now three cases where the type of an integer arithmetic operation is not completely determined from the type of its arguments, i.e.
3. Type Checking
- 3.1. Type specifications
- 3.2. Type conversions
- 3.3. Function type checking
- 3.4. Overriding type checking
Type checking is relevant to two main areas of C. It ensures that all declarations referring to the same object are consistent (clearly a pre-requisite for a well-defined program). It is also the key to determining when an undefined or unexpected value has been produced due to the type conversions which arise from certain operations in C. Conversions may be explicit (conversion is specified by a cast) or implicit. Generally explicit conversions may be regarded more leniently since the programmer was obviously aware of the conversion, whereas the implications of an implicit conversion may not have been considered.
3.1. Type specifications
3.1.1. Incompatible type qualifiers
The declarations
const int a ; int a ;
are not compatible according to the ISO C standard because the qualifier, const, is present in one declaration but not in the other. Similar rules hold for volatile qualified types. By default, tdfc2 produces an error when declarations of the same object contain different type qualifiers. The check is controlled using:
#pragma TenDRA incompatible type qualifier permit
where the options for permit are allow
, disallow
or warning
.
3.1.2. Elaborated type specifiers
In elaborated type specifiers, the class key (class
, struct
, union
or enum
) should agree with any previous declaration of the type (except that class
and struct
are interchangeable). This requirement can be relaxed using the directive:
#pragma TenDRA ignore struct/union/enum tag on
In ISO C and C++ it is not possible to give a forward declaration of an enumeration type. This constraint can be relaxed using the directive:
#pragma TenDRA forward enum declaration allow
Until the end of its definition, an enumeration type is treated as an incomplete type (as with class types). In enumeration definitions, and a couple of other contexts where comma-separated lists are required, the directive:
#pragma TenDRA extra , allow
can be used to allow a trailing comma at the end of the list.
The directive:
#pragma TenDRA complete struct/union analysis on
can be used to enable a check that every class or union has been completed within each translation unit in which it is declared.
3.1.3. Incomplete structures and unions
ISO C allows for structures or unions to be declared but not defined, provided they are not used in a context where it is necessary to know the complete structure. For example:
struct tag *p;
is allowed, despite the fact that struct tag
is incomplete. The TenDRA C checker has an option to detect such incomplete structures or unions, controlled by:
#pragma TenDRA complete struct/union analysis status
where status is on
to give an error as an incomplete structure or union is detected, warning
to give a warning, or off
to disable the check.
The check can also be controlled by passing the command-line option -X:complete_struct=
state to tdfc2, where state is check
, warn
or dont
.
The only place where the checker can actually detect that a structure or union is incomplete is at the end of the source file. This is because it is possible to complete a structure after it has been used. For example, in:
struct tag *p; struct tag { int a; int b; };
struct tag
is complete despite the fact that it was incomplete in the definition of p
.
3.2. Type conversions
The only types which may be interconverted legally in C are integral types, floating point types and pointer types. Even if these rules are observed, the results of some conversions can be surprising and may vary on different machines. The checker can detect three categories of conversion: integer to integer conversions, pointer to integer and integer to pointer conversions, and pointer to pointer conversions.
In the default mode, the checker allows all integer to integer conversions, explicit integer to pointer and pointer to integer conversions and the explicit pointer to pointer conversions defined by the ISO C standard (all conversions between pointers to function types and other pointers are undefined according to the ISO C standard).
Checks to detect these conversions are controlled by the pragma:
#pragma TenDRA conversion analysis status
Unless explicitly stated to the contrary, throughout the rest of the document where status appears in a pragma statement it represents one of on
(enable the check and produce errors), warning
(enable the check but produce only warnings), or off
(disable the check). Here status may be on
to give an error if a conversion is detected, warning
to produce a warning if a conversion is detected, or off
to switch the checks off. The checks may also be controlled using the command line option -X:test=state
where test
is one of convert_all
, convert_int
, convert_int_explicit
, convert_int_implicit
, convert_int_ptr
and convert_ptr
and state is check
, warn
or dont
.
Due to the serious nature of implicit pointer to integer, implicit pointer to pointer conversions and undefined explicit pointer to pointer conversions, such conversions are flagged as errors by default. These conversion checks are not controlled by the global conversion analysis pragma above, but must be controlled by the relevant individual pragmas given in sections §3.2.2 and §4.5.
3.2.1. Integer to integer conversions
All integer to integer conversions are allowed in C, however some can result in a loss of accuracy and so may be usefully detected. For example, conversions from int to long never result in a loss of accuracy, but conversions from long to int may. The detection of these shortening conversions is controlled by:
#pragma TenDRA conversion analysis ( int-int ) status
Checks on explicit conversions and implicit conversions may be controlled independently using:
#pragma TenDRA conversion analysis ( int-int explicit ) status
and
#pragma TenDRA conversion analysis ( int-int implicit ) status
Objects of enumerated type are specified by the ISO C standard to be compatible with an implementation-defined integer type. However assigning a value of a different integral type other then an appropriate enumeration constant to an object of enumeration type is not really in keeping with the spirit of enumerations. The check to detect the implicit integer to enum type conversions which arise from such assignments is controlled using:
#pragma TenDRA conversion analysis ( int-enum implicit ) status
Note that only implicit conversions are flagged; if the conversion is made explicit, by using a cast, no errors are raised.
As usual status must be replaced by on
, warning
or off
in all the pragmas listed above.
The interaction of the integer conversion checks with the integer promotion and arithmetic rules is an extremely complex issue which is further discussed in Chapter 4.
#pragma TenDRA conversion analysis (int-int explicit) on #pragma TenDRA conversion analysis (int-int implicit) on
will check for unsafe explicit or implicit conversions between arithmetic types. Similarly conversions between pointers and arithmetic types can be checked using:
#pragma TenDRA conversion analysis (int-pointer explicit) on #pragma TenDRA conversion analysis (int-pointer implicit) on
or equivalently:
#pragma TenDRA conversion analysis (pointer-int explicit) on #pragma TenDRA conversion analysis (pointer-int implicit) on
3.2.2. Pointer to integer and integer to pointer conversions
Integer to pointer and pointer to integer conversions are generally unportable and should always be specified by means of an explicit cast. The exception is that the integer zero and null pointers are deemed to be inter-convertible. As in the integer to integer conversion case, explicit and implicit pointer to integer and integer to pointer conversions may be controlled separately using:
#pragma TenDRA conversion analysis ( int-pointer explicit ) status
and
#pragma TenDRA conversion analysis ( int-pointer implicit ) status
or both checks may be controlled together by:
#pragma TenDRA conversion analysis ( int-pointer ) status
Where status may be on
, warning
or off
and pointer-int
may be substituted for int-pointer
.
3.2.3. Pointer to pointer conversions
According to the ISO C standard, section 6.3.4, the only legal pointer to pointer conversions are explicit conversions between:
-
a pointer to an object or incomplete type and a pointer to a different object or incomplete type. The resulting pointer may not be valid if it is improperly aligned for the type pointed to;
-
a pointer to a function of one type and a pointer to a function of another type. If a converted pointer, used to call a function, has a type that is incompatible with the type of the called function, the behaviour is undefined.
Except for conversions to and from the generic pointer which are discussed below, all other conversions, including implicit pointer to pointer conversions, are extremely unportable.
All pointer to pointer conversion may be flagged as errors using:
#pragma TenDRA conversion analysis ( pointer-pointer ) status
Explicit and implicit pointer to pointer conversions may be controlled separately using:
#pragma TenDRA conversion analysis ( pointer-pointer explicit ) status
and
#pragma TenDRA conversion analysis ( pointer-pointer implicit ) status
where, as before, status may be on
, warning
or off
.
Conversion between a pointer to a function type and a pointer to a non-function type is undefined by the ISO C standard and should generally be avoided. The checker can however be configured to treat function pointers as object pointers for conversion using:
#pragma TenDRA function pointer as pointer permit
Unless explicitly stated to the contrary, throughout the rest of the document where permit appears in a pragma statement it represents one of allow
(allow the construct and do not produce errors), warning
(allow the construct but produce warnings when it is detected), or disallow
(produce errors if the construct is detected) Here there are three options for permit: allow
(do not produce errors or warnings for function pointer ↔ pointer conversions); warning
(produce a warning when function pointer ↔ pointer conversions are detected); or disallow
(produce an error for function pointer ↔ pointer conversions).
The generic pointer, void *
, is a special case. All conversions of pointers to object or incomplete types to or from a generic pointer are allowed. Some older dialects of C used char *
as a generic pointer. This dialect feature may be allowed, allowed with a warning, or disallowed using the pragma:
#pragma TenDRA compatible type : char * == void * permit
Where permit is allow
, warning
or disallow
as before.
3.2.4. Additional conversions
There are some further variants which can be used to enable useful sets of conversion checks. For example:
#pragma TenDRA conversion analysis (int-int) on
enables both implicit and explicit arithmetic conversion checks. The directives:
#pragma TenDRA conversion analysis (int-pointer) on #pragma TenDRA conversion analysis (pointer-int) on #pragma TenDRA conversion analysis (pointer-pointer) on
are equivalent to their corresponding explicit forms (because the implicit forms are illegal by default). The directive:
#pragma TenDRA conversion analysis on
is equivalent to the four directives just given. It enables checks on implicit and explicit arithmetic conversions, explicit arithmetic to pointer conversions and explicit pointer conversions.
The default settings for these checks are determined by the implicit and explicit conversions allowed in C++. Note that there are differences between the conversions allowed in C and C++. For example, an arithmetic type can be converted implicitly to an enumeration type in C, but not in C++. The directive:
#pragma TenDRA conversion analysis (int-enum implicit) on
can be used to control the status of this conversion. The level of severity for an error message arising from such a conversion is the maximum of the severity set by this directive and that set by the int-int implicit
directive above.
The implicit pointer conversions described above do not include conversions to and from the generic pointer void *
, which have their own controlling directives. A pointer of type void *
can be converted implicitly to another pointer type in C but not in C++; this is controlled by the directive:
#pragma TenDRA++ conversion analysis (void*-pointer implicit) on
The reverse conversion, from a pointer type to void *
is allowed in both C and C++, and has a controlling directive:
#pragma TenDRA++ conversion analysis (pointer-void* implicit) on
In ISO C and C++, a function pointer can only be cast to other function pointers, not to object pointers or void *
. Many dialects however allow function pointers to be cast to and from other pointers. This behaviour can be controlled using the directive:
#pragma TenDRA function pointer as pointer allow
which causes function pointers to be treated in the same way as all other pointers.
The integer conversion checks described above only apply to unsafe conversions. A simple-minded check for shortening conversions is not adequate, as is shown by the following example:
char a = 1, b = 2 ; char c = a + b ;
the sum a + b
is evaluated as an int
which is then shortened to a char
. Any check which does not distinguish this sort of "safe" shortening conversion from unsafe shortening conversions such as:
int a = 1, b = 2 ; char c = a + b ;
is not likely to be very useful. The producer therefore associates two types with each integral expression; the first is the normal, representation type and the second is the underlying, semantic type. Thus in the first example, the representation type of a + b
is int
, but semantically it is still a char
. The conversion analysis is based on the semantic types.
The C producer supports a directive:
#pragma TenDRA keyword identifier for type representation
whereby a keyword can be introduced which can be used to explicitly declare a type with given representation and semantic components. Unfortunately this makes the C++ grammar ambiguous, so it has not yet been implemented in the C++ producer.
It is possible to allow individual conversions by means of conversion tokens. A procedure token which takes one rvalue expression program parameter and returns an rvalue expression, such as:
#pragma token PROC ( EXP : t : ) EXP : s : conv #
can be regarded as mapping expressions of type t
to expressions of type s
. The directive:
#pragma TenDRA conversion identifier-list allow
can be used to nominate such a token as a conversion token. That is to say, if the conversion, whether explicit or implicit, from t
to s
cannot be done by other means, it is done by applying the token conv
, so:
t a ; s b = a ; // maps to conv ( a )
Note that, unlike conversion functions, conversion tokens can be applied to any types.
3.2.5. Example: 64-bit portability issues
64-bit machines form the next frontier
of program portability. Most of the problems involved in 64-bit portability are type conversion problems. The assumptions that were safe on a 32-bit machine are not necessarily true on a 64-bit machine - int
may not be the same size as long
, pointers may not be the same size as int
, and so on. This example illustrates the way in which the checker's conversion analysis tests can detect potential 64-bit portability problems.
Consider the following code:
#include <stdio.h> void print ( string, offset, scale ) char *string; unsigned int offset; int scale; { string += ( scale * offset ); ( void ) puts ( string ); return; } int main () { char *s = "hello there"; print ( s + 4, 2U, -2 ); return ( 0 ); }
This appears to be fairly simple - the offset of 2U scaled by -2 cancels out the offset in s + 4, so the program just prints hello there
. Indeed, this is what happens on most machines. When ported to a particular 64-bit machine, however, it core dumps. The fairly subtle reason is that the composite offset, scale * offset
, is actually calculated as an unsigned int by the ISO C arithmetic conversion rules. So the answer is not -4. Strictly speaking it is undefined, but on virtually all machines it will be UINT_MAX - 3
. The fact that adding this offset to string is equivalent to adding -4 is only true on machines on which pointers have the same size as unsigned int
. If a pointer contains 64 bits and an unsigned int
contains 32 bits, the result is 232 bytes out.
So the error occurs because of the failure to spot that the offset being added to string is unsigned. All mixed integer type arithmetic involves some argument conversion. In the case above, scale is converted to an unsigned int and that is multiplied by offset to give an unsigned int result. If the implicit int->int conversion checks (see §3.2.1) are enabled, this conversion is detected and the problem may be avoided.
3.3. Function type checking
The importance of function type checking in C lies in the conversions which can result from type mismatches between the arguments in a function call and the parameter types assumed by its definition or between the specified type of the function return and the values returned within the function definition. Until the introduction of function prototypes into ISO standard C, there was little scope for detecting the correct typing of functions. Traditional C allows for absolutely no type checking of function arguments, so that totally bizarre functions, such as:
int f ( n ) int n ; { return ( f ( "hello", "there" ) ) ; }
are allowed, although their effect is undefined. However, the move to fully prototyped programs has been relatively slow. This is partially due to an understandable reluctance to change existing, working programs, but the desire to maintain compatibility with existing C compilers, some of which still do not support prototypes, is also a powerful factor. Prototypes are allowed in the checker's default mode but tdfc2 can be configured to allow, allow with a warning or disallow prototypes, using:
#pragma TenDRA prototype permit
where permit is allow
, disallow
or warning
.
Even if prototypes are not supported the checker has a facility, described below, for detecting incorrectly typed functions.
3.3.1. Type checking non-prototyped functions
The checker offers a method for applying prototype-like checks to traditionally defined functions, by introducing the concept of weak
prototypes. A weak prototype contains function parameter type information, but has none of the automatic argument conversions associated with a normal prototype. Instead weak prototypes imply the usual argument promotion passing rules for non-prototyped functions. The type information required for a weak prototype can be obtained in three ways:
-
A weak prototype may be declared using the syntax:
int f WEAK ( char, char * ) ;
where WEAK represents any keyword which has been introduced using:
#pragma TenDRA keyword WEAK for weak
An alternative definition of the keyword must be provided for other compilers. For example, the following definition would make system compilers interpret weak prototypes as normal (strong) prototypes:
#ifdef __TenDRA__ #pragma TenDRA keyword WEAK for weak #else #define WEAK #endif
The difference between conventional prototypes and weak prototypes can be illustrated by considering the normal prototype for f:
int f ( char, char * ) ;
When the prototype is present, the first argument to f would be passed as a char. Using the weak prototype, however, results in the first argument being passed as the integral promotion of char, that is to say, as an int.
There is one limitation on the declaration of weak prototypes - declarations of the form:
int f WEAK() ;
are not allowed. If a function has no arguments, this should be stated explicitly as:
int f WEAK( void ) ;
whereas if the argument list is not specified, weak prototypes should be avoided and a traditional declaration used instead:
extern int f ();
The checker may be configured to allow, allow with a warning or disallow weak prototype declarations using:
#pragma TenDRA prototype ( weak ) permit
where permit is replaced by
allow
,warning
ordisallow
as appropriate. Weak prototypes are not permitted in the default mode. -
Information can be deduced from a function definition. For example, the function definition:
int f ( c, s ) char c ; char *s ; { ... }
is said to have weak prototype:
int f WEAK( char, char * ) ;
The checker automatically constructs a weak prototype for each traditional function definition it encounters and if the weak prototype analysis mode is enabled (see below) all subsequent calls of the function are checked against this weak prototype.
For example, in the bizarre function in §3.3, the weak prototype:
int f WEAK ( int ) ;
is constructed for f. The subsequent call to
f
:f ( "hello", "there" ) ;
is then rejected by comparison with this weak prototype - not only is
f
called with the wrong number of arguments, but the first argument has a type incompatible with (the integral promotion of)int
. -
Information may be deduced from the calls of a function. For example, in:
extern void f (); void g () { f ( 3 ); f ( "hello" ); }
we can infer from the first call of
f
thatf
takes one integral argument. We cannot deduce the type of this argument, only that it is an integral type whose promotion isint
(since this is how the argument is passed). We can therefore infer a partial weak prototype forf
:void f WEAK ( t );
for some integral type t which promotes to int. Similarly, from the second call of f we can infer the weak prototype:
void f WEAK ( char * );
(the argument passing rules are much simpler in this case). Clearly the two inferred prototypes are incompatible, so an error is raised.
Note that prototype inferred from function calls alone cannot ensure that the uses of the function within a source file are correct, merely that they are consistent. The presence of an explicit function declaration or definition is required for a definitive "right" prototype.
Null pointers cause particular problems with weak prototypes inferred from function calls. For example, in:
#include <stdio.h> extern void f (); void g () { f ( "hello" ); f ( NULL ); }
the argument in the first call of
f
ischar *
whereas in the second it isint
(becauseNULL
is defined to be0
). WhereasNULL
can be converted tochar *
, it is not necessarily passed to procedures in the same way (for example, it may be that pointers have 64 bits and ints have 32 bits). It is almost always necessary to castNULL
to the appropriate pointer type in weak procedure calls.
Functions for which explicitly declared weak prototypes are provided are always type-checked by the checker. Weak prototypes deduced from function declarations or calls are used for type checking if the weak prototype analysis mode is enabled using:
#pragma TenDRA weak prototype analysis status
Where status is one of on
, warning
and off
as usual. Weak prototype analysis is not performed in the default mode.
There is also an equivalent command line option of the form -X:weak_proto=state
, where state can be check
, warn
or dont
.
This section ends with two examples which demonstrate some of the less obvious consequences of weak prototype analysis.
3.3.1.1. Example 1: An obscure type mismatch
As stated above, the promotion and conversion rules for weak prototypes are precisely those for traditionally declared and defined functions. Consider the program:
void f ( n ) long n; { printf ( "%ld\n", n ); } void g () { f ( 3 ); }
The literal constant 3
is an int
and hence is passed as such to f
. f
is however expecting a long
, which can lead to problems on some machines. Introducing a strong prototype declaration of f
for those compilers which understand them:
#ifdef __STDC__ void f ( long ); #endif
will produce correct code - the arguments to a function declared with a prototype are converted to the appropriate types, so that the literal is actually passed as 3L
. This solves the problem for compilers which understand prototypes, but does not actually detect the underlying error. Weak prototypes, because they use the traditional argument passing rules, do detect the error. The constructed weak prototype:
void f WEAK ( long ) ;
conveys the type information that f
is expecting a long
, but accepts the function arguments as passed rather than converting them. Hence, the error of passing an int
argument to a function expecting a long
is detected.
Many programs, seeking to have prototype checks while preserving compilability with non-prototype compilers, adopt a compromise approach of traditional definitions plus prototype declarations for those compilers which understand them, as in the example above. While this ensures correct argument passing in the prototype case, as the example shows it may obscure errors in the non-prototype case.
3.3.1.2. Example 2: Weak prototype checks in defined programs
In most cases a program which fails to compile with the weak prototype analysis enabled is undefined. ISO standard C does however contain an anomalous rule on equivalence of representation. For example, in:
extern void f () ; void g () { f ( 3 ) ; f ( 4U ) ; }
the TenDRA checker detects an error - in one instance f
is being passed an int, whereas in the other it is being passed an unsigned int
. However, the ISO C standard states that, for values which fit into both types, the representation of a number as an int
is equal to that as an unsigned int
, and that values with the same representation are interchangeable in procedure arguments. Thus the program is defined. The justification for raising an error or warning for this program is that the prototype analysis is based on types, not some weaker notion of equivalence of representation
. The program may be defined, but it is not type correct.
Another case in which a program is defined, but not correct, is where an unnecessary extra argument is passed to a function. For example, in:
void f ( a ) int a ; { printf ( "%d\n", a ) ; } void g () { f ( 3, 4 ) ; }
the call of f
is defined, but is almost certainly a mistake.
3.3.2. Weak function prototypes
The C producer supports a concept, weak prototypes, whereby type checking can be applied to the arguments of a non-prototype function. This checking can be enabled using the directive:
#pragma TenDRA weak prototype analysis on
The concept of weak prototypes is not applicable to C++, where all functions are prototyped. The C++ producer does allow the syntax for explicit weak prototype declarations, but treats them as if they were normal prototypes. These declarations are denoted by means of a keyword, WEAK
say, introduced by the directive:
#pragma TenDRA keyword identifier for weak
preceding the (
of the function declarator. The directives:
#pragma TenDRA prototype allow #pragma TenDRA prototype (weak) allow
which can be used in the C producer to warn of prototype or weak prototype declarations, are similarly ignored by the C++ producer.
The C producer also allows the directives:
#pragma TenDRA argument type-id as type-id #pragma TenDRA argument type-id as ... #pragma TenDRA extra ... allow #pragma TenDRA incompatible promoted function argument allow
which control the compatibility of function types. These directives are ignored by the C++ producer (some of them would make sense in the context of C++ but would over-complicate function overloading).
3.3.3. printf
and scanf
argument checking
The C producer includes a number of checks that the arguments in a call to a function in the printf
or scanf
families match the given format string. The check is implemented by using the directives:
#pragma TenDRA type identifier for ... printf #pragma TenDRA type identifier for ... scanf
to introduce a type representing a printf
or scanf
format string. For most purposes this type is treated as const char *
, but when it appears in a function declaration it alerts the producer that any extra arguments passed to that function should match the format string passed as the corresponding argument. The TenDRA API headers conditionally declare printf
, scanf
and similar functions in something like the form:
#ifdef __NO_PRINTF_CHECKS typedef const char *__printf_string ; #else #pragma TenDRA type __printf_string for ... printf #endif int printf ( __printf_string, ... ) ; int fprintf ( FILE *, __printf_string, ... ) ; int sprintf ( char *, __printf_string, ... ) ;
These declarations can be skipped, effectively disabling this check, by defining the __NO_PRINTF_CHECKS
macro.
These printf
and scanf
format string checks have not yet been implemented in the C++ producer due to presence of an alternative, type checked, I/O package - namely <iostream>
. The format string types are simply treated as const char *
.
3.3.4. Checking printf strings
Normally functions which take a variable number of arguments offer only limited scope for type checking. For example, given the prototype:
int execl ( const char *, const char *, ... ) ;
the first two arguments may be checked, but we have no hold on any subsequent arguments (in fact in this example they should all be const char *, but C does not allow this information to be expressed). Two classes of functions of this form, namely the printf and scanf families, are so common that they warrant special treatment. If one of these functions is called with a constant format string, then it is possible to use this string to deduce the types of the extra arguments that it is expect ing. For example, in:
printf ( "%ld", 4 ) ;
the format string indicates that printf
is expecting a single additional argument of type long
. We can therefore deduce a quasi-prototype which this particular call to printf
should conform to, namely:
int printf ( const char *, long ) ;
In fact this is a mixture of a strong prototype and a weak prototype. The first argument comes from the actual prototype of printf
, and hence is strong. All subsequent arguments correspond to the ellipsis part of the printf
prototype, and are passed by the normal promotion rules. Hence the long
component of the inferred prototype is weak (see 3.3.1). This means that the error in the call to printf
- the integer literal is passed as an int
when a long
is expected - is detected.
In order for this check to take place, the function declaration needs to tell the checker that the function is like printf
. This is done by introducing a special type, PSTRING
say, to stand for a printf
string, using:
#pragma TenDRA type PSTRING for ... printf
For most purposes this is equivalent to:
typedef const char *PSTRING;
except that when a function declaration:
int f ( PSTRING, ... );
is encountered the checker knows to deduce the types of the arguments corresponding to the ... from the PSTRING
argument (the precise rules it applies are those set out in the XPG4 definition of fprintf
). If this mechanism is used to apply printf
style checks to user defined functions, an alternative definition of PSTRING
for conventional compilers must be provided. For example:
#ifdef __TenDRA__ #pragma TenDRA type PSTRING for ... printf #else typedef const char *PSTRING; #endif
There are similar rules with scanf
in place of printf
.
The TenDRA descriptions of the standard APIs use this mechanism to describe those functions, namely printf
, fprintf
and sprintf
, and scanf
, fscanf
and sscanf
which are of these forms. This means that the checks are switched on for these functions by default. However, these descriptions are under the control of a macro, __NO_PRINTF_CHECKS
, which, if defined before stdio.h
is included, effectively switches the checks off. This macro is defined in the start-up files for certain checking modes, so that the checks are disabled in these modes (see chapter 2). The checks can be enabled in these cases by #undef
'ing the macro before including stdio.h
. There are equivalent command-line options to tdfc2 of the form -X:printf=state
, where state can be check
or dont
, which respectively undefine and define this macro.
3.3.5. Function return checking
Function returns normally present no difficulties. The return value is converted, as if by assignment, to the function return type, so that the problem is essentially one of type conversion (see 3.2). There is however one anomalous case. A plain return statement, without a return value, is allowed in functions returning a non-void type, the value returned being undefined. For example, in:
int f ( int c ) { if ( c ) return ( 1 ) ; return ; }
the value returned when c is zero is undefined. The test for detecting such void returns is controlled by:
#pragma TenDRA incompatible void return permit
where permit may be allow
, warning
or disallow
as usual.
There are also equivalent command line options to tdfc2 of the form -X:void_ret=state
, where state can be check
, warn
or dont
. Incompatible void returns are allowed in the default mode and of course, plain return statements in functions returning void are always legal.
This check also detects functions which do not contain a return statement, but fall out of the bottom of the function as in:
int f ( int c ) { if ( c ) return ( 1 ) ; }
Occasionally it may be the case that such a function is legal, because the end of the function is not reached. Unreachable code is discussed in section §4.1.
3.3.6. Overloaded functions
Older dialects of C++ did not report ambiguous overloaded function resolutions, but instead resolved the call to the first of the most viable candidates to be declared. This behaviour can be controlled using the directive:
#pragma TenDRA++ ambiguous overload resolution allow
There are occasions when the resolution of an overloaded function call is not clear. The directive:
#pragma TenDRA++ overload resolution allow
can be used to report the resolution of any such call (whether explicit or implicit) where there is more than one viable candidate.
An interesting consequence of compiling C++ in a target independent manner is that certain overload resolutions can only be determined at install-time. For example, in:
int f ( int ) ; int f ( unsigned int ) ; int f ( long ) ; int f ( unsigned long ) ; int a = f ( sizeof ( int ) ) ; // which f?
the type of the sizeof
operator, size_t
, is target dependent, but its promotion must be one of the types int
, unsigned int
, long
or unsigned long
. Thus the call to f
always has a unique resolution, but what it is is target dependent. The equivalent directives:
#pragma TenDRA++ conditional overload resolution allow #pragma TenDRA++ conditional overload resolution (complete) allow
can be used to warn about such target dependent overload resolutions. By default, such resolutions are only allowed if there is a unique resolution for each possible implementation of the argument types (note that, for simplicity, the possibility of long long
implementation types is ignored). The directive:
#pragma TenDRA++ conditional overload resolution (incomplete) allow
can be used to allow target dependent overload resolutions which only have resolutions for some of the possible implementation types (if one of the f
declarations above was removed, for example). If the implementation does not match one of these types then an install-time error is given.
There are restrictions on the set of candidate functions involved in a target dependent overload resolution. Most importantly, it should be possible to bring their return types to a common type, as if by a series of ?:
operations. This common type is the type of the target dependent call. By this means, target dependent types are prevented from propagating further out into the program. Note that since sets of overloaded functions usually have the same semantics, this does not usually present a problem.
3.4. Overriding type checking
There are several commonly used features of C, some of which are even allowed by the ISO C standard, which can circumvent or hinder the type-checking of a program. The checker may be configured either to enforce the absence of these features or to support them with or without a warning, as described below.
3.4.1. Implicit Function Declarations
The ISO C standard states that any undeclared function is implicitly assumed to return int. For example, in ISO C:
int f ( int c ) { return ( g( c ) + 1 ) ; }
the undeclared function g is inferred to have a declaration:
extern int g () ;
This can potentially lead to program errors. The definition of f
would be valid if g
actually returned double, but incorrect code would be produced. Again, an explicit declaration might give us more information about the function argument types, allowing more checks to be applied.
Therefore the best chance of detecting bugs in a program and ensuring its portability comes from having each function declared before it is used. This means detecting implicit declarations and replacing them by explicit declarations. By default implicit function declarations are allowed, however the pragma:
#pragma TenDRA implicit function declaration status
may be used to determine how tdfc2 handles implicit function declarations. Status is replaced by on
to allow implicit declarations, warning
to allow implicit declarations but to produce a warning when they occur, or off
to prevent implicit declarations and raise an error where they would normally be used.
(There are also equivalent command-line options to tcc of the form -X:implicit_func=state
, where state can be check
, warn
or dont
.)
This test assumes an added significance in API checking. If a programmer wishes to check that a certain program uses nothing outside the POSIX API, then implicitly declared functions are a potential danger area. A function from outside POSIX could be used without being detected because it has been implicitly declared. Therefore, the detection of implicitly declared functions is vital to rigorous API checking.
3.4.2. Function Parameters
Many systems pass function arguments of differing types in the same way and programs are sometimes written to take advantage of this feature. The checker has a number of options to resolve type mismatches which may arise in this way and would otherwise be flagged as errors:
- Type-type compatibility
-
When comparing function prototypes for compatibility, the function parameter types must be compared. If the parameter types would otherwise be incompatible, they are treated as compatible if they have previously been introduced with a type-type parameter compatibility pragma i.e.
#pragma TenDRA argument type-name as type-name
where type-name is the name of any type. This pragma is transitive and the second type in the pragma is taken to be the final type of the parameter.
- Type-ellipsis compatibility
-
Two function prototypes with different numbers of arguments are compatible if:
-
both prototypes have an ellipsis;
-
each parameter type common to both prototypes is compatible;
-
each extra parameter type in the prototype with more parameters, is either specified in a type-ellipsis compatibility pragma or is type-type compatible (see above) to a type that is specified in a type-ellipsis compatibility.
Type-ellipsis compatibility is introduced using the pragma:
#pragma TenDRA argument type-name as ...
where again
type-name
is the name of any type. -
- Ellipsis compatibility
-
If, when comparing two function prototypes for compatibility, one has an ellipsis and the other does not, but otherwise the two types would be compatible, then if an `extra' ellipsis is allowed, the types are treated as compatible. The pragma controlling ellipsis compatibility is:
#pragma TenDRA extra ... permit
where permit may be
allow
,disallow
orwarning
as usual.
3.4.3. Incompatible promoted function arguments
Mixing the use of prototypes with old-fashioned function definitions can result in incorrect code. For example, in the program below the function argument promotion rules are applied to the definition of f
, making it incompatible with the earlier prototype (a
is converted to the integer promotion of char
, i.e. int
).
int f ( char ); int f ( a ) char a; { ... }
An incompatible type error is raised in the default checking mode. The check for incompatible types which arise from mixtures of prototyped and non-prototyped function declarations and definitions is controlled using:
#pragma TenDRA incompatible promoted function argument
Permit
may be replaced by allow
, warning
or disallow
as normal. The parameter type in the resulting function type is the promoted parameter type.
4. Control Flow Analysis
- 4.1. Unreachable code analysis
- 4.2. Case fall through
- 4.3. Enumerations controlling switch statements
- 4.4. Empty if statements
- 4.5. Use of assignments as control expressions
- 4.6. Constant control expressions
- 4.7. Conditional and iteration statements
- 4.8. Exception analysis
The checker has a number of features which can be used to help track down potential programming errors relating to the use of variables within a source file and the flow of control through the program. Examples of this are detecting sections of unused code, and flagging expressions that depend upon the order of evaluation where the order is not defined.
4.1. Unreachable code analysis
Consider the following function definition:
int f ( int n ) { if ( n ) { return ( 1 ); } else { return ( 0 ); } return ( 2 ); }
The final return statement is redundant since it can never be reached. The test for unreachable code is controlled by:
#pragma TenDRA unreachable code permit
where permit is replaced by disallow
to give an error if unreached code is detected, warning
to give a warning, or allow
to disable the test (this is the default).
There are also equivalent command-line options to tcc of the form -X:unreached=
state
, where state
can be check
, warn
or dont
.
Annotations to the code in the form of user-defined keywords may be used to indicate that a certain statement is genuinely reached or unreached. These keywords are introduced using:
#pragma TenDRA keyword REACHED for set reachable #pragma TenDRA keyword UNREACHED for set unreachable
The statement REACHED
then indicates that this portion of the program is actually reachable, whereas UNREACHED
indicates that it is unreachable. For example, one way of fixing the program above might be to say that the final return is reachable (this is a blatant lie, but never mind). This would be done as follows:
int f ( int n ) { if ( n ) { return ( 1 ); } else { return ( 0 ); } REACHED return ( 2 ); }
An example of the use of UNREACHED
might be in the function below which falls out of the bottom without a return statement. We might know that, because it is never called with c equal to zero, the end of the function is never reached. This could be indicated as follows:
int f ( int c ) { if ( c ) return ( 1 ); UNREACHED }
As always, if new keywords are introduced into a program then definitions need to be provided for conventional compilers. In this case, this can be done as follows:
#ifdef __TenDRA__ #pragma TenDRA keyword REACHED for set reachable #pragma TenDRA keyword UNREACHED for set unreachable #else #define REACHED #define UNREACHED #endif
The directive:
#pragma TenDRA unreachable code allow
enables a flow analysis check to detect unreachable code. It is possible to assert that a statement is reached or not reached by preceding it by a keyword introduced by one of the directives:
#pragma TenDRA keyword identifier for set reachable #pragma TenDRA keyword identifier for set unreachable
The fact that certain functions, such as exit
, do not return a value can be exploited in the flow analysis routines. The equivalent directives:
#pragma TenDRA bottom identifier #pragma TenDRA++ type identifier for bottom
can be used to introduce a typedef
declaration for the type, bottom, returned by such functions. The TenDRA API headers declare exit
and similar functions in this way, for example:
#pragma TenDRA bottom __bottom __bottom exit ( int ) ; __bottom abort ( void ) ;
The bottom type is compatible with void
in function declarations to allow such functions to be redeclared in their conventional form.
4.2. Case fall through
Another flow analysis check concerns fall through in case statements. For example, in:
void f ( int n ) { switch ( n ) { case 1 : puts ( "one" ); case 2 : puts ( "two" ); } }
the control falls through from the first case to the second. This may be due to an error in the program (a missing break statement), or be deliberate. Even in the latter case, the code is not particularly maintainable as it stands - there is always the risk when adding a new case that it will interrupt this carefully contrived flow. Thus it is customary to comment all case fall throughs to serve as a warning.
In the default mode, the TenDRA C checker ignores all such fall throughs. A check to detect fall through in case statements is controlled by:
#pragma TenDRA fall into case permit
where permit is allow
(no errors), warning
(warn about case fall through) or disallow
(raise errors for case fall through).
There are also equivalent command-line options to tcc of the form -X:fall_thru=
state
, where state
can be check
, warn
or dont
.
Deliberate case fall throughs can be indicated by means of a keyword, which has been introduced using:
#pragma TenDRA keyword FALL_THROUGH for fall into case
Then, if the example above were deliberate, this could be indicated by:
void f ( int n ) { switch ( n ) { case 1 : puts ( "one" ); FALL_THROUGH case 2 : puts ( "two" ); } }
Note that FALL_THROUGH
is inserted between the two cases, rather than at the end of the list of statements following the first case.
If a keyword is introduced in this way, then an alternative definition needs to be introduced for conventional compilers. This might be done as follows:
#ifdef __TenDRA__ #pragma TenDRA keyword FALL_THROUGH for fall into case #else #define FALL_THROUGH #endif
4.3. Enumerations controlling switch statements
Enumerations are commonly used as control expressions in switch statements. When case labels for some of the enumeration constant belonging to the enumeration type do not exist and there is no default label, the switch statement has no effect for certain possible values of the control expression. Checks to detect such switch statements are controlled by:
#pragma TenDRA enum switch analysis status
where status is on
(raise an error), warning
(produce a warning), or off
(the default mode when no errors are produced).
4.4. Empty if statements
Consider the following C statements:
if ( var1 == 1 ) ; var2 = 0 ;
The conditional statement serves no purpose here and the second statement will always be executed regardless of the value of var1. This is almost certainly not what the programmer intended to write. A test for if statements with no body is controlled by:
#pragma TenDRA extra ; after conditional permit
with the usual allow
(this is the default setting), warning
and disallow
options for permit.
4.5. Use of assignments as control expressions
Using the C assignment operator, =
, when the equality operator ==
was intended is an extremely common problem. The pragma:
#pragma TenDRA assignment as bool permit
is used to control the treatment of assignments used as the controlling expression of a conditional statement or a loop, e.g.
if( var = 1 ) { ...
The options for permit are allow
, warning
and disallow
. The default setting allows assignments to be used as control statements without raising an error.
4.6. Constant control expressions
Statements with constant control expressions are not really conditional at all since the value of the control statement can be evaluated statically. Although this feature is sometimes used in loops, relying on a break, goto or return statement to end the loop, it may be useful to detect all constant control expressions to check that they are deliberate. The check for statically constant control expressions is controlled using:
#pragma TenDRA const conditional permit
where permit may be replaced by disallow
to give an error when constant control expressions are encountered, warning
to replace the error by a warning, or the check may be switched off using allow
(this is the default).
4.7. Conditional and iteration statements
The directive:
#pragma TenDRA const conditional allow
can be used to enable a check for constant expressions used in conditional contexts. A literal constant is allowed in the condition of a while
, for
or do
statement to allow for such common constructs as:
while ( true ) { // while statement body }
and target dependent constant expressions are allowed in the condition of an if
statement, but otherwise constant conditions are reported according to the status of this check.
The common error of writing =
rather than ==
in conditions can be detected using the directive:
#pragma TenDRA assignment as bool allow
which can be used to disallow such assignment expressions in contexts where a boolean is expected. The error message can be suppressed by enclosing the assignment within parentheses.
Another common error associated with iteration statements, particularly with certain brace styles, is the accidental insertion of an extra semicolon as in:
for ( init ; cond ; step ) ; { // for statement body }
The directive:
#pragma TenDRA extra ; after conditional allow
can be used to enable a check for such suspicious empty iteration statement bodies (it actually checks for ;{
).
4.8. Exception analysis
The ISO C++ rules do not require exception specifications to be checked statically. This is to facilitate the integration of large systems where a single change in an exception specification could have ramifications throughout the system. However it is often useful to apply such checks, which can be enabled using the directive:
#pragma TenDRA++ throw analysis on
This detects any potentially uncaught exceptions and other exception problems. In the error messages arising from this check, an uncaught exception of type ...
means that an uncaught exception of an unknown type (arising, for example, from a function without an exception specification) may be thrown. For example:
void f ( int ) throw ( int ) ; void g ( int ) throw ( long ) ; void h ( int ) ; void e () throw ( int ) { f ( 1 ) ; // OK g ( 2 ) ; // uncaught 'long' exception h ( 3 ) ; // uncaught '...' exception }
5. Operator Analysis
5.1. Order of evaluation
The ISO C standard specifies certain points in the expression syntax at which all prior expressions encountered are guaranteed to have been evaluated. These positions are called sequence points and occur:
-
after the arguments and function expression of a function call have been evaluated but before the call itself;
-
after the first operand of a logical &&, or || operator;
-
after the first operand of the conditional operator, ?:;
-
after the first operand of the comma operator;
-
at the end of any full expression (a full expression may take one of the following forms: an initialiser; the expression in an expression statement; the controlling expression in an
if
,while
,do
orswitch
statement; each of the three optional expressions of afor
statement; or the optional expression of areturn
statement).
Between two sequence points however, the order in which the operands of an operator are evaluated, and the order in which side effects take place is unspecified - any order which conforms to the operator precedence rules above is permitted. For example:
var = i + arr[ i++ ] ;
may evaluate to different values on different machines, depending on which argument of the +
operator is evaluated first. The checker can detect expressions which depend on the order of evaluation of sub-expressions between sequence points and these are flagged as errors or warnings when the variable analysis is enabled.
5.2. Operator precedence
The ISO C standard section 6.3, provides a set of rules governing the order in which operators within expressions should be applied. These rules are said to specify the operator precedence and are summarised in the table over the page. Operators on the same line have the same precedence and the rows are in order of decreasing precedence. Note that the unary +
, -
, *
and &
operators have higher precedence than the binary forms and thus appear higher in the table.
The precedence of operators is not always intuitive and often leads to unexpected results when expressions are evaluated. A particularly common example is to write:
if ( var & TEST == 1) { ... } else { ...
assuming that the control expression will be evaluated as:
( ( var & TEST ) == 1 )
However, the ==
operator has a higher precedence than the bitwise &
operator and the control expression is evaluated as:
( var & ( TEST == 1 ) )
which in general will give a different result.
Operators | Precedence |
---|---|
function call() [] -> . ++(postfix) --(postfix) | highest |
! ~ ++ -- + - * & (type) sizeof | |
* / % | |
+ (binary) - (binary) | |
<< >> | |
< <= > >= | |
== != | |
& | |
^ | |
| | |
&& | |
|| | |
?: | |
= += -= *= /= %= &= ^= |= <<= >>= | |
, | lowest |
The TenDRA C checker can be configured to flag expressions containing certain operators whose precedence is commonly confused, namely:
-
&&
versus||
-
<<
and>>
versus+
and-
-
&
versus==
!=
<
>
<=
>=
+
and-
-
^
versus&
==
|=
<
>
<=
>=
+
and-
-
|
versus^
&
==
|=
<
>
<=
>=
+
and-
The check is switched off by default. The the directive:
#pragma TenDRA operator precedence analysis on
can be used to enable a check for expressions where the operator precedence is not necessarily what might be expected. The intended precedence can be clarified by means of explicit parentheses. The precedence levels checked are as follows:
-
&&
versus||
. -
<<
and>>
versus binary+
and-
. -
Binary
&
versus binary+
,-
,==
,!=
,>
,>=
,<
and<=
. -
^
versus binary&
,+
,-
,==
,!=
,>
,>=
,<
and<=
. -
|
versus binary^
,&
,+
,-
,==
,!=
,>
,>=
,<
and<=
.
Also checked are expressions such as a < b < c
which do not have their normal mathematical meaning. For example, in:
d = a << b + c ; // precedence is a << ( b + c )
the precedence is counter-intuitive, although strangely enough, it isn't in:
cout << b + c ; // precedence is cout << ( b + c )
Other dubious arithmetic operations can be checked for using the directive:
#pragma TenDRA integer operator analysis on
This includes checks for operations, such as division by a negative value, which are implementation dependent, and those such as testing whether an unsigned value is less than zero, which serve no purpose. Similarly the directive:
#pragma TenDRA++ pointer operator analysis on
checks for dubious pointer operations. This includes very simple bounds checking for arrays and checking that only the simple literal 0
is used in null pointer constants:
char *p = 1 - 1 ; // valid, but weird
The directive:
#pragma TenDRA integer overflow analysis on
is used to control the treatment of overflows in the evaluation of integer constant expressions. This includes the detection of division by zero.
5.3. Floating point equality
Due to the rounding errors that occur in the handling of floating point values, comparison for equality between two floating point values is a hazardous and unpredictable operation. Tests for equality of two floating point numbers are controlled by:
#pragma TenDRA floating equality permit
where permit is allow
, warning
or disallow
. By default the check is switched off.
5.4. Operand of sizeof
According to the ISO C standard, section 6.3.3.4, the operand of the sizeof
operator is not itself evaluated. If the operand has any side-effects these will not occur. When the variable analysis is enabled, the checker detects the use of expressions with side-effects in the operand of the sizeof operator.
6. Variable Analysis
- 6.1. Variable lifetime analysis
- 6.2. 5.6.2 Modification between sequence points
- 6.3. Unused variables
- 6.4. Values set and not used
- 6.5. Variable which has not been set is used
- 6.6. Variable shadowing
- 6.7. Overriding the variable analysis
The variable analysis checks are controlled by:
#pragma TenDRA variable analysis status
Where status is on
, warning
or off
as usual. The checks are switched off in the default mode.
There are also equivalent command line options to tdfc2 of the form -X:variable=
state
, where state
can be check
, warn
or dont
.
The variable analysis is concerned with the evaluation of expressions and the use of local variables, including function arguments. Occasionally it may not be possible to statically perform a full analysis on an expression or variable and in these cases the messages produced indicate that there may be a problem. If a full analysis is possible a definite error or warning is produced. The individual checks are listed in sections 5.6.1 to 5.6.6 and section 5.7 describes the source annotations which can be used to fine-tune the variable analysis.
6.1. Variable lifetime analysis
The directive:
#pragma TenDRA variable analysis on
enables checks on the uses of automatic variables and function parameters. These checks detect:
-
If a variable is not used in its scope.
-
If the value of a variable is used before it has been assigned to.
-
If a variable is assigned to twice without an intervening use.
-
If a variable is assigned to twice without an intervening sequence point.
as illustrated by the variables a
, b
, c
and d
respectively in:
void f () { int a ; // a never used int b ; int c = b ; // b not initialised c = 0 ; // c assigned to twice int d = 0 ; d = ++d ; // d assigned to twice }
The second, and more particularly the third, of these checks requires some fairly sophisticated flow analysis, so any hints which can be picked up from exhaustive switch
statements etc. is likely to increase the accuracy of the errors detected.
In a non-static member function the various non-static data members are analysed as if they were automatic variables. It is checked that each member is initialised in a constructor. A common source of initialisation problems in a constructor is that the base classes and members are initialised in the canonical order of virtual bases, non-virtual direct bases and members in the order of their declaration, rather than in the order in which their initialisers appear in the constructor definition. Therefore a check that the initialisers appear in the canonical order is also applied.
It is possible to change the state of a variable during the variable analysis using the directives:
#pragma TenDRA set expression #pragma TenDRA discard expression
The first asserts that the variable given by the expression has been assigned to; the second asserts that the variable is not used. An alternative way of expressing this is by means of keywords:
SET ( expression ) DISCARD ( expression )
introduced using the directives.
#pragma TenDRA keyword identifier for set #pragma TenDRA keyword identifier for discard variable
respectively. These expressions can appear in expression statements and as the first argument of a comma expression.
The variable flow analysis checks have not yet been completely implemented. They may not detect errors in certain circumstances and for extremely convoluted code may occasionally give incorrect errors.
6.2. 5.6.2 Modification between sequence points
The ISO C standard states that if an object is modified more than once, or is modified and accessed other than to determine the new value, between two sequence points, then the behaviour is undefined. Thus the result of:
var = arr[i++] + i++ ;
is undefined, since the value of i is being incremented twice between sequence points. This behaviour is detected by the variable analysis.
6.3. Unused variables
As part of the variable analysis, a simple test applied to each local variable at the end of its scope to determine whether it has been used in that scope. For example, in:
int f ( int n ) { int r; return ( 0 ); }
both the function argument n
and the local variable r
are unused.
6.4. Values set and not used
This is a more complex test since it is applied to every instance of setting the variable. For example, in:
int f ( int n ) { int r = 1; r = 5; return ( r ); }
the first value r
is set to 1
and is not used before it is overwritten by 5
(this second value is used however). This test requires some flow analysis. For example, if the program is modified to:
int f ( int n ) { int r = 1; if ( n == 3 ) { r = 5; } return ( r ); }
the initial value of r
is used when n != 3
, so no error is detected. However in:
int f ( int n ) { int r = 1; if ( n == 3 ) { r = 5; } else { r = 6; } return ( r ); }
the initial value of r is overwritten regardless of the result of the conditional, and hence is unused.
6.5. Variable which has not been set is used
This test also requires some flow analysis, for example in:
int f ( int n ) { int r; if ( n == 3 ) { r = 5; } return ( r ); }
the use of the variable r
as a return value is reported because there are paths leading to this statement in which r
is not set (i.e. when n != 3
). However, in:
int f ( int n ) { int r; if ( n == 3 ) { r = 5; } else { r = 6; } return ( r ); }
r is always set before it is used, so no error is detected.
6.6. Variable shadowing
It is quite legal in C to have a variable in an inner scope, with the same name as a variable in an outer scope. These variables are distinct and whilst in the inner scope, the declaration in the outer scope is not visible - it is shadowed
by the local variable of the same name. Confusion can arise if this was not what the programmer intended. The checker can therefore be configured to detect shadowing in three cases: a local variable shadowing a global variable; a local variable shadowing a local variable with a wider scope and a local variable shadowing a typedef name, by using:
#pragma TenDRA variable hiding analysis status
If status is on
an error is raised when a local variable that shadows another variable is declared, if warning
is used the error is replaced by a warning and the off
option restores the default behaviour (shadowing is permitted and no errors are produced).
The directive:
#pragma TenDRA variable hiding analysis on
can be used to enable a check for hiding of other variables and, in member functions, data members, by local variable declarations.
6.7. Overriding the variable analysis
Although many of the problems discovered by the variable analysis are genuine mistakes, some may be as the result of deliberate decisions by the program writer. In this case, more information needs to be provided to the checker to convey the programmer's intentions. Four constructs are provided for this purpose: the discard variable, the set variable, the exhaustive switch and the non-returning function.
6.7.1. 5.7.1 Discarding variables
Actively discarding a variable counts as a use of that variable in the variable analysis, and so can be used to suppress messages concerning unused variables and values assigned to variables. There are two distinct methods to indicate that the variable x is to be discarded. The first uses a pragma:
#pragma TenDRA discard x;
which the checker treats as if it were a C statement, ending in a semicolon. Having a statement which is noticed by one compiler but ignored by another can lead to problems. For example, in:
if ( n == 3 ) #pragma TenDRA discard x; puts ( "n is three" );
tdfc2 believes that x
is discarded if n == 3
and the message is always printed, whereas other compilers will ignore the #pragma
statement and think that the message is printed if n == 3
. An alternative, in many ways neater, solution is to introduce a new keyword for discarding variables. For example, to introduce the keyword DISCARD
for this purpose, the pragma:
#pragma TenDRA keyword DISCARD for discard variable
should be used. The variable x
can then be discarded by means of the statement:
DISCARD ( x );
A dummy definition for DISCARD
to use with normal compilers needs to be given in order to maintain compilability with those compilers. For example, a complete definition of DISCARD
might be:
#ifdef __TenDRA__ #pragma TenDRA keyword DISCARD for discard variable #else #define DISCARD(x) (( void ) 0 ) #endif
Discarding a variable changes its assignment state to unset, so that any subsequent uses of the variable, without an intervening assignment to it, lead to a variable used before being set
error. This feature can be exploited if the same variable is used for distinct purposes in different parts of its scope, by causing the variable analysis to treat the different uses separately. For example, in:
void f ( void ) { int i = 0; while ( i++ < 10 ) { puts ( "hello" ); } while ( i++ < 10 ) { puts ( "goodbye" ); } }
which is intended to print both messages ten times, the two uses of i
as a loop counter are independent - they could have been implemented with different variables. By discarding i
after the first loop, the second loop can be analysed separately. In this way, the error of failing to reset i
to 0
can be detected.
6.7.2. Setting variables
In addition to discarding variables, it is also possible to set them. In deliberately setting a variable, the programmer is telling the checker to assume that some value will always have been assigned to the variable by that point, so that any variable used without being set
errors can be suppressed. This construct is particularly useful in programs with complex flow control, to help out the variable analysis. For example, in:
void f ( int n ) { int r; if ( n != 0 ) r = n; if ( n > 2 ) { printf ( "%d\n", r ); } }
r
is only used if n > 2
, in which case we also have n != 0
, so that r
has already been initialised. However, in its flow analysis, the TenDRA C checker treats all the conditionals it meets as if they were independent and does not look for any such complex dependencies (indeed it is possible to think of examples where such analysis would be impossible). Instead, it needs the programmer to clarify the flow of the program by asserting that r
will be set if the second condition is true.
Programmers may assert that the variable, r
, is set either by means of a pragma:
#pragma TenDRA set r;
or by using, for example:
SET ( r );
where SET is a keyword which has previously been introduced to stand for the variable setting construct using:
#pragma TenDRA keyword SET for set
(cf. DISCARD
above).
6.7.3. Exhaustive switch statements
A special case of a flow control construct which may be used to set the value of a variable is a switch statement. Consider the program:
char *f ( int n ) { char *r; switch ( n ) { case 1 : r = "one"; break; case 2 : r = "two"; break; case 3 : r = "three"; break; } return ( r ); }
This leads to an error indicating that r
is used but not set, because it is not set if n lies outside the three cases in the switch statement. However, the programmer might know that f
is only ever called with these three values, and hence that r
is always set before it is used. This information could be expressed by asserting that r
is set at the end of the switch construct (see above), but it would be better to express the cause of this setting rather than just its effect. The reason why r
is always set is that the switch statement is exhaustive - there are case statements for all the possible values of n
.
Programmers may assert that a switch statement is exhaustive by means of a pragma immediately following it. For example, in the above case it would take the form:
.... switch ( n ) #pragma TenDRA exhaustive { case 1 : r = "one"; break; ....
Again, there is an option to introduce a keyword, EXHAUSTIVE
say, for exhaustive switch statements using:
#pragma TenDRA keyword EXHAUSTIVE for exhaustive
Using this form, the example program becomes:
switch ( n ) EXHAUSTIVE { case 1 : r = "one"; break;
In order to maintain compatibility with existing compilers, a dummy definition for EXHAUSTIVE
must be introduced for them to use. For example, a complete definition of EXHAUSTIVE
might be:
#ifdef __TenDRA__ #pragma TenDRA keyword EXHAUSTIVE for exhaustive #else #define EXHAUSTIVE #endif
6.7.4. Switch statements
A switch
statement is said to be exhaustive if its control statement is guaranteed to take one of the values of its case
labels, or if it has a default
label. The TenDRA C and C++ producers allow a switch
statement to be asserted to be exhaustive using the syntax:
switch ( cond ) EXHAUSTIVE { // switch statement body }
where EXHAUSTIVE
is either the directive:
#pragma TenDRA exhaustive
or a keyword introduced using:
#pragma TenDRA keyword identifier for exhaustive
Knowing whether a switch
statement is exhaustive or not means that checks relying on flow analysis (including variable usage checks) can be applied more precisely.
In certain circumstances it is possible to deduce whether a switch
statement is exhaustive or not. For example, the directive:
#pragma TenDRA enum switch analysis on
enables a check on switch
statements on values of enumeration type. Such statements should be exhaustive, either explicitly by using the EXHAUSTIVE
keyword or declaring a default
label, or implicitly by having a case
label for each enumerator. Conversely, the value of each case
label should equal the value of an enumerator. For the purposes of this check, boolean values are treated as if they were declared using an enumeration type of the form:
enum bool { false = 0, true = 1 } ;
A common source of errors in switch
statements is the fall-through from one case
or default
statement to the next. A check for this can be enabled using:
#pragma TenDRA fall into case allow
case
or default
labels where fall-through from the previous statement is intentional can be marked by preceding them by a keyword, FALL_THRU
say, introduced using the directive:
#pragma TenDRA keyword identifier for fall into case
6.7.5. Non-returning functions
Consider a modified version of the program above, in which calls to f
with an argument other than 1
, 2
or 3
cause an error message to be printed:
extern void error ( const char * ); char *f ( int n ) { char *r; switch ( n ) { case 1 : r = "one"; break; case 2 : r = "two"; break; case 3 : r = "three"; break; default : error( "Illegal value" ); } return ( r ); }
This causes an error because, in the default case, r
is not set before it is used. However, depending on the semantics of the function, error, the return statement may never be reached in this case. This is because the fact that a function returns void can mean one of two distinct things:
-
That the function does not return a value. This is the usual meaning of
void
. -
That the function never returns, for example the library function,
exit
, usesvoid
in this sense.
If error never returns, then the program above is correct; otherwise, an unset value of r
may be returned.
Therefore, we need to be able to declare the fact that a function never returns. This is done by introducing a new type to stand for the non-returning meaning of void
(some compilers use volatile void
for this purpose). This is done by means of the pragma:
#pragma TenDRA type VOID for bottom
to introduce a type VOID
(although any identifier may be used) with this meaning. The declaration of error can then be expressed as:
extern VOID error ( const char * );
In order to maintain compatibility with existing compilers a definition of VOID
needs to be supplied. For example:
#ifdef __TenDRA__ #pragma TenDRA type VOID for bottom #else typedef void VOID; #endif
The largest class of non-returning functions occurs in the various standard APIs - for example, exit
and abort
. The TenDRA descriptions of these APIs contain this information. The information that a function does not return is taken into account in all flow analysis contexts. For example, in:
#include <stdlib.h> int f ( int n ) { exit ( EXIT_FAILURE ); return ( n ); }
n
is unused because the return
statement is not reached (a fact that can also be determined by the unreachable code analysis in section 5.2).
6.7.6. Return statements
In C, but not in C++, it is possible to have a return
statement without an expression in a function which does not return void
. It is possible to enable this behaviour using the directive:
#pragma TenDRA incompatible void return allow
Note that this check includes the implicit return
caused by falling off the end of a function. The effect of such a return
statement is undefined. The C++ rule that falling off the end of main
is equivalent to returning a value of 0 overrides this check.
7. Discard Analysis
- 7.1. Discarded function returns
- 7.2. Discarded computed values
- 7.3. Unused static variables and procedures
- 7.4. Discarded expressions
- 7.5. Overriding the discard analysis
A couple of examples of what might be termed discard analysis
have already been described - discarded (unused) local variables and discarded (unused) assignments to local variables (see section 5.6.4 and 5.6.5). The checker can perform three more types of discard analysis: discarded function returns, discarded computations and unused static variables and procedures. These three tests may be controlled as a group using:
#pragma TenDRA discard analysis status
where status is on
, warning
or off
.
In addition, each of the component tests may be switched on and off independently using pragmas of the form:
#pragma TenDRA discard analysis (function return) status #pragma TenDRA discard analysis (value) status #pragma TenDRA discard analysis (static) status
There are also equivalent command line options to tcc of the form -X:test=state
, where test can be discard_all
, discard_func_ret
, discard_value
or unused_static
, and state can be check
, warn
or dont
. These checks are all switched off in the default mode.
Detailed descriptions of the individual checks follow in sections 5.8.1 - 5.8.3. Section 5.9 describes the facilities for fine-tuning the discard analysis.
7.1. Discarded function returns
Functions which return a value which is not used form the commonest instances of discarded values. For example, in:
#include <stdio.h> int main () { puts ( "hello" ); return ( 0 ); }
the function, puts
, returns an int
value, indicating whether an error has occurred, which is ignored.
7.2. Discarded computed values
A rarer instance of a discarded object, and one which is almost always an error, is where a value is computed but not used. For example, in:
int f ( int n ) { int r = 4; if ( n == 3 ) { r == 5; } return ( r ); }
the value r == 5
is computed but not used. This is actually because it is a misprint for r = 5
.
7.3. Unused static variables and procedures
The final example of discarded values, which perhaps more properly belongs with the variable analysis tests mentioned above, is for static objects which are unused in the source module in which they are defined. Of course this means that they are unused in the entire program. Such objects can usually be removed.
7.4. Discarded expressions
The directive:
#pragma TenDRA discard analysis on
can be used to enable a check for values which are calculated but not used. There are three checks controlled by this directive, each of which can be controlled independently. The directive:
#pragma TenDRA discard analysis (function return) on
checks for functions which return a value which is not used. The check needs to be enabled for both the declaration and the call of the function in order for a discarded function return to be reported. Discarded returns for overloaded operator functions are never reported. The directive:
#pragma TenDRA discard analysis (value) on
checks for other expressions which are not used. Finally, the directive:
#pragma TenDRA discard analysis (static) on
checks for variables with internal linkage which are defined but not used.
An unused function return or other expression can be asserted to be deliberately discarded by explicitly casting it to void
or, equivalently, preceding it by a keyword introduced using the directive:
#pragma TenDRA keyword identifier for discard value
A static variable can be asserted to be deliberately unused by including it in list of identifiers in a directive of the form:
#pragma TenDRA suspend static identifier-list
7.5. Overriding the discard analysis
As with the variable analysis, certain constructs may be used to provide the checker with extra information about a program, to convey the programmer's intentions more clearly.
7.5.1. Discarding function returns and computed values
Unwanted function returns and, more rarely, discarded computed values, may be actively ignored to indicate to the discard analysis that the value is being discarded deliberately. This can be done using the traditional method of casting the value to void
:
( void ) puts ( "hello" );
or by introducing a keyword, IGNORE
say, for discarding a value. This is done using a pragma of the form:
#pragma TenDRA keyword IGNORE for discard value
The example discarded value then becomes:
IGNORE puts ( "hello" );
Of course it is necessary to introduce a definition of IGNORE
for conventional compilers in order to maintain compilability. A suitable definition might be:
#ifdef __TenDRA__ #pragma TenDRA keyword IGNORE for discard value #else #define IGNORE ( void ) #endif
7.5.2. Preserving unused statics
Occasionally unused static values are introduced deliberately into programs. The fact that the static variables or procedures x
, y
and z
are deliberately unused may be indicated by introducing the pragma:
#pragma TenDRA suspend static x y z
at the outer level after the definition of all three objects.
8. Preprocessing checks
- 8.1. Preprocessor directives
- 8.2. Indented Preprocessing Directives
- 8.3. Multiple macro definitions
- 8.4. Macro arguments
- 8.5. Unmatched quotes
- 8.6. Include depth
- 8.7. Text after
#endif
- 8.8. Text after
#
- 8.9. New line at end of file
- 8.10. Conditional Compilation
- 8.11. Target dependent conditional inclusion
- 8.12. Unused headers
This chapter describes tdfc2's capabilities for checking the preprocessing constructs that arise in C.
8.1. Preprocessor directives
By default, the TenDRA C checker understands those preprocessor directives specified by the ISO C standard, section 6.8, i.e. #if
, #ifdef
, #ifndef
, #elif
, #else
, #endif
, #error
, #line
and #pragma
. As has been mentioned, #pragma
statements play a significant role in the checker. While any recognised #pragma
statements are processed, all unknown pragma statements are ignored by default. The check to detect unknown pragma statements is controlled by:
#pragma TenDRA unknown pragma permit
The options for permit are disallow
(raise an error if an unknown pragma is encountered), warning
(allow unknown pragmas with a warning), or allow
(allow unknown pragmas without comment).
In addition, the common non-ISO preprocessor directives, #file
, #ident
, #assert
, #unassert
and #weak
may be permitted using:
#pragma TenDRA directive dir allow
where dir is one of file
, ident
, assert
, unassert
or weak
. If allow is replaced by warning
then the directive is allowed, but a warning is issued. In either case, the modifier (ignore)
may be added to indicate that, although the directive is allowed, its effect is ignored. Thus for example:
#pragma TenDRA directive ident (ignore) allow
causes the checker to ignore any #ident directives without raising any errors.
Finally, the directive dir can be disallowed using:
#pragma TenDRA directive dir disallow
Finally, the directive dir can be disallowed using:
#pragma TenDRA unknown directive allow
Any other unknown preprocessing directives cause the checker to raise an error in the default mode. The pragma may be used to force the checker to ignore such directives without raising any errors.
Disallow
and warning
variants are also available.
8.2. Indented Preprocessing Directives
The ISO C standard allows white space to occur before the #
in a preprocessing directive, and between the #
and the directive name. Many older preprocessors have problems with such directives. The checker's treatment of such directives can be set using:
#pragma TenDRA indented # directive permit
which detects white space before the #
and:
#pragma TenDRA indented directive after # permit
which detects white space before the #
and the directive name. The options for permit are allow
, warning
or disallow
as usual. The default checking profile allows both forms of indented directives.
8.3. Multiple macro definitions
The ISO C standard states that, for two definitions of a function-like macro to be equal, both the spelling of the parameters and the macro definition must be equal. Thus, for example, in:
#define f( x ) ( x ) #define f( y ) ( y )
the two definitions of f
are not equal, despite the fact that they are clearly equivalent. Tchk has an alternative definition of macro equality which allows for consistent substitution of parameter names. The type of macro equality used is controlled by:
#pragma TenDRA weak macro equality allow
where permit is allow
(use alternative definition of macro equality), warning
(as for allow but raise a warning), or disallow
(use the ISO C definition of macro equality - this is the default setting).
More generally, the pragma:
#pragma TenDRA extra macro definition allow
allows macros to be redefined, both consistently and inconsistently. If the definitions are incompatible, the first definition is overwritten. This pragma has a disallow
variant, which resets the check to its default mode.
8.4. Macro arguments
According to the ISO C standard, section 6.8.3, if a macro argument contains a sequence of preprocessing tokens that would otherwise act as a preprocessing directive, the behaviour is undefined. Tchk allows preprocessing directives in macro arguments by default. The check to detect such macro arguments is controlled by:
#pragma TenDRA directive as macro argument permit
where permit is allow
, warning
or disallow
.
The ISO C standard, section 6.8.3.2, also states that each #
preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list. By default, if tdfc2 encounters a #
in a function-like macro replacement list which is not followed by a parameter of the macro an error is raised. The checker's behaviour in this situation is controlled by:
#pragma TenDRA no ident after # permit
where the options for permit are allow
(do not raise errors), disallow
(default mode) and warning
(raise warnings instead of errors).
8.5. Unmatched quotes
The ISO C standard, section 6.1, states that if a '
or "
character matches the category of preprocessing tokens described as single non-whitespace-characters that do not lexically match the other preprocessing token categories
, then the behaviour is undefined. For example:
#define a 'b
would result in undefined behaviour. By default the '
character is ignored by tdfc2. A check to detect such statements may be controlled by:
#pragma TenDRA unmatched quote permit
The usual allow
, warning
and disallow
options are available.
8.6. Include depth
Most preprocessors set a maximum depth for #include
directives (which may be limited by the maximum number of files which can be open on the host system). By default, the checker supports a depth equal to this maximum number. However, a smaller maximum depth can be set using:
#pragma TenDRA includes depth n
where n can be any positive integral constant.
8.7. Text after #endif
The ISO C standard, section 6.8, specifies that #endif
and #else
preprocessor directives do not take any arguments, but should be followed by a newline. In the default checking mode, tdfc2 raises an error when #endif
or #else
statements are not directly followed by a new line. This behaviour may be modified using:
#pragma TenDRA text after directive permit
where permit is allow
(no errors are raised and any text on the same line as the #endif
or #else
statement is ignored), warning
or disallow
.
8.8. Text after #
The ISO C standard specifies that a #
occuring outside of a macro replacement list must be followed by a new line or by a preprocessing directive and this is enforced by the checker in default mode. The check is controlled by:
#pragma TenDRA no directive/nline after ident permit
where permit may be allow
, disallow
or warning
.
8.9. New line at end of file
The ISO C standard, section 5.1.1.2, states that source files must end with new lines. Files which do not end in new lines are flagged as errors by the checker in default mode. The behaviour can be modified using:
#pragma TenDRA no nline after file end permit
where permit has the usual allow
, disallow
and warning
options.
8.10. Conditional Compilation
Tchk generally treats conditional compilation in the same way as other compilers and checkers. For example, consider:
#if expr .... /* First branch */ #else .... /* Second branch */ #endif
the expression, expr
, is evaluated: if it is non-zero the first branch of the conditional is processed; if it is zero the second branch is processed instead.
Sometimes, however, tdfc2 may be unable to evaluate the expression statically because of the abstract types and expressions which arise from the minimum integer range assumptions or the abstract standard headers used by the tool (see target-dependent types in section 4.5). For example, consider the following ISO compliant program:
#include <stdio.h> #include <limits.h> int main () { #if ( CHAR_MIN == 0 ) puts ("char is unsigned"); #else puts ("char is signed"); #endif return ( 0 ); }
The TenDRA representation of the ISO API merely states that CHAR_MIN
- the least value which fits into a char - is a target dependent integral constant. Hence, whether or not it equals zero is again target dependent, so the checker needs to maintain both branches. By contrast, any conventional compiler is compiling to a particular target machine on which CHAR_MIN
is a specific integral constant. It can therefore always determine which branch of the conditional it should compile.
In order to allow both branches to be maintained in these cases, it has been necessary for tdfc2 to impose certain restrictions on the form of the conditional branches and the positions in which such target-dependent conditionals may occur. These may be summarised as:
-
Target-dependent conditionals may not appear at the outer level. If the checker encounters a target-dependent conditional at the outer level an error is produced. In order to continue checking in the rest of the file an arbitrary assumption must be made about which branch of the conditional to process; tdfc2 assumes that the conditional is true and the first branch is used;
-
The branches of allowable target-dependent conditionals may not contain declarations or definitions.
8.11. Target dependent conditional inclusion
One of the effects of trying to compile code in a target independent manner is that it is not always possible to completely evaluate the condition in a #if
directive. Thus the conditional inclusion needs to be preserved until the installer phase. This can only be done if the target dependent #if
is more structured than is normally required for preprocessing directives. There are two cases; in the first, where the #if
appears in a statement, it is treated as if it were a if
statement with braces including its branches; that is:
#if cond true_statements #else false_statements #endif
maps to:
if ( cond ) { true_statements } else { false_statements }
In the second case, where the #if
appears in a list of declarations, normally gives an error. The can however be overridden by the directive:
#pragma TenDRA++ conditional declaration allow
which causes both branches of the #if
to be analysed.
8.12. Unused headers
Header files which are included but from which nothing is used within the other source files comprising the translation unit, might just as well not have been included. Tchk can detect top level include files which are unnecessary, by analysing the tdfc2dump output for the file. This check is enabled by passing the -Wd,-H
command line flag to tcc. Errors are written to stderr in a simple ascii form by default, or to the unified dump file in dump format if the -D
command line option is used.
9. API checking
- 9.1. Including headers
- 9.2. Specifying APIs to tcc
- 9.3. API Checking Examples
- 9.4. Redeclaring Objects in APIs
- 9.5. Defining Objects in APIs
- 9.6. Stepping Outside an API
- 9.7. Using the System Headers
- 9.8. API usage analysis
9.1. Including headers
The token syntax described in the previous annex provides the means of describing an API specification independently of any particular implementation of the API. Every object in the API specification is described using the appropriate #pragma
token statement. These statements are arranged in TenDRA header files corresponding to the headers comprising the API. Each API consists of a separate set of header files. For example, if the ANSI C89 API is used, the statement:
#include <sys/types.h>
will lead to a header not found
error, whereas the header will be found in the POSIX API.
Where relationships exist between APIs these have been made explicit in the headers. For example, the POSIX version of stdio.h consists of the ANSI version plus some extra objects. This is implemented by making the TenDRA header describing the POSIX version of stdio.h
include the ANSI C89 version of stdio.h
.
9.2. Specifying APIs to tcc
The API against which a program is to be checked is specified to tcc by means of a command-line option of the form -Yapi
where api is the API name. For example, ANSI X3.159 is specified by -Yc89
(this is the default API) and POSIX 1003.1 is specified by -Yposix
(for a full list of the supported APIs see Chapter 2).
Extension APIs, such as X11, require special attention. The API for a program is never just X11, but X11 plus some base API, for example, X11 plus POSIX or X11 plus XPG3. These composite APIs may be specified by, for example, passing the options -Yposix -Yx5_lib
(in that order) to tcc to specify POSIX 1003.1 plus X11 (Release 5) Xlib. The rule is that base APIs, such as POSIX, override the existing API, whereas extension APIs, such as X11, extend it. The command-line option -info
causes tcc to print the API currently in use. For example:
% tcc -Yposix -Yx5_lib -info file.c
will result in the message:
tcc: Information: API is X11 Release 5 Xlib plus POSIX (1003.1).
9.3. API Checking Examples
As an example of the TenDRA compiler's API checking capacities, consider the following program which prints the names and inode numbers of all the files in the current directory:
#include <stdio.h> #include <sys/types.h> #include <dirent.h> int main () { DIR *d = opendir ( "." ); struct dirent *e; if ( d = NULL ) return ( 1 ); while ( e = readdir(d), e != NULL ) { printf ( "%s %lu\n", e->d_name, e->d_ino ); } closedir ( d ); return ( 0 ); }
A first attempted compilation using strict checking:
% tcc -Xs a.c
results in messages to the effect that the headers <sys/types.h>
and <dirent.h>
cannot be found, plus a number of consequential errors. This is because tcc is checking the program against the default API, that is against the ANSI API, and the program is certainly not ANSI compliant. It does look as if it might be POSIX compliant however, so a second attempted compilation might be:
% tcc -Xs -Yposix a.c
This results in one error and three warnings. Dealing with the warnings first, the returns of the calls of printf
and closedir
are being discarded and the variable d
has been set and not used. The discarded function returns are deliberate, so they can be made explicit by casting them to void
. The discarded assignment to d
requires a little more thought - it is due to the mistyping d = NULL
instead of d == NULL
on line 9. The error is more interesting. In full the error message reads:
"a.c":11 printf ( "%s %lu\n", e->d_name, e->d_ino!!!! ); Error:ISO[6.5.2.1]{ANSI[3.5.2.1]}: The identifier 'd_ino' is not a member of 'struct/union posix.dirent.dirent'. ISO[6.3.2.3]{ANSI[3.3.2.3]}: The second operand of '->' must be a member of the struct/union pointed to by the first.
That is, struct dirent
does not have a field called d_ino
. In fact this is true; while the d_name
field of struct dirent
is specified in POSIX, the d_ino
field is an XPG3 extension (This example shows that the TenDRA representation of APIs is able to differentiate between APIs at a very fine level). Therefore a third attempted compilation might be:
% tcc -Xs -Yxpg3 a.c
This leads to another error message concerning the printf
statement, that the types unsigned long
and (the promotion of) ino_t
are incompatible. This is due to a mismatch between the printf
format string %lu
and the type of e->d_ino
. POSIX only says that ino_t
is an arithmetic type, not a specific type like unsigned long
. The TenDRA representation of POSIX reflects this abstract nature of ino_t
, so that the potential portability error is detected. In fact it is impossible to give a printf
string which works for all possible implementations of ino_t
. The best that can be done is to cast e->d_ino
to some fixed type like unsigned long
and print that.
Hence the corrected, XPG3 conformant program reads:
#include <stdio.h> #include <sys/types.h> #include <dirent.h> int main () { DIR *d = opendir ( "." ); struct dirent *e; if ( d == NULL ) return ( 1 ); while ( e = readdir(d), e != NULL ) { ( void ) printf ( "%s %lu\n", e->d_name, ( unsigned long ) e->d_ino ); } ( void ) closedir ( d ); return ( 0 ); }
9.4. Redeclaring Objects in APIs
Of course, it is possible to redeclare the functions declared in the TenDRA API descriptions within the program, provided they are consistent. However, what constitutes a consistent redeclaration in the fully abstract TenDRA machine is not as straightforward as it might seem; an interesting example is malloc in the ANSI API. This is defined by the prototype:
void *malloc ( size_t );
where size_t
is a target dependent unsigned integral type. The redeclaration:
void *malloc ();
is only correct if size_t
is its own integral promotion, and therefore is not correct in general.
Since it is not always desirable to remove these redeclarations (some machines may not have all the necessary functions declared in their system headers) the TenDRA compiler has a facility to accept inconsistent redeclarations of API functions which can be enabled by using the pragma:
#pragma TenDRA incompatible interface declaration allow
This pragma supresses the consistency checking of re-declarations of API functions. Replacing allow by warning causes a warning to be printed. In both cases the TenDRA API description of the function takes precedence. The normal behaviour of flagging inconsistent redeclarations as errors can be restored by replacing allow
by disallow
in the pragma above. (There are also equivalent command-line options to tcc of the form -X:interface_decl=status
, where status can be check
, warn
or dont
.)
9.5. Defining Objects in APIs
Since the program API is meant to define the interface between what the program defines and what the target machine defines, the TenDRA compiler normally raises an error if any attempt is made to define an object from the API in the program itself. A subtle example of this is given by compiling the program:
#include <errno.h> extern int errno;
with the ANSI API. ANSI states that errno
is an assignable lvalue of type int
, and the TenDRA description of the API therefore states precisely that. The declaration of errno
as an extern int
is therefore an inconsistent specification of errno
, but a consistent implementation. Accepting the lesser of two evils, the error reported is therefore that an attempt has been made to define errno
despite the fact that it is part of the API.
Note that if this same program is compiled using the POSIX API, in which errno
is explicitly specified to be an extern int
, the program merely contains a consistent redeclaration of errno
and so does not raise an error.
The neatest workaround for the ANSI case, which preserves the declaration for those machines which need it, is as follows: if errno
is anything other than an extern int
it must be defined by a macro. Therefore:
#include <errno.h> #ifndef errno extern int errno; #endif
should always work.
In most other examples, the definitions are more obvious. For example, a programmer might provide a memory allocator containing versions of malloc
, free
etc.:
#include <stdlib.h> void *malloc ( size_t sz ) { .... } void free ( void *ptr ) { .... }
If this is deliberate then the TenDRA compiler needs to be told to ignore the API definitions of these objects and to use those provided instead. This is done by listing the objects to be ignored using the pragma:
#pragma ignore malloc free ....
(also see section G.10). This should be placed between the API specification and the object definitions. The provided definitions are checked for conformance with the API specifications. There are special forms of this pragma to enable field selectors and objects in the tag namespace to be defined. For example, if we wish to provide a definition of the type div_t
from stdlib.h
we need to ignore three objects - the type itself and its two field selectors - quot
and rem
. The definition would therefore take the form:
#include <stdlib.h> #pragma ignore div_t div_t.quot div_t.rem typedef struct { int quot; int rem; } div_t;
Similarly if we wish to define struct lconv
from locale.h
the definition would take the form:
#include <locale.h> #pragma ignore TAG lconv TAG lconv.decimal_point .... struct lconv { char *decimal_point; .... };
to take into account that lconv
lies in the tag name space. By defining objects in the API in this way, we are actually constructing a less general version of the API. This will potentially restrict the portability of the resultant program, and so should not be done without good reason.
9.6. Stepping Outside an API
Using the TenDRA compiler to check a program against a standard API will only be effective if the appropriate API description is available to the program being tested (just as a program can only be compiled on a conventional machine if the program API is implemented on that machine). What can be done for a program whose API are not supported depends on the degree to which the program API differs from an existing TenDRA API description. If the program API is POSIX with a small extension, say, then it may be possible to express that extension to the TenDRA compiler. For large unsupported program APIs it may be possible to use the system headers on a particular machine to allow for partial program checking (see section H.7).
For small API extensions the ideal method would be to use the token syntax described in Annex G to express the program API to the TenDRA compiler, however this is not currently encouraged because the syntax of such API descriptions is not yet firmly fixed. For the time being it may be possible to use C to express much of the information the TenDRA compiler needs to check the program. For example, POSIX specifies that sys/stat.h
contains a number of macros, S_ISDIR
, S_ISREG
, and so on, which are used to test whether a file is a directory, a regular file, etc. Suppose that a program is basically POSIX conformant, but uses the additional macro S_ISLNK
to test whether the file is a symbolic link (this is in COSE and AES, but not POSIX). A proper TenDRA description of S_ISLNK
would contain the information that it was a macro taking a mode_t
and returning an int
, however for checking purposes it is sufficient to merely give the types. This can be done by pretending that S_ISLNK
is a function:
#ifdef __TenDRA__ /* For TenDRA checking purposes only */ extern int S_ISLNK ( mode_t ); /* actually a macro */ #endif
More complex examples might require an object in the API to be defined in order to provide more information about it (see H.5). For example, suppose that a program is basically ANSI compliant, but assumes that FILE
is a structure with a field file_no
of type int
(representing the file number), rather than a generic type. This might be expressed by:
#ifdef __TenDRA__ /* For TenDRA checking purposes only */ #pragma ignore FILE typedef struct { /* there may be other fields here */ int file_no; /* there may be other fields here */ } FILE; #endif
The methods of API description above are what might be called example implementations
rather than the abstract implementations
of the actual TenDRA API descriptions. They should only be used as a last resort, when there is no alternative way of expressing the program within a standard API. For example, there may be no need to access the file_no
field of a FILE
directly, since POSIX provides a function, fileno
, for this purpose. Extending an API in general reduces the number of potential target machines for the corresponding program.
9.7. Using the System Headers
One possibility if a program API is not supported by the TenDRA compiler is to use the set of system headers on the particular machine on which tcc happens to be running. Of course, this means that the API checking facilities of the TenDRA compiler will not be effective, but it is possible that the other program checking aspects will be of use.
The system headers are not, and indeed are not intended to be, portable. A simple-minded approach to portability checking with the system headers could lead to more portability problems being found in the system headers than in the program itself. A more sophisticated approach involves applying different compilation modes to the system headers and to the program. The program itself can be checked very rigorously, while the system headers have very lax checks applied.
This could be done directly, by putting a wrapper around each system header describing the mode to be applied to that header. However the mechanism of named compilation modes (see 2.2) provides an alternative solution. In addition to the normal -Idir command-line option, tcc also supports the option -Nname:dir, which is identical except that it also associates the identifier name with the directory dir. Once a directory has been named in this way, the name can be used in a directive:
#pragma TenDRA directory name use environment mode
which tells tcc to apply the named compilation mode, mode, to any files included from the directory, name. This is the mechanism used to specify the checks to be applied to the system headers.
The system headers may be specified to tcc using the -Ysystem
command-line option. This specifies /usr/include
as the directory to search for headers and passes a system start-up file to tcc. This system start-up file contains any macro definitions which are necessary for tcc to navigate the system headers correctly, plus a description of the compilation mode to be used in compiling the system headers.
In fact, before searching /usr/include, tcc searches another directory for system headers. This is intended to hold modified versions of any system headers which cause particular problems or require extra information. For example:
-
A version of stdio.h is provided for all systems, which contains the declarations of printf and similar functions necessary for tcc to apply its printf-string checks (see 3.3.2).
-
A version of stdlib.h is provided for all systems which includes the declarations of exit and similar functions necessary for tcc to apply its flow analysis correctly (see 5.7).
-
Versions of stdarg.h and varargs.h are provided for all systems which work with tcc. Most system headers contain built-in functions which are recognised by cc (but not tcc) to deal with these.
The user can also use this directory to modify any system headers which cause problems. For example, not all system headers declare all the functions they should, so it might be desirable to add these declarations.
It should be noted that the system headers and the TenDRA API headers do not mix well. Both are parts of coherent systems of header files, and unless the intersection is very small, it is not usually possible to combine parts of these systems sensibly.
Even a separation, such as compiling some modules of a program using a TenDRA API description and others using the system headers, can lead to problems in the intermodular linking phase (see Chapter 9). There will almost certainly be type inconsistency errors since the TenDRA headers and the system headers will have different representations of the same object.
9.8. API usage analysis
The abstract standard headers provided with the tool are the basis for the API usage analysis checking on dump files described in Chapter 9. The declarations in each abstract header file are enclosed by the following pragmas:
#pragma TenDRA declaration block API_name begin #pragma TenDRA declaration block end
API_name has a standard form e.g. api__ansi__stdio
for stdio.h
in the ANSI API.
This information is output in the dump format as the start and end of a header scope, i.e.
SSH position ref_no = <API_name> SEH position ref_no
The first occurence of each identifier in the dump output contains scope information; in the case of an identifier declared in the abstract headers, this scope information will normally refer to a header scope. Since each use of the identifier can be traced back to its declaration, this provides a means of tracking API usage within the application when the abstract headers are used. The disadvantages of this method are that only APIs for which abstract headers are available can be used. Objects which are not part of the standard APIs are not available and if an application requires such an identifier (or indeed attempts to use a standard API identifier for which the appropriate header has not been included) the resulting errors may distort or even completely halt the dump output resulting in incomplete or incorrect analysis.
The second method of API analysis allows compilation of the application against the system headers, thereby overcoming the problems of non-standard API usage mentioned above. The dump of the application can be scanned to determine the identifiers which are used but not defined within the application itself. These identifiers form the program's external API with the system headers and libraries, and can be compared with API reference information, provided by dump output files produced from the abstract standard headers, to determine the applications API usage.
Analysis performed on the set of dump files produced for an entire application can detect the objects, types, etc. from external APIs which are used by the application. The API usage analysis is enabled by passing one or more -api_checkAPI
flags to tcc where API may be any of the standard APIs listed in section 2.1. The -api_check_outFILE
flag may be used to direct the API analysis information to the file FILE (by default it is written to stdout). The APIs used to perform API usage analysis may be different from those used to process the application. Annex G.8 contains details of the methods used to perform the API usage analysis.
10. Intermodular analysis
All the checks discussed in earlier chapters have been concerned with a single source file. However, tcc also contains a linking phase in which it is able to perform intermodular checks (i.e. checks between source files). In the linking phase, the files generated from each translation unit processed are combined into a single file containing information on all external objects within the application, and type consistency checks are applied to ensure that the definitions and declarations of each object are consistent and external objects and functions have at most one definition.
There are two types of file provided by tdfc2 for analysis; symbol table dump files and C++ spec files.
10.1. Linking symbol table dump files
The amount of information about an object stored in a dump file depends on the compilation mode used to produce that file. For example, if extra prototype checks are enabled (see section 3.3), the dump file contains any information inferred about a function from its traditional style definition or from applications of that function. For example, if one file contains:
extern void f () ; void g () { f ( "hello" ) ; }
and another contained:
void f ( n ) int n ; { return ; }
then the inferred prototype:
void f WEAK ( char * ) ;
from the call of f
would be included in the first dump file, whereas the weak prototype deduced from the definition of f
:
void f WEAK ( int ) ;
would be included in the second. When these two dump files are linked, the inconsistency is discovered and an error is reported.
10.2. Linking C++ spec files
The overall compilation scheme controlled by tcc
, as it relates to the C++ producer, can be represented as follows:
Each C++ source file, a.cc
say, is processed using tcpplus
to give an output TDF capsule, a.j
, which is passed to the installer phase of tcc
. The capsule is linked with any target dependent token definition libraries, translated to assembler and assembled to give a binary object file, a.o
. The various object files comprising the program are then linked with the system libraries to give a final executable, a.out
.
A C++ spec file is a dump of the C++ producer's internal representation of a translation unit. Such files can be written to, and read from, disk to perform such operations as intermodule analysis.
In addition to this main compilation scheme, tcpplus
can additionally be made to output a C++ spec file for each C++ source file, a.K
say. These C++ spec files can be linked, using tcpplus
in its spec linker mode, to give an additional TDF capsule, x.j
say, and a combined C++ spec file, x.K
. The main purpose of this C++ spec linking is to perform intermodule checks on the program, however in the course of this checking exported templates which are defined in one module and used in another are instantiated. This extra code is output to x.j
, which is then installed and linked in the normal way.
Note that intermodule checks, and hence intermodule template instantiations, are only performed if the -im
option is passed to tcc
.
The TenDRA checker is similar to the compiler except that it disables TDF output and has intermodule analysis enabled by default.
The C++ spec linking routines have not yet been completely implemented, and so are disabled in the current version of the C++ producer.
Note that the format of a C++ spec file is specific to the C++ producer and may change between releases to reflect modifications in the internal type system. The C producer has a similar dump format, called a C spec file, however the two are incompatible. If intermodule analysis between C and C++ source files is required then the tdfc2dump symbol table dump format should be used.
10.3. Template compilation
The C++ producer makes the distinction between exported templates, which may be used in one module and defined in another, and non-exported templates, which must be defined in every module in which they are used. As in the ISO C++ standard, the export
keyword is used to distinguish between the two cases. In the past, different compilers have had different template compilation models; either all templates were exported or no templates were exported. The latter is easily emulated - if the export
keyword is not used then no templates will be exported. To emulate the former behaviour the directive:
#pragma TenDRA++ implicit export template on
can be used to treat all templates as if they had been declared using the export
keyword.
The automatic instantiation of exported templates has not yet been implemented correctly. It is intended that such instantiations will be generated during intermodule analysis (where they conceptually belong). At present it is necessary to work round this using explicit instantiations.