C/C++ Checker Reference Manual

i. Introduction
1. Configuring the Checker
2. Integral Types
3. Type Checking
4. Control Flow Analysis
5. Operator Analysis
6. Variable Analysis
7. Discard Analysis
8. Preprocessing checks
9. API checking
10. Intermodular analysis

Katherine Flavel, The TenDRA Project (editor)
DERA

First published 1998.

Revision History

2015-02-24	kate	Split off tcc Compilation Modes to create a tccmodes manpage. Moved out `#pragma` token syntax type conversions to The Pragma Token Syntax, Integral Type Specification and Dialect Features to C/C++ Producer Configuration Guide, and moved in sections relevant to checking from C/C++ Producer Configuration Guide. Merged appendix of CLI portability options into body text. Merged in linking for symbol table dump files from the The C/C++ Symbol Table Dump document. Merged in compilation scheme for linking C++ spec files from C/C++ Producer Configuration Guide.
2010-02-21	kate	Moved out the DRA producers as a standalone tool.
2008-03-25	kate	Moved out a description of the symbol table semantics from into a seperate document, The C/C++ Symbol Table Dump. Moved out `#pragma` syntax reference to tdfc2pragma. Removed API listing.
2008-03-24	kate	Split off various sections to the C/C++ Producer Implementation, C++ and Portability and Style Guide documents.
2004-06-06	asmodai	Converted to a new build system.
1998-07-30	DERA	tdfc2 1.8.2; TenDRA 4.1.2 release.

i. Introduction

The C program static checker was originally developed as a programming tool to aid the construction of portable programs using the Application Programming Interface (API) model of software portability; the principle underlying this approach being:

If a program is written to conform to an abstract API specification, then that program will be portable to any machine which implements the API specification correctly.

This approach gave the tool an unusually powerful basis for static checking of C programs and a large amount of development work has resulted in the production of the TenDRA C static checker (invoked as tcc -ch). The terms, TenDRA C checker and tcc -ch are used interchangably in this document.

Responsibilities of the C static checker are:

strict interface checking. In particular, the checker can analyse programs against abstract APIs to check their conformance to the specification. Abstract versions of most standard APIs are provided with the tool; alternatively users can define their own abstract APIs using the syntax described in Annex G;
checking of integer sizes, overflows and implicit integer conversions including potential 64-bit problems, against a 16 bit or 32 bit architecture profile;
strict ISO C90 standard checking, plus configurable support for many non-ISO dialect features;
extensive type checking, including prototype-style checking for traditionally defined functions, conversion checking, type checking on printf and scanf style argument strings and type checking between translation units;
variable analysis, including detection of unused variables, use of uninitialised variables, dependencies on order of evaluation in expressions and detection of unused function returns, computed values and static variables;
detection of unused header files;
configurable tests for detecting many other common programming errors;
Complete standard API usage analysis. No API definitions are built in to the checker; these are provided externally. A complete list of API definitions available to tcc is documented by tccenv.
Support for user-defined checking profiles. No checking profiles are built-in to the checker; these are provided externally. A complete list of profiles exposed as -X modes to tcc as startup files is documented by tccmodes.

1. Configuring the Checker

1.1. Individual command line checking options
1.2. Customising checking profiles
1.3. Scoping checking profiles
1.4. Other checks

This section describes the built-in checking modes and the design of customised environments.

There are several methods available for configuring the checker. Most configuration is provided by built-in "modes" which are selected by using the relevant -X command line option for tcc. These modes are documented by tccmodes.

More detailed customisation may require special #pragma statements to be incorporated into the source code to be analysed (this commonly takes the form of a startup file). The configuration options generally act independently of one another and unless explicitly forbidden in the descriptions below, they may be combined in any way.

1.1. Individual command line checking options

Some of the checks available can be controlled using a command line option of the form -Xopt,opt,..., where the various opt options give a comma-separated list of commands. These commands have the form test=status, where test is the name of the check, and status is either check (apply check and give an error if it fails), warn (apply check and give a warning if it fails) or dont (do not apply check). The names of checks can be found with their descriptions in Chapters 3 - 8; for example the check for implicit function declarations described in 3.4.1 may be switched on using -X:implicit_func=check.

The command line options for portability checking are:

Check	Reference	Command Line Option
Weak Prototype Analysis	3.3.1	`-X:weak_proto=status`
Implicit Function Declaration	3.4	`-X:implicit_func=status`
Printf String Checking	3.2.2	`-X:printf=status`
Incompatible Void Returns	3.2.2	`-X:void_ret=status`
Unreachable Code	5.2	`-X:unreached=status`
Case Fall Through	5.3	`-X:fall_thru=status`
Conversion Analysis	3.2	`-X:convert_all=status`
Integer ↔ Integer Conversion	3.2.1	`-X:convert_int=status`
		`-X:convert_int_implicit=status`
		`-X:convert_int_explicit=status`
Integer ↔ Pointer Conversion	3.2.2	`-X:convert_int_ptr=status`
Integer ↔ Pointer Conversion	3.2.3	`-X:convert_ptr=status`
Complete `struct`/`union` Analysis	8.3	`-X:complete_struct=status`
Variable Analysis	5.6	`-X:variable=status`
Discard Analysis	5.8	`-X:discard_all=status`
Discarded Function Returns	5.8.1	`-X:discard_func_ret=status`
Discarded Values	5.8.2	`-X:discard_value=status`
Unused Statics	5.8.3	`-X:unused_static=status`

Table 1. Command Line Options

where status can be check, warn or dont.

1.2. Customising checking profiles

The individual checks performed by the C static checker are generally controlled by #pragma directives. The reason for this is that the ISO standard places no restrictions on the syntax following a #pragma preprocessing directive, and most compilers/checkers can be configured to ignore any unknown #pragma directives they encounter.

Most of these directives begin:

#pragma TenDRA ...

and are always checked for syntactical correctness. The individual directives, together with the checks they control are described in Chapters 3 - 8. Section 2.2 describes the method of constructing a new checking profile from these individual checks.

1.3. Scoping checking profiles

Almost all the available checks are scoped (exceptions will be mentioned in the description of the check). Scopes may be controlled by the same #pragma TenDRA begin directive described by the C/C++ Producer Configuration Guide.

1.4. Other checks

Several checks of varying utility have been implemented in the C++ producer but do not as yet have individual directives controlling their use. These can be enabled en masse using the directive:

#pragma TenDRA++ catch all allow

It is intended that this directive will be phased out as these checks are assigned controlling directives. It is possible to achieve finer control over these checks by enabling their individual error messages as described above.

2. Integral Types

2.1. Integer promotion rules
2.2. Arithmetic operations on integer types
2.3. Interaction with the integer conversion checks
2.4. Target dependent integral types
1. 2.4.1. Integer literals
2. 2.4.2. Abstract API types
2.5. Integer overflow checks
2.6. Integer operator checks
2.7. Support for 64 bit integer types (long long)

The checks described in the previous chapter involved the detection of conversions which could result in undefined values. Certain conversions involving integral types, however, are defined in the ISO C standard and so might be considered safe and unlikely to cause problems. This unfortunately is not the case: some of these conversions may still result in a change in value; the actual size of each integral type is implementation-dependent; and the "old-style" integer conversion rules which predate the ISO standard are still in common use. The checker provides support for both ISO and traditional integer promotion rules. The set of rules used may be specified independently of the two integral range scenarios, 16 bit(default) and 32 bit, described in section 2.1.2.

The means of specifying and alternative sets of promotion rules, their interaction with the conversion checks described in section 3.2 and the additional checks which may be performed on integers and integer operations are described in the remainder of this chapter.

2.1. Integer promotion rules

The ISO C standard rules may be summarised as follows: long integral types promote to themselves; other integral types promote to whichever of int or unsigned int they fit into. In full the promotions are:

char → int signed char → int unsigned char → int short → int unsigned short → int or unsigned int int → int unsigned int → unsigned int long → long unsigned long → unsigned long

Note that even with these simple built-in types, there is a degree of uncertainty, namely concerning the promotion of unsigned short. On most machines, int is strictly larger than short so the promotion of unsigned short is int. However, it is possible for short and int to have the same size, in which case the promotion is unsigned int. When using the ISO C promotion rules, the checker usually avoids making assumptions about the implementation by treating the promotion of unsigned short as an abstract integral type. If, however, the -Y32bit option is specified, int is assumed to be strictly larger than short, and unsigned short promotes to int.

The traditional C integer promotion rules are often referred to as the signed promotion rules. Under these rules, long integral types promote to themselves, as in ISO C, but the other integral types promote to unsigned int if they are qualified by unsigned, and int otherwise. Thus the signed promotion rules may be represented as follows:

char → int signed char → int unsigned char → unsigned int short → int unsigned short → unsigned int int → int unsigned int → unsigned int long → long unsigned long → unsigned long

The traditional promotion rules are applied in the Xt built-in environment only. All of the other built-in environments specify the ISO C promotion rules. Users may also specify their own rules for integer promotions and minimum integer ranges; the methods for doing this are described in Annex H.

2.2. Arithmetic operations on integer types

The ISO C standard rules for calculating the type of an arithmetic operation involving two integer types is as follows - work out the integer promotions of the types of the two operands, then:

If either promoted type is unsigned long, the result type is unsigned long;
Otherwise, if one promoted type is long and the other is unsigned int, then if a long int can represent all values of an unsigned int, the result type is long; otherwise the result type is unsigned long;
Otherwise, if either promoted type is long, the result type is long;
Otherwise, if either promoted type is unsigned int, the result type is unsigned int;
Otherwise the result type is int.

Both promoted values are converted to the result type, and the operation is then applied.

2.3. Interaction with the integer conversion checks

A simple-minded implementation of the integer conversion checks described in 3.2 would interact badly with these rules. Consider, for example, adding two values of type char:

char f ( char a, char b )
{
	char c = a + b ;
	return ( c ) ;
}

The various stages in the calculation of c are as follows - a and b are converted to their promotion type, int, added together to give an int result, which is converted to a char and assigned to c. The conversions of a and b from char to int are always safe, and so present no difficulties to the integer conversion checks. The conversion of the result from int to char, however, is precisely the type of value destroying conversion which these checks are designed to detect.

Obviously, an integer conversion check which flagged all char arithmetic would never be used, thereby losing the potential to detect many subtle portability errors. For this reason, the integer conversion checks are more sophisticated. In all typed languages, the type is used for two purposes - for static type checking and for expressing information about the actual representation of data on the target machine. Essentially it is a confusion between these two roles which leads to the problems above. The C promotion and arithmetic rules are concerned with how data is represented and manipulated, rather than the underlying abstract types of this data. When a and b are promoted to int prior to being added together, this is only a change in representation; at the conceptual level they are still char's. Again, when they are added, the result may be represented as an int, but conceptually it is a char. Thus the assignment to c, an actual char, is just a change in representation, not a change in conceptual type.

So each expression may be regarded as having two types - a conceptual type which stands for what the expression means, and a representational type which stands for how the expression is to represented as data on the target machine. In the vast majority of expressions, these types coincide, however the integral promotion and arithmetic conversions are changes of representational, not conceptual, types. The integer conversion checks are concerned with detecting changes of conceptual type, since it is these which are most likely to be due to actual programming errors.

It is possible to define integral types within the TenDRA extensions to C in which the split between concept and representation is made explicit. The pragma:

#pragma TenDRA keyword TYPE for type representation

may be used to introduce a keyword TYPE for this purpose (as with all such pragmas, the precise keyword to be used is left to the user). Once this has been done, TYPE ( r, t ) may be used to represent a type which is conceptually of type t but is represented as data like type r. Both t and r must be integral types. For example:

TYPE ( int, char ) a ;

declares a variable a which is represented as an int, but is conceptually a char.

In order to maintain compatibility with other compilers, it is necessary to give TYPE a sensible alternative definition. For all but conversion checking purposes, TYPE ( r, t ) is identical to r, so a suitable definition is:

#ifdef __TenDRA__
#pragma TenDRA keyword TYPE for type representation
#else
#define TYPE( r, t ) r
#endif

2.4. Target dependent integral types

Since the checker uses only information about the minimum guaranteed ranges of integral types, integer values for which the actual type of the values is unknown may arise. Integer values of undetermined type generally arise in one of two ways: through the use of integer literals and from API types which are not completely specified.

2.4.1. Integer literals

The ISO C rules on the type of integer literals are set out as follows. For each class of integer literals a list of types is given. The type of an integer literal is then the first type in the appropriate list which is large enough to contain the value of the integer literal. The class of the integer literal depends on whether it is decimal, hexadecimal or octal, and whether it is qualified by U (or u) or L (or l) or both. The rules may be summarised as follows:

decimal → int or long or unsigned long hex or octal → int or unsigned int or long or unsigned long any + U → unsigned int or unsigned long any + L → long or unsigned long any + UL → unsigned long

These rules are applied in all the built-in checking modes except Xt. Traditional C does not have the U and L qualifiers, so if the Xt mode is used, these qualifiers are ignored and all integer literals are treated as int, long or unsigned long, depending on the size of the number.

If a number fits into the minimal range for the first type of the appropriate list, then it is of that type; otherwise its type is undetermined and is said to be target dependent. The checker treats target dependent types as abstract integral types which may lead to integer conversion problems. For example, in:

int f ( int n ) {
	return ( n & 0xff00 ) ;
}

the type of 0xff00 is target dependent, since it does not fit into the minimal range for int specified by the ISO C standard (this is detected by the integer overflow analysis described in section 4.6). The arithmetic conversions resulting from the & operation is detected by the checker's conversion analysis. Note that if the -Y32bit option is specified to tcc, an int is assumed to contain at least 32 bits. In this case, 0xff00 fits into the type int, and so this is the type of the integer literal. No invalid integer conversions is then detected.

2.4.2. Abstract API types

Target dependent integral types also occur in API specifications and may be encountered when checking against one of the implementation-independent APIs provided with the checker. The commonest example of this is size_t, which is stated by the ISO C standard to be a target dependent unsigned integral type, and which arises naturally within the language as the type of a sizeof expression.

The checker has its own internal version of size_t, wchar_t and ptrdiff_t for evaluating static compile-time expressions. These internal types are compatible with the ISO C specification of size_t, wchar_t and ptrdiff_t, and thus are compatible with any conforming definitions of these types found in included files. However, when checking the following program against the system headers, a warning is produced on some machines concerning the implicit conversion of an unsigned int to type size_t:

#include <stdlib.h>
int main() {
	size_t size ;
	size = sizeof( int ) ;
}

The system header on the machine in question actually defines size_t to be a signed int (this of course contravenes the ISO C standard) but the compile time function sizeof returns the checker's internal version of size_t which is an abstract unsigned integral type. By using the pragma:

#pragma TenDRA set size_t:signed int

the checker can be instructed to use a different internal definition of size_t when evaluating the sizeof function and the error does not arise. Equivalent options are also available for the ptrdiff_t and wchar_t types.

2.5. Integer overflow checks

Given the complexity of the rules governing the types of integers and results of integer operations, as well as the variation of integral ranges with machine architecture, it is hardly surprising that unexpected results of integer operations are at the root of many programming problems. These problems can often be hard to track down and may suddenly appear in an application which was previously considered safe, when it is moved to a new system. Since the checker supports the concept of a guaranteed minimum size of an integer it is able to detect many potential problems involving integer constants. The pragma:

#pragma TenDRA integer overflow analysis status

where status is on, warning or off, controls a set of checks on arithmetic expressions involving integer constants. These checks cover overflow, use of constants exceeding the minimum guaranteed size for their type and division by zero. They are not enabled in the default mode.

There are two special cases of integer overflow for which checking is controlled seperately:

Bitfield sizes. Obviously, the size of a bitfield must be smaller than or equal to the minimum size of its integral type. A bitfield which is too large is flagged as an error in the default mode. The check on bitfield sizes is controlled by:
```
#pragma TenDRA bitfield overflow permit
```
where permit is one of allow, disallow or warning.
Octal and hexadecimal escape sequences. According to the ISO C standard, the value of an octal or hexadecimal escape sequence shall be in the range of representable values for the type unsigned char for an integer character constant, or the unsigned type corresponding to wchar_t for a wide character constant. The check on escape sequence sizes is controlled by:
```
#pragma TenDRA character escape overflow permit
```
where the options for permit are allow, warning and disallow. The check is switched on by default.

2.6. Integer operator checks

The results of some integer operations are undefined by the ISO C standard for certain argument types. Others are implementation-defined or simply likely to produce unexpected results.In the default mode such operations are processed silently, however a set of checks on operations involving integer constants may be controlled using:

#pragma TenDRA integer operator analysis status

where status is replaced by on, warning or off. This pragma enabled checks on:

shift operations where an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted;
right shift operation with a negative value of signed integral type as the first argument;
division operation with a negative operand;
test for an unsigned value strictly greater than or less than 0 (these are always true or false respectively);
conversion of a negative constant value to an unsigned type;
application of unary - operator to an unsigned value.

2.7. Support for 64 bit integer types (`long long`)

Although the use of long long to specify a 64 bit integer type is not supported by the ISO C90 standard it is becoming increasingly popular as in programming use. By default, tcc does not support the use of long long but the checker can be configured to support the long long type to different degrees using the following pragmas:

#pragma TenDRA longlong type permit

where permit is one of allow (long long type accepted), disallow (errors produced when long long types are detected) or warning (long long types are accepted but a warning is raised).

#pragma TenDRA set longlong type : type_name

where type_name is long or long long.

The first pragma determines the behaviour of the checker if the type long long is encountered as a type specifier. In the disallow case, an error is raised and the type specifier mapped to long, otherwise the type is stored as long long although a message alerting the user to the use of long long is raised in the warning mode. The second pragma determines the semantics of long long. If the type specified is long long, then long long is treated as a separate integer type and if code generation is enabled, long long types appears in the output. Otherwise the type is mapped to long and all objects declared long long are output as if they had been declared long (a warning is produced when this occurs). In either case, long long is treated as a distinct integer type for the purpose of integer conversion checking.

Extensions to the integer promotion and arithmetic conversion rules are required for the long long type. These have been implemented as follows:

the types of integer arithmetic operations where neither argument has long long type are unaffected;
long long and unsigned long long both promote to themselves;
the result type of arithmetic operations with one or more arguments of type unsigned long long is unsigned long long;
otherwise if either argument has type signed long long the overall type is long long if both arguments can be represented in this form, otherwise the type is unsigned long long.

There are now three cases where the type of an integer arithmetic operation is not completely determined from the type of its arguments, i.e.

signed long long + unsigned long = signed long long or unsigned long long; signed long long + unsigned int = signed long long or unsigned long long; signed int + unsigned short = signed int or unsigned int ( as before ).

3. Type Checking

Type checking is relevant to two main areas of C. It ensures that all declarations referring to the same object are consistent (clearly a pre-requisite for a well-defined program). It is also the key to determining when an undefined or unexpected value has been produced due to the type conversions which arise from certain operations in C. Conversions may be explicit (conversion is specified by a cast) or implicit. Generally explicit conversions may be regarded more leniently since the programmer was obviously aware of the conversion, whereas the implications of an implicit conversion may not have been considered.

3.1. Type specifications

3.1.1. Incompatible type qualifiers

The declarations

const int a ;
int a ;

are not compatible according to the ISO C standard because the qualifier, const, is present in one declaration but not in the other. Similar rules hold for volatile qualified types. By default, tdfc2 produces an error when declarations of the same object contain different type qualifiers. The check is controlled using:

#pragma TenDRA incompatible type qualifier permit

where the options for permit are allow, disallow or warning.

3.1.2. Elaborated type specifiers

In elaborated type specifiers, the class key (class, struct, union or enum) should agree with any previous declaration of the type (except that class and struct are interchangeable). This requirement can be relaxed using the directive:

#pragma TenDRA ignore struct/union/enum tag on

In ISO C and C++ it is not possible to give a forward declaration of an enumeration type. This constraint can be relaxed using the directive:

#pragma TenDRA forward enum declaration allow

Until the end of its definition, an enumeration type is treated as an incomplete type (as with class types). In enumeration definitions, and a couple of other contexts where comma-separated lists are required, the directive:

#pragma TenDRA extra , allow

can be used to allow a trailing comma at the end of the list.

The directive:

#pragma TenDRA complete struct/union analysis on

can be used to enable a check that every class or union has been completed within each translation unit in which it is declared.

3.1.3. Incomplete structures and unions

ISO C allows for structures or unions to be declared but not defined, provided they are not used in a context where it is necessary to know the complete structure. For example:

struct tag *p;

is allowed, despite the fact that struct tag is incomplete. The TenDRA C checker has an option to detect such incomplete structures or unions, controlled by:

#pragma TenDRA complete struct/union analysis status

where status is on to give an error as an incomplete structure or union is detected, warning to give a warning, or off to disable the check.

The check can also be controlled by passing the command-line option -X:complete_struct=state to tdfc2, where state is check, warn or dont.

The only place where the checker can actually detect that a structure or union is incomplete is at the end of the source file. This is because it is possible to complete a structure after it has been used. For example, in:

struct tag *p;
struct tag {
	int a;
	int b;
};

struct tag is complete despite the fact that it was incomplete in the definition of p.

3.2. Type conversions

The only types which may be interconverted legally in C are integral types, floating point types and pointer types. Even if these rules are observed, the results of some conversions can be surprising and may vary on different machines. The checker can detect three categories of conversion: integer to integer conversions, pointer to integer and integer to pointer conversions, and pointer to pointer conversions.

In the default mode, the checker allows all integer to integer conversions, explicit integer to pointer and pointer to integer conversions and the explicit pointer to pointer conversions defined by the ISO C standard (all conversions between pointers to function types and other pointers are undefined according to the ISO C standard).

Checks to detect these conversions are controlled by the pragma:

#pragma TenDRA conversion analysis status

Unless explicitly stated to the contrary, throughout the rest of the document where status appears in a pragma statement it represents one of on (enable the check and produce errors), warning (enable the check but produce only warnings), or off (disable the check). Here status may be on to give an error if a conversion is detected, warning to produce a warning if a conversion is detected, or off to switch the checks off. The checks may also be controlled using the command line option -X:test=state where test is one of convert_all, convert_int, convert_int_explicit, convert_int_implicit, convert_int_ptr and convert_ptr and state is check, warn or dont.

Due to the serious nature of implicit pointer to integer, implicit pointer to pointer conversions and undefined explicit pointer to pointer conversions, such conversions are flagged as errors by default. These conversion checks are not controlled by the global conversion analysis pragma above, but must be controlled by the relevant individual pragmas given in sections §3.2.2 and §4.5.

3.2.1. Integer to integer conversions

All integer to integer conversions are allowed in C, however some can result in a loss of accuracy and so may be usefully detected. For example, conversions from int to long never result in a loss of accuracy, but conversions from long to int may. The detection of these shortening conversions is controlled by:

#pragma TenDRA conversion analysis ( int-int ) status

Checks on explicit conversions and implicit conversions may be controlled independently using:

#pragma TenDRA conversion analysis ( int-int explicit ) status

and

#pragma TenDRA conversion analysis ( int-int implicit ) status

Objects of enumerated type are specified by the ISO C standard to be compatible with an implementation-defined integer type. However assigning a value of a different integral type other then an appropriate enumeration constant to an object of enumeration type is not really in keeping with the spirit of enumerations. The check to detect the implicit integer to enum type conversions which arise from such assignments is controlled using:

#pragma TenDRA conversion analysis ( int-enum implicit ) status

Note that only implicit conversions are flagged; if the conversion is made explicit, by using a cast, no errors are raised.

As usual status must be replaced by on, warning or off in all the pragmas listed above.

The interaction of the integer conversion checks with the integer promotion and arithmetic rules is an extremely complex issue which is further discussed in Chapter 4.

#pragma TenDRA conversion analysis (int-int explicit) on
#pragma TenDRA conversion analysis (int-int implicit) on

will check for unsafe explicit or implicit conversions between arithmetic types. Similarly conversions between pointers and arithmetic types can be checked using:

#pragma TenDRA conversion analysis (int-pointer explicit) on
#pragma TenDRA conversion analysis (int-pointer implicit) on

or equivalently:

#pragma TenDRA conversion analysis (pointer-int explicit) on
#pragma TenDRA conversion analysis (pointer-int implicit) on

3.2.2. Pointer to integer and integer to pointer conversions

Integer to pointer and pointer to integer conversions are generally unportable and should always be specified by means of an explicit cast. The exception is that the integer zero and null pointers are deemed to be inter-convertible. As in the integer to integer conversion case, explicit and implicit pointer to integer and integer to pointer conversions may be controlled separately using:

#pragma TenDRA conversion analysis ( int-pointer explicit ) status

and

#pragma TenDRA conversion analysis ( int-pointer implicit ) status

or both checks may be controlled together by:

#pragma TenDRA conversion analysis ( int-pointer ) status

Where status may be on, warning or off and pointer-int may be substituted for int-pointer.

3.2.3. Pointer to pointer conversions

According to the ISO C standard, section 6.3.4, the only legal pointer to pointer conversions are explicit conversions between:

a pointer to an object or incomplete type and a pointer to a different object or incomplete type. The resulting pointer may not be valid if it is improperly aligned for the type pointed to;
a pointer to a function of one type and a pointer to a function of another type. If a converted pointer, used to call a function, has a type that is incompatible with the type of the called function, the behaviour is undefined.

Except for conversions to and from the generic pointer which are discussed below, all other conversions, including implicit pointer to pointer conversions, are extremely unportable.

All pointer to pointer conversion may be flagged as errors using:

#pragma TenDRA conversion analysis ( pointer-pointer ) status

Explicit and implicit pointer to pointer conversions may be controlled separately using:

#pragma TenDRA conversion analysis ( pointer-pointer explicit ) status

and

#pragma TenDRA conversion analysis ( pointer-pointer implicit ) status

where, as before, status may be on, warning or off.

Conversion between a pointer to a function type and a pointer to a non-function type is undefined by the ISO C standard and should generally be avoided. The checker can however be configured to treat function pointers as object pointers for conversion using:

#pragma TenDRA function pointer as pointer permit

Unless explicitly stated to the contrary, throughout the rest of the document where permit appears in a pragma statement it represents one of allow (allow the construct and do not produce errors), warning (allow the construct but produce warnings when it is detected), or disallow (produce errors if the construct is detected) Here there are three options for permit: allow (do not produce errors or warnings for function pointer ↔ pointer conversions); warning (produce a warning when function pointer ↔ pointer conversions are detected); or disallow (produce an error for function pointer ↔ pointer conversions).

The generic pointer, void *, is a special case. All conversions of pointers to object or incomplete types to or from a generic pointer are allowed. Some older dialects of C used char * as a generic pointer. This dialect feature may be allowed, allowed with a warning, or disallowed using the pragma:

#pragma TenDRA compatible type : char * == void * permit

Where permit is allow, warning or disallow as before.

3.2.4. Additional conversions

There are some further variants which can be used to enable useful sets of conversion checks. For example:

#pragma TenDRA conversion analysis (int-int) on

enables both implicit and explicit arithmetic conversion checks. The directives:

#pragma TenDRA conversion analysis (int-pointer) on
#pragma TenDRA conversion analysis (pointer-int) on
#pragma TenDRA conversion analysis (pointer-pointer) on

are equivalent to their corresponding explicit forms (because the implicit forms are illegal by default). The directive:

#pragma TenDRA conversion analysis on

is equivalent to the four directives just given. It enables checks on implicit and explicit arithmetic conversions, explicit arithmetic to pointer conversions and explicit pointer conversions.

The default settings for these checks are determined by the implicit and explicit conversions allowed in C++. Note that there are differences between the conversions allowed in C and C++. For example, an arithmetic type can be converted implicitly to an enumeration type in C, but not in C++. The directive:

#pragma TenDRA conversion analysis (int-enum implicit) on

can be used to control the status of this conversion. The level of severity for an error message arising from such a conversion is the maximum of the severity set by this directive and that set by the int-int implicit directive above.

The implicit pointer conversions described above do not include conversions to and from the generic pointer void *, which have their own controlling directives. A pointer of type void * can be converted implicitly to another pointer type in C but not in C++; this is controlled by the directive:

#pragma TenDRA++ conversion analysis (void*-pointer implicit) on

The reverse conversion, from a pointer type to void * is allowed in both C and C++, and has a controlling directive:

#pragma TenDRA++ conversion analysis (pointer-void* implicit) on

In ISO C and C++, a function pointer can only be cast to other function pointers, not to object pointers or void *. Many dialects however allow function pointers to be cast to and from other pointers. This behaviour can be controlled using the directive:

#pragma TenDRA function pointer as pointer allow

which causes function pointers to be treated in the same way as all other pointers.

The integer conversion checks described above only apply to unsafe conversions. A simple-minded check for shortening conversions is not adequate, as is shown by the following example:

char a = 1, b = 2 ;
			char c = a + b ;

the sum a + b is evaluated as an int which is then shortened to a char. Any check which does not distinguish this sort of "safe" shortening conversion from unsafe shortening conversions such as:

int a = 1, b = 2 ;
			char c = a + b ;

is not likely to be very useful. The producer therefore associates two types with each integral expression; the first is the normal, representation type and the second is the underlying, semantic type. Thus in the first example, the representation type of a + b is int, but semantically it is still a char. The conversion analysis is based on the semantic types.

The C producer supports a directive:

#pragma TenDRA keyword identifier for type representation

whereby a keyword can be introduced which can be used to explicitly declare a type with given representation and semantic components. Unfortunately this makes the C++ grammar ambiguous, so it has not yet been implemented in the C++ producer.

It is possible to allow individual conversions by means of conversion tokens. A procedure token which takes one rvalue expression program parameter and returns an rvalue expression, such as:

#pragma token PROC ( EXP : t : ) EXP : s : conv #

can be regarded as mapping expressions of type t to expressions of type s. The directive:

#pragma TenDRA conversion identifier-list allow

can be used to nominate such a token as a conversion token. That is to say, if the conversion, whether explicit or implicit, from t to s cannot be done by other means, it is done by applying the token conv, so:

t a ;
			s b = a ;		// maps to conv ( a )

Note that, unlike conversion functions, conversion tokens can be applied to any types.

3.2.5. Example: 64-bit portability issues

64-bit machines form the next frontier of program portability. Most of the problems involved in 64-bit portability are type conversion problems. The assumptions that were safe on a 32-bit machine are not necessarily true on a 64-bit machine - int may not be the same size as long, pointers may not be the same size as int, and so on. This example illustrates the way in which the checker's conversion analysis tests can detect potential 64-bit portability problems.

Consider the following code:

#include <stdio.h>
void print ( string, offset, scale )
char *string;
unsigned int offset;
int scale;
{
	string += ( scale * offset );
	( void ) puts ( string );
	return;
}

int main ()
{
	char *s = "hello there";
	print ( s + 4, 2U, -2 );
	return ( 0 );
}

This appears to be fairly simple - the offset of 2U scaled by -2 cancels out the offset in s + 4, so the program just prints hello there. Indeed, this is what happens on most machines. When ported to a particular 64-bit machine, however, it core dumps. The fairly subtle reason is that the composite offset, scale * offset, is actually calculated as an unsigned int by the ISO C arithmetic conversion rules. So the answer is not -4. Strictly speaking it is undefined, but on virtually all machines it will be UINT_MAX - 3. The fact that adding this offset to string is equivalent to adding -4 is only true on machines on which pointers have the same size as unsigned int. If a pointer contains 64 bits and an unsigned int contains 32 bits, the result is 232 bytes out.

So the error occurs because of the failure to spot that the offset being added to string is unsigned. All mixed integer type arithmetic involves some argument conversion. In the case above, scale is converted to an unsigned int and that is multiplied by offset to give an unsigned int result. If the implicit int->int conversion checks (see §3.2.1) are enabled, this conversion is detected and the problem may be avoided.

3.3. Function type checking

The importance of function type checking in C lies in the conversions which can result from type mismatches between the arguments in a function call and the parameter types assumed by its definition or between the specified type of the function return and the values returned within the function definition. Until the introduction of function prototypes into ISO standard C, there was little scope for detecting the correct typing of functions. Traditional C allows for absolutely no type checking of function arguments, so that totally bizarre functions, such as:

int f ( n ) int n ; {
	return ( f ( "hello", "there" ) ) ;
}

are allowed, although their effect is undefined. However, the move to fully prototyped programs has been relatively slow. This is partially due to an understandable reluctance to change existing, working programs, but the desire to maintain compatibility with existing C compilers, some of which still do not support prototypes, is also a powerful factor. Prototypes are allowed in the checker's default mode but tdfc2 can be configured to allow, allow with a warning or disallow prototypes, using:

#pragma TenDRA prototype permit

where permit is allow, disallow or warning.

Even if prototypes are not supported the checker has a facility, described below, for detecting incorrectly typed functions.

3.3.1. Type checking non-prototyped functions

The checker offers a method for applying prototype-like checks to traditionally defined functions, by introducing the concept of weak prototypes. A weak prototype contains function parameter type information, but has none of the automatic argument conversions associated with a normal prototype. Instead weak prototypes imply the usual argument promotion passing rules for non-prototyped functions. The type information required for a weak prototype can be obtained in three ways:

A weak prototype may be declared using the syntax:
```
int f WEAK ( char, char * ) ;
```
where WEAK represents any keyword which has been introduced using:
```
#pragma TenDRA keyword WEAK for weak
```
An alternative definition of the keyword must be provided for other compilers. For example, the following definition would make system compilers interpret weak prototypes as normal (strong) prototypes:
```
#ifdef __TenDRA__
#pragma TenDRA keyword WEAK for weak
#else
#define WEAK
#endif
```
The difference between conventional prototypes and weak prototypes can be illustrated by considering the normal prototype for f:
```
int f ( char, char * ) ;
```
When the prototype is present, the first argument to f would be passed as a char. Using the weak prototype, however, results in the first argument being passed as the integral promotion of char, that is to say, as an int.

There is one limitation on the declaration of weak prototypes - declarations of the form:
```
int f WEAK() ;
```
are not allowed. If a function has no arguments, this should be stated explicitly as:
```
int f WEAK( void ) ;
```
whereas if the argument list is not specified, weak prototypes should be avoided and a traditional declaration used instead:
```
extern int f ();
```
The checker may be configured to allow, allow with a warning or disallow weak prototype declarations using:
```
#pragma TenDRA prototype ( weak ) permit
```
where permit is replaced by allow, warning or disallow as appropriate. Weak prototypes are not permitted in the default mode.
Information can be deduced from a function definition. For example, the function definition:
```
int f ( c, s )
char c ;
char *s ; { ... }
```
is said to have weak prototype:
```
int f WEAK( char, char * ) ;
```
The checker automatically constructs a weak prototype for each traditional function definition it encounters and if the weak prototype analysis mode is enabled (see below) all subsequent calls of the function are checked against this weak prototype.

For example, in the bizarre function in §3.3, the weak prototype:
```
int f WEAK ( int ) ;
```
is constructed for f. The subsequent call to f:
```
f ( "hello", "there" ) ;
```
is then rejected by comparison with this weak prototype - not only is f called with the wrong number of arguments, but the first argument has a type incompatible with (the integral promotion of) int.
Information may be deduced from the calls of a function. For example, in:
```
extern void f ();
void g ()
{
	f ( 3 );
	f ( "hello" );
}
```
we can infer from the first call of f that f takes one integral argument. We cannot deduce the type of this argument, only that it is an integral type whose promotion is int (since this is how the argument is passed). We can therefore infer a partial weak prototype for f:
```
void f WEAK ( t );
```
for some integral type t which promotes to int. Similarly, from the second call of f we can infer the weak prototype:
```
void f WEAK ( char * );
```
(the argument passing rules are much simpler in this case). Clearly the two inferred prototypes are incompatible, so an error is raised.

Note that prototype inferred from function calls alone cannot ensure that the uses of the function within a source file are correct, merely that they are consistent. The presence of an explicit function declaration or definition is required for a definitive "right" prototype.

Null pointers cause particular problems with weak prototypes inferred from function calls. For example, in:
```
#include <stdio.h>
extern void f ();
void g () {
	f ( "hello" );
	f ( NULL );
}
```
the argument in the first call of f is char * whereas in the second it is int (because NULL is defined to be 0). Whereas NULL can be converted to char *, it is not necessarily passed to procedures in the same way (for example, it may be that pointers have 64 bits and ints have 32 bits). It is almost always necessary to cast NULL to the appropriate pointer type in weak procedure calls.

Functions for which explicitly declared weak prototypes are provided are always type-checked by the checker. Weak prototypes deduced from function declarations or calls are used for type checking if the weak prototype analysis mode is enabled using:

#pragma TenDRA weak prototype analysis status

Where status is one of on, warning and off as usual. Weak prototype analysis is not performed in the default mode.

There is also an equivalent command line option of the form -X:weak_proto=state, where state can be check, warn or dont.

This section ends with two examples which demonstrate some of the less obvious consequences of weak prototype analysis.

3.3.1.1. Example 1: An obscure type mismatch

As stated above, the promotion and conversion rules for weak prototypes are precisely those for traditionally declared and defined functions. Consider the program:

void f ( n )
long n;
{
	printf ( "%ld\n", n );
}
void g () {
	f ( 3 );
}

The literal constant 3 is an int and hence is passed as such to f. f is however expecting a long, which can lead to problems on some machines. Introducing a strong prototype declaration of f for those compilers which understand them:

#ifdef __STDC__
void f ( long );
#endif

will produce correct code - the arguments to a function declared with a prototype are converted to the appropriate types, so that the literal is actually passed as 3L. This solves the problem for compilers which understand prototypes, but does not actually detect the underlying error. Weak prototypes, because they use the traditional argument passing rules, do detect the error. The constructed weak prototype:

void f WEAK ( long ) ;

conveys the type information that f is expecting a long, but accepts the function arguments as passed rather than converting them. Hence, the error of passing an int argument to a function expecting a long is detected.

Many programs, seeking to have prototype checks while preserving compilability with non-prototype compilers, adopt a compromise approach of traditional definitions plus prototype declarations for those compilers which understand them, as in the example above. While this ensures correct argument passing in the prototype case, as the example shows it may obscure errors in the non-prototype case.

3.3.1.2. Example 2: Weak prototype checks in defined programs

In most cases a program which fails to compile with the weak prototype analysis enabled is undefined. ISO standard C does however contain an anomalous rule on equivalence of representation. For example, in:

extern void f () ;
void g () {
	f ( 3 ) ;
	f ( 4U ) ;
}

the TenDRA checker detects an error - in one instance f is being passed an int, whereas in the other it is being passed an unsigned int. However, the ISO C standard states that, for values which fit into both types, the representation of a number as an int is equal to that as an unsigned int, and that values with the same representation are interchangeable in procedure arguments. Thus the program is defined. The justification for raising an error or warning for this program is that the prototype analysis is based on types, not some weaker notion of equivalence of representation. The program may be defined, but it is not type correct.

Another case in which a program is defined, but not correct, is where an unnecessary extra argument is passed to a function. For example, in:

void f ( a )
int a ;
{
	printf ( "%d\n", a ) ;
}
void g () {
	f ( 3, 4 ) ;
}

the call of f is defined, but is almost certainly a mistake.

3.3.2. Weak function prototypes

The C producer supports a concept, weak prototypes, whereby type checking can be applied to the arguments of a non-prototype function. This checking can be enabled using the directive:

#pragma TenDRA weak prototype analysis on

The concept of weak prototypes is not applicable to C++, where all functions are prototyped. The C++ producer does allow the syntax for explicit weak prototype declarations, but treats them as if they were normal prototypes. These declarations are denoted by means of a keyword, WEAK say, introduced by the directive:

#pragma TenDRA keyword identifier for weak

preceding the ( of the function declarator. The directives:

#pragma TenDRA prototype allow
#pragma TenDRA prototype (weak) allow

which can be used in the C producer to warn of prototype or weak prototype declarations, are similarly ignored by the C++ producer.

The C producer also allows the directives:

#pragma TenDRA argument type-id as type-id
#pragma TenDRA argument type-id as ...
#pragma TenDRA extra ... allow
#pragma TenDRA incompatible promoted function argument allow

which control the compatibility of function types. These directives are ignored by the C++ producer (some of them would make sense in the context of C++ but would over-complicate function overloading).

3.3.3. `printf` and `scanf` argument checking

The C producer includes a number of checks that the arguments in a call to a function in the printf or scanf families match the given format string. The check is implemented by using the directives:

#pragma TenDRA type identifier for ... printf
#pragma TenDRA type identifier for ... scanf

to introduce a type representing a printf or scanf format string. For most purposes this type is treated as const char *, but when it appears in a function declaration it alerts the producer that any extra arguments passed to that function should match the format string passed as the corresponding argument. The TenDRA API headers conditionally declare printf, scanf and similar functions in something like the form:

#ifdef __NO_PRINTF_CHECKS
		typedef const char *__printf_string ;
		#else
#pragma TenDRA type __printf_string for ... printf
		#endif

		int printf ( __printf_string, ... ) ;
		int fprintf ( FILE *, __printf_string, ... ) ;
		int sprintf ( char *, __printf_string, ... ) ;

These declarations can be skipped, effectively disabling this check, by defining the __NO_PRINTF_CHECKS macro.

These printf and scanf format string checks have not yet been implemented in the C++ producer due to presence of an alternative, type checked, I/O package - namely <iostream>. The format string types are simply treated as const char *.

3.3.4. Checking printf strings

Normally functions which take a variable number of arguments offer only limited scope for type checking. For example, given the prototype:

int execl ( const char *, const char *, ... ) ;

the first two arguments may be checked, but we have no hold on any subsequent arguments (in fact in this example they should all be const char *, but C does not allow this information to be expressed). Two classes of functions of this form, namely the printf and scanf families, are so common that they warrant special treatment. If one of these functions is called with a constant format string, then it is possible to use this string to deduce the types of the extra arguments that it is expect ing. For example, in:

printf ( "%ld", 4 ) ;

the format string indicates that printf is expecting a single additional argument of type long. We can therefore deduce a quasi-prototype which this particular call to printf should conform to, namely:

int printf ( const char *, long ) ;

In fact this is a mixture of a strong prototype and a weak prototype. The first argument comes from the actual prototype of printf, and hence is strong. All subsequent arguments correspond to the ellipsis part of the printf prototype, and are passed by the normal promotion rules. Hence the long component of the inferred prototype is weak (see 3.3.1). This means that the error in the call to printf - the integer literal is passed as an int when a long is expected - is detected.

In order for this check to take place, the function declaration needs to tell the checker that the function is like printf. This is done by introducing a special type, PSTRING say, to stand for a printf string, using:

#pragma TenDRA type PSTRING for ... printf

For most purposes this is equivalent to:

typedef const char *PSTRING;

except that when a function declaration:

int f ( PSTRING, ... );

is encountered the checker knows to deduce the types of the arguments corresponding to the ... from the PSTRING argument (the precise rules it applies are those set out in the XPG4 definition of fprintf). If this mechanism is used to apply printf style checks to user defined functions, an alternative definition of PSTRING for conventional compilers must be provided. For example:

#ifdef __TenDRA__
#pragma TenDRA type PSTRING for ... printf
#else
typedef const char *PSTRING;
#endif

There are similar rules with scanf in place of printf.

The TenDRA descriptions of the standard APIs use this mechanism to describe those functions, namely printf, fprintf and sprintf, and scanf, fscanf and sscanf which are of these forms. This means that the checks are switched on for these functions by default. However, these descriptions are under the control of a macro, __NO_PRINTF_CHECKS, which, if defined before stdio.h is included, effectively switches the checks off. This macro is defined in the start-up files for certain checking modes, so that the checks are disabled in these modes (see chapter 2). The checks can be enabled in these cases by #undef'ing the macro before including stdio.h. There are equivalent command-line options to tdfc2 of the form -X:printf=state, where state can be check or dont, which respectively undefine and define this macro.

3.3.5. Function return checking

Function returns normally present no difficulties. The return value is converted, as if by assignment, to the function return type, so that the problem is essentially one of type conversion (see 3.2). There is however one anomalous case. A plain return statement, without a return value, is allowed in functions returning a non-void type, the value returned being undefined. For example, in:

int f ( int c )
{
	if ( c ) return ( 1 ) ;
	return ;
}

the value returned when c is zero is undefined. The test for detecting such void returns is controlled by:

#pragma TenDRA incompatible void return permit

where permit may be allow, warning or disallow as usual.

There are also equivalent command line options to tdfc2 of the form -X:void_ret=state, where state can be check, warn or dont. Incompatible void returns are allowed in the default mode and of course, plain return statements in functions returning void are always legal.

This check also detects functions which do not contain a return statement, but fall out of the bottom of the function as in:

int f ( int c )
{
	if ( c ) return ( 1 ) ;
}

Occasionally it may be the case that such a function is legal, because the end of the function is not reached. Unreachable code is discussed in section §4.1.

3.3.6. Overloaded functions

Older dialects of C++ did not report ambiguous overloaded function resolutions, but instead resolved the call to the first of the most viable candidates to be declared. This behaviour can be controlled using the directive:

#pragma TenDRA++ ambiguous overload resolution allow

There are occasions when the resolution of an overloaded function call is not clear. The directive:

#pragma TenDRA++ overload resolution allow

can be used to report the resolution of any such call (whether explicit or implicit) where there is more than one viable candidate.

An interesting consequence of compiling C++ in a target independent manner is that certain overload resolutions can only be determined at install-time. For example, in:

int f ( int ) ;
			int f ( unsigned int ) ;
			int f ( long ) ;
			int f ( unsigned long ) ;

			int a = f ( sizeof ( int ) ) ;	// which f?

the type of the sizeof operator, size_t, is target dependent, but its promotion must be one of the types int, unsigned int, long or unsigned long. Thus the call to f always has a unique resolution, but what it is is target dependent. The equivalent directives:

#pragma TenDRA++ conditional overload resolution allow
#pragma TenDRA++ conditional overload resolution (complete) allow

can be used to warn about such target dependent overload resolutions. By default, such resolutions are only allowed if there is a unique resolution for each possible implementation of the argument types (note that, for simplicity, the possibility of long long implementation types is ignored). The directive:

#pragma TenDRA++ conditional overload resolution (incomplete) allow

can be used to allow target dependent overload resolutions which only have resolutions for some of the possible implementation types (if one of the f declarations above was removed, for example). If the implementation does not match one of these types then an install-time error is given.

There are restrictions on the set of candidate functions involved in a target dependent overload resolution. Most importantly, it should be possible to bring their return types to a common type, as if by a series of ?: operations. This common type is the type of the target dependent call. By this means, target dependent types are prevented from propagating further out into the program. Note that since sets of overloaded functions usually have the same semantics, this does not usually present a problem.

3.4. Overriding type checking

There are several commonly used features of C, some of which are even allowed by the ISO C standard, which can circumvent or hinder the type-checking of a program. The checker may be configured either to enforce the absence of these features or to support them with or without a warning, as described below.

3.4.1. Implicit Function Declarations

The ISO C standard states that any undeclared function is implicitly assumed to return int. For example, in ISO C:

int f ( int c ) {
	return ( g( c ) + 1 ) ;
}

the undeclared function g is inferred to have a declaration:

extern int g () ;

This can potentially lead to program errors. The definition of f would be valid if g actually returned double, but incorrect code would be produced. Again, an explicit declaration might give us more information about the function argument types, allowing more checks to be applied.

Therefore the best chance of detecting bugs in a program and ensuring its portability comes from having each function declared before it is used. This means detecting implicit declarations and replacing them by explicit declarations. By default implicit function declarations are allowed, however the pragma:

#pragma TenDRA implicit function declaration status

may be used to determine how tdfc2 handles implicit function declarations. Status is replaced by on to allow implicit declarations, warning to allow implicit declarations but to produce a warning when they occur, or off to prevent implicit declarations and raise an error where they would normally be used.

(There are also equivalent command-line options to tcc of the form -X:implicit_func=state, where state can be check, warn or dont.)

This test assumes an added significance in API checking. If a programmer wishes to check that a certain program uses nothing outside the POSIX API, then implicitly declared functions are a potential danger area. A function from outside POSIX could be used without being detected because it has been implicitly declared. Therefore, the detection of implicitly declared functions is vital to rigorous API checking.

3.4.2. Function Parameters

Many systems pass function arguments of differing types in the same way and programs are sometimes written to take advantage of this feature. The checker has a number of options to resolve type mismatches which may arise in this way and would otherwise be flagged as errors:

Type-type compatibility

When comparing function prototypes for compatibility, the function parameter types must be compared. If the parameter types would otherwise be incompatible, they are treated as compatible if they have previously been introduced with a type-type parameter compatibility pragma i.e.

#pragma TenDRA argument type-name as type-name

where type-name is the name of any type. This pragma is transitive and the second type in the pragma is taken to be the final type of the parameter.

Type-ellipsis compatibility

Two function prototypes with different numbers of arguments are compatible if:

both prototypes have an ellipsis;
each parameter type common to both prototypes is compatible;
each extra parameter type in the prototype with more parameters, is either specified in a type-ellipsis compatibility pragma or is type-type compatible (see above) to a type that is specified in a type-ellipsis compatibility.

Type-ellipsis compatibility is introduced using the pragma:

#pragma TenDRA argument type-name as ...

where again type-name is the name of any type.

Ellipsis compatibility

If, when comparing two function prototypes for compatibility, one has an ellipsis and the other does not, but otherwise the two types would be compatible, then if an `extra' ellipsis is allowed, the types are treated as compatible. The pragma controlling ellipsis compatibility is:

#pragma TenDRA extra ... permit

where permit may be allow, disallow or warning as usual.

3.4.3. Incompatible promoted function arguments

Mixing the use of prototypes with old-fashioned function definitions can result in incorrect code. For example, in the program below the function argument promotion rules are applied to the definition of f, making it incompatible with the earlier prototype (a is converted to the integer promotion of char, i.e. int).

int f ( char );
int f ( a )
char a;
{
...
}

An incompatible type error is raised in the default checking mode. The check for incompatible types which arise from mixtures of prototyped and non-prototyped function declarations and definitions is controlled using:

#pragma TenDRA incompatible promoted function argument

Permit may be replaced by allow, warning or disallow as normal. The parameter type in the resulting function type is the promoted parameter type.

4. Control Flow Analysis

4.1. Unreachable code analysis
4.2. Case fall through
4.3. Enumerations controlling switch statements
4.4. Empty if statements
4.5. Use of assignments as control expressions
4.6. Constant control expressions
4.7. Conditional and iteration statements
4.8. Exception analysis

The checker has a number of features which can be used to help track down potential programming errors relating to the use of variables within a source file and the flow of control through the program. Examples of this are detecting sections of unused code, and flagging expressions that depend upon the order of evaluation where the order is not defined.

4.1. Unreachable code analysis

Consider the following function definition:

int f ( int n )
{
	if ( n ) {
		return ( 1 );
	} else {
		return ( 0 );
	}
	return ( 2 );
}

The final return statement is redundant since it can never be reached. The test for unreachable code is controlled by:

#pragma TenDRA unreachable code permit

where permit is replaced by disallow to give an error if unreached code is detected, warning to give a warning, or allow to disable the test (this is the default).

There are also equivalent command-line options to tcc of the form -X:unreached=state, where state can be check, warn or dont.

Annotations to the code in the form of user-defined keywords may be used to indicate that a certain statement is genuinely reached or unreached. These keywords are introduced using:

#pragma TenDRA keyword REACHED for set reachable
#pragma TenDRA keyword UNREACHED for set unreachable

The statement REACHED then indicates that this portion of the program is actually reachable, whereas UNREACHED indicates that it is unreachable. For example, one way of fixing the program above might be to say that the final return is reachable (this is a blatant lie, but never mind). This would be done as follows:

int f ( int n ) {
	if ( n ) {
		return ( 1 );
	} else {
		return ( 0 );
	}
	REACHED
	return ( 2 );
}

An example of the use of UNREACHED might be in the function below which falls out of the bottom without a return statement. We might know that, because it is never called with c equal to zero, the end of the function is never reached. This could be indicated as follows:

int f ( int c ) {
	if ( c ) return ( 1 );
	UNREACHED
}

As always, if new keywords are introduced into a program then definitions need to be provided for conventional compilers. In this case, this can be done as follows:

#ifdef __TenDRA__
#pragma TenDRA keyword REACHED for set reachable
#pragma TenDRA keyword UNREACHED for set unreachable
#else
#define REACHED
#define UNREACHED
#endif

The directive:

#pragma TenDRA unreachable code allow

enables a flow analysis check to detect unreachable code. It is possible to assert that a statement is reached or not reached by preceding it by a keyword introduced by one of the directives:

#pragma TenDRA keyword identifier for set reachable
#pragma TenDRA keyword identifier for set unreachable

The fact that certain functions, such as exit, do not return a value can be exploited in the flow analysis routines. The equivalent directives:

#pragma TenDRA bottom identifier
#pragma TenDRA++ type identifier for bottom

can be used to introduce a typedef declaration for the type, bottom, returned by such functions. The TenDRA API headers declare exit and similar functions in this way, for example:

#pragma TenDRA bottom __bottom
__bottom exit ( int ) ;
__bottom abort ( void ) ;

The bottom type is compatible with void in function declarations to allow such functions to be redeclared in their conventional form.

4.2. Case fall through

Another flow analysis check concerns fall through in case statements. For example, in:

void f ( int n )
{
	switch ( n ) {
		case 1 : puts ( "one" );
		case 2 : puts ( "two" );
	}
}

the control falls through from the first case to the second. This may be due to an error in the program (a missing break statement), or be deliberate. Even in the latter case, the code is not particularly maintainable as it stands - there is always the risk when adding a new case that it will interrupt this carefully contrived flow. Thus it is customary to comment all case fall throughs to serve as a warning.

In the default mode, the TenDRA C checker ignores all such fall throughs. A check to detect fall through in case statements is controlled by:

#pragma TenDRA fall into case permit

where permit is allow (no errors), warning (warn about case fall through) or disallow (raise errors for case fall through).

There are also equivalent command-line options to tcc of the form -X:fall_thru=state, where state can be check, warn or dont.

Deliberate case fall throughs can be indicated by means of a keyword, which has been introduced using:

#pragma TenDRA keyword FALL_THROUGH for fall into case

Then, if the example above were deliberate, this could be indicated by:

void f ( int n ) {
	switch ( n ) {
		case 1 : puts ( "one" );
		FALL_THROUGH
		case 2 : puts ( "two" );
	}
}

Note that FALL_THROUGH is inserted between the two cases, rather than at the end of the list of statements following the first case.

If a keyword is introduced in this way, then an alternative definition needs to be introduced for conventional compilers. This might be done as follows:

#ifdef __TenDRA__
#pragma TenDRA keyword FALL_THROUGH for fall into case
#else
#define FALL_THROUGH
#endif

4.3. Enumerations controlling switch statements

Enumerations are commonly used as control expressions in switch statements. When case labels for some of the enumeration constant belonging to the enumeration type do not exist and there is no default label, the switch statement has no effect for certain possible values of the control expression. Checks to detect such switch statements are controlled by:

#pragma TenDRA enum switch analysis status

where status is on (raise an error), warning (produce a warning), or off (the default mode when no errors are produced).

4.4. Empty if statements

Consider the following C statements:

if ( var1 == 1 ) ;
	var2 = 0 ;

The conditional statement serves no purpose here and the second statement will always be executed regardless of the value of var1. This is almost certainly not what the programmer intended to write. A test for if statements with no body is controlled by:

#pragma TenDRA extra ; after conditional permit

with the usual allow (this is the default setting), warning and disallow options for permit.

4.5. Use of assignments as control expressions

Using the C assignment operator, =, when the equality operator == was intended is an extremely common problem. The pragma:

#pragma TenDRA assignment as bool permit

is used to control the treatment of assignments used as the controlling expression of a conditional statement or a loop, e.g.

if( var = 1 ) { ...

The options for permit are allow, warning and disallow. The default setting allows assignments to be used as control statements without raising an error.

4.6. Constant control expressions

Statements with constant control expressions are not really conditional at all since the value of the control statement can be evaluated statically. Although this feature is sometimes used in loops, relying on a break, goto or return statement to end the loop, it may be useful to detect all constant control expressions to check that they are deliberate. The check for statically constant control expressions is controlled using:

#pragma TenDRA const conditional permit

where permit may be replaced by disallow to give an error when constant control expressions are encountered, warning to replace the error by a warning, or the check may be switched off using allow (this is the default).

4.7. Conditional and iteration statements

The directive:

#pragma TenDRA const conditional allow

can be used to enable a check for constant expressions used in conditional contexts. A literal constant is allowed in the condition of a while , for or do statement to allow for such common constructs as:

while ( true ) {
	// while statement body
}

and target dependent constant expressions are allowed in the condition of an if statement, but otherwise constant conditions are reported according to the status of this check.

The common error of writing = rather than == in conditions can be detected using the directive:

#pragma TenDRA assignment as bool allow

which can be used to disallow such assignment expressions in contexts where a boolean is expected. The error message can be suppressed by enclosing the assignment within parentheses.

Another common error associated with iteration statements, particularly with certain brace styles, is the accidental insertion of an extra semicolon as in:

for ( init ; cond ; step ) ;
{
	// for statement body
}

The directive:

#pragma TenDRA extra ; after conditional allow

can be used to enable a check for such suspicious empty iteration statement bodies (it actually checks for ;{).

4.8. Exception analysis

The ISO C++ rules do not require exception specifications to be checked statically. This is to facilitate the integration of large systems where a single change in an exception specification could have ramifications throughout the system. However it is often useful to apply such checks, which can be enabled using the directive:

#pragma TenDRA++ throw analysis on

This detects any potentially uncaught exceptions and other exception problems. In the error messages arising from this check, an uncaught exception of type ... means that an uncaught exception of an unknown type (arising, for example, from a function without an exception specification) may be thrown. For example:

void f ( int ) throw ( int ) ;
void g ( int ) throw ( long ) ;
void h ( int ) ;

void e () throw ( int )
{
	f ( 1 ) ;   // OK
	g ( 2 ) ;   // uncaught 'long' exception
	h ( 3 ) ;   // uncaught '...' exception
}

5.1. Order of evaluation

The ISO C standard specifies certain points in the expression syntax at which all prior expressions encountered are guaranteed to have been evaluated. These positions are called sequence points and occur:

after the arguments and function expression of a function call have been evaluated but before the call itself;
after the first operand of a logical &&, or || operator;
after the first operand of the conditional operator, ?:;
after the first operand of the comma operator;
at the end of any full expression (a full expression may take one of the following forms: an initialiser; the expression in an expression statement; the controlling expression in an if, while, do or switch statement; each of the three optional expressions of a for statement; or the optional expression of a return statement).

Between two sequence points however, the order in which the operands of an operator are evaluated, and the order in which side effects take place is unspecified - any order which conforms to the operator precedence rules above is permitted. For example:

var = i + arr[ i++ ] ;

may evaluate to different values on different machines, depending on which argument of the + operator is evaluated first. The checker can detect expressions which depend on the order of evaluation of sub-expressions between sequence points and these are flagged as errors or warnings when the variable analysis is enabled.

5.2. Operator precedence

The ISO C standard section 6.3, provides a set of rules governing the order in which operators within expressions should be applied. These rules are said to specify the operator precedence and are summarised in the table over the page. Operators on the same line have the same precedence and the rows are in order of decreasing precedence. Note that the unary +, -, * and & operators have higher precedence than the binary forms and thus appear higher in the table.

The precedence of operators is not always intuitive and often leads to unexpected results when expressions are evaluated. A particularly common example is to write:

if ( var & TEST == 1) { ...
}
else { ...

assuming that the control expression will be evaluated as:

( ( var & TEST ) == 1 )

However, the == operator has a higher precedence than the bitwise & operator and the control expression is evaluated as:

( var & ( TEST == 1 ) )

which in general will give a different result.

Operators	Precedence
`function call()` `[]` `->` `.` `++(postfix)` `--(postfix)`	highest
`!` `~` `++` `--` `+` `-` `*` `&` `(type)` `sizeof`
`*` `/` `%`
`+`(binary) `-`(binary)
`<<` `>>`
`<` `<=` `>` `>=`
`==` `!=`
`&`
`^`
`\|`
`&&`
`\|\|`
`?:`
`=` `+=` `-=` `*=` `/=` `%=` `&=` `^=` `\|=` `<<=` `>>=`
,	lowest

Table 2. ISO C Rules for Operator Precedence

The TenDRA C checker can be configured to flag expressions containing certain operators whose precedence is commonly confused, namely:

&& versus ||
<< and >> versus + and -
& versus == != < > <= >= + and -
^ versus & == |= < > <= >= + and -
| versus ^ & == |= < > <= >= + and -

The check is switched off by default. The the directive:

#pragma TenDRA operator precedence analysis on

can be used to enable a check for expressions where the operator precedence is not necessarily what might be expected. The intended precedence can be clarified by means of explicit parentheses. The precedence levels checked are as follows:

&& versus ||.
<< and >> versus binary + and -.
Binary & versus binary +, -, ==, !=, >, >=, < and <=.
^ versus binary &, +, -, ==, !=, >, >=, < and <=.
| versus binary ^, &, +, -, ==, !=, >, >=, < and <=.

Also checked are expressions such as a < b < c which do not have their normal mathematical meaning. For example, in:

d = a << b + c ;	// precedence is a << ( b + c )

the precedence is counter-intuitive, although strangely enough, it isn't in:

cout << b + c ;		// precedence is cout << ( b + c )

Other dubious arithmetic operations can be checked for using the directive:

#pragma TenDRA integer operator analysis on

This includes checks for operations, such as division by a negative value, which are implementation dependent, and those such as testing whether an unsigned value is less than zero, which serve no purpose. Similarly the directive:

#pragma TenDRA++ pointer operator analysis on

checks for dubious pointer operations. This includes very simple bounds checking for arrays and checking that only the simple literal 0 is used in null pointer constants:

char *p = 1 - 1 ;	// valid, but weird

The directive:

#pragma TenDRA integer overflow analysis on

is used to control the treatment of overflows in the evaluation of integer constant expressions. This includes the detection of division by zero.

5.3. Floating point equality

Due to the rounding errors that occur in the handling of floating point values, comparison for equality between two floating point values is a hazardous and unpredictable operation. Tests for equality of two floating point numbers are controlled by:

#pragma TenDRA floating equality permit

where permit is allow, warning or disallow. By default the check is switched off.

5.4. Operand of `sizeof`

According to the ISO C standard, section 6.3.3.4, the operand of the sizeof operator is not itself evaluated. If the operand has any side-effects these will not occur. When the variable analysis is enabled, the checker detects the use of expressions with side-effects in the operand of the sizeof operator.

6. Variable Analysis

6.1. Variable lifetime analysis
6.2. 5.6.2 Modification between sequence points
6.3. Unused variables
6.4. Values set and not used
6.5. Variable which has not been set is used
6.6. Variable shadowing
6.7. Overriding the variable analysis

The variable analysis checks are controlled by:

#pragma TenDRA variable analysis status

Where status is on, warning or off as usual. The checks are switched off in the default mode.

There are also equivalent command line options to tdfc2 of the form -X:variable=state, where state can be check, warn or dont.

The variable analysis is concerned with the evaluation of expressions and the use of local variables, including function arguments. Occasionally it may not be possible to statically perform a full analysis on an expression or variable and in these cases the messages produced indicate that there may be a problem. If a full analysis is possible a definite error or warning is produced. The individual checks are listed in sections 5.6.1 to 5.6.6 and section 5.7 describes the source annotations which can be used to fine-tune the variable analysis.

6.1. Variable lifetime analysis

The directive:

#pragma TenDRA variable analysis on

enables checks on the uses of automatic variables and function parameters. These checks detect:

If a variable is not used in its scope.
If the value of a variable is used before it has been assigned to.
If a variable is assigned to twice without an intervening use.
If a variable is assigned to twice without an intervening sequence point.

as illustrated by the variables a, b, c and d respectively in:

void f ()
{
	int a ;        // a never used
	int b ;
	int c = b ;    // b not initialised
	c = 0 ;        // c assigned to twice
	int d = 0 ;
	d = ++d ;      // d assigned to twice
}

The second, and more particularly the third, of these checks requires some fairly sophisticated flow analysis, so any hints which can be picked up from exhaustive switch statements etc. is likely to increase the accuracy of the errors detected.

In a non-static member function the various non-static data members are analysed as if they were automatic variables. It is checked that each member is initialised in a constructor. A common source of initialisation problems in a constructor is that the base classes and members are initialised in the canonical order of virtual bases, non-virtual direct bases and members in the order of their declaration, rather than in the order in which their initialisers appear in the constructor definition. Therefore a check that the initialisers appear in the canonical order is also applied.

It is possible to change the state of a variable during the variable analysis using the directives:

#pragma TenDRA set expression
#pragma TenDRA discard expression

The first asserts that the variable given by the expression has been assigned to; the second asserts that the variable is not used. An alternative way of expressing this is by means of keywords:

SET ( expression )
DISCARD ( expression )

introduced using the directives.

#pragma TenDRA keyword identifier for set
#pragma TenDRA keyword identifier for discard variable

respectively. These expressions can appear in expression statements and as the first argument of a comma expression.

The variable flow analysis checks have not yet been completely implemented. They may not detect errors in certain circumstances and for extremely convoluted code may occasionally give incorrect errors.

6.2. 5.6.2 Modification between sequence points

The ISO C standard states that if an object is modified more than once, or is modified and accessed other than to determine the new value, between two sequence points, then the behaviour is undefined. Thus the result of:

var = arr[i++] + i++ ;

is undefined, since the value of i is being incremented twice between sequence points. This behaviour is detected by the variable analysis.

6.3. Unused variables

As part of the variable analysis, a simple test applied to each local variable at the end of its scope to determine whether it has been used in that scope. For example, in:

int f ( int n )
{
	int r;
	return ( 0 );
}

both the function argument n and the local variable r are unused.

6.4. Values set and not used

This is a more complex test since it is applied to every instance of setting the variable. For example, in:

int f ( int n )
{
	int r = 1;
	r = 5;
	return ( r );
}

the first value r is set to 1 and is not used before it is overwritten by 5 (this second value is used however). This test requires some flow analysis. For example, if the program is modified to:

int f ( int n )
{
	int r = 1;
	if ( n == 3 ) {
		r = 5;
	}
	return ( r );
}

the initial value of r is used when n != 3, so no error is detected. However in:

int f ( int n )
{
	int r = 1;
	if ( n == 3 ) {
		r = 5;
	} else {
		r = 6;
	}
	return ( r );
}

the initial value of r is overwritten regardless of the result of the conditional, and hence is unused.

6.5. Variable which has not been set is used

This test also requires some flow analysis, for example in:

int f ( int n )
{
	int r;
	if ( n == 3 ) {
		r = 5;
	}
	return ( r );
}

the use of the variable r as a return value is reported because there are paths leading to this statement in which r is not set (i.e. when n != 3). However, in:

int f ( int n )
{
	int r;
	if ( n == 3 ) {
		r = 5;
	} else {
		r = 6;
	}
	return ( r );
}

r is always set before it is used, so no error is detected.

6.6. Variable shadowing

It is quite legal in C to have a variable in an inner scope, with the same name as a variable in an outer scope. These variables are distinct and whilst in the inner scope, the declaration in the outer scope is not visible - it is shadowed by the local variable of the same name. Confusion can arise if this was not what the programmer intended. The checker can therefore be configured to detect shadowing in three cases: a local variable shadowing a global variable; a local variable shadowing a local variable with a wider scope and a local variable shadowing a typedef name, by using:

#pragma TenDRA variable hiding analysis status

If status is on an error is raised when a local variable that shadows another variable is declared, if warning is used the error is replaced by a warning and the off option restores the default behaviour (shadowing is permitted and no errors are produced).

The directive:

#pragma TenDRA variable hiding analysis on

can be used to enable a check for hiding of other variables and, in member functions, data members, by local variable declarations.

6.7. Overriding the variable analysis

Although many of the problems discovered by the variable analysis are genuine mistakes, some may be as the result of deliberate decisions by the program writer. In this case, more information needs to be provided to the checker to convey the programmer's intentions. Four constructs are provided for this purpose: the discard variable, the set variable, the exhaustive switch and the non-returning function.

6.7.1. 5.7.1 Discarding variables

Actively discarding a variable counts as a use of that variable in the variable analysis, and so can be used to suppress messages concerning unused variables and values assigned to variables. There are two distinct methods to indicate that the variable x is to be discarded. The first uses a pragma:

#pragma TenDRA discard x;

which the checker treats as if it were a C statement, ending in a semicolon. Having a statement which is noticed by one compiler but ignored by another can lead to problems. For example, in:

if ( n == 3 )
#pragma TenDRA discard x;
	puts ( "n is three" );

tdfc2 believes that x is discarded if n == 3 and the message is always printed, whereas other compilers will ignore the #pragma statement and think that the message is printed if n == 3. An alternative, in many ways neater, solution is to introduce a new keyword for discarding variables. For example, to introduce the keyword DISCARD for this purpose, the pragma:

#pragma TenDRA keyword DISCARD for discard variable

should be used. The variable x can then be discarded by means of the statement:

DISCARD ( x );

A dummy definition for DISCARD to use with normal compilers needs to be given in order to maintain compilability with those compilers. For example, a complete definition of DISCARD might be:

#ifdef __TenDRA__
#pragma TenDRA keyword DISCARD for discard variable
#else
#define DISCARD(x) (( void ) 0 )
#endif

Discarding a variable changes its assignment state to unset, so that any subsequent uses of the variable, without an intervening assignment to it, lead to a variable used before being set error. This feature can be exploited if the same variable is used for distinct purposes in different parts of its scope, by causing the variable analysis to treat the different uses separately. For example, in:

void f ( void )
{
	int i = 0;
	while ( i++ < 10 ) {
		puts ( "hello" );
	}
	while ( i++ < 10 ) {
		puts ( "goodbye" );
	}
}

which is intended to print both messages ten times, the two uses of i as a loop counter are independent - they could have been implemented with different variables. By discarding i after the first loop, the second loop can be analysed separately. In this way, the error of failing to reset i to 0 can be detected.

6.7.2. Setting variables

In addition to discarding variables, it is also possible to set them. In deliberately setting a variable, the programmer is telling the checker to assume that some value will always have been assigned to the variable by that point, so that any variable used without being set errors can be suppressed. This construct is particularly useful in programs with complex flow control, to help out the variable analysis. For example, in:

void f ( int n )
{
	int r;
	if ( n != 0 ) r = n;
	if ( n > 2 ) {
		printf ( "%d\n", r );
	}
}

r is only used if n > 2, in which case we also have n != 0, so that r has already been initialised. However, in its flow analysis, the TenDRA C checker treats all the conditionals it meets as if they were independent and does not look for any such complex dependencies (indeed it is possible to think of examples where such analysis would be impossible). Instead, it needs the programmer to clarify the flow of the program by asserting that r will be set if the second condition is true.

Programmers may assert that the variable, r, is set either by means of a pragma:

#pragma TenDRA set r;

or by using, for example:

SET ( r );

where SET is a keyword which has previously been introduced to stand for the variable setting construct using:

#pragma TenDRA keyword SET for set

(cf. DISCARD above).

6.7.3. Exhaustive switch statements

A special case of a flow control construct which may be used to set the value of a variable is a switch statement. Consider the program:

char *f ( int n )
{
	char *r;
	switch ( n ) {
		case 1 : r = "one"; break;
		case 2 : r = "two"; break;
		case 3 : r = "three"; break;
	}
	return ( r );
}

This leads to an error indicating that r is used but not set, because it is not set if n lies outside the three cases in the switch statement. However, the programmer might know that f is only ever called with these three values, and hence that r is always set before it is used. This information could be expressed by asserting that r is set at the end of the switch construct (see above), but it would be better to express the cause of this setting rather than just its effect. The reason why r is always set is that the switch statement is exhaustive - there are case statements for all the possible values of n.

Programmers may assert that a switch statement is exhaustive by means of a pragma immediately following it. For example, in the above case it would take the form:

....
switch ( n )
#pragma TenDRA exhaustive
{
	case 1 : r = "one"; break;
	....

Again, there is an option to introduce a keyword, EXHAUSTIVE say, for exhaustive switch statements using:

#pragma TenDRA keyword EXHAUSTIVE for exhaustive

Using this form, the example program becomes:

switch ( n ) EXHAUSTIVE {
	case 1 : r = "one"; break;

In order to maintain compatibility with existing compilers, a dummy definition for EXHAUSTIVE must be introduced for them to use. For example, a complete definition of EXHAUSTIVE might be:

#ifdef __TenDRA__
#pragma TenDRA keyword EXHAUSTIVE for exhaustive
#else
#define EXHAUSTIVE
#endif

6.7.4. Switch statements

A switch statement is said to be exhaustive if its control statement is guaranteed to take one of the values of its case labels, or if it has a default label. The TenDRA C and C++ producers allow a switch statement to be asserted to be exhaustive using the syntax:

switch ( cond ) EXHAUSTIVE {
	// switch statement body
}

where EXHAUSTIVE is either the directive:

#pragma TenDRA exhaustive

or a keyword introduced using:

#pragma TenDRA keyword identifier for exhaustive

Knowing whether a switch statement is exhaustive or not means that checks relying on flow analysis (including variable usage checks) can be applied more precisely.

In certain circumstances it is possible to deduce whether a switch statement is exhaustive or not. For example, the directive:

#pragma TenDRA enum switch analysis on

enables a check on switch statements on values of enumeration type. Such statements should be exhaustive, either explicitly by using the EXHAUSTIVE keyword or declaring a default label, or implicitly by having a case label for each enumerator. Conversely, the value of each case label should equal the value of an enumerator. For the purposes of this check, boolean values are treated as if they were declared using an enumeration type of the form:

enum bool { false = 0, true = 1 } ;

A common source of errors in switch statements is the fall-through from one case or default statement to the next. A check for this can be enabled using:

#pragma TenDRA fall into case allow

case or default labels where fall-through from the previous statement is intentional can be marked by preceding them by a keyword, FALL_THRU say, introduced using the directive:

#pragma TenDRA keyword identifier for fall into case

6.7.5. Non-returning functions

Consider a modified version of the program above, in which calls to f with an argument other than 1, 2 or 3 cause an error message to be printed:

extern void error ( const char * );
char *f ( int n )
{
	char *r;
	switch ( n ) {
		case 1 : r = "one"; break;
		case 2 : r = "two"; break;
		case 3 : r = "three"; break;
		default : error( "Illegal value" );
	}
	return ( r );
}

This causes an error because, in the default case, r is not set before it is used. However, depending on the semantics of the function, error, the return statement may never be reached in this case. This is because the fact that a function returns void can mean one of two distinct things:

That the function does not return a value. This is the usual meaning of void.
That the function never returns, for example the library function, exit, uses void in this sense.

If error never returns, then the program above is correct; otherwise, an unset value of r may be returned.

Therefore, we need to be able to declare the fact that a function never returns. This is done by introducing a new type to stand for the non-returning meaning of void (some compilers use volatile void for this purpose). This is done by means of the pragma:

#pragma TenDRA type VOID for bottom

to introduce a type VOID (although any identifier may be used) with this meaning. The declaration of error can then be expressed as:

extern VOID error ( const char * );

In order to maintain compatibility with existing compilers a definition of VOID needs to be supplied. For example:

#ifdef __TenDRA__
#pragma TenDRA type VOID for bottom
#else
typedef void VOID;
#endif

The largest class of non-returning functions occurs in the various standard APIs - for example, exit and abort. The TenDRA descriptions of these APIs contain this information. The information that a function does not return is taken into account in all flow analysis contexts. For example, in:

#include <stdlib.h>

int f ( int n )
{
	exit ( EXIT_FAILURE );
	return ( n );
}

n is unused because the return statement is not reached (a fact that can also be determined by the unreachable code analysis in section 5.2).

6.7.6. Return statements

In C, but not in C++, it is possible to have a return statement without an expression in a function which does not return void. It is possible to enable this behaviour using the directive:

#pragma TenDRA incompatible void return allow

Note that this check includes the implicit return caused by falling off the end of a function. The effect of such a return statement is undefined. The C++ rule that falling off the end of main is equivalent to returning a value of 0 overrides this check.

7. Discard Analysis

7.1. Discarded function returns
7.2. Discarded computed values
7.3. Unused static variables and procedures
7.4. Discarded expressions
7.5. Overriding the discard analysis
1. 7.5.1. Discarding function returns and computed values
2. 7.5.2. Preserving unused statics

A couple of examples of what might be termed discard analysis have already been described - discarded (unused) local variables and discarded (unused) assignments to local variables (see section 5.6.4 and 5.6.5). The checker can perform three more types of discard analysis: discarded function returns, discarded computations and unused static variables and procedures. These three tests may be controlled as a group using:

#pragma TenDRA discard analysis status

where status is on, warning or off.

In addition, each of the component tests may be switched on and off independently using pragmas of the form:

#pragma TenDRA discard analysis (function return) status
#pragma TenDRA discard analysis (value) status
#pragma TenDRA discard analysis (static) status

There are also equivalent command line options to tcc of the form -X:test=state, where test can be discard_all, discard_func_ret, discard_value or unused_static, and state can be check, warn or dont. These checks are all switched off in the default mode.

Detailed descriptions of the individual checks follow in sections 5.8.1 - 5.8.3. Section 5.9 describes the facilities for fine-tuning the discard analysis.

7.1. Discarded function returns

Functions which return a value which is not used form the commonest instances of discarded values. For example, in:

#include <stdio.h>
int main ()
{
	puts ( "hello" );
	return ( 0 );
}

the function, puts, returns an int value, indicating whether an error has occurred, which is ignored.

7.2. Discarded computed values

A rarer instance of a discarded object, and one which is almost always an error, is where a value is computed but not used. For example, in:

int f ( int n )
{
	int r = 4;
	if ( n == 3 )
	{
		r == 5;
	}
	return ( r );
}

the value r == 5 is computed but not used. This is actually because it is a misprint for r = 5.

7.3. Unused static variables and procedures

The final example of discarded values, which perhaps more properly belongs with the variable analysis tests mentioned above, is for static objects which are unused in the source module in which they are defined. Of course this means that they are unused in the entire program. Such objects can usually be removed.

7.4. Discarded expressions

The directive:

#pragma TenDRA discard analysis on

can be used to enable a check for values which are calculated but not used. There are three checks controlled by this directive, each of which can be controlled independently. The directive:

#pragma TenDRA discard analysis (function return) on

checks for functions which return a value which is not used. The check needs to be enabled for both the declaration and the call of the function in order for a discarded function return to be reported. Discarded returns for overloaded operator functions are never reported. The directive:

#pragma TenDRA discard analysis (value) on

checks for other expressions which are not used. Finally, the directive:

#pragma TenDRA discard analysis (static) on

checks for variables with internal linkage which are defined but not used.

An unused function return or other expression can be asserted to be deliberately discarded by explicitly casting it to void or, equivalently, preceding it by a keyword introduced using the directive:

#pragma TenDRA keyword identifier for discard value

A static variable can be asserted to be deliberately unused by including it in list of identifiers in a directive of the form:

#pragma TenDRA suspend static identifier-list

7.5. Overriding the discard analysis

As with the variable analysis, certain constructs may be used to provide the checker with extra information about a program, to convey the programmer's intentions more clearly.

7.5.1. Discarding function returns and computed values

Unwanted function returns and, more rarely, discarded computed values, may be actively ignored to indicate to the discard analysis that the value is being discarded deliberately. This can be done using the traditional method of casting the value to void:

( void ) puts ( "hello" );

or by introducing a keyword, IGNORE say, for discarding a value. This is done using a pragma of the form:

#pragma TenDRA keyword IGNORE for discard value

The example discarded value then becomes:

IGNORE puts ( "hello" );

Of course it is necessary to introduce a definition of IGNORE for conventional compilers in order to maintain compilability. A suitable definition might be:

#ifdef __TenDRA__
#pragma TenDRA keyword IGNORE for discard value
#else
#define IGNORE ( void )
#endif

7.5.2. Preserving unused statics

Occasionally unused static values are introduced deliberately into programs. The fact that the static variables or procedures x, y and z are deliberately unused may be indicated by introducing the pragma:

#pragma TenDRA suspend static x y z

at the outer level after the definition of all three objects.

8. Preprocessing checks

8.1. Preprocessor directives
8.2. Indented Preprocessing Directives
8.3. Multiple macro definitions
8.4. Macro arguments
8.5. Unmatched quotes
8.6. Include depth
8.7. Text after #endif
8.8. Text after #
8.9. New line at end of file
8.10. Conditional Compilation
8.11. Target dependent conditional inclusion
8.12. Unused headers

This chapter describes tdfc2's capabilities for checking the preprocessing constructs that arise in C.

8.1. Preprocessor directives

By default, the TenDRA C checker understands those preprocessor directives specified by the ISO C standard, section 6.8, i.e. #if, #ifdef, #ifndef, #elif, #else, #endif, #error, #line and #pragma. As has been mentioned, #pragma statements play a significant role in the checker. While any recognised #pragma statements are processed, all unknown pragma statements are ignored by default. The check to detect unknown pragma statements is controlled by:

#pragma TenDRA unknown pragma permit

The options for permit are disallow (raise an error if an unknown pragma is encountered), warning (allow unknown pragmas with a warning), or allow (allow unknown pragmas without comment).

In addition, the common non-ISO preprocessor directives, #file, #ident, #assert, #unassert and #weak may be permitted using:

#pragma TenDRA directive dir allow

where dir is one of file, ident, assert, unassert or weak. If allow is replaced by warning then the directive is allowed, but a warning is issued. In either case, the modifier (ignore) may be added to indicate that, although the directive is allowed, its effect is ignored. Thus for example:

#pragma TenDRA directive ident (ignore) allow

causes the checker to ignore any #ident directives without raising any errors.

Finally, the directive dir can be disallowed using:

#pragma TenDRA directive dir disallow

Finally, the directive dir can be disallowed using:

#pragma TenDRA unknown directive allow

Any other unknown preprocessing directives cause the checker to raise an error in the default mode. The pragma may be used to force the checker to ignore such directives without raising any errors.

Disallow and warning variants are also available.

8.2. Indented Preprocessing Directives

The ISO C standard allows white space to occur before the # in a preprocessing directive, and between the # and the directive name. Many older preprocessors have problems with such directives. The checker's treatment of such directives can be set using:

#pragma TenDRA indented # directive permit

which detects white space before the # and:

#pragma TenDRA indented directive after # permit

which detects white space before the # and the directive name. The options for permit are allow, warning or disallow as usual. The default checking profile allows both forms of indented directives.

8.3. Multiple macro definitions

The ISO C standard states that, for two definitions of a function-like macro to be equal, both the spelling of the parameters and the macro definition must be equal. Thus, for example, in:

#define f( x ) ( x )
#define f( y ) ( y )

the two definitions of f are not equal, despite the fact that they are clearly equivalent. Tchk has an alternative definition of macro equality which allows for consistent substitution of parameter names. The type of macro equality used is controlled by:

#pragma TenDRA weak macro equality allow

where permit is allow (use alternative definition of macro equality), warning (as for allow but raise a warning), or disallow (use the ISO C definition of macro equality - this is the default setting).

More generally, the pragma:

#pragma TenDRA extra macro definition allow

allows macros to be redefined, both consistently and inconsistently. If the definitions are incompatible, the first definition is overwritten. This pragma has a disallow variant, which resets the check to its default mode.

8.4. Macro arguments

According to the ISO C standard, section 6.8.3, if a macro argument contains a sequence of preprocessing tokens that would otherwise act as a preprocessing directive, the behaviour is undefined. Tchk allows preprocessing directives in macro arguments by default. The check to detect such macro arguments is controlled by:

#pragma TenDRA directive as macro argument permit

where permit is allow, warning or disallow.

The ISO C standard, section 6.8.3.2, also states that each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list. By default, if tdfc2 encounters a # in a function-like macro replacement list which is not followed by a parameter of the macro an error is raised. The checker's behaviour in this situation is controlled by:

#pragma TenDRA no ident after # permit

where the options for permit are allow (do not raise errors), disallow (default mode) and warning (raise warnings instead of errors).

8.5. Unmatched quotes

The ISO C standard, section 6.1, states that if a ' or " character matches the category of preprocessing tokens described as single non-whitespace-characters that do not lexically match the other preprocessing token categories, then the behaviour is undefined. For example:

#define a 'b

would result in undefined behaviour. By default the ' character is ignored by tdfc2. A check to detect such statements may be controlled by:

#pragma TenDRA unmatched quote permit

The usual allow, warning and disallow options are available.

8.6. Include depth

Most preprocessors set a maximum depth for #include directives (which may be limited by the maximum number of files which can be open on the host system). By default, the checker supports a depth equal to this maximum number. However, a smaller maximum depth can be set using:

#pragma TenDRA includes depth n

where n can be any positive integral constant.

8.7. Text after `#endif`

The ISO C standard, section 6.8, specifies that #endif and #else preprocessor directives do not take any arguments, but should be followed by a newline. In the default checking mode, tdfc2 raises an error when #endif or #else statements are not directly followed by a new line. This behaviour may be modified using:

#pragma TenDRA text after directive permit

where permit is allow (no errors are raised and any text on the same line as the #endif or #else statement is ignored), warning or disallow.

8.8. Text after `#`

The ISO C standard specifies that a # occuring outside of a macro replacement list must be followed by a new line or by a preprocessing directive and this is enforced by the checker in default mode. The check is controlled by:

#pragma TenDRA no directive/nline after ident permit

where permit may be allow, disallow or warning.

8.9. New line at end of file

The ISO C standard, section 5.1.1.2, states that source files must end with new lines. Files which do not end in new lines are flagged as errors by the checker in default mode. The behaviour can be modified using:

#pragma TenDRA no nline after file end permit

where permit has the usual allow, disallow and warning options.

8.10. Conditional Compilation

Tchk generally treats conditional compilation in the same way as other compilers and checkers. For example, consider:

#if expr
.... /* First branch */
#else
.... /* Second branch */
#endif

the expression, expr, is evaluated: if it is non-zero the first branch of the conditional is processed; if it is zero the second branch is processed instead.

Sometimes, however, tdfc2 may be unable to evaluate the expression statically because of the abstract types and expressions which arise from the minimum integer range assumptions or the abstract standard headers used by the tool (see target-dependent types in section 4.5). For example, consider the following ISO compliant program:

#include <stdio.h>
#include <limits.h>
int main ()
{
#if ( CHAR_MIN == 0 )
	puts ("char is unsigned");
#else
	puts ("char is signed");
#endif
	return ( 0 );
}

The TenDRA representation of the ISO API merely states that CHAR_MIN - the least value which fits into a char - is a target dependent integral constant. Hence, whether or not it equals zero is again target dependent, so the checker needs to maintain both branches. By contrast, any conventional compiler is compiling to a particular target machine on which CHAR_MIN is a specific integral constant. It can therefore always determine which branch of the conditional it should compile.

In order to allow both branches to be maintained in these cases, it has been necessary for tdfc2 to impose certain restrictions on the form of the conditional branches and the positions in which such target-dependent conditionals may occur. These may be summarised as:

Target-dependent conditionals may not appear at the outer level. If the checker encounters a target-dependent conditional at the outer level an error is produced. In order to continue checking in the rest of the file an arbitrary assumption must be made about which branch of the conditional to process; tdfc2 assumes that the conditional is true and the first branch is used;
The branches of allowable target-dependent conditionals may not contain declarations or definitions.

8.11. Target dependent conditional inclusion

One of the effects of trying to compile code in a target independent manner is that it is not always possible to completely evaluate the condition in a #if directive. Thus the conditional inclusion needs to be preserved until the installer phase. This can only be done if the target dependent #if is more structured than is normally required for preprocessing directives. There are two cases; in the first, where the #if appears in a statement, it is treated as if it were a if statement with braces including its branches; that is:

#if cond
	true_statements
#else
	false_statements
#endif

maps to:

if ( cond ) {
	true_statements
} else {
	false_statements
}

In the second case, where the #if appears in a list of declarations, normally gives an error. The can however be overridden by the directive:

#pragma TenDRA++ conditional declaration allow

which causes both branches of the #if to be analysed.

8.12. Unused headers

Header files which are included but from which nothing is used within the other source files comprising the translation unit, might just as well not have been included. Tchk can detect top level include files which are unnecessary, by analysing the tdfc2dump output for the file. This check is enabled by passing the -Wd,-H command line flag to tcc. Errors are written to stderr in a simple ascii form by default, or to the unified dump file in dump format if the -D command line option is used.

9. API checking

9.1. Including headers
9.2. Specifying APIs to tcc
9.3. API Checking Examples
9.4. Redeclaring Objects in APIs
9.5. Defining Objects in APIs
9.6. Stepping Outside an API
9.7. Using the System Headers
9.8. API usage analysis

9.1. Including headers

The token syntax described in the previous annex provides the means of describing an API specification independently of any particular implementation of the API. Every object in the API specification is described using the appropriate #pragma token statement. These statements are arranged in TenDRA header files corresponding to the headers comprising the API. Each API consists of a separate set of header files. For example, if the ANSI C89 API is used, the statement:

#include <sys/types.h>

will lead to a header not found error, whereas the header will be found in the POSIX API.

Where relationships exist between APIs these have been made explicit in the headers. For example, the POSIX version of stdio.h consists of the ANSI version plus some extra objects. This is implemented by making the TenDRA header describing the POSIX version of stdio.h include the ANSI C89 version of stdio.h.

9.2. Specifying APIs to tcc

The API against which a program is to be checked is specified to tcc by means of a command-line option of the form -Yapi where api is the API name. For example, ANSI X3.159 is specified by -Yc89 (this is the default API) and POSIX 1003.1 is specified by -Yposix (for a full list of the supported APIs see Chapter 2).

Extension APIs, such as X11, require special attention. The API for a program is never just X11, but X11 plus some base API, for example, X11 plus POSIX or X11 plus XPG3. These composite APIs may be specified by, for example, passing the options -Yposix -Yx5_lib (in that order) to tcc to specify POSIX 1003.1 plus X11 (Release 5) Xlib. The rule is that base APIs, such as POSIX, override the existing API, whereas extension APIs, such as X11, extend it. The command-line option -info causes tcc to print the API currently in use. For example:

% tcc -Yposix -Yx5_lib -info file.c

will result in the message:

tcc: Information: API is X11 Release 5 Xlib plus POSIX (1003.1).

9.3. API Checking Examples

As an example of the TenDRA compiler's API checking capacities, consider the following program which prints the names and inode numbers of all the files in the current directory:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>

int main ()
{
	DIR *d = opendir ( "." );
	struct dirent *e;
	if ( d = NULL ) return ( 1 );
	while ( e = readdir(d), e != NULL ) 
	{
		printf ( "%s %lu\n", e->d_name, e->d_ino );
	}
	closedir ( d );
	return ( 0 );
}

A first attempted compilation using strict checking:

% tcc -Xs a.c

results in messages to the effect that the headers <sys/types.h> and <dirent.h> cannot be found, plus a number of consequential errors. This is because tcc is checking the program against the default API, that is against the ANSI API, and the program is certainly not ANSI compliant. It does look as if it might be POSIX compliant however, so a second attempted compilation might be:

% tcc -Xs -Yposix a.c

This results in one error and three warnings. Dealing with the warnings first, the returns of the calls of printf and closedir are being discarded and the variable d has been set and not used. The discarded function returns are deliberate, so they can be made explicit by casting them to void. The discarded assignment to d requires a little more thought - it is due to the mistyping d = NULL instead of d == NULL on line 9. The error is more interesting. In full the error message reads:

"a.c":11 printf ( "%s %lu\n", e->d_name, e->d_ino!!!! ); Error:ISO[6.5.2.1]{ANSI[3.5.2.1]}: The identifier 'd_ino' is not a member of 'struct/union posix.dirent.dirent'. ISO[6.3.2.3]{ANSI[3.3.2.3]}: The second operand of '->' must be a member of the struct/union pointed to by the first.

That is, struct dirent does not have a field called d_ino. In fact this is true; while the d_name field of struct dirent is specified in POSIX, the d_ino field is an XPG3 extension (This example shows that the TenDRA representation of APIs is able to differentiate between APIs at a very fine level). Therefore a third attempted compilation might be:

% tcc -Xs -Yxpg3 a.c

This leads to another error message concerning the printf statement, that the types unsigned long and (the promotion of) ino_t are incompatible. This is due to a mismatch between the printf format string %lu and the type of e->d_ino. POSIX only says that ino_t is an arithmetic type, not a specific type like unsigned long. The TenDRA representation of POSIX reflects this abstract nature of ino_t, so that the potential portability error is detected. In fact it is impossible to give a printf string which works for all possible implementations of ino_t. The best that can be done is to cast e->d_ino to some fixed type like unsigned long and print that.

Hence the corrected, XPG3 conformant program reads:

#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>

int main ()
{
	DIR *d = opendir ( "." );
	struct dirent *e;
	if ( d == NULL ) return ( 1 );
	while ( e = readdir(d), e != NULL )
	{
		( void ) printf ( "%s %lu\n", e->d_name, ( unsigned long ) e->d_ino );
	}
	( void ) closedir ( d );
	return ( 0 );
}

9.4. Redeclaring Objects in APIs

Of course, it is possible to redeclare the functions declared in the TenDRA API descriptions within the program, provided they are consistent. However, what constitutes a consistent redeclaration in the fully abstract TenDRA machine is not as straightforward as it might seem; an interesting example is malloc in the ANSI API. This is defined by the prototype:

void *malloc ( size_t );

where size_t is a target dependent unsigned integral type. The redeclaration:

void *malloc ();

is only correct if size_t is its own integral promotion, and therefore is not correct in general.

Since it is not always desirable to remove these redeclarations (some machines may not have all the necessary functions declared in their system headers) the TenDRA compiler has a facility to accept inconsistent redeclarations of API functions which can be enabled by using the pragma:

#pragma TenDRA incompatible interface declaration allow

This pragma supresses the consistency checking of re-declarations of API functions. Replacing allow by warning causes a warning to be printed. In both cases the TenDRA API description of the function takes precedence. The normal behaviour of flagging inconsistent redeclarations as errors can be restored by replacing allow by disallow in the pragma above. (There are also equivalent command-line options to tcc of the form -X:interface_decl=status, where status can be check, warn or dont.)

9.5. Defining Objects in APIs

Since the program API is meant to define the interface between what the program defines and what the target machine defines, the TenDRA compiler normally raises an error if any attempt is made to define an object from the API in the program itself. A subtle example of this is given by compiling the program:

#include <errno.h>
extern int errno;

with the ANSI API. ANSI states that errno is an assignable lvalue of type int, and the TenDRA description of the API therefore states precisely that. The declaration of errno as an extern int is therefore an inconsistent specification of errno, but a consistent implementation. Accepting the lesser of two evils, the error reported is therefore that an attempt has been made to define errno despite the fact that it is part of the API.

Note that if this same program is compiled using the POSIX API, in which errno is explicitly specified to be an extern int, the program merely contains a consistent redeclaration of errno and so does not raise an error.

The neatest workaround for the ANSI case, which preserves the declaration for those machines which need it, is as follows: if errno is anything other than an extern int it must be defined by a macro. Therefore:

#include <errno.h>
#ifndef errno
extern int errno;
#endif

should always work.

In most other examples, the definitions are more obvious. For example, a programmer might provide a memory allocator containing versions of malloc, free etc.:

#include <stdlib.h>

void *malloc ( size_t sz )
{
	....
}

void free ( void *ptr )
{
	....
}

If this is deliberate then the TenDRA compiler needs to be told to ignore the API definitions of these objects and to use those provided instead. This is done by listing the objects to be ignored using the pragma:

#pragma ignore malloc free ....

(also see section G.10). This should be placed between the API specification and the object definitions. The provided definitions are checked for conformance with the API specifications. There are special forms of this pragma to enable field selectors and objects in the tag namespace to be defined. For example, if we wish to provide a definition of the type div_t from stdlib.h we need to ignore three objects - the type itself and its two field selectors - quot and rem. The definition would therefore take the form:

#include <stdlib.h>
#pragma ignore div_t div_t.quot div_t.rem
typedef struct {
	int quot;
	int rem;
} div_t;

Similarly if we wish to define struct lconv from locale.h the definition would take the form:

#include <locale.h>
#pragma ignore TAG lconv TAG lconv.decimal_point 
....
struct lconv {
	char *decimal_point;
	....
};

to take into account that lconv lies in the tag name space. By defining objects in the API in this way, we are actually constructing a less general version of the API. This will potentially restrict the portability of the resultant program, and so should not be done without good reason.

9.6. Stepping Outside an API

Using the TenDRA compiler to check a program against a standard API will only be effective if the appropriate API description is available to the program being tested (just as a program can only be compiled on a conventional machine if the program API is implemented on that machine). What can be done for a program whose API are not supported depends on the degree to which the program API differs from an existing TenDRA API description. If the program API is POSIX with a small extension, say, then it may be possible to express that extension to the TenDRA compiler. For large unsupported program APIs it may be possible to use the system headers on a particular machine to allow for partial program checking (see section H.7).

For small API extensions the ideal method would be to use the token syntax described in Annex G to express the program API to the TenDRA compiler, however this is not currently encouraged because the syntax of such API descriptions is not yet firmly fixed. For the time being it may be possible to use C to express much of the information the TenDRA compiler needs to check the program. For example, POSIX specifies that sys/stat.h contains a number of macros, S_ISDIR, S_ISREG, and so on, which are used to test whether a file is a directory, a regular file, etc. Suppose that a program is basically POSIX conformant, but uses the additional macro S_ISLNK to test whether the file is a symbolic link (this is in COSE and AES, but not POSIX). A proper TenDRA description of S_ISLNK would contain the information that it was a macro taking a mode_t and returning an int, however for checking purposes it is sufficient to merely give the types. This can be done by pretending that S_ISLNK is a function:

#ifdef __TenDRA__
/* For TenDRA checking purposes only */
extern int S_ISLNK ( mode_t ); 
/* actually a macro */
#endif

More complex examples might require an object in the API to be defined in order to provide more information about it (see H.5). For example, suppose that a program is basically ANSI compliant, but assumes that FILE is a structure with a field file_no of type int (representing the file number), rather than a generic type. This might be expressed by:

#ifdef __TenDRA__
/* For TenDRA checking purposes only */
#pragma ignore FILE
typedef struct {
	/* there may be other fields here */
	int file_no;
	/* there may be other fields here */
} FILE;
#endif

The methods of API description above are what might be called example implementations rather than the abstract implementations of the actual TenDRA API descriptions. They should only be used as a last resort, when there is no alternative way of expressing the program within a standard API. For example, there may be no need to access the file_no field of a FILE directly, since POSIX provides a function, fileno, for this purpose. Extending an API in general reduces the number of potential target machines for the corresponding program.

9.7. Using the System Headers

One possibility if a program API is not supported by the TenDRA compiler is to use the set of system headers on the particular machine on which tcc happens to be running. Of course, this means that the API checking facilities of the TenDRA compiler will not be effective, but it is possible that the other program checking aspects will be of use.

The system headers are not, and indeed are not intended to be, portable. A simple-minded approach to portability checking with the system headers could lead to more portability problems being found in the system headers than in the program itself. A more sophisticated approach involves applying different compilation modes to the system headers and to the program. The program itself can be checked very rigorously, while the system headers have very lax checks applied.

This could be done directly, by putting a wrapper around each system header describing the mode to be applied to that header. However the mechanism of named compilation modes (see 2.2) provides an alternative solution. In addition to the normal -Idir command-line option, tcc also supports the option -Nname:dir, which is identical except that it also associates the identifier name with the directory dir. Once a directory has been named in this way, the name can be used in a directive:

#pragma TenDRA directory name use environment mode

which tells tcc to apply the named compilation mode, mode, to any files included from the directory, name. This is the mechanism used to specify the checks to be applied to the system headers.

The system headers may be specified to tcc using the -Ysystem command-line option. This specifies /usr/include as the directory to search for headers and passes a system start-up file to tcc. This system start-up file contains any macro definitions which are necessary for tcc to navigate the system headers correctly, plus a description of the compilation mode to be used in compiling the system headers.

In fact, before searching /usr/include, tcc searches another directory for system headers. This is intended to hold modified versions of any system headers which cause particular problems or require extra information. For example:

A version of stdio.h is provided for all systems, which contains the declarations of printf and similar functions necessary for tcc to apply its printf-string checks (see 3.3.2).
A version of stdlib.h is provided for all systems which includes the declarations of exit and similar functions necessary for tcc to apply its flow analysis correctly (see 5.7).
Versions of stdarg.h and varargs.h are provided for all systems which work with tcc. Most system headers contain built-in functions which are recognised by cc (but not tcc) to deal with these.

The user can also use this directory to modify any system headers which cause problems. For example, not all system headers declare all the functions they should, so it might be desirable to add these declarations.

It should be noted that the system headers and the TenDRA API headers do not mix well. Both are parts of coherent systems of header files, and unless the intersection is very small, it is not usually possible to combine parts of these systems sensibly.

Even a separation, such as compiling some modules of a program using a TenDRA API description and others using the system headers, can lead to problems in the intermodular linking phase (see Chapter 9). There will almost certainly be type inconsistency errors since the TenDRA headers and the system headers will have different representations of the same object.

9.8. API usage analysis

The abstract standard headers provided with the tool are the basis for the API usage analysis checking on dump files described in Chapter 9. The declarations in each abstract header file are enclosed by the following pragmas:

#pragma TenDRA declaration block API_name begin
#pragma TenDRA declaration block end

API_name has a standard form e.g. api__ansi__stdio for stdio.h in the ANSI API.

This information is output in the dump format as the start and end of a header scope, i.e.

SSH		 position				ref_no = <API_name>
SEH		 position				ref_no

The first occurence of each identifier in the dump output contains scope information; in the case of an identifier declared in the abstract headers, this scope information will normally refer to a header scope. Since each use of the identifier can be traced back to its declaration, this provides a means of tracking API usage within the application when the abstract headers are used. The disadvantages of this method are that only APIs for which abstract headers are available can be used. Objects which are not part of the standard APIs are not available and if an application requires such an identifier (or indeed attempts to use a standard API identifier for which the appropriate header has not been included) the resulting errors may distort or even completely halt the dump output resulting in incomplete or incorrect analysis.

The second method of API analysis allows compilation of the application against the system headers, thereby overcoming the problems of non-standard API usage mentioned above. The dump of the application can be scanned to determine the identifiers which are used but not defined within the application itself. These identifiers form the program's external API with the system headers and libraries, and can be compared with API reference information, provided by dump output files produced from the abstract standard headers, to determine the applications API usage.

Analysis performed on the set of dump files produced for an entire application can detect the objects, types, etc. from external APIs which are used by the application. The API usage analysis is enabled by passing one or more -api_checkAPI flags to tcc where API may be any of the standard APIs listed in section 2.1. The -api_check_outFILE flag may be used to direct the API analysis information to the file FILE (by default it is written to stdout). The APIs used to perform API usage analysis may be different from those used to process the application. Annex G.8 contains details of the methods used to perform the API usage analysis.

10. Intermodular analysis

10.1. Linking symbol table dump files
10.2. Linking C++ spec files
10.3. Template compilation

All the checks discussed in earlier chapters have been concerned with a single source file. However, tcc also contains a linking phase in which it is able to perform intermodular checks (i.e. checks between source files). In the linking phase, the files generated from each translation unit processed are combined into a single file containing information on all external objects within the application, and type consistency checks are applied to ensure that the definitions and declarations of each object are consistent and external objects and functions have at most one definition.

There are two types of file provided by tdfc2 for analysis; symbol table dump files and C++ spec files.

10.1. Linking symbol table dump files

The amount of information about an object stored in a dump file depends on the compilation mode used to produce that file. For example, if extra prototype checks are enabled (see section 3.3), the dump file contains any information inferred about a function from its traditional style definition or from applications of that function. For example, if one file contains:

extern void f () ;
void g ()
{
	f ( "hello" ) ;
}

and another contained:

void f ( n )
int n ;
{
	return ;
}

then the inferred prototype:

void f WEAK ( char * ) ;

from the call of f would be included in the first dump file, whereas the weak prototype deduced from the definition of f:

void f WEAK ( int ) ;

would be included in the second. When these two dump files are linked, the inconsistency is discovered and an error is reported.

10.2. Linking C++ spec files

The overall compilation scheme controlled by tcc, as it relates to the C++ producer, can be represented as follows:

Figure 2. Compilation Scheme

Each C++ source file, a.cc say, is processed using tcpplus to give an output TDF capsule, a.j, which is passed to the installer phase of tcc. The capsule is linked with any target dependent token definition libraries, translated to assembler and assembled to give a binary object file, a.o. The various object files comprising the program are then linked with the system libraries to give a final executable, a.out.

A C++ spec file is a dump of the C++ producer's internal representation of a translation unit. Such files can be written to, and read from, disk to perform such operations as intermodule analysis.

In addition to this main compilation scheme, tcpplus can additionally be made to output a C++ spec file for each C++ source file, a.K say. These C++ spec files can be linked, using tcpplus in its spec linker mode, to give an additional TDF capsule, x.j say, and a combined C++ spec file, x.K. The main purpose of this C++ spec linking is to perform intermodule checks on the program, however in the course of this checking exported templates which are defined in one module and used in another are instantiated. This extra code is output to x.j, which is then installed and linked in the normal way.

Note that intermodule checks, and hence intermodule template instantiations, are only performed if the -im option is passed to tcc.

The TenDRA checker is similar to the compiler except that it disables TDF output and has intermodule analysis enabled by default.

The C++ spec linking routines have not yet been completely implemented, and so are disabled in the current version of the C++ producer.

Note that the format of a C++ spec file is specific to the C++ producer and may change between releases to reflect modifications in the internal type system. The C producer has a similar dump format, called a C spec file, however the two are incompatible. If intermodule analysis between C and C++ source files is required then the tdfc2dump symbol table dump format should be used.

10.3. Template compilation

The C++ producer makes the distinction between exported templates, which may be used in one module and defined in another, and non-exported templates, which must be defined in every module in which they are used. As in the ISO C++ standard, the export keyword is used to distinguish between the two cases. In the past, different compilers have had different template compilation models; either all templates were exported or no templates were exported. The latter is easily emulated - if the export keyword is not used then no templates will be exported. To emulate the former behaviour the directive:

#pragma TenDRA++ implicit export template on

can be used to treat all templates as if they had been declared using the export keyword.

The automatic instantiation of exported templates has not yet been implemented correctly. It is intended that such instantiations will be generated during intermodule analysis (where they conceptually belong). At present it is necessary to work round this using explicit instantiations.

C/C++ Checker Reference Manual

Revision History

i. Introduction

1. Configuring the Checker

1.1. Individual command line checking options

1.2. Customising checking profiles

1.3. Scoping checking profiles

1.4. Other checks

2. Integral Types

2.1. Integer promotion rules

2.2. Arithmetic operations on integer types

2.3. Interaction with the integer conversion checks

2.4. Target dependent integral types

2.4.1. Integer literals

2.4.2. Abstract API types

2.5. Integer overflow checks

2.6. Integer operator checks

2.7. Support for 64 bit integer types (long long)

3. Type Checking

3.1. Type specifications

3.1.1. Incompatible type qualifiers

3.1.2. Elaborated type specifiers

3.1.3. Incomplete structures and unions

3.2. Type conversions

3.2.1. Integer to integer conversions

3.2.2. Pointer to integer and integer to pointer conversions

3.2.3. Pointer to pointer conversions

3.2.4. Additional conversions

3.2.5. Example: 64-bit portability issues

3.3. Function type checking

3.3.1. Type checking non-prototyped functions

3.3.1.1. Example 1: An obscure type mismatch

3.3.1.2. Example 2: Weak prototype checks in defined programs

3.3.2. Weak function prototypes

3.3.3. printf and scanf argument checking

3.3.4. Checking printf strings

3.3.5. Function return checking

3.3.6. Overloaded functions

3.4. Overriding type checking

3.4.1. Implicit Function Declarations

3.4.2. Function Parameters

3.4.3. Incompatible promoted function arguments

4. Control Flow Analysis

4.1. Unreachable code analysis

4.2. Case fall through

4.3. Enumerations controlling switch statements

4.4. Empty if statements

4.5. Use of assignments as control expressions

4.6. Constant control expressions

4.7. Conditional and iteration statements

4.8. Exception analysis

5. Operator Analysis

5.1. Order of evaluation

5.2. Operator precedence

5.3. Floating point equality

5.4. Operand of sizeof

6. Variable Analysis

6.1. Variable lifetime analysis

6.2. 5.6.2 Modification between sequence points

6.3. Unused variables

6.4. Values set and not used

6.5. Variable which has not been set is used

6.6. Variable shadowing

6.7. Overriding the variable analysis

6.7.1. 5.7.1 Discarding variables

6.7.2. Setting variables

6.7.3. Exhaustive switch statements

6.7.4. Switch statements

6.7.5. Non-returning functions

6.7.6. Return statements

7. Discard Analysis

7.1. Discarded function returns

7.2. Discarded computed values

7.3. Unused static variables and procedures

7.4. Discarded expressions

7.5. Overriding the discard analysis

7.5.1. Discarding function returns and computed values

7.5.2. Preserving unused statics

8. Preprocessing checks

8.1. Preprocessor directives

2.7. Support for 64 bit integer types (`long long`)

3.3.3. `printf` and `scanf` argument checking

5.4. Operand of `sizeof`

8.7. Text after `#endif`

8.8. Text after `#`