2. Integral Types
- 2.1. Integer promotion rules
- 2.2. Arithmetic operations on integer types
- 2.3. Interaction with the integer conversion checks
- 2.4. Target dependent integral types
- 2.5. Integer overflow checks
- 2.6. Integer operator checks
- 2.7. Support for 64 bit integer types (
long long
)
The checks described in the previous chapter involved the detection of conversions which could result in undefined values. Certain conversions involving integral types, however, are defined in the ISO C standard and so might be considered safe and unlikely to cause problems. This unfortunately is not the case: some of these conversions may still result in a change in value; the actual size of each integral type is implementation-dependent; and the "old-style" integer conversion rules which predate the ISO standard are still in common use. The checker provides support for both ISO and traditional integer promotion rules. The set of rules used may be specified independently of the two integral range scenarios, 16 bit(default) and 32 bit, described in section 2.1.2.
The means of specifying and alternative sets of promotion rules, their interaction with the conversion checks described in section 3.2 and the additional checks which may be performed on integers and integer operations are described in the remainder of this chapter.
2.1. Integer promotion rules
The ISO C standard rules may be summarised as follows: long
integral types promote to themselves; other integral types promote to whichever of int
or unsigned int
they fit into. In full the promotions are:
Note that even with these simple built-in types, there is a degree of uncertainty, namely concerning the promotion of unsigned short
. On most machines, int
is strictly larger than short
so the promotion of unsigned short
is int
. However, it is possible for short
and int
to have the same size, in which case the promotion is unsigned int
. When using the ISO C promotion rules, the checker usually avoids making assumptions about the implementation by treating the promotion of unsigned short
as an abstract integral type. If, however, the -Y32bit
option is specified, int
is assumed to be strictly larger than short
, and unsigned short
promotes to int
.
The traditional C integer promotion rules are often referred to as the signed promotion rules. Under these rules, long integral types promote to themselves, as in ISO C, but the other integral types promote to unsigned int
if they are qualified by unsigned
, and int
otherwise. Thus the signed promotion rules may be represented as follows:
The traditional promotion rules are applied in the Xt
built-in environment only. All of the other built-in environments specify the ISO C promotion rules. Users may also specify their own rules for integer promotions and minimum integer ranges; the methods for doing this are described in Annex H.
2.2. Arithmetic operations on integer types
The ISO C standard rules for calculating the type of an arithmetic operation involving two integer types is as follows - work out the integer promotions of the types of the two operands, then:
-
If either promoted type is
unsigned long
, the result type isunsigned long
; -
Otherwise, if one promoted type is
long
and the other isunsigned int
, then if along int
can represent all values of anunsigned int
, the result type islong
; otherwise the result type isunsigned long
; -
Otherwise, if either promoted type is
long
, the result type islong
; -
Otherwise, if either promoted type is
unsigned int
, the result type isunsigned int
; -
Otherwise the result type is
int
.
Both promoted values are converted to the result type, and the operation is then applied.
2.3. Interaction with the integer conversion checks
A simple-minded implementation of the integer conversion checks described in 3.2 would interact badly with these rules. Consider, for example, adding two values of type char:
char f ( char a, char b ) { char c = a + b ; return ( c ) ; }
The various stages in the calculation of c
are as follows - a
and b
are converted to their promotion type, int
, added together to give an int
result, which is converted to a char
and assigned to c
. The conversions of a
and b
from char
to int
are always safe, and so present no difficulties to the integer conversion checks. The conversion of the result from int
to char
, however, is precisely the type of value destroying conversion which these checks are designed to detect.
Obviously, an integer conversion check which flagged all char arithmetic would never be used, thereby losing the potential to detect many subtle portability errors. For this reason, the integer conversion checks are more sophisticated. In all typed languages, the type is used for two purposes - for static type checking and for expressing information about the actual representation of data on the target machine. Essentially it is a confusion between these two roles which leads to the problems above. The C promotion and arithmetic rules are concerned with how data is represented and manipulated, rather than the underlying abstract types of this data. When a
and b
are promoted to int
prior to being added together, this is only a change in representation; at the conceptual level they are still char
's. Again, when they are added, the result may be represented as an int
, but conceptually it is a char
. Thus the assignment to c
, an actual char
, is just a change in representation, not a change in conceptual type.
So each expression may be regarded as having two types - a conceptual type which stands for what the expression means, and a representational type which stands for how the expression is to represented as data on the target machine. In the vast majority of expressions, these types coincide, however the integral promotion and arithmetic conversions are changes of representational, not conceptual, types. The integer conversion checks are concerned with detecting changes of conceptual type, since it is these which are most likely to be due to actual programming errors.
It is possible to define integral types within the TenDRA extensions to C in which the split between concept and representation is made explicit. The pragma:
#pragma TenDRA keyword TYPE for type representation
may be used to introduce a keyword TYPE for this purpose (as with all such pragmas, the precise keyword to be used is left to the user). Once this has been done, TYPE ( r, t )
may be used to represent a type which is conceptually of type t
but is represented as data like type r
. Both t
and r
must be integral types. For example:
TYPE ( int, char ) a ;
declares a variable a which is represented as an int, but is conceptually a char.
In order to maintain compatibility with other compilers, it is necessary to give TYPE
a sensible alternative definition. For all but conversion checking purposes, TYPE ( r, t )
is identical to r
, so a suitable definition is:
#ifdef __TenDRA__ #pragma TenDRA keyword TYPE for type representation #else #define TYPE( r, t ) r #endif
2.4. Target dependent integral types
Since the checker uses only information about the minimum guaranteed ranges of integral types, integer values for which the actual type of the values is unknown may arise. Integer values of undetermined type generally arise in one of two ways: through the use of integer literals and from API types which are not completely specified.
2.4.1. Integer literals
The ISO C rules on the type of integer literals are set out as follows. For each class of integer literals a list of types is given. The type of an integer literal is then the first type in the appropriate list which is large enough to contain the value of the integer literal. The class of the integer literal depends on whether it is decimal, hexadecimal or octal, and whether it is qualified by U
(or u
) or L
(or l
) or both. The rules may be summarised as follows:
These rules are applied in all the built-in checking modes except Xt
. Traditional C does not have the U
and L
qualifiers, so if the Xt
mode is used, these qualifiers are ignored and all integer literals are treated as int
, long
or unsigned long
, depending on the size of the number.
If a number fits into the minimal range for the first type of the appropriate list, then it is of that type; otherwise its type is undetermined and is said to be target dependent. The checker treats target dependent types as abstract integral types which may lead to integer conversion problems. For example, in:
int f ( int n ) { return ( n & 0xff00 ) ; }
the type of 0xff00
is target dependent, since it does not fit into the minimal range for int specified by the ISO C standard (this is detected by the integer overflow analysis described in section 4.6). The arithmetic conversions resulting from the &
operation is detected by the checker's conversion analysis. Note that if the -Y32bit
option is specified to tcc, an int
is assumed to contain at least 32 bits. In this case, 0xff00
fits into the type int, and so this is the type of the integer literal. No invalid integer conversions is then detected.
2.4.2. Abstract API types
Target dependent integral types also occur in API specifications and may be encountered when checking against one of the implementation-independent APIs provided with the checker. The commonest example of this is size_t, which is stated by the ISO C standard to be a target dependent unsigned integral type, and which arises naturally within the language as the type of a sizeof
expression.
The checker has its own internal version of size_t
, wchar_t
and ptrdiff_t
for evaluating static compile-time expressions. These internal types are compatible with the ISO C specification of size_t
, wchar_t
and ptrdiff_t
, and thus are compatible with any conforming definitions of these types found in included files. However, when checking the following program against the system headers, a warning is produced on some machines concerning the implicit conversion of an unsigned int
to type size_t
:
#include <stdlib.h> int main() { size_t size ; size = sizeof( int ) ; }
The system header on the machine in question actually defines size_t
to be a signed int
(this of course contravenes the ISO C standard) but the compile time function sizeof
returns the checker's internal version of size_t
which is an abstract unsigned integral type. By using the pragma:
#pragma TenDRA set size_t:signed int
the checker can be instructed to use a different internal definition of size_t
when evaluating the sizeof
function and the error does not arise. Equivalent options are also available for the ptrdiff_t
and wchar_t
types.
2.5. Integer overflow checks
Given the complexity of the rules governing the types of integers and results of integer operations, as well as the variation of integral ranges with machine architecture, it is hardly surprising that unexpected results of integer operations are at the root of many programming problems. These problems can often be hard to track down and may suddenly appear in an application which was previously considered safe
, when it is moved to a new system. Since the checker supports the concept of a guaranteed minimum size of an integer it is able to detect many potential problems involving integer constants. The pragma:
#pragma TenDRA integer overflow analysis status
where status is on
, warning
or off
, controls a set of checks on arithmetic expressions involving integer constants. These checks cover overflow, use of constants exceeding the minimum guaranteed size for their type and division by zero. They are not enabled in the default mode.
There are two special cases of integer overflow for which checking is controlled seperately:
-
Bitfield sizes
. Obviously, the size of a bitfield must be smaller than or equal to the minimum size of its integral type. A bitfield which is too large is flagged as an error in the default mode. The check on bitfield sizes is controlled by:#pragma TenDRA bitfield overflow permit
where permit is one of
allow
,disallow
orwarning
. -
Octal and hexadecimal escape sequences
. According to the ISO C standard, the value of an octal or hexadecimal escape sequence shall be in the range of representable values for the type unsigned char for an integer character constant, or the unsigned type corresponding towchar_t
for a wide character constant. The check on escape sequence sizes is controlled by:#pragma TenDRA character escape overflow permit
where the options for permit are
allow
,warning
anddisallow
. The check is switched on by default.
2.6. Integer operator checks
The results of some integer operations are undefined by the ISO C standard for certain argument types. Others are implementation-defined or simply likely to produce unexpected results.In the default mode such operations are processed silently, however a set of checks on operations involving integer constants may be controlled using:
#pragma TenDRA integer operator analysis status
where status is replaced by on
, warning
or off
. This pragma enabled checks on:
-
shift operations where an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted;
-
right shift operation with a negative value of signed integral type as the first argument;
-
division operation with a negative operand;
-
test for an unsigned value strictly greater than or less than 0 (these are always true or false respectively);
-
conversion of a negative constant value to an unsigned type;
-
application of unary
-
operator to an unsigned value.
2.7. Support for 64 bit integer types (long long
)
Although the use of long long
to specify a 64 bit integer type is not supported by the ISO C90 standard it is becoming increasingly popular as in programming use. By default, tcc does not support the use of long long
but the checker can be configured to support the long long
type to different degrees using the following pragmas:
#pragma TenDRA longlong type permit
where permit is one of allow
(long long
type accepted), disallow
(errors produced when long long
types are detected) or warning
(long long
types are accepted but a warning is raised).
#pragma TenDRA set longlong type : type_name
where type_name is long
or long long
.
The first pragma determines the behaviour of the checker if the type long long
is encountered as a type specifier. In the disallow case, an error is raised and the type specifier mapped to long
, otherwise the type is stored as long long
although a message alerting the user to the use of long long
is raised in the warning mode. The second pragma determines the semantics of long long
. If the type specified is long long
, then long long
is treated as a separate integer type and if code generation is enabled, long long
types appears in the output. Otherwise the type is mapped to long
and all objects declared long long
are output as if they had been declared long
(a warning is produced when this occurs). In either case, long long
is treated as a distinct integer type for the purpose of integer conversion checking.
Extensions to the integer promotion and arithmetic conversion rules are required for the long long
type. These have been implemented as follows:
-
the types of integer arithmetic operations where neither argument has
long long
type are unaffected; -
long long
andunsigned long long
both promote to themselves; -
the result type of arithmetic operations with one or more arguments of type
unsigned long long
isunsigned long long
; -
otherwise if either argument has type
signed long long
the overall type islong long
if both arguments can be represented in this form, otherwise the type isunsigned long long
.
There are now three cases where the type of an integer arithmetic operation is not completely determined from the type of its arguments, i.e.