Integral Types

2. Integral Types

2.1. Integer promotion rules
2.2. Arithmetic operations on integer types
2.3. Interaction with the integer conversion checks
2.4. Target dependent integral types
1. 2.4.1. Integer literals
2. 2.4.2. Abstract API types
2.5. Integer overflow checks
2.6. Integer operator checks
2.7. Support for 64 bit integer types (long long)

The checks described in the previous chapter involved the detection of conversions which could result in undefined values. Certain conversions involving integral types, however, are defined in the ISO C standard and so might be considered safe and unlikely to cause problems. This unfortunately is not the case: some of these conversions may still result in a change in value; the actual size of each integral type is implementation-dependent; and the "old-style" integer conversion rules which predate the ISO standard are still in common use. The checker provides support for both ISO and traditional integer promotion rules. The set of rules used may be specified independently of the two integral range scenarios, 16 bit(default) and 32 bit, described in section 2.1.2.

The means of specifying and alternative sets of promotion rules, their interaction with the conversion checks described in section 3.2 and the additional checks which may be performed on integers and integer operations are described in the remainder of this chapter.

2.1. Integer promotion rules

The ISO C standard rules may be summarised as follows: long integral types promote to themselves; other integral types promote to whichever of int or unsigned int they fit into. In full the promotions are:

char → int signed char → int unsigned char → int short → int unsigned short → int or unsigned int int → int unsigned int → unsigned int long → long unsigned long → unsigned long

Note that even with these simple built-in types, there is a degree of uncertainty, namely concerning the promotion of unsigned short. On most machines, int is strictly larger than short so the promotion of unsigned short is int. However, it is possible for short and int to have the same size, in which case the promotion is unsigned int. When using the ISO C promotion rules, the checker usually avoids making assumptions about the implementation by treating the promotion of unsigned short as an abstract integral type. If, however, the -Y32bit option is specified, int is assumed to be strictly larger than short, and unsigned short promotes to int.

The traditional C integer promotion rules are often referred to as the signed promotion rules. Under these rules, long integral types promote to themselves, as in ISO C, but the other integral types promote to unsigned int if they are qualified by unsigned, and int otherwise. Thus the signed promotion rules may be represented as follows:

char → int signed char → int unsigned char → unsigned int short → int unsigned short → unsigned int int → int unsigned int → unsigned int long → long unsigned long → unsigned long

The traditional promotion rules are applied in the Xt built-in environment only. All of the other built-in environments specify the ISO C promotion rules. Users may also specify their own rules for integer promotions and minimum integer ranges; the methods for doing this are described in Annex H.

2.2. Arithmetic operations on integer types

The ISO C standard rules for calculating the type of an arithmetic operation involving two integer types is as follows - work out the integer promotions of the types of the two operands, then:

If either promoted type is unsigned long, the result type is unsigned long;
Otherwise, if one promoted type is long and the other is unsigned int, then if a long int can represent all values of an unsigned int, the result type is long; otherwise the result type is unsigned long;
Otherwise, if either promoted type is long, the result type is long;
Otherwise, if either promoted type is unsigned int, the result type is unsigned int;
Otherwise the result type is int.

Both promoted values are converted to the result type, and the operation is then applied.

2.3. Interaction with the integer conversion checks

A simple-minded implementation of the integer conversion checks described in 3.2 would interact badly with these rules. Consider, for example, adding two values of type char:

char f ( char a, char b )
{
	char c = a + b ;
	return ( c ) ;
}

The various stages in the calculation of c are as follows - a and b are converted to their promotion type, int, added together to give an int result, which is converted to a char and assigned to c. The conversions of a and b from char to int are always safe, and so present no difficulties to the integer conversion checks. The conversion of the result from int to char, however, is precisely the type of value destroying conversion which these checks are designed to detect.

Obviously, an integer conversion check which flagged all char arithmetic would never be used, thereby losing the potential to detect many subtle portability errors. For this reason, the integer conversion checks are more sophisticated. In all typed languages, the type is used for two purposes - for static type checking and for expressing information about the actual representation of data on the target machine. Essentially it is a confusion between these two roles which leads to the problems above. The C promotion and arithmetic rules are concerned with how data is represented and manipulated, rather than the underlying abstract types of this data. When a and b are promoted to int prior to being added together, this is only a change in representation; at the conceptual level they are still char's. Again, when they are added, the result may be represented as an int, but conceptually it is a char. Thus the assignment to c, an actual char, is just a change in representation, not a change in conceptual type.

So each expression may be regarded as having two types - a conceptual type which stands for what the expression means, and a representational type which stands for how the expression is to represented as data on the target machine. In the vast majority of expressions, these types coincide, however the integral promotion and arithmetic conversions are changes of representational, not conceptual, types. The integer conversion checks are concerned with detecting changes of conceptual type, since it is these which are most likely to be due to actual programming errors.

It is possible to define integral types within the TenDRA extensions to C in which the split between concept and representation is made explicit. The pragma:

#pragma TenDRA keyword TYPE for type representation

may be used to introduce a keyword TYPE for this purpose (as with all such pragmas, the precise keyword to be used is left to the user). Once this has been done, TYPE ( r, t ) may be used to represent a type which is conceptually of type t but is represented as data like type r. Both t and r must be integral types. For example:

TYPE ( int, char ) a ;

declares a variable a which is represented as an int, but is conceptually a char.

In order to maintain compatibility with other compilers, it is necessary to give TYPE a sensible alternative definition. For all but conversion checking purposes, TYPE ( r, t ) is identical to r, so a suitable definition is:

#ifdef __TenDRA__
#pragma TenDRA keyword TYPE for type representation
#else
#define TYPE( r, t ) r
#endif

2.4. Target dependent integral types

Since the checker uses only information about the minimum guaranteed ranges of integral types, integer values for which the actual type of the values is unknown may arise. Integer values of undetermined type generally arise in one of two ways: through the use of integer literals and from API types which are not completely specified.

2.4.1. Integer literals

The ISO C rules on the type of integer literals are set out as follows. For each class of integer literals a list of types is given. The type of an integer literal is then the first type in the appropriate list which is large enough to contain the value of the integer literal. The class of the integer literal depends on whether it is decimal, hexadecimal or octal, and whether it is qualified by U (or u) or L (or l) or both. The rules may be summarised as follows:

decimal → int or long or unsigned long hex or octal → int or unsigned int or long or unsigned long any + U → unsigned int or unsigned long any + L → long or unsigned long any + UL → unsigned long

These rules are applied in all the built-in checking modes except Xt. Traditional C does not have the U and L qualifiers, so if the Xt mode is used, these qualifiers are ignored and all integer literals are treated as int, long or unsigned long, depending on the size of the number.

If a number fits into the minimal range for the first type of the appropriate list, then it is of that type; otherwise its type is undetermined and is said to be target dependent. The checker treats target dependent types as abstract integral types which may lead to integer conversion problems. For example, in:

int f ( int n ) {
	return ( n & 0xff00 ) ;
}

the type of 0xff00 is target dependent, since it does not fit into the minimal range for int specified by the ISO C standard (this is detected by the integer overflow analysis described in section 4.6). The arithmetic conversions resulting from the & operation is detected by the checker's conversion analysis. Note that if the -Y32bit option is specified to tcc, an int is assumed to contain at least 32 bits. In this case, 0xff00 fits into the type int, and so this is the type of the integer literal. No invalid integer conversions is then detected.

2.4.2. Abstract API types

Target dependent integral types also occur in API specifications and may be encountered when checking against one of the implementation-independent APIs provided with the checker. The commonest example of this is size_t, which is stated by the ISO C standard to be a target dependent unsigned integral type, and which arises naturally within the language as the type of a sizeof expression.

The checker has its own internal version of size_t, wchar_t and ptrdiff_t for evaluating static compile-time expressions. These internal types are compatible with the ISO C specification of size_t, wchar_t and ptrdiff_t, and thus are compatible with any conforming definitions of these types found in included files. However, when checking the following program against the system headers, a warning is produced on some machines concerning the implicit conversion of an unsigned int to type size_t:

#include <stdlib.h>
int main() {
	size_t size ;
	size = sizeof( int ) ;
}

The system header on the machine in question actually defines size_t to be a signed int (this of course contravenes the ISO C standard) but the compile time function sizeof returns the checker's internal version of size_t which is an abstract unsigned integral type. By using the pragma:

#pragma TenDRA set size_t:signed int

the checker can be instructed to use a different internal definition of size_t when evaluating the sizeof function and the error does not arise. Equivalent options are also available for the ptrdiff_t and wchar_t types.

2.5. Integer overflow checks

Given the complexity of the rules governing the types of integers and results of integer operations, as well as the variation of integral ranges with machine architecture, it is hardly surprising that unexpected results of integer operations are at the root of many programming problems. These problems can often be hard to track down and may suddenly appear in an application which was previously considered safe, when it is moved to a new system. Since the checker supports the concept of a guaranteed minimum size of an integer it is able to detect many potential problems involving integer constants. The pragma:

#pragma TenDRA integer overflow analysis status

where status is on, warning or off, controls a set of checks on arithmetic expressions involving integer constants. These checks cover overflow, use of constants exceeding the minimum guaranteed size for their type and division by zero. They are not enabled in the default mode.

There are two special cases of integer overflow for which checking is controlled seperately:

Bitfield sizes. Obviously, the size of a bitfield must be smaller than or equal to the minimum size of its integral type. A bitfield which is too large is flagged as an error in the default mode. The check on bitfield sizes is controlled by:
```
#pragma TenDRA bitfield overflow permit
```
where permit is one of allow, disallow or warning.
Octal and hexadecimal escape sequences. According to the ISO C standard, the value of an octal or hexadecimal escape sequence shall be in the range of representable values for the type unsigned char for an integer character constant, or the unsigned type corresponding to wchar_t for a wide character constant. The check on escape sequence sizes is controlled by:
```
#pragma TenDRA character escape overflow permit
```
where the options for permit are allow, warning and disallow. The check is switched on by default.

2.6. Integer operator checks

The results of some integer operations are undefined by the ISO C standard for certain argument types. Others are implementation-defined or simply likely to produce unexpected results.In the default mode such operations are processed silently, however a set of checks on operations involving integer constants may be controlled using:

#pragma TenDRA integer operator analysis status

where status is replaced by on, warning or off. This pragma enabled checks on:

shift operations where an expression is shifted by a negative number or by an amount greater than or equal to the width in bits of the expression being shifted;
right shift operation with a negative value of signed integral type as the first argument;
division operation with a negative operand;
test for an unsigned value strictly greater than or less than 0 (these are always true or false respectively);
conversion of a negative constant value to an unsigned type;
application of unary - operator to an unsigned value.

2.7. Support for 64 bit integer types (`long long`)

Although the use of long long to specify a 64 bit integer type is not supported by the ISO C90 standard it is becoming increasingly popular as in programming use. By default, tcc does not support the use of long long but the checker can be configured to support the long long type to different degrees using the following pragmas:

#pragma TenDRA longlong type permit

where permit is one of allow (long long type accepted), disallow (errors produced when long long types are detected) or warning (long long types are accepted but a warning is raised).

#pragma TenDRA set longlong type : type_name

where type_name is long or long long.

The first pragma determines the behaviour of the checker if the type long long is encountered as a type specifier. In the disallow case, an error is raised and the type specifier mapped to long, otherwise the type is stored as long long although a message alerting the user to the use of long long is raised in the warning mode. The second pragma determines the semantics of long long. If the type specified is long long, then long long is treated as a separate integer type and if code generation is enabled, long long types appears in the output. Otherwise the type is mapped to long and all objects declared long long are output as if they had been declared long (a warning is produced when this occurs). In either case, long long is treated as a distinct integer type for the purpose of integer conversion checking.

Extensions to the integer promotion and arithmetic conversion rules are required for the long long type. These have been implemented as follows:

the types of integer arithmetic operations where neither argument has long long type are unaffected;
long long and unsigned long long both promote to themselves;
the result type of arithmetic operations with one or more arguments of type unsigned long long is unsigned long long;
otherwise if either argument has type signed long long the overall type is long long if both arguments can be represented in this form, otherwise the type is unsigned long long.

There are now three cases where the type of an integer arithmetic operation is not completely determined from the type of its arguments, i.e.

signed long long + unsigned long = signed long long or unsigned long long; signed long long + unsigned int = signed long long or unsigned long long; signed int + unsigned short = signed int or unsigned int ( as before ).