Configuration for types

5. Configuration for types

5.1. The Portability Table
5.2. Specifying integer literal types
5.3. Extended integral types
5.4. Bitfield types
5.5. Type declarations
5.6. Type compatibility
5.7. Incomplete types
5.8. Built-in types
5.9. Sign of char

5.1. The Portability Table

The portability table is used by the checker to describe the minimum assumptions about the representation of the integral types. It contains information on the minimum integer sizes and the minimum range of values that can be represented by each integer type, the sign of plain char, and whether signed types can be assumed to be symmetric (for example, [-127,127]) or maximum (for example, [-128,127]). The format for this file is documented by tdfc2portability.

The minimum integer ranges are deduced from the minimum integer sizes as follows. Suppose b is the minimum number of bits that will be used to represent a certain integral type, then:

For unsigned integer types the minimum range is [0, 2^b-1];
For signed integer types if signed_range is maximum the minimum range is [-2^b-1, 2^b-1-1]. Otherwise, if signed_range is symmetric the minimum range is [-(2^b-1-1), 2^b-1-1];
For the type char which is not specified as signed or unsigned, if char_type is signed then char is treated as signed, if char_type is unsigned then char is treated as unsigned, and if char_type is either, the minimum range of char is the intersection of the minimum ranges of signed char and unsigned char.

5.2. Specifying integer literal types

By default tdfc2 assumes that all integer ranges conform to the minimum ranges prescribed by the ISO C standard, i.e. char contains at least 8 bits, short and int contain at least 16 bits and long contains at least 32 bits. If the -Y32bit flag is passed to the checker it assumes that integers conform to the minimum ranges commonly found on most 32 bit machines, i.e. int contains at least 32 bits and int is strictly larger than short so that the integral promotion of unsigned short is int under the ISO C standard integer promotion rules.

The integer literal pragmas are used to define the method of computing the type of an integer literal. Integer literals cannot be used in a program unless the class to which they belong has been described using an integer literal pragma. Each built-in checking mode includes some integer literal pragmas describing the semantics appropriate for that mode. If these built-in modes are inappropriate, then the user must describe the semantics using the pragma below:

#pragma integer literal literal_class lit_class_type_list

The literal_class identifies the type of literal integer involved. The possibilities are:

decimal
octal
hexadecimal

Each of these types can optionally be followed by unsigned and/or long to specify an unsigned and/or long type respectively.

The values of the integer literals of any particular class are divided into contiguous sub-ranges specified by the lit_class_type_list which takes the form below:

lit_class_type_list
	*int_type_spec
		integer_constant int_type_spec | lit_class_type_listint_type_spec :
		: type_name
		* warning? : identifier
		** :

The first integer constant, i1 say, identifies the range [0, i1], the second, i2 say, identifies the range [i1 + 1, i2]. The symbol * specifies the unlimited range upwards from the last integer constant. Each integer constant must be strictly greater than its predecessor.

Associated with each sub-range is an int_type_spec which is either a type, a procedure token identifier with an optional warning (see G.9) or a failure. For each sub-range:

If the int_type_spec is a type name, then it must be an integral type and specifies the type associated with literals in that sub-range.
If the int_type_spec is an identifier, then the type of integer is computed by a procedure token of that name which takes the integer value as a parameter and delivers its type. The procedure token must have been declared previously as
```
#pragma token PROC ( VARIETY ) VARIETY
```
Since the type of the integer is computed by a procedure token which may be implemented differently on different targets, there is the option of producing a warning whenever the token is applied.
If the int_type_spec is **, then any integer literal lying in the associated sub-range will cause the checker to raise an error.

For example:

#pragma integer literal decimal 0x7fff : int | 0x7fffffff : long | * : unsigned long

divides unsuffixed decimal literals into three ranges: literals in the range [0, 0x7fff] are of type int, integer literals in the range [0x7fff, 0x7fffffff] are of type long and the remainder are of type unsigned long.

There are four pre-defined procedure tokens supplied with the compiler which are used in the startup files to provide the default specification for integer literals:

~lit_int is the external identification of a token that returns the integer type according to the rules of ISO C for an unsuffixed decimal;
~lit_hex is the external identification of a token that returns the integer type according to the rules of ISO C for an unsuffixed hexadecimal;
~lit_unsigned is the external identification of a token that returns the integer type according to the rules of ISO C for integers suffixed by U only;
~lit_long is the external identification of a token that returns the integer type according to the rules of ISO C for integers suffixed by L only.

5.3. Extended integral types

The long long integral types are not part of ISO C or C++ by default, however support for them can be enabled using the directive:

#pragma TenDRA longlong type allow

This support includes allowing long long in type specifiers and allowing LL and ll as integer literal suffixes.

There is a further directive given by the two cases:

#pragma TenDRA set longlong type : long long
#pragma TenDRA set longlong type : long

which can be used to control the implementation of the long long types. Either they can be mapped to the default representation, which is guaranteed to contain at least 64 bits, or they can be mapped to the corresponding long types.

Because these long long types are not an intrinsic part of C++ the implementation does not integrate them into the language as fully as is possible. This is to prevent the presence or otherwise of long long types affecting the semantics of code which does not use them. For example, it would be possible to extend the rules for the types of integer literals, integer promotion types and arithmetic types to say that if the given value does not fit into the standard integral types then the extended types are tried. This has not been done, although these rules could be implemented by changing the definitions of the standard tokens used to determine these types. By default, only the rules for arithmetic types involving a long long operand and for LL integer literals mention long long types.

5.4. Bitfield types

The C++ rules on bitfield types differ slightly from the C rules. Firstly any integral or enumeration type is allowed in a bitfield, and secondly the bitfield width may exceed the underlying type size (the extra bits being treated as padding). These properties can be controlled using the directives:

#pragma TenDRA extra bitfield int type allow
#pragma TenDRA bitfield overflow allow

respectively.

The ISO C standard only allows signed int, unsigned int and their equivalent types as type specifiers in bitfields. Using the default checking profile, tdfc2 raises errors for other integral types used as type specifiers in bitfields. This behaviour may be modified using the pragma:

#pragma TenDRA extra int bitfield type permit

where permit is one of allow (no errors raised), warning (allow non-int bitfields through with a warning) or disallow (raise errors for non-int bitfields).

If non-int bitfields are allowed, the bitfield is treated as if it had been declared with an int type of the same signedness as the given type. The use of the type char as a bitfield type still generally causes an error, since whether a plain char is treated as signed or unsigned is implementation-dependent. The pragma:

#pragma TenDRA character set-sign

where set-sign is signed, unsigned or either, can be used to specify the signedness of a plain char bitfield. If set-sign is signed or unsigned, the bitfield is treated as though it were declared signed char or unsigned char respectively. If set-sign is either, the sign of the bitfield is target-dependent and the use of a plain char bitfield causes an error.

5.5. Type declarations

C does not allow multiple definitions of a typedef name, whereas C++ allows multiple consistent definitions. This behaviour can be controlled using the directive:

#pragma TenDRA extra type definition allow

In accordence with the ISO C standard, in default mode tdfc2 does not allow a type to be defined more than once using a typedef. The pragma:

#pragma TenDRA extra type definition permit

where permit is allow (silently accepts redefinitions, provided they are consistent), warning or disallow.

5.6. Type compatibility

The directive:

#pragma TenDRA incompatible type qualifier allow

allows objects to be redeclared with different cv-qualifiers (normally such redeclarations would be incompatible). The composite type is qualified using the join of the cv-qualifiers in the various redeclarations.

The directive:

#pragma TenDRA compatible type : type-id == type-id : allow

asserts that the given two types are compatible. Currently the only implemented version is char * == void * which enables char * to be used as a generic pointer as it was in older dialects of C.

5.7. Incomplete types

Some dialects of C allow incomplete arrays as member types. These are generally used as a place-holder at the end of a structure to allow for the allocation of an arbitrarily sized array. Support for this feature can be enabled using the directive:

#pragma TenDRA incomplete type as object type allow

The ISO C standard (Section 6.1.2.5) states that an incomplete type e.g an undefined structure or union type, is not an object type and that array elements must be of object type. The default behaviour of the checker causes errors when incomplete types are used to specify array element types. The pragma:

#pragma TenDRA incomplete type as object type permit

can be used to alter the treatment of array declarations with incomplete element types. permit is one of allow, disallow or warning as usual.

5.8. Built-in types

The definitions of implementation dependent integral types which arise naturally within the language - the type of the difference of two pointers, ptrdiff_t, and the type of the sizeof operator, size_t - given in the <stddef.h> header can be overridden using the directives:

#pragma TenDRA set ptrdiff_t : type-id
#pragma TenDRA set size_t : type-id

These directives are useful when targeting a specific machine on which the definitions of these types are known; while they may not affect the code generated they can cut down on spurious conversion warnings. Note that although these types are built into the producer they are not visible to the user unless an appropriate header is included (with the exception of the keyword wchar_t in ISO C++), however the directives:

#pragma TenDRA++ type identifier for type-name

can be used to make these types visible. They are equivalent to a typedef declaration of identifier as the given built-in type, ptrdiff_t, size_t or wchar_t.

5.9. Sign of `char`

Whether plain char is signed or unsigned is implementation dependent. By default the implementation is determined by the definition of the ~char token, however this can be overridden in the producer either by means of the portability table or by the directive:

#pragma TenDRA character character-sign

where character-sign can be signed, unsigned or either (the default). Again this directive is useful primarily when targeting a specific machine on which the signedness of char is known.