6. Configuration for literals
- 6.1. Integer literals
- 6.2. Character literals
- 6.3. Writeable String literals
- 6.4. Concatenation of character string literals and wide character string literals
- 6.5. Escape sequences
6.1. Integer literals
The rules for finding the type of an integer literal can be described using directives of the form:
#pragma TenDRA integer literal literal-spec
where:
literal-spec : literal-base literal-suffix? literal-type-list literal-base : octal decimal hexadecimal literal-suffix : unsigned long unsigned long long long unsigned long long literal-type-list : * literal-type-spec integer-literal literal-type-spec | literal-type-list ? literal-type-spec | literal-type-list literal-type-spec : : type-id * allow? : identifier * * allow? :
Each directive gives a literal base and suffix, describing the form of an integer literal, and a list of possible types for literals of this form. This list gives a mapping from the value of the literal to the type to be used to represent the literal. There are three cases for the literal type; it may be a given integral type, it may be calculated using a given literal type token (see C/C++ Producer Implementation), or it may cause an error to be raised. There are also three cases for describing a literal range; it may be given by values less than or equal to a given integer literal, it may be given by values which are guaranteed to fit into a given integral type, or it may be match any value. For example:
#pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int #pragma TenDRA integer literal decimal 32767 : int | ** : l_i
describes how to find the type of a decimal literal with no suffix. Values less that or equal to 32767 have type int
; larger values have target dependent type calculated using the token ~lit_int
. Introducing a warning
into the directive will cause a warning to be printed if the token is used to calculate the value.
Note that this scheme extends that implemented by the C producer, because of the need for more accurate information in the C++ producer. For example, the specification above does not fully express the ISO rule that the type of a decimal integer is the first of the types int
, long
and unsigned long
which it fits into (it only expresses the first step). However with the C++ extensions it is possible to write:
#pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int #pragma TenDRA integer literal decimal ? : int | ? : long |\ ? : unsigned long | ** : l_i
6.2. Character literals
By default, a simple character literal has type int
in C and type char
in C++. The type of such literals can be controlled using the directive:
#pragma TenDRA++ set character literal : type-id
The type of a wide character literal is given by the implementation defined type wchar_t
. By default, the definition of this type is taken from the target machine's <stddef.h>
C header (note that in ISO C++, wchar_t
is actually a keyword, but its underlying representation must be the same as in C). This definition can be overridden in the producer by means of the directive:
#pragma TenDRA set wchar_t : type-id
for an integral type type-id.
6.3. Writeable String literals
By default, character string literals have type char [n]
in C and older dialects of C++, but type const char [n]
in ISO C++. Similarly wide string literals have type wchar_t [n]
or const wchar_t [n]
. Whether string literals are const
or not can be controlled using the two directives:
#pragma TenDRA++ set string literal : const #pragma TenDRA++ set string literal : no const
In the case where literals are const
, the array-to-pointer conversion is allowed to cast away the const
to allow for a degree of backwards compatibility. The status of this deprecated conversion can be controlled using the directive:
#pragma TenDRA writeable string literal allow
(yes, I know that that should be writable
). Note that this directive has a slightly different meaning in the C producer.
The ISO C standard, section 6.1.4, states that if the program attempts to modify a string literal of either form, the behaviour is undefined
. Assignments to string literals of the form:
"abc" = '3';
always result in errors. Other attempts to modify members of string literals, e.g.
"abc"[1] = '3';
are permitted in the default checking mode. This behaviour can be changed using:
#pragma TenDRA writeable string literal permit
where permit may be allow
, warning
or disallow
.
6.4. Concatenation of character string literals and wide character string literals
Adjacent string literals tokens of similar types (either both character string literals or both wide string literals) are concatenated at an early stage in parser, however it is unspecified what happens if a character string literal token is adjacent to a wide string literal token. By default this gives an error, but the directive:
#pragma TenDRA unify incompatible string literal allow
can be used to enable the strings to be concatenated to give a wide string literal.
If a '
or "
character does not have a matching closing quote on the same line then it is undefined whether an implementation should report an unterminated string or treat the quote as a single unknown character. By default, the C++ producer treats this as an unterminated string, but this behaviour can be controlled using the directive:
#pragma TenDRA unmatched quote allow
The ISO C standard, section 6.1.4, states that if a character string literal is adjacent to a wide character string literal, the behaviour is undefined. By default, this is flagged as an error by the checker. If the pragma:
#pragma TenDRA unify incompatible string literal permit
is used, with permit set to allow
or warning
the character string literal is converted to a wide character string literal and the strings are concatenated, although in the warning
case a warning is output. The disallow
version of the pragma restores the default behaviour.
6.5. Escape sequences
By default, if the character following the \
in an escape sequence is not one of those listed in the ISO C or C++ standards then an error is given. This behaviour, which is left unspecified by the standards, can be controlled by the directive:
#pragma TenDRA unknown escape allow
The result is that the \
in unknown escape sequences is ignored, so that \z
is interpreted as z
, for example. Individual escape sequences can be enabled or disabled using the directives:
#pragma TenDRA++ escape character-literal as character-literal allow #pragma TenDRA++ escape character-literal disallow
so that, for example:
#pragma TenDRA++ escape 'e' as '\033' allow #pragma TenDRA++ escape 'a' disallow
sets \e
to be the ASCII escape character and disables the alert character \a
.
By default, if the value of a character, given for example by a \x
escape sequence, does not fit into its type then an error is given. This implementation dependent behaviour can however be controlled by the directive:
#pragma TenDRA character escape overflow allow
the value being converted to its type in the normal way.
The ISO C standard specifies a small set of escape sequences in strings, for example \n
as newline. Unknown escape sequences lead to an error in the default mode , however the severity of the error may be altered using:
#pragma TenDRA unknown escape permit
where permit is allow
(silently replaces the unknown escape sequence, \z say, by z
), warning
or disallow
.