12. Token syntax
The C and C++ producers allow place-holders for various categories of syntactic classes to be expressed using directives of the form:
#pragma TenDRA token token-spec
or simply:
#pragma token token-spec
These place-holders are represented as TDF tokens and hence are called tokens. These tokens stand for a certain type, expression or whatever which is to be represented by a certain named TDF token in the producer output. This mechanism is used, for example, to allow C API specifications to be represented target independently. The types, functions and expressions comprising the API can be described using #pragma token
directives and the target dependent definitions of these tokens, representing the implementation of the API on a particular machine, can be linked in later. This mechanism is described in detail elsewhere.
A summary of the grammar for the #pragma token
directives accepted by the C++ producer is given in tdfc2pragma.
12.1. Token specifications
A token specification is divided into two components, a token-introduction giving the token sort, and a token-identification giving the internal and external token names:
token-spec : token-introduction token-identification token-introduction : exp-token statement-token type-token member-token procedure-token token-identification : token-namespace? identifier # external-identifier? token-namespace : TAG external-identifier : - preproc-token-list
The TAG
qualifier is used to indicate that the internal name lies in the C tag namespace. This only makes sense for structure and union types. The external token name can be given by any sequence of preprocessing tokens. These tokens are not macro expanded. If no external name is given then the internal name is used. The special external name -
is used to indicate that the token does not have an associated external name, and hence is local to the current translation unit. Such a local token must be defined. White space in the external name (other than at the start or end) is used to indicate that a TDF unique name should be used. The white space serves as a separator for the unique name components.
12.1.1. Expression tokens
Expression tokens are specified as follows:
exp-token : EXP exp-storage? : type-id : NAT INTEGER
representing a expression of the given type, a non-negative integer constant and general integer constant, respectively. Each expression has an associated storage class:
exp-storage : lvalue rvalue const
indicating whether it is an lvalue, an rvalue or a compile-time constant expression. An absent exp-storage is equivalent to rvalue
. All expression tokens lie in the macro namespace; that is, they may potentially be defined as macros.
For backwards compatibility with the C producer, the directive:
#pragma TenDRA++ rvalue token as const allow
causes rvalue
tokens to be treated as const
tokens.
12.1.2. Statement tokens
Statement tokens are specified as follows:
statement-token : STATEMENT
All statement tokens lie in the macro namespace.
12.1.3. Type tokens
Type tokens are specified as follows:
type-token : TYPE VARIETY VARIETY signed VARIETY unsigned FLOAT ARITHMETIC SCALAR CLASS STRUCT UNION
representing a generic type, an integral type, a signed integral type, an unsigned integral type, a floating point type, an arithmetic (integral or floating point) type, a scalar (arithmetic or pointer) type, a class type, a structure type and a union type respectively.
Floating-point, arithmetic and scalar token types have not yet been implemented correctly in either the C or C++ producers.
12.1.4. Member tokens
Member tokens are specified as follows:
member-token : MEMBER access-specifier? member-type-id : type-id :
where an access-specifier of public
is assumed if none is given. The member type is given by:
member-type-id : type-id type-id % constant-expression
where %
is used to denote bitfield members (since :
is used as a separator). The second type denotes the structure or union the given member belongs to. Different types can have members with the same internal name, but the external token name must be unique. Note that only non-static data members can be represented in this form.
Two declarations for the same MEMBER
token (including token definitions) should have the same type, however the directive:
#pragma TenDRA++ incompatible member declaration allow
allows declarations with different types, provided these types have the same size and alignment requirements.
12.1.5. Procedure tokens
Procedure, or high-level, tokens are specified in one of three ways:
procedure-token : general-procedure simple-procedure function-procedure
All procedure tokens (except ellipsis functions - see below) lie in the macro namespace. The most general form of procedure token specifies two sets of parameters. The bound parameters are those which are used in encoding the actual TDF output, and the program parameters are those which are specified in the program. The program parameters are expressed in terms of the bound parameters. A program parameter can be an expression token parameter, a statement token parameter, a member token parameter, a procedure token parameter or any type. The bound parameters are deduced from the program parameters by a similar process to that used in template argument deduction.
general-procedure : PROC { bound-toks? | prog-pars? } token-introduction bound-toks : bound-token bound-token , bound-toks bound-token : token-introduction token-namespace? identifier prog-pars : program-parameter program-parameter , prog-pars program-parameter : EXP identifier STATEMENT identifier TYPE type-id MEMBER type-id : identifier PROC identifier
The simplest form of a general-procedure is one in which the prog-pars correspond precisely to the bound-toks. In this case the syntax:
simple-procedure : PROC ( simple-toks? ) token-introduction simple-toks : simple-token simple-token , simple-toks simple-token : token-introduction token-namespace? identifier?
may be used. Note that the parameter names are optional.
A function token is specified as follows:
function-procedure : FUNC type-id :
where the given type is a function type. This has two effects: firstly a function with the given type is declared; secondly, if the function type has the form:
r ( p1, ...., pn )
a procedure token with sort:
PROC ( EXP rvalue : p1 :, ...., EXP rvalue : pn : ) EXP rvalue : r :
is declared. For ellipsis function types only the function, not the token, is declared. Note that the token behaves like a macro definition of the corresponding function. Unless explicitly enclosed in a linkage specification, a function declared using a FUNC
token has C linkage. Note that it is possible for two FUNC
tokens to have the same internal name, because of function overloading, however external names must be unique.
The directive:
#pragma TenDRA incompatible interface declaration allow
can be used to allow incompatible redeclarations of functions declared using FUNC
tokens. The token declaration takes precedence.
Certain of the more complex examples of PROC
tokens such as, for example, tokens with PROC
parameters, have not been implemented in either the C or C++ producers.
12.2. Token arguments
As mentioned above, the program parameters for a PROC
token are those specified in the program itself. These arguments are expressed as a comma-separated list enclosed in brackets, the form of each argument being determined by the corresponding program parameter.
An EXP
argument is an assignment expression. This must be an lvalue for lvalue
tokens and a constant expression for const
tokens. The argument is converted to the token type (for lvalue
tokens this is essentially a conversion between the corresponding reference types). A NAT
or INTEGER
argument is an integer constant expression. In the former case this must be non-negative.
A STATEMENT
argument is a statement. This statement should not contain any labels or any goto
or return
statements.
A type argument is a type identifier. This must name a type of the correct category for the corresponding token. For example, a VARIETY
token requires an integral type.
A member argument must describe the offset of a member or nested member of the given structure or union type. The type of the member should agree with that of the MEMBER
token. The general form of a member offset can be described in terms of member selectors and array indexes as follows:
member-offset : ::? id-expression member-offset . ::? id-expression member-offset [ constant-expression ]
A PROC
argument is an identifier. This identifier must name a PROC
token of the appropriate sort.
12.3. Defining tokens
Given a token specification of a syntactic object and a normal language definition of the same object (including macro definitions if the token lies in the macro namespace), the producers attempt to unify the two by defining the TDF token in terms of the given definition. Whether the token specification occurs before or after the language definition is immaterial. Unification also takes place in situations where, for example, two types are known to be compatible. Multiple consistent explicit token definitions are allowed by default when allowed by the language; this is controlled by the directive:
#pragma TenDRA compatible token allow
The default unification behaviour may be modified using the directives:
#pragma TenDRA no_def token-list #pragma TenDRA define token-list #pragma TenDRA reject token-list
or equivalently:
#pragma no_def token-list #pragma define token-list #pragma ignore token-list
which set the state of the tokens given in token-list. A state of no_def
means that no unification is attempted and that any attempt to explicitly define the token results in an error. A state of define
means that unification takes place and that the token must be defined somewhere in the translation unit. A state of reject
means that unification takes place as normal, but any resulting token definition is discarded and not output to the TDF capsule.
If a token with the state define
is not defined, then the behaviour depends on the sort of the token. A FUNC
token is implicitly defined in terms of its underlying function, such as:
#define f( a1, ...., an ) ( f ) ( a1, ...., an )
Other undefined tokens cause an error. This behaviour can be modified using the directives:
#pragma TenDRA++ implicit token definition allow #pragma TenDRA++ no token definition allow
respectively.
The primitive operations, no_def
, define
and reject
, can also be expressed using the context sensitive directive:
#pragma TenDRA interface token-list
or equivalently:
#pragma interface token-list
By default this is equivalent to no_def
, but may be modified by inclusion using one of the directives:
#pragma TenDRA extend header-name #pragma TenDRA implement header-name
or equivalently:
#pragma extend interface header-name #pragma implement interface header-name
These are equivalent to:
#include header-name
except that the form [....]
is allowed as a header name. This is equivalent to <....>
except that it starts the directory search after the point at which the including file was found, rather than at the start of the path (i.e. it is equivalent to the #include_next
directive found in some preprocessors). The effect of the extend
directive on the state of the interface
directive is as follows:
no_def -> no_def define -> reject reject -> reject
The effect of the implement
directive is as follows:
no_def -> define define -> define reject -> reject
That is to say, a implement
directive will cause all the tokens in the given header to be defined and their definitions output. Any tokens included in this header by extend
may be defined, but their definitions will not be output. This is precisely the behaviour which is required to ensure that each token is defined exactly once in an API library build.
The lists of tokens in the directives above are expressed in the form:
token-list : token-id token-list? # preproc-token-list
where a token-id represents an internal token name:
token-id : token-namespace? identifier type-id . identifier
Note that member tokens are specified by means of both the member name and its parent type. In this type specifier, TAG
, rather than class
, struct
or union
, may be used in elaborated type specifiers for structure and union tokens. If the token-id names an overloaded function then the directive is applied to all FUNC
tokens of that name. It is possible to be more selective using the #
form which allows the external token name to be specified. Such an entry must be the last in a token-list.
A related directive has the form:
#pragma TenDRA++ undef token token-list
which undefines all the given tokens so that they are no longer visible.
As noted above, a macro is only considered as a token definition if the token lies in the macro namespace. Tokens which are not in the macro namespace, such as types and members, cannot be defined using macros. Occasionally API implementations do define member selector as macros in terms of other member selectors. Such a token needs to be explicitly defined using a directive of the form:
#pragma TenDRA member definition type-id : identifier member-offset
where member-offset is as above.