7. Notes
- 7.1. Binding
- 7.2. Character codes
- 7.3. Constant evaluation
- 7.4. Division and modulus
- 7.5. Equality of EXPs
- 7.6. Equality of SHAPEs
- 7.7. Equality of ALIGNMENTs
- 7.8. Exceptions and jumps
- 7.9. Procedures
- 7.10. Frames
- 7.11. Lifetimes
- 7.12. Alloca
- 7.13. Memory Model
- 7.14. Order of evaluation
- 7.15. Original pointers
- 7.16. Overlapping
- 7.17. Incomplete assignment
- 7.18. Representing integers
- 7.19. Overflow and Integers
- 7.20. Representing floating point
- 7.21. Floating point errors
- 7.22. Rounding and floating point
- 7.23. Floating point accuracy
- 7.24. Representing bitfields
- 7.25. Permitted limits
- 7.26. Least Upper Bound
- 7.27. Read-only areas
- 7.28. Tag and Token signatures
- 7.29. Dynamic initialisation
7.1. Binding
The following constructions introduce TAG
s: identify, variable, make_proc, make_general_proc, make_id_tagdec, make_var_tagdec, common_tagdec.
During the evaluation of identify and variable a value, v, is produced which is bound to the TAG
during the evaluation of an EXP
or EXP
s. The TAG
is “in scope” for these EXP
s. This means that in the EXP
a use of the TAG
is permissible and will refer to the declaration.
The make_proc and make_general_proc construction introduces TAG
s which are bound to the actual parameters on each call of the procedure. These TAG
s are “in scope” for the body of the procedure.
If a make_proc or make_general_proc construction occurs in the body of another make_proc or make_general_proc, the TAG
s of the inner procedure are not in scope in the outer procedure, nor are the TAG
s of the outer in scope in the inner.
The apply_general_proc construction permits the introduction of TAG
s whose scope is the postlude argument. These are bound to the values of caller parameters after the evaluation of the body of the procedure.
The make_id_tagdec, make_var_tagdec and common_tagdec constructions introduce TAG
s which are “in scope” throughout all the tagdef UNIT
s. These TAG
s may have values defined for them in the tagdef UNIT
s, or values may be supplied by linking.
The following constructions introduce LABEL
s: conditional, repeat, labelled.
The construction themselves define EXP
s for which these LABEL
s are “in scope”. This means that in the EXP
s a use of the LABEL
is permissible and will refer to the introducing construction.
TAG
s, LABEL
s and TOKEN
s (as TOKEN
parameters) introduced in the body of a TOKEN
definition are systematically renamed in their scope each time the TOKEN
definition is applied. The scope will be completely included by the TOKEN
definition.
Each of the values introduced in a UNIT
will be named by a different TAG
, and the labelling constructions will use different labels, so no visibility rules are needed. The set of TAG
s and LABEL
s used in a simple UNIT
are considered separately from those in another simple UNIT
, so no question of visibility arises. The compound and link UNIT
s provide a method of relating the items in one simple UNIT
to those in another, but this is through the intermediary of another set of TAG
s and TOKEN
s at the CAPSULE
level.
7.2. Character codes
TDF does not have a concept of characters. It transmits integers of various sizes. So if a producer wishes to communicate characters to an installer, it will usually have to do so by encoding them in some way as integers.
An ANSI C producer sending a TDF program to a set of normal C environments may well choose to encode its characters using the ASCII codes, an EBCDIC based producer transmitting to a known set of EBCDIC environments might use the code directly, and a wide character producer might likewise choose a specific encoding. For some programs this way of proceeding is necessary, because the codes are used both to represent characters and for arithmetic, so the particular encoding is enforced. In these cases it will not be possible to translate the characters to another encoding because the character codes will be used in the TDF as ordinary integers, which must not be translated.
Some producers may wish to transmit true characters, in the sense that something is needed to represent particular printing shapes and nothing else. These representations will have to be transformed into the correct character encoding on the target machine.
Probably the best way to do this is to use TOKEN
s. A fixed representation for the printing marks could be chosen in terms of integers and TOKEN
s introduced to represent the translation from these integers to local character codes, and from strings of integers to strings of local character codes. These definitions could be bound on the target machine and the installer should be capable of translating these constructions into efficient machine code. To make this a standard, unique TOKEN
s should be used.
But this raises the question, who chooses the fixed representation and the unique TOKEN
s and their specification? Clearly TDF provides a mechanism for performing the standardisation without itself defining a standard.
Here TDF gives rise to the need for extra standards, especially in the specification of globally named unique TOKEN
s.
7.3. Constant evaluation
Some constructions require an EXP
argument which is “constant at install time”. For an EXP
to satisfy this condition it must be constructed according to the following rules after substitution of token definitions and selection of exp_cond branches.
If it contains obtain_tag then the tag will be introduced within the EXP
, or defined with make_id_tagdef within the current capsule.
It may not contain any of the following constructions: apply_proc, apply_general_proc, assign_with_mode, contents_with_mode, continue, current_env, error_jump, goto_local_lv, make_local_lv, move_some, repeat, round_as_state.
Unless it is the EXP
argument of a TAGDEF
, a “constant at install time” may not contain env_offset or env_size.
Any use of contents or assign will be applied only to POINTER
s derived from variable constructions.
If it contains labelled there will only be jumps to the LABEL
s from within starter, not from within any of the places.
Any use of obtain_tag defined with make_id_tagdef will occur after the end of the make_id_tagdef.
Note specifically that a constant EXP
forming the defining value of a TAGDEF
construct may contain env_offset and/or env_size.
7.4. Division and modulus
Two classes of division (D) and remainder (M) construct are defined. The two classes have the same definition if both operands have the same sign. Neither is defined if the second argument is zero.
Class 1:
p D1 q = n
where:
p = n*q + (p M1 q) sign(p M1 q) = sign(q) 0 <= |p M1 q| < |q|
Class 2:
p D2 q = n
where:
p = n*q + (p M2 q) sign(p M2 q) = sign(p) 0 <= |p M2 q| < |q|
7.5. Equality of EXPs
A definition of equality of EXP
s would be a considerable part of a formal specification of TDF, and is not given here.
7.6. Equality of SHAPEs
-
Two
SHAPE
s are equal if they are bothBOTTOM
, or bothTOP
or bothPROC
. -
Two
SHAPE
s are equal if they are both integer or both floating, or both bitfield, and the corresponding parameters are equal. -
Two
SHAPE
s are equal if they are bothNOF
, the numbers of items are equal and theSHAPE
parameters are equal. -
Two
OFFSET
s or twoPOINTER
s are equal if theirALIGNMENT
parameters are pairwise equal. -
Two
COMPOUND
s are equal if theirOFFSET
EXP
s are equal. -
No other pairs of
SHAPE
s are equal.
7.7. Equality of ALIGNMENTs
Two ALIGNMENT
s are equal if and only if they are equal sets.
7.8. Exceptions and jumps
TDF allows simply for labels and jumps within a procedure, by means of the conditional, labelled and repeat constructions, and the goto, case and various test constructions. But there are two more complex jumping situations.
First there is the jump, known to stay within a procedure, but to a computed destination. Many languages have discouraged this kind of construction, but it is still available in Cobol (implicitly), and it can be used to provide other facilities (see below). TDF allows it by means of the POINTER(
{code})
. TDF is arranged so that this can usually be implemented as the address of the label. The goto_local_lv construction just jumps to the label.
The other kind of construction needed is the jump out of a procedure to a label which is still active, restoring the environment of the destination procedure: the long jump. Related to this is the notion of exception. Unfortunately long jumps and exceptions do not co-exist well. Exceptions are commonly organised so that any necessary destruction operations are performed as the stack of frames is traversed; long jumps commonly go directly to the destination. TDF must provide some facility which can express both of these concepts. Furthermore exceptions come in several different versions, according to how the exception handlers are discriminated and whether exception handling is initiated if there is no handler which will catch the exception.
Fortunately the normal implementations of these concepts provide a suggestion as to how they can be introduced into TDF. The local label value provides the destination address, the environment (produced by current_env) provides the stack frame for the destination, and the stack re-setting needed by the local label jumps themselves provides the necessary stack information. If more information is needed, such as which exception handlers are active, this can be created by producing the appropriate TDF.
So TDF takes the long jump as the basic construction, and its parameters are a local label value and an environment. Everything else can be built in terms of these.
The TDF arithmetic constructions allows one to specify a LABEL
as destination if the result of the operation is exceptional. This is sufficient for the kind of explicit exception handling found in C++ and, in principle, could also be used to implement the kind of “automatic” exception detection and handling found in Ada, for example.
However many architectures have facilities for automatically trapping on exceptions without explicit testing. To take advantage of this, there is a trap ERROR_TREATMENT
with associated ERROR_code
s. The action taken on an exception with trap ERROR_TREATMENT
will be to “throw” the ERROR_code
. Since each language has its own idea of how to interpret the ERROR_code
and handle exceptions, the onus is on the producer writer to describe how to throw an ERROR_code
.
The producer writer must give a definition of a TOKEN
~Throw : NAT
-> EXP
where the NAT
will be the error_val of some ERROR_code
. The expansion of this token will be consistent with the interpretation of the relevant ERROR_code
and the method of handling exceptions. Usually this will consist of decoding the ERROR_code
and doing a long_jump on some globals set up by the procedure in which the exception handling takes place.
The translator writer will provide a parameterless EXP TOKEN
, ~Set_signal_handler. This TOKEN
will use ~Throw and must be applied before any possible exceptions. This implies that the definition of both ~Throw and ~Set_signal_handler must be bound before translation of any CAPSULE
which uses them, presumeably by linking with some TDF libraries.
These tokens are specified in more detail in the companion document, TDF Token Register.
7.9. Procedures
The var_intro of a make_proc, if present, may be used under one of two different circumstances. In one circumstance, the POINTER TAG
provided by the var_intro is used to access the actual var_param of an apply_proc. If this is the case, all uses of apply_proc which have the effect of calling this procedure will have the var_param option present, and they will all have precisely the same number of params as params_intro in the make_proc. The body of the make_proc can access elements of the var_param by using OFFSET
arithmetic relative to the POINTER TAG
. This provides a method of supplying a variable number of parameters, by composing them into a compound value which is supplied as the var_param.
However, this has proved to be unsatisfactory for the implementation of variable number of parameters in C - one cannot choose the POINTER
alignment of the TAG
a priori in non-prototype calls.
An alternative circumstance for using var_intro is where all uses of apply_proc which have the effect of calling this procedure may have more params present than the number of params_intro, and none of them will have their var_param option present. The body of the make_proc can access the additional params by using installer-defined TOKEN
s specified in the companion document TDF Token Register, analogous to the use of variable argument lists in C. A local variable v of shape ~va_list must be initialised to ~__va_start(p), where p is the POINTER
obtained from the var_intro. Successive elements of the params list can then be obtained by successive applications of ~va_arg(v,s) where s is the SHAPE
of element obtained. Finally, ~va_end(v) completes the use of v.
The definition of caller parameters in general procedures addesses this difficulty in a different way, by describing the layout of caller parameters qualified by PROCPROPS
var_callers. This allows both the call and the body to have closely associated views of the OFFSET
s within a parameter set, regardless of whether or not the particular parameter has been named. The installer-defined TOKEN
~next_caller_offset provides access to successive caller parameters, by using OFFSET
s relative to the current frame pointer current_env, adjusting for any differences there may be between the closely associated views. The caller_intro list of the make_general_proc must not be empty, then the sequence of OFFSET
s can start with an appropriate env_offset. Similar consideration applies to accessing within the callee parameters, using the installer-defined TOKEN
~next_callee_offset.
All uses of return, untidy_return and tail_call in a procedure will return values of the same SHAPE
, and this will be the result_shape specified in all uses of apply_proc or apply_general_proc calling the procedure.
The use of untidy_return gives a generalisation of local_alloc. It extends the validity of pointers allocated by local_alloc within the immediatly enclosing procedure into the calling procedure. The original space of these pointers may be invalidated by local_free just as if it had been generated by local_alloc in the calling procedure.
The PROCPROPS
check_stack may be used to check that limit set by set_stack_limit is not exceeded by the allocation of the static locals of a procedure body to be obeyed. If it is exceeded then the producer-defined TOKEN
~Throw: NAT
-> EXP
will be invoked as ~Throw(error_val(stack_overflow)). Note that this will not include any space generated by local_alloc
; an explicit test is required to do check these.
Any SHAPE
is permitted as the result_shape in an apply_proc or apply_general_proc.
7.10. Frames
TDF states that while a particular procedure activation is current, it is possible to create a POINTER
, by using current_env, which gives access to all the declared variables and identifications of the activation which are alive and which have been marked as visible. The construction env_offset gives the OFFSET
of one of these relative to such a POINTER
. These constructions may serve for several purposes.
One significant purpose is to implement such languages as Pascal which have procedures declared inside other procedures. One way of implementing this is by means of a “display”, that is, a tuple of frame pointers of active procedures.
Another purpose is to find active variables satisfying some criterion in all the procedure activations. This is commonly required for garbage collection. TDF does not force the installer to implement a frame pointer register, since some machines do not work best in this way. Instead, a frame pointer is created only if required by current_env. The implication of this is that this sort of garbage collection needs the collaboration of the producer to create TDF which makes the correct calls on current_env and env_offset and place suitable values in known positions.
Programs compiled especially to provide good diagnostic information can also use these operations.
In general any program which wishes to manipulate the frames of procedures other than the current one can use current_env and env_offset to do so.
A frame consists of three components, the caller parameters, callee parameters and locals of the procedure involved. Since each component may have different internal methods of access within the frame, each has a different special frame alignment associated with pointers within them. These are callers_alignment, callees_alignment and locals_alignment. The POINTER
produced by current_env will be some union of these special alignments depending on how the procedure was defined.
Each of these frame alignments are considered to contain any ALIGNMENT
produced by alignment from any SHAPE
. Note that this does not say that they are the set union of all such ALIGNMENT
s. This is because the interpretation of pointer and offset operations (notably add_to_pointer) may be different depending on the implementation of the frames; they may involve extra indirections.
Accordingly, because of the constraints on add_to_ptr, an OFFSET
produced by env_offset can only be added to a POINTER
produced by current_env. It is a further constraint that such an OFFSET
will only be added to a POINTER
produced from current_env used on the procedure which declared the TAG
.
7.11. Lifetimes
TAG
s are bound to values during the evaluation of EXP
s, which are specified by the construction which introduces the TAG
. The evaluation of these EXP
s is called the lifetime of the activation of the TAG
.
Note that lifetime is a different concept from that of scope. For example, if the EXP
contains the application of a procedure, the evaluation of the body of the procedure is within the lifetime of the TAG
, but the TAG
will not be in scope.
A similar concept applies to LABEL
s.
7.12. Alloca
The constructions involving alloca (last_local, local_alloc, local_free, local_free_all) as well as the untidy_return construction imply a stack-like implementation which is related to procedure calls. They may be implemented using the same stack as the procedure frames, if there is such a stack, or it may be more convenient to implement them separately. However note that if the alloca mechanism is implemented as a stack, this may be an upward or a downward growing stack.
The state of this notional stack is referred to here as the alloca state. The construction local_alloc creates a new space on the alloca stack, the size of this space being given by an OFFSET
. In the special case that this OFFSET
is zero, local_alloc in effect gives the current alloca state (normally a POINTER
to the top of the stack).
A use of local_free_all returns the alloca state to what it was on entry to the current procedure.
The construction last_local gives a POINTER
to the top item on the stack, but it is necessary to give the size of this (as an OFFSET
) because this cannot be deduced if the stack is upward growing. This top item will be the whole of an item previously allocated with local_alloc.
The construction local_free returns the state of the alloca machine to what it was when its parameter POINTER
was allocated. The OFFSET
parameter will be the same value as that with which the POINTER
was allocated.
The ALIGNMENT
of the POINTER
delivered by local_alloc is alloca_alignment. This shall include the set union of all the ALIGNMENT
s which can be produced by alignment from any SHAPE
.
The use of alloca_alignment arises so that the alloca stack can hold any kind of value. The sizes of spaces allocated must be rounded up to the appropriate ALIGNMENT
. Since this includes all value ALIGNMENT
s a value of any ALIGNMENT
can be assigned into this space. Note that there is no necessary relation with the special frame alignments (see section 7.10) though they must both contain all the ALIGNMENT
s which can be produced by alignment from any SHAPE
.
Stack pushing is local_alloc. Stack popping can be performed by use of last_local and local_free. Remembering the state of the alloca stack and returning to it can be performed by using local_alloc with a zero OFFSET
and local_free.
Note that stack pushing can also be achieved by the use of a procedure call with untidy_return.
A transfer of control to a local label by means of goto, goto_local_lv, any test construction or any error_jump will not change the alloca stack.
If an installer implements identify and variable by creating space on a stack when they come into existence, rather than doing the allocation for identify and variable at the start of a procedure activation, then it may have to consider making the alloca stack into a second stack.
7.13. Memory Model
The layout of data in memory is entirely determined by the calculation of OFFSET
s relative to POINTER
s. That is, it is determined by OFFSET
arithmetic and the add_to_ptr construction.
A POINTER
is parameterised by the ALIGNMENT
of the data indicated. An ALIGNMENT
is a set of all the different kinds of basic value which can be indicated by a POINTER
. That is, it is a set chosen from all VARIETY
s, all FLOATING_VARIETY
s, all BITFIELD_VARIETY
s, proc, code, pointer and offset. There are also three special ALIGNMENT
s, frame_alignment, alloca_alignment and var_param_alignment.
The parameter of a POINTER
will not consist entirely of BITFIELD_VARIETY
s.
The implication of this is that the ALIGNMENT
of all procedures is the same, the ALIGNMENT
of all POINTER
s is the same and the ALIGNMENT
of all OFFSET
s is the same.
At present this corresponds to the state of affairs for all machines. But it is certainly possible that, for example, 64-bit pointers might be aligned on 64-bit boundaries while 32-bit pointers are aligned on 32-bit boundaries. In this case it will become necessary to add different kinds of pointer to TDF. This will not present a problem, because, to use such pointers, similar changes will have to be made in languages to distinguish the kinds of pointer if they are to be mixed.
The difference between two POINTER
s is measured by an OFFSET
. Hence an OFFSET
is parameterised by two ALIGNMENT
s, that of the starting POINTER
and that of the end POINTER
. The ALIGNMENT
set of the first must include the ALIGNMENT
set of the second.
The parameters of an OFFSET
may consist entirely of BITFIELD_VARIETY
s.
The operations on OFFSET
s are subject to various constraints on ALIGNMENT
s. It is important not to read into offset arithmetic what is not there. Accordingly some rules of the algebra of OFFSET
s are given below.
-
offset_add is associative.
-
offset_mult corresponds to repeated offset_addition.
-
offset_max is commutative, associative and idempotent.
-
offset_add distributes over offset_max where they form legal expressions.
-
offset_test(prob, >= , a, b) continues if offset_max(a,b) = a.
7.13.1. Simple model
An example of the representation of OFFSET
arithmetic is given below. This is not a definition, but only an example. In order to make this clear a machine with bit addressing is hypothesized. This machine is referred to as the simple model.
In this machine ALIGNMENT
s will be represented by the number by which the bit address of data must be divisible. For example, 8-bit bytes might have an ALIGNMENT
of 8, longs of 32 and doubles of 64. OFFSET
s will be represented by the displacement in bits from a POINTER
. POINTER
s will be represented by the bit address of the data. Only one memory space will exist. Then in this example a possible conforming implementation would be as follows.
-
add_to_ptr is addition.
-
offset_add is addition.
-
offset_div and offset_div_by_int are exact division.
-
offset_max is maximum.
-
offset_mult is multiply.
-
offset_negate is negate.
-
offset_pad(a, x) is ((x + a - 1) / a) * a
-
offset_subtract is subtract.
-
offset_test is integer_test.
-
offset_zero is 0.
-
shape_offset(s) is the minimum number of bits needed to be moved to move a value of
SHAPE
s.
Note that these operations only exist where the constraints on the parameters are satisfied. Elsewhere the operations are undefined.
All the computations in this representation are obvious, but there is one point to make concerning offset_max, which has the following arguments and result.
arg1: EXP OFFSET(x, y) arg2: EXP OFFSET(z, y) -> EXP OFFSET(unite_alignments(x, z), y)
The SHAPE
s could have been chosen to be:
arg1: EXP OFFSET(x, y) arg2: EXP OFFSET(z, t) -> EXP OFFSET(unite_alignments(x, z), intersect_alignments(y, t))
where unite_alignments is set union and intersect_alignments is set intersection. This would have expressed the most general reality. The representation of unite_alignments(x, z) is the maximum of the representations of x and z in the simple model. Unfortunately the representation of intersect_alignments(y, t) is not the minimum of the representations of y and t. In other words the simple model representation is not a homomorphism if intersect_alignments is used. Because the choice of representation in the installer is an important consideration the actual definition was chosen instead. It seems unlikely that this will affect practical programs significantly.
7.13.2. Comparison of pointers and offsets
Two POINTER
s to the same ALIGNMENT
, a, are equal if and only if the result of subtract_ptrs applied to them is equal to offset_zero(a).
The comparison of OFFSET
s is reduced to the definition of offset_max and the equality of OFFSET
s by the note in offset_test.
7.13.3. Circular types in languages
It is assumed that circular types in programming languages will always involve the SHAPE
s PROC
or POINTER
(x) on the circular path in their TDF representation. Since the ALIGNMENT
of POINTER
is {pointer} and does not involve the ALIGNMENT
of the thing pointed at, circular SHAPE
s are not needed. The circularity is always broken in ALIGNMENT
(or PROC
).
7.13.4. Special alignments
There are seven special ALIGNMENT
s. One of these is code_alignment, the ALIGNMENT
of the POINTER
delivered by make_local_lv.
The ALIGNMENT
of a parameter of SHAPE
s is given by parameter_alignment(s) which will always contain alignment(s).
The other five special ALIGNMENT
s are alloca_alignment, callees_alignment, callers_alignment, locals_alignment and var_param_alignment. Each of these contains the set union of all the ALIGNMENT
s which can be produced by alignment from any SHAPE
. But they need not be equal to that set union, nor need there be any relation between them.
In particular they are not equal (in the sense of equality of alignments -->Equality of ALIGNMENT
s).
Each of these five refer to alignments of various components of a frame.
Notice that pointers and offsets such as POINTER
(callees_alignment(true)) and OFFSET
(callees_alignment(true), x) etc. can have some special representation and that add_to_ptr and offset_add can operate correctly on these representations. However it is necessary that
alignment(POINTER(A))={pointer}
for any ALIGNMENT
A.
7.13.5. Atomic assignment
At least one VARIETY
shall exist such that assign and assign_with_mode are atomic operations. This VARIETY
shall be specified as part of the installer specification. It shall be capable of representing the numbers 0 to 127.
Note that it is not necessary for this to be the same VARIETY
on each machine. Normal practice will be to use a TOKEN
for this VARIETY
and choose the definition of the TOKEN
on the target machine.
7.14. Order of evaluation
The order of evaluation is specified in certain constructions in terms of equivalent effect with a canonical order of evaluation. These constructions are conditional, identify, labelled, repeat, sequence and variable. Let these be called the order-specifying constructions.
The constructions which change control also specify a canonical order. These are apply_proc, apply_general_proc, case, goto, goto_local_lv, long_jump, return, untidy_return, return_to_label, tail_call, the test constructions and all instructions containing the error_jump and trap ERROR_TREATMENT
s.
The order of evaluation of the components of other constructions is as follows. The components may be evaluated in any order and with their components - down to the TDF leaf level - interleaved in any order. The constituents of the order specifying constructions may also be interleaved in any order, but the order of the operations within an order specifying operation shall be equivalent in effect to a canonical order.
Note that the rule specifying when error_jumps or traps are to be taken (error_jump) relaxes the strict rule that everything has to be “as if” completed by the end of certain constructions. Without this rule pipelines would have to stop at such points, in order to be sure of processing any errors. Since this is not normally needed, it would be an expensive requirement. Hence this rule. However a construction will be required to force errors to be processed in the cases where this is important.
7.15. Original pointers
Certain constructions are specified as producing original pointers. They allocate space to hold values and produce pointers indicating that new space. All other pointer values are derived pointers, which are produced from original pointers by a sequence of add_to_ptr operations. Counting original pointers as being derived from themselves, every pointer is derived from just one original pointer.
A null pointer is counted as an original pointer.
If procedures are called which come from outside the TDF world (such as calloc) it is part of their interface with TDF to state if they produce original pointers, and what is the lifetime of the pointer.
As a special case, original pointers can be produced by using current_env and env_offset (see current_env).
Note that:
add_to_ptr(p, offset_add(q, r))
is equivalent to:
add_to_ptr(add_to_ptr(p, q), r)
In the case that p is the result of current_env and q is the result of env_offset:
add_to_ptr(p, q)
is defined to be an original pointer. For any such expression q will be produced by env_offset applied to a TAG
introduced in the procedure in which current_env was used to make p.
7.16. Overlapping
In the case of move_some, or assign or assign_with_mode in which arg2 is a contents or contents_with_mode, it is possible that the source and destination of the transfer might overlap.
In this case, if the operation is move_some or assign_with_mode and the TRANSFER_MODE
contains overlap, then the transfer shall be performed correctly, that is, as if the data were copied from the source to an independent place and then to the destination.
In all cases, if the source and destination do not overlap the transfer shall be performed correctly.
Otherwise the effect is undefined.
7.17. Incomplete assignment
If the arg2 component of an assign or assign_with_mode operation is left by means of a jump, the question arises as to what value is in the destination of the transfer.
If the TRANSFER_MODE
complete is used, the destination shall be left unchanged if the arg2 component is left by means of a jump. If complete is not used and arg2 is left by a jump, the destination may be affected in any way.
7.18. Representing integers
Integer VARIETY
s shall be represented by a range of integers which includes those specified by the given bounds. This representation shall be twos-complement.
If the lower bound of the VARIETY
is non-negative, the representing range shall be from 0 to 28n-1 for some n. n is called the number of bytes in the representation. The number of bits in the representation is 8n.
If the lower bound of the VARIETY
is negative the representing range shall be from -28n-1 to 28n-1-1 for some n. n is called the number of bytes in the representation. The number of bits in the representation is 8n
Installers may limit the size of VARIETY
that they implement. A statement of such limits shall be part of the specification of the installer. In no case may such limits be less than 64 bits, signed or unsigned.
It is intended that there should be no upper limit allowed at some future date.
Operations are performed in the representing VARIETY
. If the result of an operation does not lie within the bounds of the stated VARIETY
, but does lie in the representation, the value produced in that representation shall be as if the VARIETY
had the lower and upper bounds of the representation. The implication of this is usually that a number in a VARIETY
is represented by that same number in the representation.
If the bounds of a VARIETY
, v, include those of a VARIETY
, w, the representing VARIETY
for v shall include or be equal to the representing VARIETY
for w.
The representations of two VARIETY
s of the form var_limits(0, 2n-1) and var_limits(-2n-1, 2n-1-1) shall have the same number of bits and the mapping of their ALIGNMENT
s into the target alignment shall be the same.
7.19. Overflow and Integers
It is necessary first to define what overflow means for integer operations and second to specify what happens when it occurs. The intention of TDF is to permit the simplest possible implementation of common constructions on all common machines while allowing precise effects to be achieved, if necessary at extra cost.
Integer varieties may be represented in the computer by a range of integers which includes the bounds given for the variety. An arithmetic operation may therefore yield a result which is within the stated variety, or outside the stated variety but inside the range of representing values, or outside that range. Most machines provide instructions to detect the latter case; testing for the second case is possible but a little more costly.
In the first two cases the result is defined to be the value in the representation. Overflow occurs only in the third case.
If the ERROR_TREATMENT
is impossible overflow will not occur. If it should happen to do so the effect of the operation is undefined.
If the ERROR_TREATMENT
is error_jump a LABEL
is provided to jump to if overflow occurs.
If the ERROR_TREATMENT
is trap(overflow), a producer-defined TOKEN
~Throw: NAT
-> EXP
must be provided. On an overflow, the installer will arrange that ~Throw(error_val(overflow)) is evaluated.
The wrap ERROR_TREATMENT
is provided so that a useful defined result may be produced in certain cases where it is usually easily available on most machines. This result is available on the assumption that machines use binary arithmetic for integers. This is certainly so at present, and there is no close prospect of other bases being used.
If a precise result is required further arithmetic and testing may be needed which the installer may be able to optimise away if the word lengths happen to suit the problem. In extreme cases it may be necessary to use a larger variety.
7.20. Representing floating point
FLOATING_VARIETY
s shall be implemented by a representation which has at least the properties specified.
Installers may limit the size of FLOATING_VARIETY
which they implement. A statement of such limits shall be part of the specification of an installer.
The limit may also permit or exclude infinities.
Any installer shall implement at least one FLOATING_VARIETY
with the following properties (c.f. IEEE doubles):
-
mantissa_digs shall not be less than 53.
-
min_exponent shall not be less than 1023.
-
max_exponent shall not be less than 1022.
Operations are performed and overflows detected in the representing FLOATING_VARIETY
.
7.21. Floating point errors
The only permitted ERROR_TREATMENT
s for operations delivering FLOATING_VARIETY
s are impossible, error_jump and trap (overflow).
The kinds of floating point error which can occur depend on the machine architecture (especially whether it has IEEE floating point) and on the definitions in the ABI being obeyed.
Possible floating point errors depend on the state of the machine and may include overflow, divide by zero, underflow, invalid operation and inexact. The setting of this state is performed outside TDF (at present).
If an error_jump or trap is taken as the result of a floating point error the operations to test what kind of error it was are outside the TDF definition (at present).
7.22. Rounding and floating point
Each machine has a rounding state which shall be one of to_nearest, toward_larger, toward_smaller, toward_zero. For each operation delivering a FLOATING_VARIETY
, except for make_floating, any rounding necessary shall be performed according to the rounding state.
7.23. Floating point accuracy
While it is understood that most implementations will use IEEE floating arithmetic operations, there are machines which use other formats and operations. It is intended that they should not be excluded from having TDF implementations.
For TDF to have reasonably consistent semantics across many platforms, one must have some minimum requirements on the accuracies of the results of the floating point operations defined in TDF. The provisional requirements sketched below would certainly be satisfied by an IEEE implementation.
Let @ be some primitive dyadic arithmetic operator and @' be its TDF floating-point implementation. Let F be some non-complex FLOATING_VARIETY
and F' be a representational variety of F.
Condition 1:
If a, b and a @ b can all be represented exactly in F, then they will also be represented exactly in F'. Extending the '-notation in the obvious manner:
(a @ b)' = (a' @' b')
This equality will also hold using the TDF equality test, i.e.:
(a @ b)' =' (a' @' b')
Condition 2:
The operator @' is monotonic in the sense apposite to the operator @. For example, consider the operator +; if x is any number and a and b are as above:
(x > b) => ((a' +' x') >= (a + b)')
and:
(x < b) => ((a' +' x') <= (a + b)')
and so on, reflecting the weakening of the ordering after the operation from > to >= and < to <=. Once again, the inequalities will hold for their TDF equivalents e.g., >=' and >'.
Similar conditions can be expressed for the monadic operations.
For the floating-point test operation, there are obvious analogues to both conditions. The weakening of the ordering in the monotonicity condition, however, may lead to surprising results, arising mainly from the uncertainty of the result of equality between floating numbers which cannot be represented exactly in F.
Accuracy requirements for complex FLOATING_VARIETY
s could follow directly by considering the above conditions applied to real and imaginary parts independently. The following proviso is added for some complex operations however, to allow for possible intermediate error conditions. With floating_div, floating_mult and floating_power for complex FLOATING_VARIETY
s, errors are guaranteed not to occur only if the square of the modulus of each argument is representable and the square of the modulus of the result is representable. Whenever these additional constraints are not met, the operation will either complete with the accuracy conditions above applying, or it will complete according to the ERROR_TREATMENT
specified.
7.24. Representing bitfields
BITFIELD_VARIETY
s specify a number of bits and shall be represented by exactly that number of bits in twos-complement notation. Producers may expect them to be packed as closely as possible.
Installers may limit the number of bits permitted in BITFIELD_VARIETY
s. Such a limit shall be not less than 32 bits, signed or unsigned.
It is intended that there should be no upper limit allowed at some future date.
Some offsets of which the second parameter contains a BITFIELD
alignment are subject to a constraint defined below. This constraint is referred to as variety_enclosed.
The intent of this constraint is to force BITFIELD
s to be implemented (in memory) as being included in some properly aligned VARIETY
value. The constraint applies to:
x: offset(p, b)
and to:
sh = bitfield(bfvar_bits(s, n))
where alignment(sh) is included in b. The constraint is as follows:
There will exist a VARIETY
, v, and r: offset(p, q) where v is in q.
offset_pad(b, r) <= x
and:
offset_pad(b, r + sz(v)) >= offset_pad( b, x + sz(sh))
where the comparisons are in the sense of offset_test, + is offset_add and sz is shape_offset.
7.25. Permitted limits
An installer may specify limits on the sizes of some of the data SHAPE
s which it implements. In each case there is a minimum set of limits such that all installers shall implement at least the specified SHAPE
s. Part of the description of an installer shall be the limits it imposes. Installers are encouraged not to impose limits if possible, though it is not expected that this will be feasible for floating point numbers.
7.26. Least Upper Bound
The LUB of two SHAPE
s, a and b is defined as follows:
-
If a and b are equal shapes, then a
-
If a is
BOTTOM
then b -
If b is
BOTTOM
then a. -
Otherwise
TOP
.
7.27. Read-only areas
Consider three scenarios in increasingly static order:
-
Dynamic loading. A new module is loaded, initialising procedures are obeyed and the results of these are then marked as read-only.
-
Normal loading. An ld program is obeyed which produces various (possibly circular) structures which are put into an area which will be read-only when the program is obeyed.
-
Using ROM. Data structures are created (again possibly circular) and burnt into ROM for use by a separate program.
In each case program is obeyed to create a structure, which is then frozen. The special case when the data is, say, just a string is not sufficiently general.
This TDF specification takes the attitude that the use of read-only areas is a property of how TDF is used - a part of the installation process - and there should not be TDF constructions to say that some values in a CAPSULE
are read-only. Such constructions could not be sufficiently general.
7.28. Tag and Token signatures
In a TDF program there will usually be references to TAG
s which are not defined in TDF; their definitions are intended to be supplied by a host system in system specific libraries.
These TAG
s will be declared (but not defined) in a TDF CAPSULE
and will be specified by external linkages of the CAPSULE
with EXTERNAL
s containg either TDFIDENT
s or UNIQUE
s. In previous versions of TDF, the external names required by system linking could only be derived from those EXTERNAL
s.
Version 4.0 gives an alternative method of constructing extra-TDF names. Each global TAG
declaration can now contain a STRING
signature field which may be used to derive the external name required by the system.
This addition is principally motivated by the various “name mangling” schemes of C++. The STRING
signature can be constructed by concatenations and token expansions. Suitable usages of TOKEN
s can ensure that the particular form of name-mangling can be deferred to installation time and hence allow, at least conceptually, linking with different C++ libraries.
As well as TAG
declarations, TAG
definitions are allowed to have signatures. The restriction that the signature (if present) of a TAG
definition being identical to its corresponding definition could allow type checking across seperately compiled CAPSULE
s.
Similar considerations apply to TOKEN
s; although token names are totally internal to TDF, it would allow one to check that a token declared in one CAPSULE
has the same “type” as its definition in another.
7.29. Dynamic initialisation
The dynamic initialisation of global variables is required for languages like C++. Previous to version 4.0, the only initialisations permissable were load-time ones; in particular no procedure calls were allowed in forming the initialising value. Version 4.0 introduces the constructor initial_value to remedy this situation.
Several different implementation strategies could be considered for this. Basically, one must ensure that all the initial_value expressions are transformed into assignments to globals in some procedure. One might expect that there would be one such procedure invented for each CAPSULE
and that somehow this procedure is called before the main program.
This raises problems on how we can name this procedure so that it can be identified as being a special initialising procedure. Some UNIX linkers reserve a name like __init specially so that all instances of it from different modules can be called before the main procedure. Other cases may require a pre-pass on the .o files prior to system linking.