13. TDF expansions of offsets
13.1. Offsets within compound types
Consider the C structure defined by:
typedef struct{ int i; double d; char c;} mystruct;
Given that sh_int, sh_char and sh_double are the SHAPEs for int, char and double, the SHAPE of mystruct
is constructed by:
SH_mystruct = compound(S_c)
where:
S_c = offset_add(O_c, shape_offset(sh_char))
where:
O_c = offset_pad(alignment(sh_char), S_d)
where:
S_d = offset_add(O_d, shape_offset(sh_double))
where:
O_d = offset_pad(alignment(sh_double), S_i)
where: [n]
S_i = offset_add(O_i, shape_offset(sh_int))
and:
O_i = offset_zero(alignment(sh_int))
Each of S_c, S_d and S_i gives the minimum "size" of the space required to upto and including the field c, d and i respectively. Each of O_c, O_d and O_i gives the OFFSET "displacement" from a pointer to a mystruct
required to select the fields c, d and i respectively. The C program fragment:
mystruct s; .... s.d = 1.0; ...
would translate to something like:
variable(empty, tag_s, make_value(compound(S_c)), sequence(..., assign(add_to_ptr(obtain_tag(tag_s), O_d), 1.0), ... ) )
Each of the OFFSET expressions above are ideal candidates for tokenisation; a producer would probably define tokens for each of them and use exp_apply_token to expand them at each of their uses.
From the definition, we find that:
S_c = shape_offset(SH_mystruct) i.e. an OFFSET(alignment(sh_int) xc8 alignment(sh_char) xc8 alignment(sh_double), {})
This would not be the OFFSET required to describe sizeof(mystruct)
in C, since this is defined to be the difference between successive elements an array of mystruct
s. The sizeof
OFFSET would have to pad S_c to the alignment of SH_mystruct:
offset_pad(alignment(SH_mystruct), S_c)
This is the OFFSET that one would use to compute the displacement of an element of an array of mystruct
s using offset_mult with the index.
The most common use of OFFSETs is in add_to_ptr to compute the address of a structure or array element. Looking again at its signature in a slightly different form:
arg1: EXP POINTER(y xc8 A) arg2: EXP OFFSET(y, z) -> EXP POINTER(z) ... for any ALIGNMENT A
one sees that arg2
can measure an OFFSET from a value of a "smaller" alignment than the value pointed at by arg1
. If arg2
were O_d, for example, then arg1
could be a pointer to any structure of the form struct {int i, double d,...} not just mystruct
. The general principle is that an OFFSET to a field constructed in this manner is independent of any fields after it, corresponding to normal usage in both languages and machines. A producer for a language which conflicts with this would have to produce less obvious TDF, perhaps by re-ordering the fields, padding the offsets by later alignments or taking maxima of the sizes of the fields.
13.2. Bitfield offsets
Bitfield offsets are governed by rather stricter rules. In order to extract or assign a bitfield, we have to find an integer variety which entirely contains the bitfield. Suppose that we wished to extract a bitfield by:
bitfield_contents(v, p:POINTER(X), b:OFFSET(Y, B))
Y must be an alignment(I) where I is some integer SHAPE, contained in X. Further, this has to be equivalent to:
bitfield_contents(v, add_ptr(p, d:OFFSET(Y,Y)), b':OFFSET(Y, B))
for some d and b' such that:
offset_pad(v, shape_offset(I)) >= b'
and
offset_add(offset_pad(v, offset_mult(d, sizeof(I)), b') = b
Clearly, we have a limitation on the length of bitfields to the maximum integer variety available; in addition, we cannot have a bitfield which overlaps two such varieties.
The difficulties inherent in this may be illustrated by attempting to construct an array of bitfields using the nof constructor. Assuming a standard architecture, we find that we cannot usefully define an object of SHAPE nof(N, bitfield(bfvar_bits(b, M))) without padding this shape out to some integer variety which can contain M bits. In addition, they can only be usefully indexed (using bitfield_contents)either if M is some power of 2 or M*N is less than the length of the maximum integer variety. Thus a producer must be sure that these conditions hold if he is to generate and index this object simply. Even here he is in some dificulty, since he does not know the representational varieties of the integers available to him; also it is difficult for him to ensure that the alignment of the entire array is in some sense minimal. Similar difficulties occur with bitfields in structures - they are not restricted to arrays.
The solution to this conundrum in its full generality requires knowledge of the available representational varieties. Particular languages have restrictions which means that sub-optimal solutions will satisfy its specification on the use of bitfields. For example, C is satisfied with bitfields of maximum length 32 and simple alignment constraints. However, for the general optimal solution, I can see no reasonable alternative to the installer defining some tokens to produce bitfield offsets which are guaranteed to obey the alignment rules and also give optimal packing of fields and alignments of the total object for the platform in question. I believe that three tokens are sufficient to do this; these are analogous to the constructors offset_zero, offset_pad and offset_mult with ordinary alignments and their signatures could be:
~Bitfield_offset_zero: n: NAT issigned: BOOL -> EXP OFFSET(A, bfvar_bits(issigned, n))
Here the result is a zero offset to the bitfield with “minimum” integer variety alignment A.
~Bitfield_offset_pad: n: NAT issigned: BOOL sh: SHAPE -> EXP OFFSET(alignment(sh) xc8 A, bfvar_bits(issigned, n))
Here the result is the shape_offset of sh
padded with the `minimum' alignment A so that it can accomodate the bitfield. Note that this may involve padding sh
with the alignment of the maximum integer variety if there are not enough bits left at the end of sh.
~Bitfield_offset_mult: n: NAT issigned: BOOL ind: EXP INTEGER(v) -> EXP OFFSET(A, bfvar_bits(issigned, n))
Here the result is an offset which gives the displacement of ind
th element of an array of n
-bit bitfields with `minimum' alignment A. Note that this will correspond to a normal multiplication only if n
is a power of 2 or ind
* n
<= length of the maximum integer variety.
These tokens can be expressed in TDF if the lengths of the available varieties are known, i.e., they are installer defined.[o] They ought to be used in place of offset_zero, offset_pad and offset_mult whereever the alignment or shape (required to construct a SHAPE or an argument to the bitfield constructs) is a pure bitfield. The constructor nof should never be used on a pure bitfield; instead it should be replaced by:
S = compound(~Bitfield_offset_mult(M, b, N))
to give a shape, S, representing an array of N M-bit bitfields. This may not be just N*M bits; for example ~Bitfield_offset_mult may be implemented to pack an array of 3-bit bitfields as 10 fields to a 32-bit word. In any case, one would expect that normal rules for offset arithmetic are preserved, e.g.
offset_add(~Bitfield_offset_pad(M, b, S), size(bitfield(bfvar_bits(b, N)))) = ~Bitfield_offset_mult(M, b, N + 1) where size(X) = offset_pad(alignment(X), shape_offset(X))
- [n]
I could equally have given simply shape_offset(sh_int) for S_i, but the above formulation is more uniform with respect to selection OFFSETs.
- [o]
For most architectures, these definition are dependent only on a few constants such as the maximum length of bitfield., expessed as tokens for the target. The precise specification of such target dependent tokens is of current interest outside the scope of this document.