TDF Diagnostic Specification
© , , The TenDRA Project.
© DERA.
First published .
Revision History
| DERA | TenDRA 4.1.2 release. |
i. Introduction
The TDF diagnostic information is intended to convey all that information, used by current source level debuggers, which would conventionally be part of an object file. Any particular installer will only use those parts of this information which its native object format can represent.
The version of the diagnostics described here is the first version. It has only been tested with TDF produced from C programs. There are known to be certain deficiencies relative to other languages (in particular to FORTRAN). A later version will correct these deficiencies. The changes already envisaged are detailed in §3, and would have minimal (if any) impact on C producers.
The diagnostic system introduces one new type of TDF linkable entities, and currently adds two new units to the bitstream representation of TDF.
Much of the actual annotation of procedure bodies is currently done by reserved TOKENs, which installers recognize specially. These TOKENs are described in §2.
There is a resemblance between the TDF diagnostic information and Unix International's DWARF format. DWARF has similar aims to the TDF diagnostics, and ensuring that complete DWARF information could be generated provided a useful check during the development of the TDF diagnostics. However the TDF diagnostics are intended to be architecture (and format) neutral. No inference should be made about any link (present or future) between DWARF and TDF diagnostics.
1. Diagnostic SORTs
- 1.1. DIAG_DESCRIPTOR
- 1.2. DIAG_UNIT
- 1.3. DIAG_TAG
- 1.4. DIAG_TAGDEF
- 1.5. DIAG_TYPE_UNIT
- 1.6. DIAG_TYPE
- 1.7. ENUM_VALUES
- 1.8. DIAG_FIELD
- 1.9. DIAG_TQ
- 1.10. FILENAME
- 1.11. SOURCEMARK
As a summary of this section:
-
DIAG_TYPEs describe programming language types (e.g. arrays, structs...).DIAG_TQs are qualifiers ofDIAG_TYPEs used for attributes like volatile and const. -
FILENAMEs andSOURCEMARKs describe source files and locations within them. -
DIAG_TAGs associate integers withDIAG_TYPEs. They are used in a similar manner to normal TDFTAGs, and are held in a (TDF) linkable unit called aDIAG_TYPE_UNIT. -
DIAG_UNITs hold a collection ofDIAG_DESCRIPTORs, used for information outside procedure bodies.
1.1. DIAG_DESCRIPTOR
| Number of encoding bits: | 2 |
| Is coding extendable: | yes |
DIAG_DESCRIPTORs are used to associate names in the source program with diagnostic items.
1.1.1. diag_desc_id
| Encoding number: | 1 |
src_name: TDFSTRING(k, n)
whence: SOURCEMARK
found_at: EXP POINTER(al)
type: DIAG_TYPE
-> DIAG_DESCRIPTOR Generates a descriptor for an identifier (of DIAG_TYPE type), whose source name was src_name from source location whence. The EXP found_at describes how to access the value. Note that the EXP need not be unique (e.g. FORTRAN EQUIVALENCE might be implemented this way).
1.1.2. diag_desc_struct
| Encoding number: | 2 |
src_name: TDFSTRING(k, n)
whence: SOURCEMARK
new_type: DIAG_TYPE
-> DIAG_DESCRIPTOR Generates a descriptor whose source name was src_name. new_type must be either a DIAG_STRUCT, DIAG_UNION or DIAG_ENUM.
This construct is obsolete.
1.1.3. diag_desc_typedef
| Encoding number: | 3 |
src_name: TDFSTRING(k, n)
whence: SOURCEMARK
new_type: DIAG_TYPE
-> DIAG_DESCRIPTOR Generates a descriptor for a type new_type whose source name was src_name. Note that diag_desc_typedef is used for associating a name with a type, rather than for any name given in the initial description of the type (e.g. in C this is used for typedef, not for struct/union/enum tags).
1.2. DIAG_UNIT
| Number of encoding bits: | 0 |
| Is coding extendable: | no |
| Unit identification: | diagdef |
A DIAG_UNIT is a TDF unit containing DIAG_DESCRIPTORs. A DIAG_UNIT is used to contain descriptions of items outside procedure bodies (e.g. global variables, global type definitions).
1.2.1. build_diag_unit
| Encoding number: | 0 |
no_labels: TDFINT
descriptors: SLIST(DIAG_DESCRIPTOR)
-> DIAG_UNIT Create a DIAG_UNIT containing DIAG_DESCRIPTORs. no_labels is the number of local labels used in descriptors (for conditionals).
1.3. DIAG_TAG
| Number of encoding bits: | 1 |
| Is coding extendable: | yes |
| Linkable entity identification: | diagtag |
DIAG_TAGs are used inter alia to break cyclic diagnostic types. They are (TDF) linkable entities. A DIAG_TAG is made from a number, and used in use_diag_tag to obtain the DIAG_TYPE associated with that number by make_diag_tagdef.
1.3.1. make_diag_tag
| Encoding number: | 1 |
num: TDFINT
-> DIAG_TAG Create a DIAG_TAG from num.
1.4. DIAG_TAGDEF
| Number of encoding bits: | 1 |
| Is coding extendable: | yes |
DIAG_TAGDEFs associate DIAG_TAGs with DIAG_TYPE s.
1.4.1. make_diag_tagdef
| Encoding number: | 1 |
tno: TDFINT
dtype: DIAG_TYPE
-> DIAG_TAGDEF Associates tag number tno with dtype.
1.5. DIAG_TYPE_UNIT
| Number of encoding bits: | 0 |
| Is coding extendable: | no |
| Unit identification: | diagtype |
A DIAG_TYPE_UNIT is a TDF unit containing DIAG_TAGDEFs.
1.5.1. build_diagtype_unit
| Encoding number: | 0 |
no_labels: TDFINT
tagdefs: SLIST(DIAG_TAGDEF)
-> DIAG_TYPEUNIT Create a DIAG_TYPEUNIT containing DIAG_TAGDEFs. no_labels is the number of local labels used in tagdefs (for conditionals).
1.6. DIAG_TYPE
| Sortname: | foreign_sort("diag_type") |
| Number of encoding bits: | 4 |
| Is coding extendable: | yes |
DIAG_TYPEs are used to provide diagnostic information about data types.
1.6.1. diag_type_apply_token
| Encoding number: | 1 |
token_value: TOKEN
token_args: BITSTREAM
-> DIAG_TYPE The token is applied to the arguments to give a DIAG_TYPE. If there is a definition for token_value in the CAPSULE then token_args is a BITSTREAM encoding of the SORTs of its parameters, in the order specified.
1.6.2. diag_array
| Encoding number: | 2 |
element_type: DIAG_TYPE
stride: EXP OFFSET(p,p)
lower_bound: EXP INTEGER(v)
upper_bound: EXP INTEGER(v)
index_type: DIAG_TYPE
-> DIAG_TYPE An array of element_type objects. stride is the OFFSET between elements of the array (i.e. p is described by element_type). The bounds are in general not runtime constants, hence the values are EXPs (not say SIGNED_NAT). The VARIETY v is described by index_type. As in TDF there is no multi-dimensional array primitive.
1.6.3. diag_bitfield
| Encoding number: | 3 |
type: DIAG_TYPE
number_of_bits: NAT
-> DIAG_TYPE Describes number_of_bits, which when extracted will have DIAG_TYPE type.
1.6.4. diag_enum
| Encoding number: | 4 |
base_type: DIAG_TYPE
enum_name: TDFSTRING(k, n)
values: LIST(ENUM_VALUES)
-> DIAG_TYPE An enumeration to be stored in an object of type base_type. If enum_name is a string contining zero characters this signifies no source tag.
1.6.5. diag_floating_variety
| Encoding number: | 5 |
var: FLOATING_VARIETY
-> DIAG_TYPE Creates a DIAG_TYPE to describe an FLOATING_VARIETY var.
1.6.6. diag_loc
| Encoding number: | 6 |
object: DIAG_TYPE
qualifier: DIAG_TQ
-> DIAG_TYPE Records the existence of an item of DIAG_TYPE object, qualified by qualifier. diag_loc is used for variables (which may of course not actually occupy a memory location).
1.6.7. diag_proc
| Encoding number: | 7 |
params: LIST(DIAG_TYPE)
optional_args: BOOL
result_type: DIAG_TYPE
-> DIAG_TYPE Describes a procedure taking n parameters. optional_args is true if and only if the make_proc which this diag_proc describes had vartag present.
1.6.8. diag_ptr
| Encoding number: | 8 |
object: DIAG_TYPE
qualifier: DIAG_TQ
-> DIAG_TYPE Describes a pointer to an object of DIAG_TYPE object. The DIAG_TQ qualifier qualifier qualifies the pointer, not the object pointed to.
1.6.9. diag_struct
| Encoding number: | 9 |
tdf_shape: SHAPE
src_name: TDFSTRING(k, n)
fields: LIST(DIAG_FIELD)
-> DIAG_TYPE Describes a structure. If src_name is a string contining zero characters this signifies no source tag for the whole structure. tdf_shape allows the total size to be computed.
1.6.10. diag_type_null
| Encoding number: | 10 |
-> DIAG_TYPE A null DIAG_TYPE.
1.6.11. diag_union
| Encoding number: | 11 |
tdf_shape: SHAPE
src_name: TDFSTRING(k, n)
fields: LIST(DIAG_FIELD)
-> DIAG_TYPE Describes a union. If src_name is a string contining zero characters this signifies no source tag for the whole union. tdf_shape allows the total size to be computed.
1.6.12. diag_variety
| Encoding number: | 12 |
var: VARIETY
-> DIAG_TYPE Creates a DIAG_TYPE to describe an integer VARIETY var.
1.6.13. use_diag_tag
| Encoding number: | 13 |
dtag: DIAG_TAG
-> DIAG_TYPE Obtains the DIAG_TYPE associated with DIAG_TAG dtag.
1.7. ENUM_VALUES
| Number of encoding bits: | 0 |
| Is coding extendable: | no |
1.7.1. make_enum_values_list
| Encoding number: | 0 |
value: EXP sh
src_name: TDFSTRING(k, n)
-> ENUM_VALUES ENUM_VALUES describe elements of an enumerated type. src_name is the source language name. value evaluates to a value of SHAPE sh. Note that all members of a LIST(ENUM_VALUES) must have the same sh.
1.8. DIAG_FIELD
| Number of encoding bits: | 0 |
| Is coding extendable: | no |
1.8.1. make_diag_field
| Encoding number: | 0 |
field_name: TDFSTRING(k, n)
found_at: EXP OFFSET(ALIGNMENT whole, ALIGNMENT this_field)
field_type: DIAG_TYPE
-> DIAG_FIELD DIAG_FIELDs describe one field of a structure or union. field_name is the source language name. found_at is the OFFSET between whole (the enclosing structure or union), and this field (this_field). field_type is the DIAG_TYPE of the field.
1.9. DIAG_TQ
| Number of encoding bits: | 2 |
| Is coding extendable: | yes |
DIAG_TQs are type qualifiers, used to qualify DIAG_TYPEs. A DIAG_TQ is constructed from diag_tq_null and the various add_diag_XXX operations.
1.9.1. add_diag_const
| Encoding number: | 1 |
qual: DIAG_TQ
-> DIAG_TQ Marks a DIAG_TQ type qualifier as being const in the ANSI C sense.
1.9.2. add_diag_volatile
| Encoding number: | 2 |
qual: DIAG_TQ
-> DIAG_TQ Marks a DIAG_TQ type qualifier as being volatile in the ANSI C sense.
1.9.3. diag_tq_null
| Encoding number: | 3 |
-> DIAG_TQ Create a null DIAG_TQ type qualifier.
1.10. FILENAME
| Sortname: | foreign_sort("~diag_file") |
| Number of encoding bits: | 2 |
| Is coding extendable: | yes |
FILENAME record details of source files used in producing a CAPSULE. They can be tokenised for abbreviation.
1.10.1. filename_apply_token
| Encoding number: | 1 |
token_value: TOKEN
token_args: BITSTREAM
-> FILENAME The token is applied to the arguments to give a FILENAME. If there is a definition for token_value in the CAPSULE then token_args is a BITSTREAM encoding of the SORTs of its parameters, in the order specified.
1.10.2. make_filename
| Encoding number: | 2 |
date: NAT
machine: TDFSTRING(k1, n1)
file: TDFSTRING(k2, n2)
-> FILENAME Create a FILENAME for file file, dated date (a UNIX timestamp; seconds since 1 Jan 1970) on machine machine.
1.11. SOURCEMARK
| Number of encoding bits: | 1 |
| Is coding extendable: | yes |
A SOURCEMARK records a location in the source program. Present SOURCEMARKs assume that a location can be described by one or two numbers within a FILENAME.
1.11.1. make_sourcemark
| Encoding number: | 1 |
file: FILENAME
line_no: NAT
char_offset: NAT
-> SOURCEMARK Create a SOURCEMARK referencing the char_offset'th character on line line_no in file file.
char_offset is counted from 1, 0 meaning that no character offset is available.
2. Reserved diagnostic TOKENs
Reserved TOKENs were used for diagnostic extensions to EXPs, to avoid adding new constructs the contents of an existing UNIT. All other parts of the diagnostic system occur in other UNITs.
2.1. ~exp_to_source
body: EXP sh
from: SOURCEMARK
to: SOURCEMARK
-> EXP sh Records that the EXP body arose from translating program between SOURCEMARK from and SOURCEMARK to (inclusive).
2.2. ~diag_id_source
body: EXP sh
name: TDFSTRING(k, n)
access: EXP POINTER(al)
type: DIAG_TYPE
-> EXP sh Within the EXP body a variable named name of DIAG_TYPE type can be accessed via the EXP access.
2.3. ~diag_type_scope
body: EXP sh
name: TDFSTRING(k, n)
type: DIAG_TYPE
-> EXP sh Within the EXP body a source language type named name of DIAG_TYPE type is valid.
2.4. ~diag_tag_scope
body: EXP sh
name: TDFSTRING(k, n)
type: DIAG_TYPE
-> EXP sh This TOKEN is obsolete.
3. Proposed changes
It is thought likely that the new TDF entities described above will eventually be incorporated into the main TDF specification.
In several places below the absence of "standardised methods" is noted. These are cases where TDF can express some operation in several ways, and the installer cannot be expected to spot all of them and generate new diagnostic info.
3.1. Language features currently missing
The following sections list some of the language features known not to be supported by the current specification. It is not intended to be exhaustive.
3.1.1. Data types
-
Complex numbers.
-
Fortran alternate
RETURNs.
3.1.2. C++ requirements
-
The
referencetype is not yet present. -
The accessibility attributes (
public,privateandprotected) are not yet present. -
No
memberfunction information, and no specification of how to deal with name mangling. Pointer tomembermay need special recognition. -
No operations for describing
classes and inheritance.
3.1.3. FORTRAN requirements
-
Main
PROGRAMattribute missing. -
Fortran optional parameters may need special treatment
-
Use of
COMMONis not explicit in TDF. -
Fortran77 etc. has a string type, which could be implemented in several ways (other languages need this, but they may differ on the same machine).
3.1.4. Other requirements
-
No standardised method for describing static link info. TDF can express such programs, but the link could be stored in several ways.
-
No standardised method for describing arrays with either non-constant bounds, and/or where the bounds are present in the running image. (The upper_bound and lower_bound
EXPs are sufficiently powerful, but needs some rules) -
No way to name a lexical block.
-
Formal parameters with default values cannot have the default made visible.
-
Variables which are constant, and have been inlined everywhere may be a problem.
-
No standardised method of describing the discriminant part of a discriminated union (in TDF probably represented by a struct containing the discriminant and the union).
-
The distinction between ANSI prototyped and non-prototyped functions (this is a real problem for functions taking
float) -
No standardised method for PASCAL sets.
-
No standardised method for subrange types.
-
PASCAL and Modula have a
WITHconstruct to change semantics of record field lookup. No standardised method for documenting this.
3.2. Areas for further abstraction
3.2.1. Compilation related
How a running program has been created from several components is of interest when debugging. The present system cannot record all details of how a program has been created. In particular there is no indication of the source language of any piece of TDF, nor of the full name of any of the source files.
3.2.2. C related
At present there is no defined link between the fundamental C types and the VARIETYs etc. used for them. Present installers for 32 bit machines cannot distinguish between int and long when generating diagnostics, other than by means of the standard token names which form part of the C producer language interface.
3.2.3. Naming of types
At present various DIAG_TYPEs have names, some don't. I suspect we should make a separate is_named operation and remove the other names.
3.3. Postscript - ANDF-DE
As this section makes clear, the TDF Diagnostic Specification was only ever really intended to deal with C. As of 1997, a more extensive diagnostic extension to TDF, ANDF-DE, is under development by DDC-I. This has been designed with the requirements of C, C++ and Ada in mind. It is intended that eventually ANDF-DE will be incorporated into the TDF specification, and the diagnostic format described here will be denegrated.