TDF Diagnostic Specification

Katherine Flavel, The TenDRA Project (copyeditor)
Jeroen Ruigrok van der Werven, The TenDRA Project (copyeditor)
DERA

First published 1998.

Revision History

1998-07-30

DERA

TenDRA 4.1.2 release.

i. Introduction

The TDF diagnostic information is intended to convey all that information, used by current source level debuggers, which would conventionally be part of an object file. Any particular installer will only use those parts of this information which its native object format can represent.

The version of the diagnostics described here is the first version. It has only been tested with TDF produced from C programs. There are known to be certain deficiencies relative to other languages (in particular to FORTRAN). A later version will correct these deficiencies. The changes already envisaged are detailed in §3, and would have minimal (if any) impact on C producers.

The diagnostic system introduces one new type of TDF linkable entities, and currently adds two new units to the bitstream representation of TDF.

Much of the actual annotation of procedure bodies is currently done by reserved TOKENs, which installers recognize specially. These TOKENs are described in §2.

There is a resemblance between the TDF diagnostic information and Unix International's DWARF format. DWARF has similar aims to the TDF diagnostics, and ensuring that complete DWARF information could be generated provided a useful check during the development of the TDF diagnostics. However the TDF diagnostics are intended to be architecture (and format) neutral. No inference should be made about any link (present or future) between DWARF and TDF diagnostics.

1. Diagnostic SORTs

As a summary of this section:

DIAG_TYPEs describe programming language types (e.g. arrays, structs...). DIAG_TQs are qualifiers of DIAG_TYPEs used for attributes like volatile and const.
FILENAMEs and SOURCEMARKs describe source files and locations within them.
DIAG_TAGs associate integers with DIAG_TYPEs. They are used in a similar manner to normal TDF TAGs, and are held in a (TDF) linkable unit called a DIAG_TYPE_UNIT.
DIAG_UNITs hold a collection of DIAG_DESCRIPTORs, used for information outside procedure bodies.

1.1. DIAG_DESCRIPTOR

Number of encoding bits:	2
Is coding extendable:	yes

DIAG_DESCRIPTORs are used to associate names in the source program with diagnostic items.

1.1.1. diag_desc_id

Encoding number:

src_name:        TDFSTRING(k, n)
whence:          SOURCEMARK
found_at:        EXP POINTER(al)
type:            DIAG_TYPE
                       -> DIAG_DESCRIPTOR

Generates a descriptor for an identifier (of DIAG_TYPE type), whose source name was src_name from source location whence. The EXP found_at describes how to access the value. Note that the EXP need not be unique (e.g. FORTRAN EQUIVALENCE might be implemented this way).

1.1.2. diag_desc_struct

Encoding number:

src_name:        TDFSTRING(k, n)
whence:          SOURCEMARK
new_type:        DIAG_TYPE
                       -> DIAG_DESCRIPTOR

Generates a descriptor whose source name was src_name. new_type must be either a DIAG_STRUCT, DIAG_UNION or DIAG_ENUM.

This construct is obsolete.

1.1.3. diag_desc_typedef

Encoding number:

src_name:        TDFSTRING(k, n)
whence:          SOURCEMARK
new_type:        DIAG_TYPE
                       -> DIAG_DESCRIPTOR

Generates a descriptor for a type new_type whose source name was src_name. Note that diag_desc_typedef is used for associating a name with a type, rather than for any name given in the initial description of the type (e.g. in C this is used for typedef, not for struct/union/enum tags).

1.2. DIAG_UNIT

Number of encoding bits:	0
Is coding extendable:	no
Unit identification:	`diagdef`

A DIAG_UNIT is a TDF unit containing DIAG_DESCRIPTORs. A DIAG_UNIT is used to contain descriptions of items outside procedure bodies (e.g. global variables, global type definitions).

1.2.1. build_diag_unit

Encoding number:

no_labels:       TDFINT
descriptors:     SLIST(DIAG_DESCRIPTOR)
           -> DIAG_UNIT

Create a DIAG_UNIT containing DIAG_DESCRIPTORs. no_labels is the number of local labels used in descriptors (for conditionals).

1.3. DIAG_TAG

Number of encoding bits:	1
Is coding extendable:	yes
Linkable entity identification:	`diagtag`

DIAG_TAGs are used inter alia to break cyclic diagnostic types. They are (TDF) linkable entities. A DIAG_TAG is made from a number, and used in use_diag_tag to obtain the DIAG_TYPE associated with that number by make_diag_tagdef.

1.3.1. make_diag_tag

Encoding number:

num:             TDFINT
                       -> DIAG_TAG

Create a DIAG_TAG from num.

1.4. DIAG_TAGDEF

Number of encoding bits:	1
Is coding extendable:	yes

DIAG_TAGDEFs associate DIAG_TAGs with DIAG_TYPE s.

1.4.1. make_diag_tagdef

Encoding number:

tno:             TDFINT
dtype:           DIAG_TYPE
                     -> DIAG_TAGDEF

Associates tag number tno with dtype.

1.5. DIAG_TYPE_UNIT

Number of encoding bits:	0
Is coding extendable:	no
Unit identification:	`diagtype`

A DIAG_TYPE_UNIT is a TDF unit containing DIAG_TAGDEFs.

1.5.1. build_diagtype_unit

Encoding number:

no_labels:       TDFINT
tagdefs: SLIST(DIAG_TAGDEF)
                     -> DIAG_TYPEUNIT

Create a DIAG_TYPEUNIT containing DIAG_TAGDEFs. no_labels is the number of local labels used in tagdefs (for conditionals).

1.6. DIAG_TYPE

Sortname:	`foreign_sort("diag_type")`
Number of encoding bits:	4
Is coding extendable:	yes

DIAG_TYPEs are used to provide diagnostic information about data types.

1.6.1. diag_type_apply_token

Encoding number:

token_value:     TOKEN
token_args:      BITSTREAM
                     -> DIAG_TYPE

The token is applied to the arguments to give a DIAG_TYPE. If there is a definition for token_value in the CAPSULE then token_args is a BITSTREAM encoding of the SORTs of its parameters, in the order specified.

1.6.2. diag_array

Encoding number:

element_type:    DIAG_TYPE
stride:          EXP OFFSET(p,p)
lower_bound:     EXP INTEGER(v)
upper_bound:     EXP INTEGER(v)
index_type:      DIAG_TYPE
                     -> DIAG_TYPE

An array of element_type objects. stride is the OFFSET between elements of the array (i.e. p is described by element_type). The bounds are in general not runtime constants, hence the values are EXPs (not say SIGNED_NAT). The VARIETY v is described by index_type. As in TDF there is no multi-dimensional array primitive.

1.6.3. diag_bitfield

Encoding number:

type:            DIAG_TYPE
number_of_bits:  NAT
                     -> DIAG_TYPE

Describes number_of_bits, which when extracted will have DIAG_TYPE type.

1.6.4. diag_enum

Encoding number:

base_type:       DIAG_TYPE
enum_name:       TDFSTRING(k, n)
values:          LIST(ENUM_VALUES) 
                     -> DIAG_TYPE

An enumeration to be stored in an object of type base_type. If enum_name is a string contining zero characters this signifies no source tag.

1.6.5. diag_floating_variety

Encoding number:

var:             FLOATING_VARIETY
                     -> DIAG_TYPE

Creates a DIAG_TYPE to describe an FLOATING_VARIETY var.

1.6.6. diag_loc

Encoding number:

object:          DIAG_TYPE
qualifier:       DIAG_TQ
                     -> DIAG_TYPE

Records the existence of an item of DIAG_TYPE object, qualified by qualifier. diag_loc is used for variables (which may of course not actually occupy a memory location).

1.6.7. diag_proc

Encoding number:

params:          LIST(DIAG_TYPE)
optional_args:   BOOL
result_type:     DIAG_TYPE
                     -> DIAG_TYPE

Describes a procedure taking n parameters. optional_args is true if and only if the make_proc which this diag_proc describes had vartag present.

1.6.8. diag_ptr

Encoding number:

object:          DIAG_TYPE
qualifier:       DIAG_TQ
                     -> DIAG_TYPE

Describes a pointer to an object of DIAG_TYPE object. The DIAG_TQ qualifier qualifier qualifies the pointer, not the object pointed to.

1.6.9. diag_struct

Encoding number:

tdf_shape:       SHAPE
src_name:        TDFSTRING(k, n)
fields:          LIST(DIAG_FIELD) 
                     -> DIAG_TYPE

Describes a structure. If src_name is a string contining zero characters this signifies no source tag for the whole structure. tdf_shape allows the total size to be computed.

1.6.10. diag_type_null

Encoding number:

                   -> DIAG_TYPE

A null DIAG_TYPE.

1.6.11. diag_union

Encoding number:

tdf_shape:       SHAPE
src_name:        TDFSTRING(k, n)
fields:          LIST(DIAG_FIELD)
                     -> DIAG_TYPE

Describes a union. If src_name is a string contining zero characters this signifies no source tag for the whole union. tdf_shape allows the total size to be computed.

1.6.12. diag_variety

Encoding number:

var:             VARIETY
                     -> DIAG_TYPE

Creates a DIAG_TYPE to describe an integer VARIETY var.

1.6.13. use_diag_tag

Encoding number:

dtag:            DIAG_TAG
                     -> DIAG_TYPE

Obtains the DIAG_TYPE associated with DIAG_TAG dtag.

1.7. ENUM_VALUES

Number of encoding bits:	0
Is coding extendable:	no

1.7.1. make_enum_values_list

Encoding number:

value:           EXP sh
src_name:        TDFSTRING(k, n)
                   -> ENUM_VALUES

ENUM_VALUES describe elements of an enumerated type. src_name is the source language name. value evaluates to a value of SHAPE sh. Note that all members of a LIST(ENUM_VALUES) must have the same sh.

1.8. DIAG_FIELD

Number of encoding bits:	0
Is coding extendable:	no

1.8.1. make_diag_field

Encoding number:

field_name:      TDFSTRING(k, n)
found_at:        EXP OFFSET(ALIGNMENT whole, ALIGNMENT this_field)
field_type:      DIAG_TYPE
                     -> DIAG_FIELD

DIAG_FIELDs describe one field of a structure or union. field_name is the source language name. found_at is the OFFSET between whole (the enclosing structure or union), and this field (this_field). field_type is the DIAG_TYPE of the field.

1.9. DIAG_TQ

Number of encoding bits:	2
Is coding extendable:	yes

DIAG_TQs are type qualifiers, used to qualify DIAG_TYPEs. A DIAG_TQ is constructed from diag_tq_null and the various add_diag_XXX operations.

1.9.1. add_diag_const

Encoding number:

qual:            DIAG_TQ
                     -> DIAG_TQ

Marks a DIAG_TQ type qualifier as being const in the ANSI C sense.

1.9.2. add_diag_volatile

Encoding number:

qual:            DIAG_TQ
                     -> DIAG_TQ

Marks a DIAG_TQ type qualifier as being volatile in the ANSI C sense.

1.9.3. diag_tq_null

Encoding number:

                     -> DIAG_TQ

Create a null DIAG_TQ type qualifier.

1.10. FILENAME

Sortname:	`foreign_sort("~diag_file")`
Number of encoding bits:	2
Is coding extendable:	yes

FILENAME record details of source files used in producing a CAPSULE. They can be tokenised for abbreviation.

1.10.1. filename_apply_token

Encoding number:

token_value:     TOKEN
token_args:      BITSTREAM
                     -> FILENAME

The token is applied to the arguments to give a FILENAME. If there is a definition for token_value in the CAPSULE then token_args is a BITSTREAM encoding of the SORTs of its parameters, in the order specified.

1.10.2. make_filename

Encoding number:

date:    NAT
machine: TDFSTRING(k1, n1)
file:    TDFSTRING(k2, n2)
         -> FILENAME

Create a FILENAME for file file, dated date (a UNIX timestamp; seconds since 1 Jan 1970) on machine machine.

1.11. SOURCEMARK

Number of encoding bits:	1
Is coding extendable:	yes

A SOURCEMARK records a location in the source program. Present SOURCEMARKs assume that a location can be described by one or two numbers within a FILENAME.

1.11.1. make_sourcemark

Encoding number:

file:            FILENAME
line_no: NAT
char_offset:     NAT
           -> SOURCEMARK

Create a SOURCEMARK referencing the char_offset'th character on line line_no in file file.

char_offset is counted from 1, 0 meaning that no character offset is available.

Reserved TOKENs were used for diagnostic extensions to EXPs, to avoid adding new constructs the contents of an existing UNIT. All other parts of the diagnostic system occur in other UNITs.

2.1. ~exp_to_source

body:            EXP sh
from:            SOURCEMARK
to:              SOURCEMARK
           -> EXP sh

Records that the EXP body arose from translating program between SOURCEMARK from and SOURCEMARK to (inclusive).

2.2. ~diag_id_source

body:            EXP sh
name:            TDFSTRING(k, n)
access:          EXP POINTER(al)
type:            DIAG_TYPE
           -> EXP sh

Within the EXP body a variable named name of DIAG_TYPE type can be accessed via the EXP access.

2.3. ~diag_type_scope

body:            EXP sh
name:            TDFSTRING(k, n)
type:            DIAG_TYPE
           -> EXP sh

Within the EXP body a source language type named name of DIAG_TYPE type is valid.

2.4. ~diag_tag_scope

body:            EXP sh
name:            TDFSTRING(k, n)
type:            DIAG_TYPE
           -> EXP sh

This TOKEN is obsolete.

3. Proposed changes

3.1. Language features currently missing
3.2. Areas for further abstraction
3.3. Postscript - ANDF-DE

It is thought likely that the new TDF entities described above will eventually be incorporated into the main TDF specification.

In several places below the absence of "standardised methods" is noted. These are cases where TDF can express some operation in several ways, and the installer cannot be expected to spot all of them and generate new diagnostic info.

3.1. Language features currently missing

The following sections list some of the language features known not to be supported by the current specification. It is not intended to be exhaustive.

3.1.1. Data types

Complex numbers.
Fortran alternate RETURNs.

3.1.2. C++ requirements

The reference type is not yet present.
The accessibility attributes (public, private and protected) are not yet present.
No member function information, and no specification of how to deal with name mangling. Pointer to member may need special recognition.
No operations for describing classes and inheritance.

3.1.3. FORTRAN requirements

Main PROGRAM attribute missing.
Fortran optional parameters may need special treatment
Use of COMMON is not explicit in TDF.
Fortran77 etc. has a string type, which could be implemented in several ways (other languages need this, but they may differ on the same machine).

3.1.4. Other requirements

No standardised method for describing static link info. TDF can express such programs, but the link could be stored in several ways.
No standardised method for describing arrays with either non-constant bounds, and/or where the bounds are present in the running image. (The upper_bound and lower_bound EXPs are sufficiently powerful, but needs some rules)
No way to name a lexical block.
Formal parameters with default values cannot have the default made visible.
Variables which are constant, and have been inlined everywhere may be a problem.
No standardised method of describing the discriminant part of a discriminated union (in TDF probably represented by a struct containing the discriminant and the union).
The distinction between ANSI prototyped and non-prototyped functions (this is a real problem for functions taking float)
No standardised method for PASCAL sets.
No standardised method for subrange types.
PASCAL and Modula have a WITH construct to change semantics of record field lookup. No standardised method for documenting this.

3.2. Areas for further abstraction

3.2.1. Compilation related

How a running program has been created from several components is of interest when debugging. The present system cannot record all details of how a program has been created. In particular there is no indication of the source language of any piece of TDF, nor of the full name of any of the source files.

3.2.2. C related

At present there is no defined link between the fundamental C types and the VARIETYs etc. used for them. Present installers for 32 bit machines cannot distinguish between int and long when generating diagnostics, other than by means of the standard token names which form part of the C producer language interface.

3.2.3. Naming of types

At present various DIAG_TYPEs have names, some don't. I suspect we should make a separate is_named operation and remove the other names.

3.3. Postscript - ANDF-DE

As this section makes clear, the TDF Diagnostic Specification was only ever really intended to deal with C. As of 1997, a more extensive diagnostic extension to TDF, ANDF-DE, is under development by DDC-I. This has been designed with the requirements of C, C++ and Ada in mind. It is intended that eventually ANDF-DE will be incorporated into the TDF specification, and the diagnostic format described here will be denegrated.