Orientation Guide
© , , , , , The TenDRA Project.
© Rob Andrews.
First published .
Revision History
kate | Merged suite purpose overview into the Orientation Guide. | |
kate | Initial revision of the Orientation Guide. | |
kate | Added suite purpose overview verbatim from Rob Andrews's unofficial TenDRA homepage. | |
ra | Initial revision of the Suite Purpose Overview. |
i. Introduction
TenDRA is a relatively complex system. This document attempts to orientate the reader amongst the maze of file types and tools, and to help show how it all fits together. The focus here is primarily on what files live where, and how they are involved; the details of each stage are not discussed at length.
A few examples are given, and it is recommended that you follow along and experiment with various alterations whilst reading. This is not a comprehensive guide; it only covers the areas which seemed to be most relevant.
1. Suite Overview
- 1.1. What is TDF?
- 1.2. What is TenDRA?
- 1.3. Using the TenDRA Compiler
- 1.4. TDF Producers
- 1.5. TDF Installers
- 1.6. TDF Tools
- 1.7. Compiler Writing Tools
1.1. What is TDF?
TDF (standing for TenDRA Distribution Format) is the compiler intermediate language, which lies at the heart of the TenDRA technology. Unlike most intermediate languages, which tend to be abstractions of assembler languages, TDF is an abstraction of high level languages. The current release is based on TDF Issue 4.0, with experimental extensions to handle debugging in languages such as C++ and Ada (these extensions are not used by default).
The TDF Specification gives a technical description of the TDF language. This is supplemented by the TDF Diagnostic Specification. This is an extension to the core TDF specification, which describes how information sufficient to allow for the debugging of C programs can be embedded into a TDF capsule (it is this that the experimental extensions mentioned above are intended to replace).
The companion document, The TDF Token Register, describes the globally reserved, “standard tokens”.
The Guide to the TDF specification gives an overview and commentary on the TDF language, explaining some of the more difficult concepts.
For those who know a bit of history, TDF was the technology adopted by OSF as their ANDF (Architecture Neutral Distribution Format), and TDF Issue 4.0 (Revision 1) is the base document for The Open Group XANDF standard. Thus the terms TDF, ANDF and XANDF are largely synonymous; TDF is used in documentation since it is the term closest to our hearts.
1.2. What is TenDRA?
TenDRA is the name of the compiler technology built around the TDF intermediate language. The design and intended uses of TDF have affected how the TenDRA technology has developed. For example, the original emphasis of OSF's ANDF concept was on distribution, but this begged the question about program portablility. The current TenDRA technology is far more about portability than it is about distribution, although TDF could still be used as a distribution format.
The rigid enforcement of an interface level between the compiler front-ends and the compiler back-ends, and the goal of producing target independent TDF (suitable for distribution) have produced a flexible, clean compiler technology. It has pulled many of the questions about program portability into sharp focus in a way that a more conventional compiler could not.
1.3. Using the TenDRA Compiler
The main user interface to the TenDRA compiler, tcc, can be used as a direct replacement for the system compiler, cc(1). This is described in the TCC Users' Guide.
There is an alternative user interface, , which just applies the static program checks and disables code generation. Thus corresponds to lint(1) in the same way that tcc corresponds to cc(1).
The chief difference between tcc and other compilers is it the degree of preciseness it requires in specifying the compilation environment. This environment consists of two, largely orthogonal, components: the language checks to be applied, and the API to be checked against. For example, the -Xc
option specifies ISO C with no extensions and no extra checks, the -Xa
option specifies ISO C with common extensions, and -Xs
specifies ISO C with no extensions and lots of extra checks. Similarly -Yc89
specifies the ANSI C89 API (excluding Amendment 1), -Yposix
specifies the POSIX 1003.1 API etc. It is also possible to make tcc use the system headers on the host machine by the use of the -Ysystem
option. The -Yc++
option is required to enable the C++ facilities. The default mode is equivalent to -Xc -Yc89.
How to configure the C compiler checks is described in more detail in the C/C++ Checker Reference Manual. The extra checks available in C++ are described in the C/C++ Producer Configuration Guide.
1.4. TDF Producers
A tool which compiles a high-level language to TDF, is called a producer. The TenDRA software contains producers for the C and C++ languages. The original TenDRA C producer (tdfc) has now been superseded by a new C producer (tdfc2) based on the C++ producer (tcpplus).
The design of both producers has been guided by the goal of trying to ensure program portability by means of static program analysis. Some thoughts on this subject are set out in the document TDF and Portability.
The first component of this is by ensuring that the language implemented by the producer accurately reflects the corresponding language standard (ISO C, including Amendment 1, or the draft ISO C++ standard). The producers both include references to the standards documents within their error messages, so that a specific error can be tied to a specific clause within the standard. The producers have been tested using both the Plum Hall and Perennial C and C++ compiler validation suites.
The C++ producer implements most of the language sections of the November 1997 draft ISO C++ standard. The known problem areas are:
-
Automatic inter-module instantiation of templates is not yet fully implemented.
-
The current implementation of exception handling is not optimal with respect to performance.
-
Temporaries are not always destroyed in precisely the right place.
-
Partially constructed objects are not destroyed properly.
-
The visibility of friend functions is not right.
(<new>
, <typeinfo>
and <exception>
) have been implemented. If a complete implementation of the standard C++ library is required, it must be obtained from elsewhere. See the C/C++ Producer Configuration Guide for more details.
1.5. TDF Installers
A tool which compiles TDF to a machine language, is called an installer. TDF installers for a number of Unix systems and processors are included within the release (see the list of supported platforms). Each installer consists of code from three levels:
-
Code which is common to all installers. A large portion of each installer is derived from a common section, which reads the input TDF capsule and applies various TDF -> TDF transformations to optimise the code. Each installer has a configuration file which indicates which of these transformations are appropriate to its particular processor.
-
Code which is specific to a particular processor. Each installer also has some processor-specific code, which applies optimisations and other transformations, which are too tied to a particular processor to warrant inclusion in the common section. This section also includes register allocation.
-
Code which is specific to a particular processor/operating system pair. Even within the installers for a single processor, there may be differences between different operating systems. These differences are usually cosmetic, such as the precise assembler format etc. of reliability and performance tuning, due to the differing priorities in building up an installer base. The Intel and SPARC installers are the most reliable and have been subject to the most performance tuning.
All the installers fully support the C subset of TDF (i.e. code generated by the C producer). The Mips/Ultrix installer does not support the initial_value
construct (used in dynamic initialisation), but otherwise all the installers fully support the C++ subset. The Intel and SPARC installers fully support the entire TDF specification, as checked by the OSF AVS (ANDF Validation Suite).
1.6. TDF Tools
There are various tools included within the software for viewing, generating and transforming TDF. tspec excepted, the use of these components is integrated into the user interface, tcc, but they may also be called directly.
- tspec
-
The API checking facilities of the TenDRA compiler are implemented by means of abstract interface specifications generated using the tspec tool.
This tool and specifications for a number of common APIs are included with the release. Part of the installation process consists of pre-compiling the implementations of those APIs implemented on the target machine into TDF libraries. This is performed automatically using tcc to combine the tspec specification with the implementation given in the system headers.
- tld
-
tld is the TDF linker. It combines a number of TDF capsules into a single capsule. It also can be used to create and manipulate libraries of TDF capsules.
This functionality is provided by tcc, but tld may be called directly.
- disp
-
disp is the TDF pretty printer. It translates the bitstream comprising a TDF capsule into a human readable form.
This functionality is provided by tcc, but disp may be called directly.
- tnc
-
tnc is the TDF notation compiler. It acts as a sort of TDF “unstructured assembler”, and can translate TDF capsules to and from a human readable form.
This functionality is provided by tcc, but tnc may be called directly.
- tpl
-
tpl is the PL_TDF compiler. It is a TDF “structured assembler” in the lineage of PL360. tpl provides a more user-friendly way of generating TDF capsules from scratch than that offered by tnc.
This functionality is provided by tcc, but tpl may be called directly.
1.7. Compiler Writing Tools
A number of compiler writing tools, which were used in the development of the TenDRA compiler technology are also bundled with the TenDRA software release. These include the following:
2. Project Organisation
2.1. A tour of $PREFIX
The tendra.base.mk
makefile provides several related $PREFIX_*
variables which are used to specify various locations on the filesystem for installation. Each of these corresponds to different uses:
Variable | Default | Filetypes | Description |
---|---|---|---|
PREFIX | /usr/local | – | A convenience to specify the base for everything. |
PREFIX_BIN | ${PREFIX}/bin | Executables | User-facing binaries |
PREFIX_LIB | ${PREFIX}/lib | .so, .a | User-facing libraries and their corresponding headers. These are used by users in their own code. |
PREFIX_INCLUDE | ${PREFIX}/include | .h | |
PREFIX_LIBEXEC | ${PREFIX}/libexec | Executables | Internal binaries, used by tcc. This should not be in $PATH . |
PREFIX_SHARE | ${PREFIX}/share | Plaintext | A convenience to specify the base for platform-independent resources. |
PREFIX_MAN | ${PREFIX}/man | roff | User-facing manpages, consumed by man(1). |
PREFIX_TSPEC | ${PREFIX_SHARE}/tspec | .c, .h | tspec-generated abstract API implementations. These get combined with the system headers and compiled to produce concrete API implementations, output to PREFIX_API . |
PREFIX_STARTUP | ${PREFIX_LIB}/tcc/startup | .pf, .h, environments | tcc portability tables and strictness profiles. These are platform-independent. |
PREFIX_ENV | ${PREFIX_LIB}/tcc/env | environments | tcc environments. These are platform-specific. |
PREFIX_LPI | ${PREFIX_LIB}/tcc/lpi | .tl | TDF implementations of LPI tokens (e.g. c.tl, the C language types, etc). These are platform-specific. |
PREFIX_API | ${PREFIX_LIB}/tcc/api | .tl | TDF implementations of API tokens (e.g. c89.tl, the ANSI C89 API). These are platform-specific. |
PREFIX_SYS | ${PREFIX_LIB}/tcc/sys | .a, .o, etc | System interface miscellany (e.g. crt*.o). These are platform-specific. |
PREFIX_TMP | /tmp | tcc-XXXXXX/ | Temporary workspace for tcc. |
During a build these may be overridden individually, but they default to values based on $PREFIX
, and so overriding just that suffices for most situations. The best choices for these paths depends on the filesystem layout for each particular system.
2.2. Dependencies
TenDRA is comprised of a suite of related tools, each packaged separately. The dependencies required for building these are:
Note that
here represents the system compiler, which may or may not itself be an installation of TenDRA.cc
The runtime dependencies are:
These runtime dependencies require the respective tools to be installed under $PREFIX
in order to be used. However, once installed, they know where to find their own resources, and so need not be deployed into the system's $PATH
, or set up using ldconfig(8), etc. In other words, TenDRA may be installed to some temporary $PREFIX
and run from there. The installation does not require root.
Finally, as is typical with compilers, some of the tools are written using themselves; these have generated code which occasionally needs to be rebuilt by developers. The dependencies for rebuilding are not documented here, as they are not particularly important. These dependencies are not relevant to package managers or system adminstrators building TenDRA; they are of interest to TenDRA's developers only. Typically a developer working on a particular area would regenerate just that part as required.
2.3. Bootstrap
The goal of bootstrapping is to produce the bare minimum required for the system to be able to build itself. This is neccessary because the system compiler alone is unable to build the whole of TenDRA, as many parts are written using TenDRA-specific mechanisms. For example, the API implementations which tspec produces are encoded using the #pragma
token definition syntax which is only meaningful to the tdfc2 producer.
The phases of bootstrapping TenDRA are simply:
- Use the system compiler to build just the TenDRA tools required to be able to recompile itself, and the rest of the TenDRA system. This stage may be omitted if the system compiler is itself TenDRA (when upgrading, for example).
- Use the bootstrapped minimal TenDRA compiler to build everything required for final use.
This is simpler than for many older systems, which often take into account system compilers that only implement a subset of standard C. Such a situation is unlikely to occur now, and so the bootstrap process is simplified by being able to rely on code which is written in standard C.
2.3.1. Building the bootstrap compiler
The dependencies which must be satisfied using the system compiler are as illustrated below. Note that relatively few projects are required; many tools are omitted (disp, for example) because they are not required for the next stage of rebuilding.
Bootstrapping is of course traditional for compilers; it is a natural step for any general purpose compiler to make minimal use of the system compiler, and then proceed to rebuild using itself.
The runtime dependencies for using this minimal bootstrap compiler are the same as the dependencies for those same tools in general use, as illustrated in .
2.3.2. Rebuilding using the bootstrap compiler
TODO
3. Building
3.1. Building APIs
The API checking is one of the more interesting areas of TenDRA. An overview of the process of building APIs is given in . This diagram expands on the top-right and bottom-right quadrants of the TDF compilation phases described in TDF and Portability.
An API consists of abstract specifications of a set of APIs[a] which represent a similar level of abstraction that their respective standards represent. For example, size_t
for the C89 API is defined to be an unsigned arithmetic type, but exactly which type is left to the implementation. See for details on specifying APIs.
These abstract specifications are converted by tspec into API Source and API Includes. The API Includes contain #pragma token
statements which create tokens that correspond to the various things the API defines. Details of these are documented in the tdfc2
guide. These are used later on, during compilation of users' programs.
The generated API Source from tspec contains implementations of just the symbols present in each header for an API (as opposed to all the extensions your system probably provides), guarded by preprocessor conditions. These guards are of a standard form; for example, ssize_t.c
from the posix1
API:
/* AUTOMATICALLY GENERATED BY tspec 2.8 */ #ifndef __WRONG_POSIX1 #ifndef __WRONG_POSIX1_SYS_TYPES_H_SSIZE_T #if #include ( sys/types.h ) #define __BUILDING_TDF_POSIX1_SYS_TYPES_H_SSIZE_T #include <sys/types.h> #endif #endif #ifndef __BUILDING_TDF_POSIX1_SYS_TYPES_H_SSIZE_T #pragma TenDRA no token definition allow #endif #pragma implement interface <../shared/posix1.api/ssize_t.h> #endif
In the API specifications fed to tspec, ssize_t
is a subset; the __WRONG_POSIX1_SYS_TYPES_H_SSIZE_T
guard above is provided so that it may be excluded if your system does not have a compliant implementation of ssize_t
.
Non-compliance for a particular machine is indicated by setting __WRONG_*
macros in the start-up files for that machine. Hence for ULTRIX, which (apparently) has a ssize_t
incompatible to the posix1
API's, defines __WRONG_POSIX1_SYS_TYPES_H_SSIZE_T
and hence ssize_t
is omitted when compiling the tspec API Source into the API TDF Tokens. See Porting TenDRA to Different Operating Systems for a worked example of making use of these macros.
The compiled API TDF Tokens are linked together into .tl
libraries; each library represents an API. As explained above, the contents of these libraries are the intersection of the sets of the things defined in that particular API and what your system provides.
3.2. Production of TDF
Based on the files installed, the process of compilation is outlined in . These steps may be seen by executing tcc -dry
.
C is used as an example here, though similar things apply for any other producer. The main difference would be in API checking.
It is important to note that during compilation the system headers are not used at all. Instead, the various prototypes and such which would be bought in by #include
statements are prepended from the API Includes. These can be seen by running tcc -E
:
clarion% tcc -Yc89 -E hello.c #line 1 "hello.c" ... #line 13 "$PREFIX_TSPEC/TenDRA/include/shared/c89.api/size_t.h" ... #pragma token VARIETY unsigned size_t # size_t #pragma token VARIETY __size_t # __size_t #pragma promote size_t : __size_t #pragma no_def size_t __size_t ... #line 25 "$PREFIX_TSPEC/TenDRA/include/c89.api/stdio.h" #pragma token EXP rvalue : FILE * : stdin # c89.stdio.stdin #pragma token EXP rvalue : FILE * : stdout # c89.stdio.stdout #pragma token EXP rvalue : FILE * : stderr # c89.stdio.stderr ... #pragma token FUNC int ( __local_printf_string, ... ) : printf # c89.stdio.printf #line 3 "hello.c" int main(void) { printf("hello, world\n"); return 0; }
Here I've omitted most of the things <stdio.h>
defines, just to keep the example small.
Note that the contents of the tspec API Includes are portable; they ought to be the same for any system (this should be the case, since they were generated from the tspec API Specification sources, which did not involve anything system-specific). Therefore since these are included verbatim for a given API at the top of a users' C program, we can infer that the TDF capsule produced (foo.j
is itself portable. This is what TenDRA is all about: producing a portable binary. The steps following (namely the call to trans and beyond) may therefore be on a different system than the one on which hello.j
was produced, even though that system may have a different implementation of the APIs used. As long as the target system provides the same subset in its $api.tl
, the code will link and execute as expected. For details on this, see .
3.3. Installation of TDF
The final step of linking also brings in any system-specific libraries which may be required (such as crt0.o
), and of course any user-specified libraries, if given. These are illustrated representatively and their exact details differ per platform.