4. Descriptions and Results of Project phases

  1. 4.1. Linux installation
    1. 4.1.1. Linux/i386 installation (including the source code for commands)
    2. 4.1.2. Linux/Alpha installation
  2. 4.2. TenDRA installation
    1. 4.2.1. TenDRA installation on Intel/i386
    2. 4.2.2. TenDRA installation on DEC/Alpha
  3. 4.3. Build environment with TenDRA
    1. 4.3.1. Definition of the set of Linux commands
    2. 4.3.2. Setting up the build environment
    3. 4.3.3. NAT-NAT/i386 and DRA-NAT/i386 build problems
  4. 4.4. Definition of the API for the commands
  5. 4.5. ANDFization of the commands
    1. 4.5.1. Dealing with ANDF constraints
    2. 4.5.2. Undocumented dependencies to the OS / the underlying hardware
    3. 4.5.3. Upgrade to TenDRA 4.0
    4. 4.5.4. Holes in source code portability (64-bit vs 32-bit issues)
  6. 4.6. Installation of the API for the commands
    1. 4.6.1. Installation of the API on Linux/i386
    2. 4.6.2. Installation of the API on Linux/Alpha
    3. 4.6.3. API installation problems
  7. 4.7. Installation and validation of the commands
    1. 4.7.1. Miscellaneous problems encountered at validation
    2. 4.7.2. Recent upgrades of our original source code for Linux commands

In the next paragraphs, we describe the way we accomplished the various tasks of the project and we summarize their results.

4.1. Linux installation

At the beginning of the project, we installed the Linux operating system release 1.1 on an Intel/i386 machine. In December 1995, after a few months of work on the Intel/i386 platform, we installed Linux on the second platform for the project, which is a Dec/Alpha. Linux was first released on this platform at the beginning of 1995.

4.1.1. Linux/i386 installation (including the source code for commands)

A machine with Linux 1.1.59 from the Slackware distribution, including the native compilation chain and libraries from GNU, was setup for the project.

The Linux system is available on several anonymous ftp sites. The one we used was at sunsite.unc.edu, where a distribution of the sources and binaries of the Intel/Linux commands from Slackware was available under the /pub/Linux/distributions/slackware directory. Note that the current Slackware distribution at the time of writing of this report is based on Linux 3.0.

In the Slackware Linux distribution for Intel/ix86, the delivery of the source code for commands is split into a large number of packages. The contents of each source package must be compiled and installed individually. For example, the awk(1) command, actually gawk(1), belongs to the bin package which contains 56 commands, while the bc(1) command belongs to the bc package which contains only this one command. Consequently, we did not download the whole set of sources for the Linux commands, but selected a few packages containing the sources of the commands we intended to build first. We also had a look at the Caldera Linux source distribution, and it appeared to be organized in the same way.

A Slackware Linux package for source code distribution is made of a compressed tar files (usually only one), optional patch file(s), and a shell script. The execution of this shell script installs the source files from the tar file(s), applies patches if necessary, optionally performs a self-configuration step, runs the makefile(s) for the compilation, and finally generates a binary package which holds the resulting executables. This procedure has been adapted to fit the NAT-NAT, DRA-NAT and DRA-DRA development steps on two platforms, as described in the section Setting up the build environment.

Note that each package has a private version number. Thus packages can be maintained and released independently. Moreover, some packages (e.g. the bin package) are a collection of several “subpackages”, each of which has its own version number.

4.1.2. Linux/Alpha installation

The Linux Operating System port to the Digital Alpha architecture started two years ago. The first user-installable distribution was available in January 95, from the BLADE distribution, and was a 32-bit version. Then came a 64-bit version which was made compatible with Digital Unix with respect to basic C language types:

TypeSize
int32-bit
long64-bit
pointer64-bit

While it is still under development, Linux/Alpha is now robust and includes most of the capabilities provided by the Linux/Intel system.

The BLADE distribution was the first available distribution for Linux/Alpha. For the project, we retrieved the November 95 BLADE_0.3 release, based on the Linux 1.3 development kernel, at the following site:

ftp.digital.com:/pub/DEC/Linux-Alpha

This release consists of more than thirty 3.5'' floppy images (not including X-Window). The source code for the commands is not a part of this distribution. Since then, several new versions for the boot firmware, kernel, compiler and libraries have been released, but, as we encountered minor problems only with BLADE_0.3, we did not upgrade our system.

Since December 95, another distribution of Linux for Dec/Alpha is also available from the RedHat company; the current version is:

ftp.redhat.com:/pub/redhat/redhat-2.1/axp

This distribution includes all the source packages for the components it is made from, along with some fixes and additions, in both binary and source forms. It is possible to unpack a RedHat Linux/Alpha 2.1 set while not running the RedHat Linux, but, as a proprietary packaging format is used, one should install the packaging tools (rpm(1)) first.

At the time we setup the machine, Linux/Alpha was operational only on a few variants of Digital Alpha-based systems. So, we selected an entry level and rather inexpensive board, the Digital AXPpci 33 Alpha PC motherboard, around which we built a machine. Our Linux/Alpha system currently comprises the following:

  • an 8-slot enclosure with 200W power supply and fan

  • Digital AXPpci 33 motherboard, Windows NT (ARC) firmware, PS/2 style keyboard interface, 233 Mhz Alpha processor

  • 2x16MB, 36-bit, 70ns SIMMs

  • 256 KB, 20ns cache [optional part]

  • a Number 9 GXE VGA display adapter (ISA)

  • a dumb VGA display

  • a PS/2 style keyboard

  • a 3.5''/1440K floppy disk drive

  • a SCSI-2 hard disk (a DECpc 2.0GB disk from Digital)

  • a 3COM Ethernet Link-II (aka 3c503) controller (ISA).

We installed the BLADE_0.3 distribution on our machine, including the C compilation chain and libraries. In order to use the 3COM Ethernet board, we had to rebuild the kernel. We used almost all of the default kernel build parameters, except for the Ethernet adapter, for the settings for the TGA graphics support (switched to “no”) and for the NFS-client feature (selected). Note that a kernel rebuild takes more than half an hour on our system.

A very interesting feature of the current releases of Linux/Alpha is that they provide an almost perfect binary compatibility with Digital Unix. This was of great help to us, as will be described later.

Among the various updates to the Linux/Alpha boot loader, kernel, C compiler and libraries, commands, ..., which have been made by the Linux-Alpha development teams, we have used only a few:

  • upgrade of the sed command: some sed scripts used for modifying the system headers when building the APIs with TenDRA caused the original sed command to abort.

  • upgrade of a few system headers, extracted from the azstarnet inc-and-libs-0.38.tar.gz file.

These two updates were downloaded from the site

ftp.azstartnet.com:/pub/linux/axp

We encountered a few problems with the BLADE_0.3 release on the AXPpci Alpha board:

  • The floppy disk driver sometimes entered a time-out, as indicated by a console message.

  • Some shell scripts failed until a #!/bin/sh line, or equivalent, was inserted. According to a member of the Linux/Alpha development team, it is caused by the kernel command loader, and was fixed in new kernel releases. We worked-around this problem by patching a number of shell script files included with various source packages we were building on Linux/Alpha: we realized too late that it would have been preferable to upgrade the kernel.

  • Linux/Alpha failed to mount an NFS file system served by a HP-UX release 8 machine. Fortunately, this problem disappeared when using a server running Solaris or HP-UX release 9: we had to move our development tree to such a host.

  • As mentioned above, the sets of source files were shared through NFS between a Linux/Intel, a Digital Unix and a Linux/Alpha platform. Occasionally, Linux/Alpha lost access to a file that had been updated recently by another NFS client: the error message “Stale NFS file handle” was displayed. Unmounting/remounting the NFS file system usually cured such problems.

  • During kernel rebuilds, the compilation of at least one file failed because of lack of memory: in the makefiles for kernel rebuild, the gcc compiler is called with the -pipe option, which speeds up the build but is not safe when compiling large source files. We wrote a small shell script which redoes a compilation without the -pipe option. (This problem was fixed by a subsequent kernel release: tcpip.c was split into several parts...)

4.2. TenDRA installation

We first installed the TenDRA technology on the Linux/i386 platform, from the April 1995 snapshot. Later, we installed the November 1995 release, the first to include support for the Dec/Alpha machine, in order to start work on the Linux/Alpha platform. However, because this snapshot was not upward compatible with the previous TenDRA release, we also had to install it on the Linux/i386 platform. We did not upgrade to the February 1996 snapshot, though it is compatible with the previous one. We only used it in a few cases when we had a bug in a command and wanted to make sure that it was not due to a problem already fixed in TenDRA.

4.2.1. TenDRA installation on Intel/i386

The TenDRA snapshot from April 1995, based on TDF 3.1, included support for the Linux/i386 platform. So, the installation on our machine was straightforward. We only had to recompile the tcc driver, and to modify some environment and startup files to fine tune the level of checking.

When we started to work on the second platform, we had to install the November 1995 snapshot of the TenDRA technology (see below). This was DRA's first snapshot based on TDF 4.0, and it included significant changes to the installation procedure. We had some difficulties to install this snapshot, due to a few bugs in the new installation procedures, but once installed the technology appeared to work well.

4.2.2. TenDRA installation on DEC/Alpha

The TenDRA snapshot from November 1995 was the first snapshot with support for the Dec/Alpha platform, but was not upward compatible with the one installed on the Intel platform (TenDRA 4.0 versus TenDRA 3.1). So, the TenDRA snapshot from November 1995 was installed on both the Intel/i386 and the Dec/Alpha platforms.

DRA provide support for the DigitalUnix/Alpha platform, not for the Linux/Alpha one. However, we benefited from the compatibility between Digital Unix and Linux to solve this problem. We made three different installations of the TenDRA technology for Alpha, among which the 2nd was fully operational for Linux/Alpha:

  • First installation on native Digital Unix/alpha.

  • Second installation, still on Digital Unix, but for cross-development for Linux/Alpha (termed “lin_alpha_cross”).

  • Third installation on Linux/Alpha (termed “lin_alpha”).

The first installation was straightforward and worked very well. The main purpose was to ensure that the TenDRA technology worked correctly on Dec/Alpha.

For the second installation, we created a new target platform termed “lin_alpha_cross”. Most of the “lin_alpha_cross” files are shared with Digital Unix, using symbolic links to directories or files, since only a few files differ between the two targets. The main purpose of these changes was to use Linux/Alpha system header files when compiling, instead of the Digital Unix system header files, and Linux/Alpha libraries and startup files when link-editing. For example, we changed three files inside the <target_platform>/private/env directory named default, system and tcc_diag. For the same reason, we created a specific “lin_alpha_cross” subdirectory in lib/system to hold some replacement system header files when cross-compiling with the -Ysystem option (i.e. in DRA-NAT mode). The target dependent directories and files used when (cross-)building APIs for Linux/Alpha, e.g. located under the src/apis/libs directory, were also made specific.

Using this installation, we could cross-compile and cross-link on Digital Unix for Linux, without any problem. The binary compatibility between Linux/Alpha and Digital Unix was thus a key factor of success.

The third TenDRA installation for Linux/Alpha was readily derived from the previous one. We benefited again from the binary compatibility with Digital Unix: we ran the TenDRA compilation chain, built for Digital Unix, on top of Linux/Alpha, without the necessity to port it or recompile it. However, to do so, we had to copy and install the shared library tools of Digital Unix on Linux/Alpha because TenDRA uses shared libraries, for which there is currently no support in Linux/Alpha. The Digital Unix shared libraries mechanism works fine under Linux! This trick could have be avoided if we had re-linked the TenDRA tools under Digital Unix using its statically-linked libraries.

In order to use the Linux/Alpha native assembler and link-editor instead of those from Digital Unix, we wrote a front-end shell script to the TenDRA installer (trans). This shell script calls the actual trans tool with an option to output a source assembly file instead of the “binary assembly” files used by the Digital Unix as1 tool. Similarly, we wrote a front-end shell script which emulates the call made by tcc to as1 by a call to the Linux as tool. We give below the changes to the settings in <lin_alpha>/private/env/default for the third installation:

+TRANS "/..../linux/1.3.45/alpha/private/bin/trans.sh"
+AS1   "/..../linux/1.3.45/alpha/private/bin/as1.sh"
+AS    "/usr/bin/as -nocpp" # seems unused
+LD    "/usr/bin/ld -G8 -O1"

However, despite these modifications, the port of the TenDRA installer to Linux/Alpha could not be completed. In some cases, the TenDRA installer appeared to generate assembly instructions that are not recognized by the Linux/Alpha assembler. For example, the following lines could not be assembled properly by Linux/Alpha:

.extern __ctype_ 8 Error: Rest of line ignored. First ignored character is '8' stq $fp, 8($sp) Warning: Illegal operands bis $17,$17,$fp Warning: Illegal operands .frame $fp, 360, $26, 0 Error: bad absolute expression; zero assumed

We now understand it is not surprising that we were unable to use the TenDRA installer for Digital Unix/Alpha on Linux/Alpha. ANDF installer output needs to be tailored for different target operating systems according to the assembler and/or link editor interfaces supported by the target operating system. In our case we attempted to use the assembler interface, and the errors and warnings above are examples where this interface differs between Digital Unix/Alpha and Linux/Alpha. Debugger support and even some details of the procedure calling conventions may also need to be taken into account when tailoring an ANDF installer to a different operating system.

Since we had already set up a cross-development TenDRA environment for Linux/Alpha, hosted by Digital Unix, we continued to use it and discontinued use of the “lin_alpha” installation. We actually installed TenDRA on an NFS file server (used by the Linux/ i386, the Digital Unix and the Linux/Alpha platforms).

4.3. Build environment with TenDRA

4.3.1. Definition of the set of Linux commands

The definition of the set of commands has not been done once for all. It has been done on the Intel/i386 platform, during the first part of the project, in two steps, level1 and level2, as described in §2. Each step has been performed as an incremental process. Each time new commands were selected, a whole cycle of API definition (§4.4), command ANDFization (§4.5), API installation (§4.6) and command installation and validation (§4.7) was performed.

During the first phase of the project, a number of commands have been compiled in DRA-DRA mode on the Unixware platform (see Validation of TenDRA Capability to Implement a UNIX-like Operating System). We started by locating these commands in the packages from the Slackware Linux source distribution for Intel/ix86. Since these commands were among the simplest ones to ANDFize on Unixware, they were good candidates to start with. Provided that a full binary installation was made on our Linux/i386 platform, a command could be located in a package by searching for its name in the list of packages under the /var/adm/packages directory. This directory contains one text file per installed package, which records the names of the commands it contains (actually the relative installation path from / is provided for each command). When the name of the package which holds a command has been found, we just had to connect to the ftp site, find the directory of the same name in the Slackware source distribution, and download the files under this directory.

Among the 103 Unixware commands ANDfized during the previous phase of this project, we found 59 commands with similar name in Linux, scattered among 11 packages: bc, bin, bsdgames, diff, find, grep, gzip, sh_utils, txtutils and util. Moreover, these packages also contained some additional commands which appeared to be good candidates for easy ANDFization. However, we excluded a few of them which were compiled but not delivered in the binary package, or which seemed too dependent on the target platform (e.g. the fdformat command which formats floppy disks). We also selected four additional packages, tar, cpio, xlock and xgames, in order to complete the level 1 set of commands.

The definition of the level 2 set of commands was more difficult than for level 1, because we had to reject a number of commands, for various reasons discussed below.

First, we tried to include in the level 2 set of commands more commands from X11, as we successfully experimented the ANDFization of a few of them for level 1. However it appeared that this was not so easy: the sources for these commands had not been packaged by Slackware, but were provided inside a huge collection of sources named Xfree86 (from the Xfree86 Project, Inc.), itself derived from the X Consortium X11R6 code. An attempt to perform the first step of the build of Xfree86 on Linux/i386, which consisted in producing Makefiles from Imakefiles, failed. With some rewriting, we managed to produce a Makefile for a simple Xfree86 command, xclock, and successfully compiled it. However, we did not spend much time on understanding the installation procedures of Xfree86, and, since it would have taken us too much time per command to rewrite every Makefile, we set aside Xfree86.

Then we excluded one package, groff, because it was mostly coded in C++. We found also that some native Linux header files, used in several packages, offer BSD compatibility but in a way that could not be straightforwardly adapted to TenDRA. This issue is discussed in §4.4.

We have included in the level 2 set of commands some commands which represent quite large amounts of source code: m4, elvis (a vi clone), joe (another editor), less, perl and elm. The ultimate step of this experiment would have been to build “monsters” such as bash and emacs.

Finally, we evaluated the number of commands distributed with a Linux system, and we found about 700 executable binary files in the /usr/bin, /bin, /usr/X11/bin, /usr/openwin/bin, /usr/games, /sbin and /usr/sbin directories. We examined some of these commands in the Slackware packages, and concluded that there could be candidates for ANDFization. But we were limited by time constraints to include such commands. Also, we did not port to the 2nd platform, nor validate, all the commands operational on the 1st platform. Here again, time is the main reason why we did not complete the port. We estimate that an additional 1.5 engineer-month would have been sufficient to complete the task, apart from some commands which may have been difficult to port.

The level1 and level2 set of commands include 236 commands: these commands were installed and validated on Linux/i386 (cf. section 4.7 on this point).

In the list below, the commands in bold (149) have been installed and validated on both Linux/i386 and Linux/Alpha, as opposed to the commands listed in plain characters, which are available on the first platform only.

The few (13) commands listed in italic were ported to both platforms, but their validation failed, or was not completed, on the second platform.

Finally, in the “p/m” statements, p is the best-case number of commands we ported, while m is the maximum number of them with respect to a given Slackware Linux package.

PackageTotalCommands ported
aaa_base9/9fromdos, funzip, mtools, todos, unzip, unzipsfx, zip, zipnote, zipsplit.
ash1/1ash.
bc1/1bc.
bin48/56at, bban, bpe, chgrp, chmod, chown, compress, cp, crond, crontab, ctags, dd, df, dircolors, du, ed, elvis, elvprsv, elvrec, file, fiz, fmt, gawk, ginstall, indent, ln, ls, man, mkdir, mkfifo, mknod, mv, patch, ref, rm, rmdir, sed, shar, sysvbanner, time, touch, tput, unarj, unshar, uudecode, uuencode, which, zoo.
bsdgames13/36bcd, caesar, factor, fish, monop, morse, number, paranoia, ppt, primes, rain, worm, worms.
byacc1/1byacc.
cpio2/2cpio, mt-GNU.
diff4/4cmp, diff, diff3, sdiff.
elm9/9answer, elm, elmalias, fastmail, filter, frm, newalias, newmail, readmsg.
find6/6bigram, code, find, frcode, locate, xargs.
flex1/1flex.
getty2/2getty, uugetty.
grep1/1grep.
gzip1/1gzip.
ispell6/6buildash, icombine, ijoin, ispell, sq, unsq.
joe2/2joe, termidx.
less2/2less, lesskey.
m42/2 ansi2knr, m4.
perl4/4 a2p, perl4.036, sperl4.036, tperl4.036.
ps11/12free, fuser, killall, ps, pstree, psupdate, tload, uptime, vmstat, w.procps, w.bassman.
rcs8/8ci, co, ident, merge, rcs, rcsdiff, rcsmerge, rlog.
sh_utils24/24 basename, date, dirname, echo, env, expr, id, logname, nice, pathchk, printenv, printf, pwd, sleep, stty, su, tee, test, tty, uname, users, who, whoami, yes.
sudo2/2sudo.bin, visudo.
tar3/3tar, rmt, testpad.
tcpip7/31 (from the net-tools subset)arp, ifconfig, plipconfig, rarp, route, netstat, slattach.
txtutils22/22cat, cksum, comm, csplit, cut, expand, fold, head, join, nl, od, paste, pr, sort, split, sum, tac, tail, tr, unexpand, uniq, wc.
util35/57agetty, arch, banner, chfn, chroot, chsh, col, colcrt, colrm, column, ddate, frag, hexdump, hostname, ipcrm, ipcs, last, login, mesg, more, newgrp, passwd, rdev, readprofile, renice, rev, setsid, sln, strings, swapon, ul, vipw, wall, zdump, zic.
xgames8/13maze, xcolormap, spider, xtetris, xlander, xminesweep, xroach, xvier.
xlock1/1xlock.

4.3.2. Setting up the build environment

The environments for the NAT-NAT, DRA-NAT and DRA-DRA builds have been setup using similar to those used during the Unixware port.

  • One single reference source tree, then a dedicated work tree per (build, target platform). For the 1st target (Linux/i386), each work tree holds symbolic links to the source tree, while binaries are built inside a work tree as plain files. In addition, a procedure is used to replace a link to the source tree by a link to a patch tree when a source file has to be modified during the port to TenDRA. This is very similar to the environment we had on Unixware. The major difference is that each package has its own set of source/work/patch file trees. This is more modular but requires more manipulations.

  • For the 2nd target (Linux/Alpha): we usually created only a work tree for the DRA-DRA build. It initially contained source files only, which are symbolic links to their equivalent in the DRA-DRA/i386 work tree. By “source files” we mean here the Makefiles and the ANDF - .j - files having been generated from the original .c files by the TenDRA producer, during the DRA-DRA build for Linux/i386.

  • A shell script used as a pseudo cc (e.g. pseudo gcc) during the DRA-NAT and DRA-DRA builds. This avoids the necessity to modify most of the original makefiles when building the commands. The pseudo cc used during the build for the 2nd platform substitutes the (usually unique) input_file.c by input_file.j.

One specific feature of the sources and build procedures of the Linux commands is that they have often been designed to support a variety of target platforms and UNIX variants at source level. Thus, when building a command for the first time, there is usually a preliminary self-configuration step which examines the system header files, and produces a local header file (or a customized Makefile) which summarizes the target system peculiarities by means of #define (or -D) statements. We ran such self-configuration scripts before creating the NAT-NAT, DRA-NAT and DRA-DRA work trees: this assumes that our second platform for porting (Linux/Alpha) is to provide similar APIs to the 1st one (Linux/i386). Eventually, we had to revise the settings chose by the self-configuration.

4.3.3. NAT-NAT/i386 and DRA-NAT/i386 build problems

These two builds of the commands were only performed on the Linux/i386 platform, as a sanity check and cleanup of the source code.

We faced only one problem during the NAT-NAT/i386 build of a few commands.

  • Some header files (e.g. linux/autoconf.h) were not found when attempting to compile some administrative commands. To gain access to such headers, the preliminary step of a kernel rebuild can be done; alternatively, one could manually establish the proper symbolic links for /usr/include/linux and /usr/include/asm: they should point to their equivalent inside the /usr/src/linux/includedirectory.

We faced a limited number of problems during the DRA-NAT/i386 builds of the commands. We list these problems below:

  • The link-edit of some commands failed because one symbol was undefined: _alloca. In the native compiler (gcc), _alloca is implemented as a built-in function. In the TenDRA compiler, this can also be the case, provided that the header file alloca.h is explicitly included. So we modified the relevant source files to include this header file.

  • The source code for some commands appeared to use, through the inclusion of a system header file or under #ifdef i386 conditional instruction, some assembly code. The related commands were thus excluded from our set of commands, except for a few of them for which we found a C variant to the assembly code.

  • Re-declaration of an array, for which the dimension was computed using sizeof. The following code sums-up the problem:

    extern int lnum[sizeof(short)];
    int lnum[sizeof(short)]; /* bis */

We sent a Change Request, array_sizeof(262), concerning this problem, which applied to the apr-95 and nov-95 TenDRA releases. It has now been fixed.

  • Name conflict between a function and its arguments. The following code sums-up the problem:

    char *fields(fields)
    char *fields;
    { return fields; }

We sent a Change Request, func_var(262), concerning this problem, which applies to the apr-95 and nov-95 TenDRA releases. It has now been fixed.

  • Use of custom options of the native compiler (gcc), e.g. -fpcc_struct_return

    This option was used in the Makefile for the getty package. The gcc man page says that this option provides intercallability with modules (e.g. library modules) compiled with a pcc compiler. We concluded that this was not relevant when compiling for a Linux target platform, since gcc is used to compile the libraries, and we ignored it. Similarly, we ignored, i.e. filtered out in our pseudo-gcc for DRA-NAT/DRA-DRA builds, many other gcc options such as -fomit-frame-pointer, -pipe, -g while we adapted to TenDRA style some others, such as -static (for gcc) to -Wl,-static (for tcc).

4.4. Definition of the API for the commands

We started the experiment with an xpg3 API, and decided to put all other symbols we needed in an extension API. However, after a few compilations of Linux commands, it became clear that most of the symbols we were adding to the extension API were in fact part of some other standard APIs, such as svid3.

So, we redefined our base API to be a merge between the xpg3, svid3, gcc and bsd_extn APIs delivered with TenDRA, limiting the extension API to symbols specific to the Linux commands interface. In fact, some of the symbols in the extension API are defined in the standard cose API, but since this API is very partially supported by Linux, and sometimes conflicts with definitions provided in other APIs, it was not worth including it in the base API.

For the level 2 set of commands, we downloaded some packages using a BSD-like interface, and we tried to include the symbol definitions for these commands in our extension API. However, this appeared to be very difficult, since we found that the Linux implementation of some BSD interfaces redefines symbols from the POSIX API, in an incompatible way. This is reflected in the Linux header files by conditional definitions, selected with the _BSD_SOURCE macro for example, or by replacement header files, such as bsd/signal.h instead of signal.h. The incompatible definitions we found were for the jmp_buf type, the setjmp(), getpgrp(), wait(), waitpid(), wait3() and wait4() functions, and finally the signal() function redefined as bsd_signal().

This problem could have been resolved by removing from our base API the conflicting symbols, and creating a conflict_posix and a conflict_bsd extension APIs with these symbols. The compilation of the commands based on a BSD-like API would have used the conflict_bsd API, in addition to the base and regular extension APIs, and would have been link-edited with the libbsd library provided by Linux. Since this would have taken a lot of time, we preferred not to modify our API and we set aside these commands, unless we found a simple work-around: selecting at build-time, or recoding to, a POSIX adherence for them (refer to next section).

Finally, in order to compile some X11 commands, a separate API including the x5_lib, x5_t, x5_mu, x5_aw andx5_mit standard APIs, has been created. Since Linux is based on X11R6, an extension API has also been created, which includes the few symbols we had to define for the X11 commands we built.

We found one inconsistency between the Linux header files and the standard API provided with TenDRA for the <sys/socket.h> header file, defined in the bsd_ext API: we had to change almost every use of caddr_t to struct sockaddr *. We also found a few inconsistencies between the Linux/i386 and Linux/Alpha header files, which have been resolved by some corrections to the Linux native header files, in the API definition, and in the source code for one command (more).

4.5. ANDFization of the commands

We encountered different kinds of problems when compiling with TenDRA the set of commands on the Linux/i386 platform. Among these problems, only one was related to a bug in the TenDRA technology, the others were either related to ANDF constraints, or to more general portability issues.

4.5.1. Dealing with ANDF constraints

We list below problems we encountered while ANDFizing Linux commands, which are related to the use of the TenDRA technology as a replacement to a classic compiler. We start with the only bug found in TenDRA during this process, then we roughly follow the order in which the various issues were encountered.

  • Redefinition of an API token as a macro

    In the code below, alarm is defined as a macro, but it is also a token in our API:

    #include <unistd.h> /* for alarm() */
    extern int debug() ;
    #define D_RUN 1
    #define alarm(d) alarm(d); debug(D_RUN, "alarm set: %s:%u",\
        __FILE__, __LINE__)
    long xx() { return alarm((long)5); }

    This code is indeed illegal, but tdfc entered an infinite loop. The problem was reported to DRA as loop_tdfc_alarm(276), and has now been fixed.

  • Added missing startup macros

    When the TenDRA compiler (tcc) was used, a number of startup flags, defined with the native compiler (gcc), were missing. The linux, __linux__, unix and __X11_P_HEADERS flags, plus a number of flags defined in the native features.h header file, such as _POSIX_SOURCE, were added in a startup file for tcc.

  • Added missing function prototypes and fixed type mismatches

    We used a tcc option to warn about missing function prototypes, and we fixed them by either including the appropriate header files or adding their prototype for locally defined functions. We added casting on some calls to library functions. Then, every remaining undeclared symbol was added to the extension API. Note than more than half of the changes we made in the source code for Linux commands consisted in adding such prototypes.

  • Resolved one conflict with an API symbol

    The function mkdir(), local to the file mtools, has been renamed to avoid a conflict with the API symbol defined in <sys/stat.h>.

  • Illegal use of target-dependent condition

    In the code below, INT_MAX is a target dependent token, which cannot be used to conditionally define a preprocessor macro:

    	#if (INT_MAX <= 65535)
      	#define longdiff(a, b) /* (definition 1 for the macro) */
      	#else
      	#define longdiff(a, b) /* (definition 2 for the macro) */
      	#endif

    We fixed it by replacing the macro definition by a static functions:

    static int longdiff(time_t a, time_t b) {
    #if (INT_MAX <= 65535)
      	/* ... (definition 1 for the function) */
    #else
      	/* ... (definition 2 for the function) */
    #endif
    }

    This constraint arises from the way ANDF is used to achieve portability between targets which may have different values for INT_MAX. The constraint is that a target-dependent #if is permitted only where a statement is permissible, and both alternatives must be legal statement lists.

    This constraint unfortunately prevents target-dependent macro definitions in the style shown above. DRA is currently considering whether the constraint may be eased in a subsequent version of TenDRA to permit certain well-formed cases such as this.

  • POSIX.1 or SVID interfaces versus BSD interfaces

    Three functions of the bin package, time and crontab, and ash (a simple shell), were configured to use some BSD interfaces which had not been included into our API, as discussed in §4.4.

    For the time command, we found that the support for POSIX interfaces was provided in the source code, so we used it.

    Similarly, prior to building ash, we modified the related configuration file and Makefile, in order to elect svid3-like interfaces instead of the default bsd ones.

    For the crontab command, we fixed the problem by removing in the source code some (simple) calls to the BSD wait4() function, and by using the XPG3 waitpid() function instead.

  • For a few commands which use the curses interface, e.g. bpe, we chose the svid3 variant instead of the BSD one (they are both supported by Linux). Makefiles for building these commands with TenDRA have been changed to use the libncurses library instead of the libcurses library at link-edit time. Note that the sources for the elvis editor (from the bin package) embed a small custom version of the curses interface.

  • For some commands, the initial self-configuration step performed prior to entering the actual build defines the path for another command, because the latter is called by the first one by means of exec() or system(). While Linux provides the <paths.h> header file for this purpose, we found some files which do not include this header file, and others which need to call a command for which there is no path definition in the regular header. An example of such a situation is elm, which calls an editor (e.g. vi). When we detected such situations, we either modified the source code to include and use <paths.h>, or we added a definition inside the alternate <paths.h> specified in our extension API.

  • Pointer/Integer conversion

    The TenDRA compiler can be configured to issue a warning on every pointer/integer conversion. This is done with the pragma instruction:

    #pragma TenDRA conversion analysis (int-pointer) warning

    However, due to the very large number of occurrences of these warnings, we had to cancel this mode, and decided to postpone their analysis until after the validation step.

    For example, we encountered uses of -1 (minus one) to give a special meaning to a pointer value, while only 0 (NULL) is accepted for this purpose. (Note: 64-bit issues are discussed later in this section.)

  • Underspecified type in svid3 API

    We found one command which makes the assumption that the daddr_t type, defined in the svid3 API, is an arithmetic type. The source code casts a daddr_t value into an int, while daddr_t is defined in the API as:

    +TYPE daddr_t;

    We fixed this problem by stating that daddr_t is indeed an arithmetic type, which is correct for both Linux/i386 and Linux/Alpha. We initially modified the reference svid3 API, but later we did it more cleanly, moving the daddr_t definition to our extension API prior to changing it to:

    +TYPE (int) daddr_t;

    Also, prior to fixing this in the API, we found that casting an integer value to a daddr_t type was not rejected by the TenDRA compiler, while it is obviously illegal. A bug report has been sent to DRA, and this has now been fixed.

  • Recoding of a source file dealing with target platform byte ordering issues.

    A source file used a local BYTEORDER macro, set-up during the initial self-configuration step of the build of the command, to support different byte ordering. However, Linux already provides for this purpose a __BYTE_ODER macro, defined in the <bytesex.h> header file. So, we added the __BYTE_ORDER symbol to our API, and replaced all the occurrences of the BYTEORDER macro by references to the __BYTE_ODER API macro. We also had to rewrite some code, because with TenDRA some instructions are illegal after a target dependent condition.

  • The termio and termios interfaces are both provided with Linux, and share the same set of macros to define indexes in the c_cc array from either the termio or termios structures. On Linux/i386, these indexes are the same for the two structures, while on Linux/Alpha they differ. TenDRA provides a way to support different variants of a same object, using version numbers, and this should have solved our problem. However, since we never used this feature before, we did not spend time to see how we could use it in our API. Instead, we made a temporary fix, consisting in renaming the constants in the termio interface. For two of the commands we ported, more and ispell, which use the termio interface, we changed their sources to reference the new macros.

4.5.2. Undocumented dependencies to the OS / the underlying hardware

Some commands are platform dependent, and are not easily (sometimes: not at all) portable from one platform to another. However, the Linux/i386 and Linux/Alpha OS's are very similar; furthermore, some hardware architectures built around the DEC Alpha chip are not much different from the Intel-based PC: for example, our Linux/Alpha platform includes ISA adapters for graphics and Ethernet. In such a favorable situation, the Linux/i386 ifconfig command (from the tcpip/net-tools subset), which displays hardware information on the network interfaces such as their (ISA) “Base address”, could probably have been easily ported to Linux/Alpha (/AXP pci). Also, the perl command, which includes optional support for undocumented system calls, may or may not be portable between two Linux platforms, depending on the system calls they implement.

On the other hand, changing the format for binary files, e.g. switching from a classic Linux a.out format to the Digital Unix “Extended COFF”, may require changes in some common commands such as strings or file. When using a classic compiler, some (or even all in a favorable case) of these changes may be hidden inside the system header files, e.g. <a.out.h>, but the TenDRA compilation chain, when used in DRA-DRA mode, is often more rigid.

4.5.3. Upgrade to TenDRA 4.0

When we had ANDFized the whole set of commands, we upgraded to TenDRA 4.0 in order to work on the Linux/Alpha platform (see §4.2.2). However, we had to ANDFize again the set of commands with the new TenDRA version, since it was not upward compatible with the previous one. In fact, we only re-ANDFized the commands we tried to install on the Dec/Alpha platform, at the time we needed them. We did not encounter any problem when doing this.

4.5.4. Holes in source code portability (64-bit vs 32-bit issues)

During the installation and validation of these commands on the second platform, we found a number of bugs related to portability problems, which are out of the scope of TenDRA. All these bugs were due to code assuming 32-bit platforms, which break on 64-bit platforms. Some of the bugs we found were already fixed in early Linux/Alpha releases, such as the Blade release, others were still there. We fixed the source of the commands, re-andfized them, and installed and validated these commands again on the two platforms. We give below the portability issues we encountered:

  • Wrong int <-> pointer conversion

    On Linux/Alpha (and Digital Unix/Alpha), a pointer type is 8-bytes wide, so it cannot fit in an int type, which is only 4-bytes wide. Fortunately, the long type on Linux/Alpha is, as usual, as large as a pointer, and thus can be used as a replacement for an int, each time an explicit pointer<->integer is used. This is a common type of the portability fixes we had to make in the source code for Linux commands, subsequent to encountering a Linux/Alpha-only problem at validation time.

  • Incorrect assumptions on sizes of int, long and size_t

    Although in many cases int and long types are equivalent, we give below three examples of code we found where it makes a difference:

    /* #1 */ { int i; printf("i value is: %ld\n", i); }
    /* #2 */ extern char *malloc(int);
    /* #3 */ { long l ; printf("%08lx", l); }

    All three work perfectly on Linux/i386, while they cause, or could cause, damage under Linux/Alpha.

    In the first two lines the function, printf or malloc, will read respectively a long on the stack, which are 8 bytes wide, while only 4 bytes for an int would have been pushed. Note that the correct prototype for malloc is char *malloc(size_t), and that size_t is equivalent to long on Linux/Alpha. We fixed the error on such printf statements with a cast to long for an argument, and the error on malloc by replacing its local-and-wrong declaration by the inclusion of the <stdlib.h> header file.

    In the third case, the instruction was used to print a fixed number of digits. However, a long, 8-bytes on an Alpha platform, may hold a value that prints up to 16 digits, thus putting unexpected digits in the output. In the code where we found the problem, the fix was to truncate the value to a 4-byte value.

4.6. Installation of the API for the commands

4.6.1. Installation of the API on Linux/i386

The API is made of two parts, a base API, which is a merge between xpg3, svid3, gcc and bsd_extn APIs, and an extension API, which completes the interface required for the commands (see §4.4).

The base API was installed on the Linux/i386 platform, without any problem. However, we left some tokens undefined when some parts of the API were not part of the actual Linux API. Then, the extension API was installed, as we extended it to cover more and more commands. These installations required a few patches to the system header files, some of which were already provided with the TenDRA snapshot. These installations went very well, with only a few problems, listed in a following paragraph. Then, when we moved to a new TenDRA snapshot to the Linux/Alpha platform, we re-installed the API, without any problem.

We made a small number of modifications to the API during the port of the commands on the Linux/Alpha platform. However, each modification we made required re-installation of some parts of the API.

4.6.2. Installation of the API on Linux/Alpha

As for the Linux/i386 platform, we had to apply some patches to the system header files in order to install the API. A number of these patches were actually identical to the patches we made on the Linux/i386 header files. So, instead of copying and editing, by hand, these files again, we chose to implement these patches by means of sed scripts, which could be applied to both Linux/i386 and Linux/Alpha header files. Most of these scripts are now common to the two platforms, although a few of them are specific to one. These sed scripts not only facilitate corrections to the system header files, but would also be useful if we need to upgrade from one Linux version to another.

The Linux/Alpha system header files do not differ much from the Linux/i386 ones. However, since the Linux/Alpha port was derived from a more recent Linux/i386 version (Linux 1.3) than the one we used (Linux 1.1), we could not clearly distinguish between the changes which come from standard Linux evolutions and those which have been introduced during the port of Linux to Digital Alpha. One important modification was that some definitions found on Linux/i386 in some <linux/*> or <sys/*> header files have now been moved into <asm/*> header files. We had to take such changes into account when adapting the Linux/i386 modifications to Linux/Alpha.

Finally, we did not port to Linux/Alpha the extension API to the TenDRA x5/* APIs. This would have required installation of X11 on our Linux/Alpha box, which consists of 28 additional floppy images! This part of the API was only required for 9 commands from our set of commands, which we did not install on Linux/Alpha.

4.6.3. API installation problems

We list below the problems we found when building the API on the Linux/i386 or the Linux/Alpha platforms, and the solution we adopted.

  • A macro, makedev, added to our extension API, was defined in the Linux/i386sys/sysmacros.h header file. This file contains the lines:

    #define major(dev) ...
    #define minor(dev) ...
    #define makedev(major,minor) ...

    The identifiers major and minor used to name the formal parameters of the makedev macro are also the name of two macros defined in this header file, which we included in our API. The clash on the names is reported as an error by tdfc when building the API. We did not check whether this was a TenDRA bug or not, but we bypassed the problem by using an alternate version of sys/sysmacros.h, in which the formal parameters of the makedev macro have been renamed.

  • The <bytesex.h> header file contained the following lines:

    #undef __BYTE_ORDER
    #define __BYTE_ORDER 1234

    The #undef line prevents tcc from finding the definition of the __BYTE_ORDER constant. This is a constraint that applies only when building API token libraries. It is a necessary consequence of using C macro definitions to obtain ANDF token definitions. We bypassed this behaviour by commenting out the #undef line in a replacement header file.

  • The Linux/Alpha header files do not follow the standard APIs in some cases. For example, the Linux/Alpha <sys/stat.h> header file defines the field st_dev from the stat structure as unsigned int instead of dev_t as defined in XPG/3. Since dev_t is equivalent to unsigned int on Linux/Alpha, we were able to modify the system header file to use the correct type.

  • The Linux/Alpha header files sometimes use an int type, in places where a long type is used on Linux/i386. In such cases, we decided to patch the Linux/i386 header files to use an int type, since long and int are equivalent on a 32-bit platform.

  • The reverse situation, where Linux/i386 uses an int and Linux/Alpha uses a long, has also been found. In this case, we preferred to modify the API to accept both types, using the tspec +TYPE (int) ... notation.

  • On the Linux/i386 platform, we extended the API with some symbols from the <termio.h> header file. The <termio.h> system header file has changed between Linux/i386 and Linux/Alpha, and some definition were incompatible with the extension API. We eventually found a solution, which involved a fix to the Linux header files.

  • The Linux system supports two variants of the curses library, one defined by the <curses.h> header file for a BSD API, and the other defined by the <ncurses.h> header file for a svid3 API. We used the latter to build the API, since we do not support the BSD API.

  • The TenDRA svid3 API defines the constant RLIM_INFINITY, from the sys/resource.h header file, as follow:

    +CONST int RLIM_INFINITY;

    However, this constant is used to assign variables of type rlimit_t, which is, on Dec/Alpha, defined as a long, thus 8-bytes wide. So the problem was: while this constant was defined with the value 0x7fffffffffffffffL, it was actually truncated to fit within a (32-bit) int. We fixed this bug by replacing the definition of RLIM_INFINITY by:

    +CONST rlimit_t RLIM_INFINITY;

4.7. Installation and validation of the commands

On the Linux/i386 platform, we used the TenDRA compiler to produce the ANDF files, and then translate them into binary executable files, in the same invocation of the compiler. A tcc option was used to preserve the intermediate ANDF files.

On the Linux/Alpha platform, we used the ANDF files produced on the Linux/i386 platform, and translated them into binary executable files. For this platform, we ran the TenDRA compiler on a Dec/Alpha platform as a cross-compiler for the Linux/Alpha platform (see §4.2.2).

In order to validate the commands we built, we used several different methods, depending on the commands.

We found that a very limited number of commands were packaged with some rather extensive self-validation tests, that we used to validate such commands.

We tested some other commands interactively (e.g. elvis, bpe, ispell, elm, ...). However, for most commands we had to write small tests. Even a basic test requires several shell script lines: it took us several weeks to write tests for >200 commands then to run them.

Finally, a small number of commands, actually 10, was not tested: getty/uugetty, sudo.bin/visudo, readprofile, swapon, mt-GNU, rmt, plipconfig and slattach. As none of these commands were actually installed on the 2nd platform, there is no real penalty.

On the Linux/i386 platform, 236 commands were installed, then validated. Conversely, on Linux/Alpha, we installed and validated only a subset of these commands: about 150, as previously mentioned. While we found only a few problems during the validation on Linux/i386, we faced a number of validation failures on Linux/Alpha, thus requiring much more investigation. Some of these problems were due to bugs in the TenDRA technology, and have all been fixed by now. Most of the others have already been discussed in previous sections.

We list below miscellaneous problems, encountered at validation time on either the 1st or the 2nd platforms, which are not really related to TenDRA, nor do they depend on portability issues in the original source code.

4.7.1. Miscellaneous problems encountered at validation

  • For two commands, sln and stty, we found different behaviour due to an undetected missing symbol in our API. For example, the sln command is written such that, depending on weather the S_ISLINK symbol is defined or not, it generates a symbolic link or a hard link. The missing symbol was defined in our API to fix the problem.

  • The more command on Linux/Alpha, of which the source code uses the non-POSIX termio interface, waits for 4 input characters before processing them. The reason is that in Linux/Alpha (as in Digital Unix) some of the termio.c_cc[] control characters overlap with some others, depending on the input mode being used (i.e. either the “canonical” mode or the “half-cooked” mode). So such control characters, which are the ones at VMIN and VTIME indexes, must always be updated when switching from one input mode to the other: we changed the source code for the more command accordingly.

  • We found two bugs in the Linux/Alpha libc library, in the getegid and times functions. The 1st one was already known and fixed in a later release, while the 2nd one was not (we received a fix for it a few days later, from a member of the Linux/Alpha project).

  • The csplit command, from the txtutils package, defines the memchr() function, also defined in the libc library. The validation of this command failed on Alpha, until we removed this local recoding for memchr() and used the regular library entry point instead.

  • The environment on Linux/Alpha for using the zic command was not correct, since that command is not available in the Linux/Alpha BLADE_0.3 distribution.

  • The environment on Linux/Alpha for running the bpe command was not correct, since we had used the (available) ncurses library when link-editing this command: at run-time the related terminfo data base was lacking. We simply had to replicate on Linux/Alpha the Linux/i386 terminfo files to cure this problem.

4.7.2. Recent upgrades of our original source code for Linux commands

We initially found on the net a limited number of patches for the commands in source form; such patches had been made during the port of Linux to Digital Alpha. For example, the following patch for the source code for the col command was found on ftp.azstarnet.com, but it is not a example of interest since we had preventively fixed it, i.e. during the initial ANDFization of the col command:

util-linux-2.5/text-utils/col.c:

+ #include <malloc.h>

Conversely, neither the BLADE_0.3 nor the BLADE_0.2 distributions include source code for commands; we discovered very recently that such code was available (on ftp.digital.com) for the very first, 32-bit Linux/Alpha distribution. Nevertheless, for three commands we used source code from the RedHat 2.1/beta and 2.1 Linux/Alpha distributions. This allowed us to fix incorrect behavior of the zic command; we also experimented without success (due to lack of time) some partial upgrades of the source code for zdump and cpio.