1. sid's Organisation
- 1.1. The
main
Function - 1.2. Adding a New Output Language:
main_language_list
- 1.3. Code Organisation and Conventions
When you call sid, the main operations it performs are
-
Reads the grammar .sid file and stores its internal representation.
-
Reads the grammar output language specific .act file and complete the representation of the grammars with the action code. (After this step, sid only works on the internal representation.)
-
Transforms and Optimises the Grammar. Most notably, it removes left recursion and tries to transform the context free grammar provided in an equivalent LL(1) grammar.
-
Outputs the parser.
1.1. The main
Function
Let's look at some parts of the code in main.c. In main
, you can see HANDLE
and WITH
: these are macros that emulate an exception mechanism in C. These come from libexds.
The function main
itself doesn't do much: it initializes some structures, calls main_init
to process the command line options then calls main_1
to do all the interesting work. Now look at the function main_1
, it calls in order (forgetting about all the initialisation stuff and the error handling stuff)
sid_parse_grammar()
-
parses the .sid file and converts it into an internal representation.
grammar_check_complete(&grammar)
-
verifies that all rules are accessible.
(*(main_language->input_proc))(output_closure, &grammar)
-
parses the action file and completes the internal representation of the grammar.
grammar_remove_left_recursion(&grammar)
-
TODO
grammar_compute_first_set(&grammar)
-
computes the first set of each rule in the grammar.
grammar_factor(&grammar)
-
TODO
grammar_simplify(&grammar)
-
TODO
grammar_compute_inlining(&grammar)
-
TODO
grammar_check_collisions(&grammar)
-
TODO
grammar_recompute_alt_names(&grammar)
-
TODO
(*(main_language->output_proc))(output_closure, &grammar)
-
outputs the parser in the chosen language.
You may wonder what main_language
is. We explain it in the next section.
1.2. Adding a New Output Language: main_language_list
The global variable main_language
allows us to easily modify the output language. It's a pointer to a structure called LangListT
. This pointer will always point to an element in the table main_language_list
, which contains callbacks for the various stages of processing.
The first member indicates the option name. The second one is a pointer to the initialisation function. The third one contains a pointer to the top input routine for the action file. The fourth one is an integer indicating the number of input file (2 for outputting C, 1 for test). Then we have the top output language specific output function and finally the number of outputted file. Don't remove the last line: it serves as a guard. To add a new output language, add a line to main_language_list
and implement the new top level functions.
1.3. Code Organisation and Conventions
If you read this guide, it is probably because you want to modify sid. In this section, we say how sid is organised and how one should modify the code to keep the code readable.
sid defines many types for the internal representation of a grammar. These types are defined in the header files, begins with a majuscule and ends with T, e.g. RuleT
. If a type, MytypeT
is declared in myfile.h, then any function that directly touches the members of MytypeT
begins with mytype_
and is defined in file.c. No other function should touch MytypeT
directly. If you want to access an object of a certain type, do not access its members directly. Instead, use the interface declared in the header (the same header where the type is declared).