# The G language

The G language is conceived to be logically as similar as possible to
a reasonable subset of C, while at the same time being easy to parse
and removed of features that are unessential to bootstrapping.

In G all variables have the same type, which is analogous to the `int`
type in C (although signedness is not defined: it is defined by
operators) and corresponds to the native integer type available on the
machine. It is used both as integer and as pointer. Since there is
only one type, function signatures are entirely defined by the number
of parameters they accept.

Each G file must only contains printable and whitespace ASCII
characters. They are assembled into tokens, each of which is either a
collection of non-whitespace characters bounded by whitespace
characters, or a string of characters bounded by `"`
guards. Whitespace characters (except those in strings) are irrelevant
and are dropped after tokenization. Comments are introduced by a `#`
(except those in strings) and continue to the end of the line.

## Top level structure

In a G source code, the following constructions are available.

 * **Constant definition**

        const NAME VALUE

   declares a compile-time constant with name `NAME` and value `VALUE`
   (which can be a previously defined constant, a decimal expression
   or a hexadecimal expression). Constants are not materialized into
   variables, but have the same type, so cannot contain values out of
   the native integer type bounds.

   Examples:

        const DOZEN 12
        const BYTE_MAX 0xff

 * **Global variable definition**

        $NAME

   defines a global variable called `NAME`, which is initialized to
   zero. There must be no space between the dollar sign and the name
   (i.e., they have to be in the same token).

   Example:

        $global_var

 * **Function declaration**

        ifun NAME PARAM_NUM

   declares a function named `NAME` with `PARAM_NUM` parameters
   without defining it. I will have to be matched by a later (or also
   earlier, although it is useless in this case) function definition
   (introduced with `fun`). The same function can be declared as many
   times as one wish, as long as all declarations have the same number
   of parameters (which must also be equal to the number of parameters
   in the function definition).

   Examples:

        ifun setup 0
        ifun sum_two_numbers 2

 * **Function definition**

        fun NAME PARAM_NUM { BODY }

   defines a function names `NAME` with `PARAM_NUM` parameters. `BODY`
   must be the code to be executed when the function is called (see
   "Function body" below).

## Function body

The body of a function is a sequence of commands that are to be
executed by a stack machine. At the beginning of the function the
stack is empty, but there is not obligation to leave it empty at the
end. In general each command pops a few (possibly zero) operands, does
some computation with them and pushes the result. All stack elements
have the same type as any variable.

Multiplexed with stack commands, other commands are available to
manage execution flow or introduce local variables. In general, all
non-stack commands require the stack to be empty at their point of
execution.

In particular, the following stack commands are supported.

 * **Push a constant value** Writing an integral value (either in
   decimal or hexadecimal form) causes it to be pushed on the stack.

 * **Push variable value** Writing a variable name (either local or
   global) causes its current value to be pushed on the stack.

 * **Call a function by name** Writing a function name causes a number
   of arguments equal to the number of parameters the function accept
   to be popped from the stack. The function is then called with such
   parameters passed to it, and its return value is then pushed on the
   stack. All functions always return a value, there is no concept of
   a function returning `void`. If this value has no meaning, it is up
   to the caller to ignore it.

 * **Push variable address** Writing a variable or function name
   prepended by a `@` sign (without spaces: they have to be in the
   same token) causes its address to be pushed on the stack.

 * **Call a function by address** Writing a backslash `\` followed
   (without spaces) by a number (or a compile-time constant) causes a
   value to be popped from the stack and interpreted as the address of
   a function taking the number of parameters specified after the
   backslash. The function call procedure is then followed as above
   (the appropriate number of arguments is popped from the stack, the
   function is called and its return value is pushed).

 * **Flush the stack** Writing a single `;` character causes the stack
   to be flushed (i.e., all values are popped and discarded). This is
   required before using non-stack commands, like execution flow
   control and variable introduction commands.

 * **Return from function** Writing a token `ret` causes the function
   to immediately return. If the stack is not empty, its top value is
   the function return value. If the stack is empty, the return value
   is unspecified.

The following non-stack commands are supported. When they are
executed, the stack must be empty (and, correspondingly, they leave it
empty).

 * **Code block** At any point a scoped block can be begun with the
   command `{` and closed with the corresponding `}`. All variables
   definitions inside the block expire when the block is closed. The
   stack does not need to be empty at the end of the block, but if it
   is not, it is flushed.

 * **Local variable definition** Writing `$NAME` a new variable with
   name `NAME` is introduced. Its initial value is unspecified
   (differently from global variables).

 * **Conditional block** The `if` token introduces a conditional
   block. It must be followed by one or more stack commands (except
   `ret`) and then by a code block, defined as above. When the
   conditional block is executed, first the guard stack commands are
   evaluated and must leave exactly one value in the stack: if such
   value is zero, control is directly trasfered to the end of the
   block. If not, the block is executed normally. Differently from C,
   the `if` command must guard a block; it cannot guard a single
   expression. Optionally, the token `else` followed by another block
   can appear. That block is executed if the initial guard evaluates
   to zero and skipped if it does not.

 * **Repeated block** Then `while` token introduces a repeated
   block. Its syntax is identical to the conditional block, except for
   the usage of `while` instead of `if`. Its semantic is also
   identical, except that at the end of the block execution the guard
   expression is evaluated again, and if it still non-zero, the block
   is executed again. The `else` block cannot appear in this case.

   G does not directly support `for` blocks and `continue`, `break`
   and `goto` statements. If needed, they have to be emulated with
   appropriate flags.

The function's formal parameters are not directly available as named
variables. However, they can be retrieved with the `param` predefined
function, described below.

## Predefined functions

The following functions are always available in a G program, without
having to be manually defined.

 * `=` (2 arguments) assigns the latest pushed argument to the
   variable at the address specified by the earliest pushed
   argument. It is important to notice that this is not completely
   equivalent to the C assignement operator, because G has no
   equivalent for the C lvalue concept. Thus `a b =` is rather
   equivalent to C's `*(int*)a = b`. If you want to assign the value
   of `b` to `a`, then you need to write `@a b =` in G, which becomes
   equivalent to C's `*(int*)&a = b`.

 * `param` (1 argument) returns the `n`-th formal parameter to the
   enclosing function, where `n` is the value passed to `param`. If
   `n` is larger or equal then the number of parameters, the behaviour
   is unspecified. The zeroth parameter is the one that was pushed
   *last* on the stack before calling the enclosing
   function. Therefore in the following snippet:

        fun test 2 {
          $a $b
          @a 0 param = ;
          @b 1 param = ;
        }

        fun test2 0 {
          0 1 test ;
        }

   `a` will have value `1` and `b` will have value `0` inside `test`.

 * `+` (2 arguments) returns the sum of its two arguments.

 * `-` (2 arguments) returns the difference of its two arguments (the
   earliest pushed minus the latest pushed).

 * TODO

## Common coding suggestions

This section is not normative, but G programmers are encouraged to
follow it so that G programs remain as readable as possible.

 * The `;` command clearly bears a similarity with the C semicolon,
   which is used to close statements. In G there are no statements, so
   there is no need to close them; as a result, in general `;`
   commands are only really required when you need to call a non-stack
   command. However, it is suggested to still use them in the C way,
   to improve readability and to drop the stack as soon as it is not
   needed anymore (this also prevents leftover stack elements to be
   used in a successive unrelated expression).

   For example, the following code assignes `0` to the variable `a`
   and `1` the the varible `b`:

        @a 0 = ;
        @b 1 = ;

   The first semicolon (and possibily the second one too, depending on
   what comes later) can be removed without changing the program
   behaviour:

        @a 0 =
        @b 1 = ;

   In this second case, the second assignement is executed while still
   having the value of `a` at the bottom of the stack. Thus the
   program is still correct, but for the reasons above the first
   snippet is encouraged. Incidentally, newlines are irrelevant too:
   the same code could have written, with or without the semicolon, on
   one line:

        @a 0 = @b 1 = ;

   And, of course, this is even less encouraged.

 * The usage of the `param` function might be a bit non-obvious,
   because it causes parameters to materialize inside the function in
   a way that might appears to be the "opposite" of what might seem
   sensible. The suggested idiomatic way to use `param` in thus the
   following: suppose that you want to define a function analogous to
   the C declaration

        int func(int a, int b, int c);

   Then it is suggested to define it in G in this way:

        fun func 3 {
          $a
          $b
          $c
          @a param 2 = ;  # Notice param arguments are in
          @b param 1 = ;  # decreasing order
          @c param 0 = ;

          # Do not use param anymore in function body; just use a, b
          # and c
        }

   and call it by pushing arguments on the stack in the same order
   they appear in the C declaration:

        fun func2 0 {
          0 1 2 func ;
        }

   `a` will take value `0`, `b` will take value `1` and `c` will take
   value `2` inside `func`.

   Also, the reference G compiler will produce for `func` machine code
   that is ABI-compatible with the C declaration for `func` if the C
   compiler uses `cdecl` calling conventions, which permits easy
   interaction between C and G in later stages of `asmc`.

 * G's very simple type system, while allowing a very simple syntax
   and compiler, completely leaves the burden of organizing structured
   data types on the programmer. Fortunately the task is not that
   difficult with a little bit of code organization (which is, in the
   end, not very different from what happens in a C program, except
   that you do not have the syntactic sugar coating). Suppose that you
   need a structure like this one in C:

        typedef struct {
          int first;
          int second;
          int third;
        } MyStruct;

   You can use the following code in G:

        const MYSTRUCT_FIRST 0
        const MYSTRUCT_SECOND 4
        const MYSTRUCT_THIRD 8
        const SIZEOF_MYSTRUCT 12

   Then, using `ptr` to denote a pointer to this structure, the
   following C code:

        MyStruct *ptr;
        ptr = malloc(sizeof(MyStruct));
        ptr->first = 0;
        ptr->second = ptr->third;
        free(ptr);

   is roughly equivalent to this G code:

        $ptr
        @ptr SIZEOF_MYSTRUCT malloc = ;
        ptr MYSTRUCT_FIRST take_addr 0 = ;
        ptr MYSTRUCT_SECOND take_addr ptr MYSTRUCT_THIRD take = ;
        ptr free ;

   The library routines `take` and `take_addr` are defined in
   `utils.g` and do the right thing here (`take_addr` is actually
   completely equivalent to `+` and `take` is just `+` followed by
   dereferencing; it is useful to give them different names to remark
   their meaning).

   The G syntax is a bit more verbose and requires some care in
   maintaining the offset tables for all structures (be careful not to
   get confused between multiples of 4 and feel free to use
   hexadecimal if it makes things easier for you), but all in all if
   you know how to do things in C, converting to G is rather
   straightforward.

## Examples

Let us discuss a few simple G programs and provide their C equivalents
to better illustrate them.

    fun sum_two_numbers 2 {            int sum_two_numbers(int p1, int p0) {
      $x                                 int x;
      $y                                 int y;
      @x 1 param = ;                     x = p1;
      @y 0 param = ;                     y = p2;

      $sum                               int sum;
      @sum x y + = ;                     sum = x+y;
      sum ret ;                          return sum;
    }                                  }

Incidentally, `sum_two_numbers` does exactly the same thing as the
built-in function `+`, but it was an easy starting example.

    const FROM 20                      #define FROM 20
    const TO 0x64                      #define TO 0x64

    ifun sum_number 2                  int sum_numbers(int, int);

    fun main 0 {                       int main(void) {
      "The sum of numbers from "         platform_log("The sum of numbers from ", 1);
        1 platform_log ;
      FROM itoa 1 platform_log ;         platform_log(itoa(FROM), 1);
      " to " 1 platform_log ;            platform_log(" to ", 1);
      TO itoa 1 platform_log ;           platform_log(itoa(TO), 1);
      " is " 1 platform_log ;            platform_log(" is ", 1);
      FROM TO sum_numbers itoa           platform_log(itoa(sum_numbers(FROM, TO)), 1);
        1 platform_log ;
      "\n" 1 platform_log ;              platform_log("\n", 1);
    }                                  }

    # Return the sum of numbers        // Return the sum of numbers
    # in an interval                   // in an interval
    fun sum_numbers 2 {                int sum_numbers(int p1, int p0) {
      $from                              int from;
      $to                                int to;
      @from 1 param = ;                  from = p1;
      @to 0 param = ;                    to = p0;

      $i                                 int i;
      $sum                               int sum;
      @i from = ;                        i = from;
      @sum 0 = ;                         sum = 0;
      while i to <= {                    while i <= to {
        @sum sum i + = ;                   sum = sum + i;
        @i i 1 + = ;                       i = i + 1;
      }                                  }

      sum ret ;                          return sum;
    }                                  }

Of course C has quicker expressions like `+=` and `++`, but I did not
use them in this example to better explain the analogy with G. The
function `itoa` returns a number formatted as a decimal string, while
the function `platform_log` dump a string to the console.

    # Function is supposed to be       // Function is supposed to be
    # defined elsewhere                // defined elsewhere
    ifun process 1                     int process(int);

    fun long_operation 2 {             int long_operation(int p1, int p0) {
      $input                             int input;
      $callback                          int callback;
      @input 1 param = ;                 input = p1;
      @callback 0 param = ;              callback = p0;

      # Do a very long operation         // Do a very long operation
      $result                            int result;
      @result input process = ;          result = process(input);

      # Call back at the end             // Call back at the end
      result callback \1 ;               (((*)(int))callback)(result);
    }                                  }

    fun end_callback 1 {               int end_callback(int p0) {
      $result                            int result;
      @result 0 param = ;                result = p0;

      "Result is " 1 platform_log ;      platform_log("Result is ", 1);
      result itoa 1 platform_log ;       platform_log(itoa(result), 1);
    }                                  }

    fun start_operation 0 {            int start_operation(void) {
      $input                             int input;

      input @end_callback                long_operation(input, (int)&end_callback);
        long_operation;
    }                                  }

The snippet above shows the use of the backslash operator to do
indirect function call.