G_LANGUAGE.md 17 KB

The G language

The G language is conceived to be logically as similar as possible to a reasonable subset of C, while at the same time being easy to parse and removed of features that are unessential to bootstrapping.

In G all variables have the same type, which is analogous to the int type in C (although signedness is not defined: it is defined by operators) and corresponds to the native integer type available on the machine. It is used both as integer and as pointer. Since there is only one type, function signatures are entirely defined by the number of parameters they accept.

Each G file must only contains printable and whitespace ASCII characters. They are assembled into tokens, each of which is either a collection of non-whitespace characters bounded by whitespace characters, or a string of characters bounded by " guards. Whitespace characters (except those in strings) are irrelevant and are dropped after tokenization. Comments are introduced by a # (except those in strings) and continue to the end of the line.

Top level structure

In a G source code, the following constructions are available.

  • Constant definition

    const NAME VALUE
    

declares a compile-time constant with name NAME and value VALUE (which can be a previously defined constant, a decimal expression or a hexadecimal expression). Constants are not materialized into variables, but have the same type, so cannot contain values out of the native integer type bounds.

Examples:

    const DOZEN 12
    const BYTE_MAX 0xff
  • Global variable definition

    $NAME
    

defines a global variable called NAME, which is initialized to zero. There must be no space between the dollar sign and the name (i.e., they have to be in the same token).

Example:

    $global_var
  • Function declaration

    ifun NAME PARAM_NUM
    

declares a function named NAME with PARAM_NUM parameters without defining it. I will have to be matched by a later (or also earlier, although it is useless in this case) function definition (introduced with fun). The same function can be declared as many times as one wish, as long as all declarations have the same number of parameters (which must also be equal to the number of parameters in the function definition).

Examples:

    ifun setup 0
    ifun sum_two_numbers 2
  • Function definition

    fun NAME PARAM_NUM { BODY }
    

defines a function names NAME with PARAM_NUM parameters. BODY must be the code to be executed when the function is called (see "Function body" below).

Function body

The body of a function is a sequence of commands that are to be executed by a stack machine. At the beginning of the function the stack is empty, but there is not obligation to leave it empty at the end. In general each command pops a few (possibly zero) operands, does some computation with them and pushes the result. All stack elements have the same type as any variable.

Multiplexed with stack commands, other commands are available to manage execution flow or introduce local variables. In general, all non-stack commands require the stack to be empty at their point of execution.

In particular, the following stack commands are supported.

  • Push a constant value Writing an integral value (either in decimal or hexadecimal form) causes it to be pushed on the stack.

  • Push variable value Writing a variable name (either local or global) causes its current value to be pushed on the stack.

  • Call a function by name Writing a function name causes a number of arguments equal to the number of parameters the function accept to be popped from the stack. The function is then called with such parameters passed to it, and its return value is then pushed on the stack. All functions always return a value, there is no concept of a function returning void. If this value has no meaning, it is up to the caller to ignore it.

  • Push variable address Writing a variable or function name prepended by a @ sign (without spaces: they have to be in the same token) causes its address to be pushed on the stack.

  • Call a function by address Writing a backslash \ followed (without spaces) by a number (or a compile-time constant) causes a value to be popped from the stack and interpreted as the address of a function taking the number of parameters specified after the backslash. The function call procedure is then followed as above (the appropriate number of arguments is popped from the stack, the function is called and its return value is pushed).

  • Flush the stack Writing a single ; character causes the stack to be flushed (i.e., all values are popped and discarded). This is required before using non-stack commands, like execution flow control and variable introduction commands.

  • Return from function Writing a token ret causes the function to immediately return. If the stack is not empty, its top value is the function return value. If the stack is empty, the return value is unspecified.

The following non-stack commands are supported. When they are executed, the stack must be empty (and, correspondingly, they leave it empty).

  • Code block At any point a scoped block can be begun with the command { and closed with the corresponding }. All variables definitions inside the block expire when the block is closed. The stack does not need to be empty at the end of the block, but if it is not, it is flushed.

  • Local variable definition Writing $NAME a new variable with name NAME is introduced. Its initial value is unspecified (differently from global variables).

  • Conditional block The if token introduces a conditional block. It must be followed by one or more stack commands (except ret) and then by a code block, defined as above. When the conditional block is executed, first the guard stack commands are evaluated and must leave exactly one value in the stack: if such value is zero, control is directly trasfered to the end of the block. If not, the block is executed normally. Differently from C, the if command must guard a block; it cannot guard a single expression. Optionally, the token else followed by another block can appear. That block is executed if the initial guard evaluates to zero and skipped if it does not.

  • Repeated block Then while token introduces a repeated block. Its syntax is identical to the conditional block, except for the usage of while instead of if. Its semantic is also identical, except that at the end of the block execution the guard expression is evaluated again, and if it still non-zero, the block is executed again. The else block cannot appear in this case.

G does not directly support for blocks and continue, break and goto statements. If needed, they have to be emulated with appropriate flags.

The function's formal parameters are not directly available as named variables. However, they can be retrieved with the param predefined function, described below.

Predefined functions

The following functions are always available in a G program, without having to be manually defined.

  • = (2 arguments) assigns the latest pushed argument to the variable at the address specified by the earliest pushed argument. It is important to notice that this is not completely equivalent to the C assignement operator, because G has no equivalent for the C lvalue concept. Thus a b = is rather equivalent to C's *(int*)a = b. If you want to assign the value of b to a, then you need to write @a b = in G, which becomes equivalent to C's *(int*)&a = b.

  • param (1 argument) returns the n-th formal parameter to the enclosing function, where n is the value passed to param. If n is larger or equal then the number of parameters, the behaviour is unspecified. The zeroth parameter is the one that was pushed last on the stack before calling the enclosing function. Therefore in the following snippet:

    fun test 2 {
      $a $b
      @a 0 param = ;
      @b 1 param = ;
    }
    
    fun test2 0 {
      0 1 test ;
    }
    

a will have value 1 and b will have value 0 inside test.

  • + (2 arguments) returns the sum of its two arguments.

  • - (2 arguments) returns the difference of its two arguments (the earliest pushed minus the latest pushed).

  • TODO

Common coding suggestions

This section is not normative, but G programmers are encouraged to follow it so that G programs remain as readable as possible.

  • The ; command clearly bears a similarity with the C semicolon, which is used to close statements. In G there are no statements, so there is no need to close them; as a result, in general ; commands are only really required when you need to call a non-stack command. However, it is suggested to still use them in the C way, to improve readability and to drop the stack as soon as it is not needed anymore (this also prevents leftover stack elements to be used in a successive unrelated expression).

For example, the following code assignes 0 to the variable a and 1 the the varible b:

    @a 0 = ;
    @b 1 = ;

The first semicolon (and possibily the second one too, depending on what comes later) can be removed without changing the program behaviour:

    @a 0 =
    @b 1 = ;

In this second case, the second assignement is executed while still having the value of a at the bottom of the stack. Thus the program is still correct, but for the reasons above the first snippet is encouraged. Incidentally, newlines are irrelevant too: the same code could have written, with or without the semicolon, on one line:

    @a 0 = @b 1 = ;

And, of course, this is even less encouraged.

  • The usage of the param function might be a bit non-obvious, because it causes parameters to materialize inside the function in a way that might appears to be the "opposite" of what might seem sensible. The suggested idiomatic way to use param in thus the following: suppose that you want to define a function analogous to the C declaration

    int func(int a, int b, int c);
    

Then it is suggested to define it in G in this way:

    fun func 3 {
      $a
      $b
      $c
      @a param 2 = ;  # Notice param arguments are in
      @b param 1 = ;  # decreasing order
      @c param 0 = ;

      # Do not use param anymore in function body; just use a, b
      # and c
    }

and call it by pushing arguments on the stack in the same order they appear in the C declaration:

    fun func2 0 {
      0 1 2 func ;
    }

a will take value 0, b will take value 1 and c will take value 2 inside func.

Also, the reference G compiler will produce for func machine code that is ABI-compatible with the C declaration for func if the C compiler uses cdecl calling conventions, which permits easy interaction between C and G in later stages of asmc.

  • G's very simple type system, while allowing a very simple syntax and compiler, completely leaves the burden of organizing structured data types on the programmer. Fortunately the task is not that difficult with a little bit of code organization (which is, in the end, not very different from what happens in a C program, except that you do not have the syntactic sugar coating). Suppose that you need a structure like this one in C:

    typedef struct {
      int first;
      int second;
      int third;
    } MyStruct;
    

You can use the following code in G:

    const MYSTRUCT_FIRST 0
    const MYSTRUCT_SECOND 4
    const MYSTRUCT_THIRD 8
    const SIZEOF_MYSTRUCT 12

Then, using ptr to denote a pointer to this structure, the following C code:

    MyStruct *ptr;
    ptr = malloc(sizeof(MyStruct));
    ptr->first = 0;
    ptr->second = ptr->third;
    free(ptr);

is roughly equivalent to this G code:

    $ptr
    @ptr SIZEOF_MYSTRUCT malloc = ;
    ptr MYSTRUCT_FIRST take_addr 0 = ;
    ptr MYSTRUCT_SECOND take_addr ptr MYSTRUCT_THIRD take = ;
    ptr free ;

The library routines take and take_addr are defined in utils.g and do the right thing here (take_addr is actually completely equivalent to + and take is just + followed by dereferencing; it is useful to give them different names to remark their meaning).

The G syntax is a bit more verbose and requires some care in maintaining the offset tables for all structures (be careful not to get confused between multiples of 4 and feel free to use hexadecimal if it makes things easier for you), but all in all if you know how to do things in C, converting to G is rather straightforward.

Examples

Let us discuss a few simple G programs and provide their C equivalents to better illustrate them.

fun sum_two_numbers 2 {            int sum_two_numbers(int p1, int p0) {
  $x                                 int x;
  $y                                 int y;
  @x 1 param = ;                     x = p1;
  @y 0 param = ;                     y = p2;

  $sum                               int sum;
  @sum x y + = ;                     sum = x+y;
  sum ret ;                          return sum;
}                                  }

Incidentally, sum_two_numbers does exactly the same thing as the built-in function +, but it was an easy starting example.

const FROM 20                      #define FROM 20
const TO 0x64                      #define TO 0x64

ifun sum_number 2                  int sum_numbers(int, int);

fun main 0 {                       int main(void) {
  "The sum of numbers from "         platform_log("The sum of numbers from ", 1);
    1 platform_log ;
  FROM itoa 1 platform_log ;         platform_log(itoa(FROM), 1);
  " to " 1 platform_log ;            platform_log(" to ", 1);
  TO itoa 1 platform_log ;           platform_log(itoa(TO), 1);
  " is " 1 platform_log ;            platform_log(" is ", 1);
  FROM TO sum_numbers itoa           platform_log(itoa(sum_numbers(FROM, TO)), 1);
    1 platform_log ;
  "\n" 1 platform_log ;              platform_log("\n", 1);
}                                  }

# Return the sum of numbers        // Return the sum of numbers
# in an interval                   // in an interval
fun sum_numbers 2 {                int sum_numbers(int p1, int p0) {
  $from                              int from;
  $to                                int to;
  @from 1 param = ;                  from = p1;
  @to 0 param = ;                    to = p0;

  $i                                 int i;
  $sum                               int sum;
  @i from = ;                        i = from;
  @sum 0 = ;                         sum = 0;
  while i to <= {                    while i <= to {
    @sum sum i + = ;                   sum = sum + i;
    @i i 1 + = ;                       i = i + 1;
  }                                  }

  sum ret ;                          return sum;
}                                  }

Of course C has quicker expressions like += and ++, but I did not use them in this example to better explain the analogy with G. The function itoa returns a number formatted as a decimal string, while the function platform_log dump a string to the console.

# Function is supposed to be       // Function is supposed to be
# defined elsewhere                // defined elsewhere
ifun process 1                     int process(int);

fun long_operation 2 {             int long_operation(int p1, int p0) {
  $input                             int input;
  $callback                          int callback;
  @input 1 param = ;                 input = p1;
  @callback 0 param = ;              callback = p0;

  # Do a very long operation         // Do a very long operation
  $result                            int result;
  @result input process = ;          result = process(input);

  # Call back at the end             // Call back at the end
  result callback \1 ;               (((*)(int))callback)(result);
}                                  }

fun end_callback 1 {               int end_callback(int p0) {
  $result                            int result;
  @result 0 param = ;                result = p0;

  "Result is " 1 platform_log ;      platform_log("Result is ", 1);
  result itoa 1 platform_log ;       platform_log(itoa(result), 1);
}                                  }

fun start_operation 0 {            int start_operation(void) {
  $input                             int input;

  input @end_callback                long_operation(input, (int)&end_callback);
    long_operation;
}                                  }

The snippet above shows the use of the backslash operator to do indirect function call.