The G language is conceived to be logically as similar as possible to a reasonable subset of C, while at the same time being easy to parse and removed of features that are unessential to bootstrapping.
In G all variables have the same type, which is analogous to the int
type in C (although signedness is not defined: it is defined by
operators) and corresponds to the native integer type available on the
machine. It is used both as integer and as pointer. Since there is
only one type, function signatures are entirely defined by the number
of parameters they accept.
Each G file must only contains printable and whitespace ASCII
characters. They are assembled into tokens, each of which is either a
collection of non-whitespace characters bounded by whitespace
characters, or a string of characters bounded by "
guards. Whitespace characters (except those in strings) are irrelevant
and are dropped after tokenization. Comments are introduced by a #
(except those in strings) and continue to the end of the line.
In a G source code, the following constructions are available.
Constant definition
const NAME VALUE
declares a compile-time constant with name NAME
and value VALUE
(which can be a previously defined constant, a decimal expression
or a hexadecimal expression). Constants are not materialized into
variables, but have the same type, so cannot contain values out of
the native integer type bounds.
Examples:
const DOZEN 12
const BYTE_MAX 0xff
Global variable definition
$NAME
defines a global variable called NAME
, which is initialized to
zero. There must be no space between the dollar sign and the name
(i.e., they have to be in the same token).
Example:
$global_var
Function declaration
ifun NAME PARAM_NUM
declares a function named NAME
with PARAM_NUM
parameters
without defining it. I will have to be matched by a later (or also
earlier, although it is useless in this case) function definition
(introduced with fun
). The same function can be declared as many
times as one wish, as long as all declarations have the same number
of parameters (which must also be equal to the number of parameters
in the function definition).
Examples:
ifun setup 0
ifun sum_two_numbers 2
Function definition
fun NAME PARAM_NUM { BODY }
defines a function names NAME
with PARAM_NUM
parameters. BODY
must be the code to be executed when the function is called (see
"Function body" below).
The body of a function is a sequence of commands that are to be executed by a stack machine. At the beginning of the function the stack is empty, but there is not obligation to leave it empty at the end. In general each command pops a few (possibly zero) operands, does some computation with them and pushes the result. All stack elements have the same type as any variable.
Multiplexed with stack commands, other commands are available to manage execution flow or introduce local variables. In general, all non-stack commands require the stack to be empty at their point of execution.
In particular, the following stack commands are supported.
Push a constant value Writing an integral value (either in decimal or hexadecimal form) causes it to be pushed on the stack.
Push variable value Writing a variable name (either local or global) causes its current value to be pushed on the stack.
Call a function by name Writing a function name causes a number
of arguments equal to the number of parameters the function accept
to be popped from the stack. The function is then called with such
parameters passed to it, and its return value is then pushed on the
stack. All functions always return a value, there is no concept of
a function returning void
. If this value has no meaning, it is up
to the caller to ignore it.
Push variable address Writing a variable or function name
prepended by a @
sign (without spaces: they have to be in the
same token) causes its address to be pushed on the stack.
Call a function by address Writing a backslash \
followed
(without spaces) by a number (or a compile-time constant) causes a
value to be popped from the stack and interpreted as the address of
a function taking the number of parameters specified after the
backslash. The function call procedure is then followed as above
(the appropriate number of arguments is popped from the stack, the
function is called and its return value is pushed).
Flush the stack Writing a single ;
character causes the stack
to be flushed (i.e., all values are popped and discarded). This is
required before using non-stack commands, like execution flow
control and variable introduction commands.
Return from function Writing a token ret
causes the function
to immediately return. If the stack is not empty, its top value is
the function return value. If the stack is empty, the return value
is unspecified.
The following non-stack commands are supported. When they are executed, the stack must be empty (and, correspondingly, they leave it empty).
Code block At any point a scoped block can be begun with the
command {
and closed with the corresponding }
. All variables
definitions inside the block expire when the block is closed. The
stack does not need to be empty at the end of the block, but if it
is not, it is flushed.
Local variable definition Writing $NAME
a new variable with
name NAME
is introduced. Its initial value is unspecified
(differently from global variables).
Conditional block The if
token introduces a conditional
block. It must be followed by one or more stack commands (except
ret
) and then by a code block, defined as above. When the
conditional block is executed, first the guard stack commands are
evaluated and must leave exactly one value in the stack: if such
value is zero, control is directly trasfered to the end of the
block. If not, the block is executed normally. Differently from C,
the if
command must guard a block; it cannot guard a single
expression. Optionally, the token else
followed by another block
can appear. That block is executed if the initial guard evaluates
to zero and skipped if it does not.
Repeated block Then while
token introduces a repeated
block. Its syntax is identical to the conditional block, except for
the usage of while
instead of if
. Its semantic is also
identical, except that at the end of the block execution the guard
expression is evaluated again, and if it still non-zero, the block
is executed again. The else
block cannot appear in this case.
G does not directly support for
blocks and continue
, break
and goto
statements. If needed, they have to be emulated with
appropriate flags.
The function's formal parameters are not directly available as named
variables. However, they can be retrieved with the param
predefined
function, described below.
The following functions are always available in a G program, without having to be manually defined.
=
(2 arguments) assigns the latest pushed argument to the
variable at the address specified by the earliest pushed
argument. It is important to notice that this is not completely
equivalent to the C assignement operator, because G has no
equivalent for the C lvalue concept. Thus a b =
is rather
equivalent to C's *(int*)a = b
. If you want to assign the value
of b
to a
, then you need to write @a b =
in G, which becomes
equivalent to C's *(int*)&a = b
.
param
(1 argument) returns the n
-th formal parameter to the
enclosing function, where n
is the value passed to param
. If
n
is larger or equal then the number of parameters, the behaviour
is unspecified. The zeroth parameter is the one that was pushed
last on the stack before calling the enclosing
function. Therefore in the following snippet:
fun test 2 {
$a $b
@a 0 param = ;
@b 1 param = ;
}
fun test2 0 {
0 1 test ;
}
a
will have value 1
and b
will have value 0
inside test
.
+
(2 arguments) returns the sum of its two arguments.
-
(2 arguments) returns the difference of its two arguments (the
earliest pushed minus the latest pushed).
TODO
This section is not normative, but G programmers are encouraged to follow it so that G programs remain as readable as possible.
;
command clearly bears a similarity with the C semicolon,
which is used to close statements. In G there are no statements, so
there is no need to close them; as a result, in general ;
commands are only really required when you need to call a non-stack
command. However, it is suggested to still use them in the C way,
to improve readability and to drop the stack as soon as it is not
needed anymore (this also prevents leftover stack elements to be
used in a successive unrelated expression).For example, the following code assignes 0
to the variable a
and 1
the the varible b
:
@a 0 = ;
@b 1 = ;
The first semicolon (and possibily the second one too, depending on what comes later) can be removed without changing the program behaviour:
@a 0 =
@b 1 = ;
In this second case, the second assignement is executed while still
having the value of a
at the bottom of the stack. Thus the
program is still correct, but for the reasons above the first
snippet is encouraged. Incidentally, newlines are irrelevant too:
the same code could have written, with or without the semicolon, on
one line:
@a 0 = @b 1 = ;
And, of course, this is even less encouraged.
The usage of the param
function might be a bit non-obvious,
because it causes parameters to materialize inside the function in
a way that might appears to be the "opposite" of what might seem
sensible. The suggested idiomatic way to use param
in thus the
following: suppose that you want to define a function analogous to
the C declaration
int func(int a, int b, int c);
Then it is suggested to define it in G in this way:
fun func 3 {
$a
$b
$c
@a param 2 = ; # Notice param arguments are in
@b param 1 = ; # decreasing order
@c param 0 = ;
# Do not use param anymore in function body; just use a, b
# and c
}
and call it by pushing arguments on the stack in the same order they appear in the C declaration:
fun func2 0 {
0 1 2 func ;
}
a
will take value 0
, b
will take value 1
and c
will take
value 2
inside func
.
Also, the reference G compiler will produce for func
machine code
that is ABI-compatible with the C declaration for func
if the C
compiler uses cdecl
calling conventions, which permits easy
interaction between C and G in later stages of asmc
.
G's very simple type system, while allowing a very simple syntax and compiler, completely leaves the burden of organizing structured data types on the programmer. Fortunately the task is not that difficult with a little bit of code organization (which is, in the end, not very different from what happens in a C program, except that you do not have the syntactic sugar coating). Suppose that you need a structure like this one in C:
typedef struct {
int first;
int second;
int third;
} MyStruct;
You can use the following code in G:
const MYSTRUCT_FIRST 0
const MYSTRUCT_SECOND 4
const MYSTRUCT_THIRD 8
const SIZEOF_MYSTRUCT 12
Then, using ptr
to denote a pointer to this structure, the
following C code:
MyStruct *ptr;
ptr = malloc(sizeof(MyStruct));
ptr->first = 0;
ptr->second = ptr->third;
free(ptr);
is roughly equivalent to this G code:
$ptr
@ptr SIZEOF_MYSTRUCT malloc = ;
ptr MYSTRUCT_FIRST take_addr 0 = ;
ptr MYSTRUCT_SECOND take_addr ptr MYSTRUCT_THIRD take = ;
ptr free ;
The library routines take
and take_addr
are defined in
utils.g
and do the right thing here (take_addr
is actually
completely equivalent to +
and take
is just +
followed by
dereferencing; it is useful to give them different names to remark
their meaning).
The G syntax is a bit more verbose and requires some care in maintaining the offset tables for all structures (be careful not to get confused between multiples of 4 and feel free to use hexadecimal if it makes things easier for you), but all in all if you know how to do things in C, converting to G is rather straightforward.
Let us discuss a few simple G programs and provide their C equivalents to better illustrate them.
fun sum_two_numbers 2 { int sum_two_numbers(int p1, int p0) {
$x int x;
$y int y;
@x 1 param = ; x = p1;
@y 0 param = ; y = p2;
$sum int sum;
@sum x y + = ; sum = x+y;
sum ret ; return sum;
} }
Incidentally, sum_two_numbers
does exactly the same thing as the
built-in function +
, but it was an easy starting example.
const FROM 20 #define FROM 20
const TO 0x64 #define TO 0x64
ifun sum_number 2 int sum_numbers(int, int);
fun main 0 { int main(void) {
"The sum of numbers from " platform_log("The sum of numbers from ", 1);
1 platform_log ;
FROM itoa 1 platform_log ; platform_log(itoa(FROM), 1);
" to " 1 platform_log ; platform_log(" to ", 1);
TO itoa 1 platform_log ; platform_log(itoa(TO), 1);
" is " 1 platform_log ; platform_log(" is ", 1);
FROM TO sum_numbers itoa platform_log(itoa(sum_numbers(FROM, TO)), 1);
1 platform_log ;
"\n" 1 platform_log ; platform_log("\n", 1);
} }
# Return the sum of numbers // Return the sum of numbers
# in an interval // in an interval
fun sum_numbers 2 { int sum_numbers(int p1, int p0) {
$from int from;
$to int to;
@from 1 param = ; from = p1;
@to 0 param = ; to = p0;
$i int i;
$sum int sum;
@i from = ; i = from;
@sum 0 = ; sum = 0;
while i to <= { while i <= to {
@sum sum i + = ; sum = sum + i;
@i i 1 + = ; i = i + 1;
} }
sum ret ; return sum;
} }
Of course C has quicker expressions like +=
and ++
, but I did not
use them in this example to better explain the analogy with G. The
function itoa
returns a number formatted as a decimal string, while
the function platform_log
dump a string to the console.
# Function is supposed to be // Function is supposed to be
# defined elsewhere // defined elsewhere
ifun process 1 int process(int);
fun long_operation 2 { int long_operation(int p1, int p0) {
$input int input;
$callback int callback;
@input 1 param = ; input = p1;
@callback 0 param = ; callback = p0;
# Do a very long operation // Do a very long operation
$result int result;
@result input process = ; result = process(input);
# Call back at the end // Call back at the end
result callback \1 ; (((*)(int))callback)(result);
} }
fun end_callback 1 { int end_callback(int p0) {
$result int result;
@result 0 param = ; result = p0;
"Result is " 1 platform_log ; platform_log("Result is ", 1);
result itoa 1 platform_log ; platform_log(itoa(result), 1);
} }
fun start_operation 0 { int start_operation(void) {
$input int input;
input @end_callback long_operation(input, (int)&end_callback);
long_operation;
} }
The snippet above shows the use of the backslash operator to do indirect function call.