I don't recall the exact moment when I decided to start the project that turned out to be much, much more work and patience then I ever expected, but it must have been around 1 or 2 years ago.
It has its roots in my experiments with turning AVR-based microcontrollers (Arduino, Teensy) into something resembling a PC or terminal. Back then, the first step was that I tried to generate a VGA signal and display characters on a VGA monitor directly from the MCU without specialized "graphics hardware". Surprisingly, after a lot of glitches and noise, it worked. I connected the pins of an old PS/2 keyboard to the same MCU and a few lines of code later, I could type letters on the screen and had a minimal command line. These early little successes gave me a lot of motivation.
When entering new territory, first make small attempts at something you never did before to get some first feelings of success.
When I moved to the Teensy, this was my first hands-on encounter with an ARM processor; although I quickly found out that it doesn't really understand the ARM instruction set, but a mini version of it called Thumb. I pasted some code into my bare-bones character generator that could disassemble Thumb, and later wrote a very basic text editor in C that had an "assemble" function that generated Thumb code from assembler source. It didn't support all opcodes, nor macros or other fancy assembler features, but it worked. I soldered an SD card directly to this little computer and linking Arduino's SDFat library, I could read and write files! Later I found out that I could generate up to 256 colors by constructing a resistor ladder that would turn 8 output bits into blazing RRGGGBB encoded pixels, and a similar approach works for 8-bit sound. I spent some nights programming rotating 3D cubes, loading pixel images from SD and even streaming music from a WAV file. It was good. This "bare metal" kind of directness was what I had been missing from computers since the end of the C64 and Amiga times.
But it was extremely limited because MCUs typically only have a few kilobytes (!) of RAM. Very fast, on-die SRAM, but most have something like 8, 16 or (Teensy 3.1) 64kB. Not enough to store a display framebuffer in a satisfying resolution (>320x256). Also, MCUs typically operate at speeds of less than 100 MHz.
Since these experiments, I really craved a computer system that was dead simple, "brutal" and direct, but at the same time in the GHz/GB arena so that I could manipulate interesting audiovisual content and make use of the internet in a meaningful way.
I still haven't found a system like that, but I have an approximation.
At the same time, I had a nagging feeling that I wanted a simpler operating system that had less layers between me and the hardware, and whose complexity was low enough so someone with limited time could wrap their head around it and become productive quickly. By productive I mean "get graphics and text on the screen, sound from the speakers and packets on the network" and not "run 100 mysterious background services, fill your RAM with a bloated office package and load advertising from the web".
I figured I needed a few components to make the system do something interesting:
The platform I chose to implement these things was the Raspberry Pi 2 (actually first the Pi 1 because I had several lying around, but this is a differnt story). The Raspberry Pi 2 fulfills, in theory, mostly, the requirements I had for the hardware:
What I dislike about the RPi 2:
Loading a bare-metal kernel on the RPi 2 is relatively easy. You put a file "kernel7.img" on the first FAT partition of an MBR partioned SD card. You also put the required binary blobs and a "config.txt" file on the card. Then you plug it in and pray.
How do you create a "kernel" file?
kernel7.img is an ARM binary file that you can generate from an ELF executable using objcopy from the ARM version of GNU binutils:
arm-none-eabi-objcopy build/interim-arm.elf -O binary build/kernel7.img
The secret sauce in the RPi2 loads your binary not at address 0x0, but at address 0x8000. To make everything work, if you compile your binary with GCC, for example, you need a linker file (a file with the extension .ld, passed to gcc with the -T parameter) like this that specifies the start address at the right place:
ENTRY(_start)
SECTIONS
{
. = 0x8000;
.text : {
KEEP(*(.text.boot))
*(.text)
}
.rodata : {
*(.rodata)
}
.data : {
*(.data)
}
.bss : {
_bss_start = ALIGN(4096);
bss = ALIGN(4096);
*(.bss)
}
_bss_end = .;
}
Linker files look weird, no? Some lessons I learned here:
If your kernel is written in a high level language like C, this language needs some setup before code written in it can run reliably on the target platform. Which is at least:
Usually, and I also did it like this, you do this setup in an assembler fragment that is linked in at the start address of your kernel. This performs the critical setup -- on the way probably switching on some CPU features -- and then jump to the "main" function of your C code; then you are in C-land.
How to set up the stack?
Setting up the stack requires you to think about the general memory layout of the system. I decided to have a flat memory layout (virtual addresses = physical addresses, to avoid indirection / knots in my brain) and make the stack and heap (memory that is allocated by the language; explanation further down below) grow in opposite direction from each other.
I did it like this:
What else to do in the assembler setup fragment?
Switch on VFP/NEON, the floating point and vector units of the ARM, respectively. Without this, if you use any floating point code, the CPU will crash and you will wonder why.
In the ARM Architecture Reference Manual (ARM ARM), they also set up the MMU (memory management unit) in assembler, but I found this too cumbersome and adapted mvrn's C code to do this instead.
Most hardware configuration that is ARM-related, like switching on the VFP, is done via the MRC and MCR instructions. First you read from the control/coprocessor registers using MRC to a temporary register, then you twiddle the bits and then you write everything back with MCR. Example:
mrc p15, #0, r1, c1, c0, #2
orr r1, r1, #(0xf << 20)
mcr p15, #0, r1, c1, c0, #2
ldr r3, =main // load address of main to R3
blx r3 // branch to R3 and store PC in link register
The main() function does more hardware configuration:
Then I launch the REPL, which is the system's main loop.
Finally, the parts of the OS written in Interim Lisp are loaded by evaluating:
(eval (read (recv (open "/sd/shell.l"))))
File: reader.c
The Reader parses ASCII/UTF-8 strings into S-Expressions, which are internally represented as trees of the Cell struct (defined in minilisp.h).
A Cell consists of 3 machine words: value/addr (CAR), size/next (CDR) and the tag, which specifies the type of the Cell.
typedef struct Cell {
union {
jit_word_t value;
void* addr;
};
union {
jit_word_t size;
void* next;
};
jit_word_t tag;
} Cell;
Carves cells from the heap.
Cell* cell_alloc() {
if (free_list_avail>free_list_consumed) {
// serve from free list
int idx = free_list_consumed;
free_list_consumed++;
Cell* res = free_list[idx];
return res;
} else {
// grow heap
Cell* res = &cell_heap[cells_used];
cells_used++;
if (cells_used>MAX_CELLS) {
// out of memory
printf("[cell_alloc] failed: MAX_CELLS used.\n");
exit(1);
}
return res;
}
}
File: compiler.c
File: jit_arm_raw.c
Generates ARM instructions.
void jit_movr(int dreg, int sreg) {
uint32_t op = 0xe1a00000;
op |= (sreg<<0);
op |= (dreg<<12);
code[code_idx++] = op;
}
File: writer.c
Turns Cell trees into human- (and Reader-) readable strings.
File: alloc.c
Finds all reachable objects and frees the rest.
Interim is single-tasked. Cooperative multi-tasking is emulated by round-robin executing a list of functions in the global tasks list. Tasks can be spawned by appending a function to this list.
(def mouse (open "/mouse"))
(let mouse-info (recv mouse))
(def mouse-dx (car (car mouse-info)))
(def mouse-dy (cdr (car mouse-info)))
(def freenode (open "/net/tcp/62.231.75.133/6667"))
(let packet (recv net))
(if (size packet) (do
…
))
Interim's syntax is defined as follows.
An atom is one of the following:
deadbeef01
]A pair is a link between two atoms; it has a left-hand side (traditionally called "car", "contents of address register") and a right-hand side (traditionally called "cdr", "contents of data register").
A list atom is a pair whose left side points to an atom of any type and whose right side points to the next list atom or an "empty list" atom (a pair of two "nothings"). Lists are entered and displayed simply by separating atoms with whitespace and wrapping the result in parentheses: (1 2 3 4).
Interim programs are simply Interim lists. The first item of a progam list must be the symbol of the function to apply to the following parameters; the remaining items of the list are the parameters:
(+ 1 2) ; evaluates to 3.
(- 5 4) ; evaluates to 1.
(def a 23) ; evaluates to 23. as a side-effect, the symbol "a" is defined as the number 23.
(def double (fn x (+ x x))) ; defines the function "double" that takes one parameter called "x"
(double a) ; now evaluates to 46.
A number of built-in symbols are available at system startup.
(def <symbol> <atom>)
Definition. Binds an atom to a symbol (globally).
(let <symbol> <atom>)
Definition. Binds an atom to a symbol (function-locally).
(fn <arg1 argn …> <body>)
Function. Defines a function with parameter symbols ("arguments") which can be used as placeholders anywhere in the function body.
(<symbol> <arg1 argn…>)
Application. Apply arguments to the function bound to the symbol.
(+), (-), (*), (/), (%)
Add, subtract, multiply, divide, modulus. These operate on integers. A non-nothing non-integer is interpreted as 1. Nothing is interpreted as 0. This allows logic to be constructed from arithmetic.
(gt <a> <b>), (lt <a> <b>), (eq <a> <b>)
Compare two values (a and b). Gt returns 1 if a is greater than b and 0 if not. Lt does the opposite. Eq returns 1 if a and b are equal (they represent the same number or object) and 0 if not.
(if <condition> <then> <else>)
Conditional branch. If condition evaluates to non-zero, is evaluated, or else is evaluted. The value of the evaluated branch is returned.
(while <condition> <then>)
Conditional loop. As long as condition evaluates to non-zero, is evaluated over and over again. The value of the last evaluation is returned.
(do <expr1> <exprn> …)
Sequencing. Expressions are evaluated in the given order; The value of the last evaluation is returned.
(cons <a> <b>)
Returns a pair of a and b: (a.b).
(car <pair>)
Returns the left side of a pair.
(cdr <pair>)
Returns the right side of a pair.
(quote <s-expression>)
Returns the given S-expression unmodified without evaluating it.
(list <a> <b> …)
Constructs a list from the given parameters.
(read <string>)
Deserialization. Parses a string (text) into atoms.
(write <op>)
Serialization. Returns a string representation of the given atom.
(symbols)
Returns a list of all visible symbols.
Strings (text), byte strings (binary data) and files have a uniform interface in Interim OS. All system resources (see Section 3.6) in Interim OS are exposed as file systems which can be addressed using (open ). The opened "file" behaves exactly like a byte string; thus, a string created at run-time is a nameless file in memory.
(alloc 65536)
Returns a 64-kilobyte string of zeroes.
(size <str>)
Returns the length of a string in bytes.
(substr <str> <offset> <length>)
Returns a substring of length that starts at bytes in the given string.
(put <str> <offset> <byte>)
Replaces the byte at position in the given string with a copy of .
(get <str> <offset>)
Returns the byte at position in the given string.
(open "/path/to/file")
Returns a stream that is an interface to the file at the specified path.
(send <sym> <str2>)
Sends (appends) a string to the given symbol representing a stream.
(recv <sym> <length>)
Receives (consumes) up to bytes from the stream represented by .