1 year ago · 17a6b3b4e9
--- a/Readme.md
+++ b/Readme.md
@@ -49,6 +49,16 @@ list of emulated hardware:
 
				 [KolibriOS](https://copy.sh/v86/?profile=kolibrios) —
			
 
				 [QNX](https://copy.sh/v86/?profile=qnx)
			
 
				 
			
 
				+## Docs
			
 
				+
			
 
				+[How it works](docs/how-it-works.md) —
			
 
				+[Networking](docs/networking.md) —
			
 
				+[Archlinux guest setup](docs/archlinux.md) —
			
 
				+[Windows 2000/XP guest setup](docs/windows-xp.md) —
			
 
				+[9p filesystem](docs/filesystem.md) —
			
 
				+[Linux rootfs on 9p](docs/linux-9p-image.md) —
			
 
				+[Profiling](docs/profiling.md)
			
 
				+
			
 
				 ## Compatibility
			
 
				 
			
 
				 Here's an overview of the operating systems supported in v86:
			
--- a/docs/filesystem.md
+++ b/docs/filesystem.md
@@ -1,9 +1,7 @@
 
				 A 9p filesystem is supported by the emulator, using a virtio transport. Using
			
 
				-it, files can be exchanged with the guest OS, see
			
 
				-[`create_file`](/src/browser/starter.js#L1179-L1199)
			
 
				-and
			
 
				-[`read_file`](/src/browser/starter.js#L1209-L1228). It can
			
 
				-be enabled by passing the following options to `V86Starter`:
			
 
				+it, files can be exchanged with the guest OS, see `create_file` and `read_file`
			
 
				+in [`starter.js`](https://github.com/copy/v86/blob/master/src/browser/starter.js).
			
 
				+It can be enabled by passing the following options to `V86`:
			
 
				 
			
 
				 ```javascript
			
 
				 filesystem: {
			
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@@ -0,0 +1,81 @@
 
				+Here's an overview of v86's workings. For details, check the
			
 
				+[source](https://github.com/copy/v86/tree/master/src).
			
 
				+
			
 
				+The major limitations of WebAssembly are (for the purpose of making emulators with jit):
			
 
				+
			
 
				+- structured control flow (no arbitrary jumps)
			
 
				+- no control over registers (you can't keep hardware registers in wasm locals across functions)
			
 
				+- no mmap (paging needs to be fully emulated)
			
 
				+- no patching
			
 
				+- module generation is fairly slow, but at least it's asynchronous, so other things can keep running
			
 
				+- there is some memory overhead per module, so you can't generate more than a few thousand
			
 
				+
			
 
				+v86 has an interpreted mode, which collects entry points (targets of function
			
 
				+calls and indirect jumps). It also measures the hotness per page, so that
			
 
				+compilation is focused on code that is often executed. Once a page is
			
 
				+considered hot, code is generated for the entire page and up to `MAX_PAGES`
			
 
				+that are directly reachable from it.
			
 
				+
			
 
				+v86 generates a single function with a big switch statement (brtable), to
			
 
				+ensure that all functions and targets of indirect jumps are reachable from
			
 
				+other modules. The remaining control flow is handled using the "stackifier"
			
 
				+algorithm (well-explained in
			
 
				+[this blog post](https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2)).
			
 
				+At the moment, there is no linking of wasm modules. The current module is
			
 
				+exited, and the main loop detects if a new module can be entered.
			
 
				+
			
 
				+In practice, I found that browsers don't handle this structure (deep brtables,
			
 
				+with locals being used across the entire function) very well, and `MAX_PAGES`
			
 
				+has to be set to fairly low, otherwise memory usage blows up. It's likely that
			
 
				+improvements are possible (generating fewer entry points, splitting code across
			
 
				+multiple functions).
			
 
				+
			
 
				+Code-generation happens in two passes. The first pass finds all basic block
			
 
				+boundaries, the second generates code for each basic block. Instruction
			
 
				+decoding is generated by a [set of
			
 
				+scripts](https://github.com/copy/v86/tree/master/gen) from a [table of
			
 
				+instructions](https://github.com/copy/v86/blob/master/gen/x86_table.js). It's
			
 
				+also used to [generate
			
 
				+tests](https://github.com/copy/v86/blob/master/tests/nasm/create_tests.js).
			
 
				+
			
 
				+To handle paging, v86 generates code similar to this (see `gen_safe_read`):
			
 
				+
			
 
				+```
			
 
				+entry <- tlb[addr >> 12 << 2]
			
 
				+if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast
			
 
				+entry <- safe_read_jit_slow(addr, instruction_pointer)
			
 
				+if page_fault: goto exit-with-pagefault
			
 
				+fast: mem[(entry & ~0xFFF) ^ addr]
			
 
				+```
			
 
				+
			
 
				+There is a 4 MB cache that acts like a tlb. It contains the physical address,
			
 
				+read-only bit, whether the page contains code (in order to invalidate it on
			
 
				+write) and whether the page points to mmio. Any of those cases are handled in
			
 
				+the slow path (`safe_read_jit_slow`), as well as walking the page tables and
			
 
				+triggering page faults. The fast path is taken in the vast majority of times.
			
 
				+
			
 
				+The remaining code generation is mostly a straight-forward, 1-to-1 translation
			
 
				+of x86 to wasm. The only analysis done is to optimise generation of condional
			
 
				+jumps immediately after arithmetic instructions, e.g.:
			
 
				+
			
 
				+```
			
 
				+cmp eax, 52
			
 
				+setb eax
			
 
				+```
			
 
				+
			
 
				+becomes:
			
 
				+
			
 
				+```
			
 
				+... // code for cmp
			
 
				+eax <- eax < 52
			
 
				+```
			
 
				+
			
 
				+A lazy flag mechanism is used to speed arithmetic (applies to both jit and
			
 
				+interpreted mode, see
			
 
				+[`arith.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/arith.rs) and
			
 
				+[`misc_instr.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/misc_instr.rs)).
			
 
				+There's a wip that tries to elide most lazy-flags updates:
			
 
				+https://github.com/copy/v86/pull/466
			
 
				+
			
 
				+FPU instructions are emulated using softfloat (very slow, but unfortunately
			
 
				+some code relies on 80 bit floats).
			
--- a/docs/networking.md
+++ b/docs/networking.md
@@ -1,7 +1,11 @@
 
				+# v86 networking
			
 
				+
			
 
				 Emulating a network card is supported. It can be used by passing the
			
 
				-`network_relay_url` option to `V86Starter`. The url must point to a running
			
 
				+`network_relay_url` option to `V86`. The url must point to a running
			
 
				 WebSockets Proxy. The source code for WebSockets Proxy can be found at
			
 
				-https://github.com/benjamincburns/websockproxy.
			
 
				+[benjamincburns/websockproxy](https://github.com/benjamincburns/websockproxy).
			
 
				+An alternative, Node-based implementation is
			
 
				+[krishenriksen/node-relay](https://github.com/krishenriksen/node-relay).
			
 
				 
			
 
				 The network card could also be controlled programatically, but this is
			
 
				 currently not exposed.
			
@@ -13,3 +17,31 @@ browser-compatible `WebSocket` constructor being present in the global scope.
 
				 throttling built-in by default which will degrade the networking.
			
 
				 `bellenottelling/websockproxy`docker image has this throttling removed via
			
 
				 [websockproxy/issues/4#issuecomment-317255890](https://github.com/benjamincburns/websockproxy/issues/4#issuecomment-317255890).
			
 
				+
			
 
				+### Interaction with state images
			
 
				+
			
 
				+When using state images, v86 randomises the MAC address after the state has
			
 
				+been loaded, so that multiple VMs don't receive the same address. However, the
			
 
				+guest OS is not aware that the MAC address has changed, which prevents it from
			
 
				+sending and receiving packets correctly. There are several workarounds:
			
 
				+
			
 
				+- Unload the network driver before saving the state. On Linux, unloading can be
			
 
				+  done using `rmmod ne2k-pci` or `echo 0000:00:05.0 >
			
 
				+  /sys/bus/pci/drivers/ne2k-pci/unbind` and loading (after the state has been
			
 
				+  loaded) using `modprobe ne2k-pci` or `echo 0000:00:05.0 >
			
 
				+  /sys/bus/pci/drivers/ne2k-pci/bind`
			
 
				+- Pass `preserve_mac_from_state_image: true` to the V86 constructor. This
			
 
				+  causes MAC addresses to be shared between all VMs with the same state image.
			
 
				+- Pass `mac_address_translation: true` to the V86 constructor. This causes v86
			
 
				+  to present the old MAC address to the guest OS, but translate it to a
			
 
				+  randomised MAC address in outgoing packets (and vice-versa for incoming
			
 
				+  packets). This mechanism currently only supports the ethernet, ipv4, dhcp and
			
 
				+  arp protcols. See `translate_mac_address` in
			
 
				+  [`src/ne2k.js`](https://github.com/copy/v86/blob/master/src/ne2k.js). This is
			
 
				+  currently used in Windows, ReactOS and SerenityOS profiles.
			
 
				+- Some OSes don't cache the MAC address when the driver loads and therefore
			
 
				+  don't need any of the above workarounds. This seems to be the case for Haiku,
			
 
				+  OpenBSD and FreeBSD.
			
 
				+
			
 
				+Note that the same applies to IP addresses, so a dhcp client should only be run
			
 
				+after the state has been loaded.
			
--- a/docs/profiling.md
+++ b/docs/profiling.md
@@ -0,0 +1,7 @@
 
				+v86 has a built-in profiler, which instruments generated code to count certain
			
 
				+events and types of instructions. It can be used by building with `make
			
 
				+debug-with-profiler` and opening debug.html.
			
 
				+
			
 
				+For debugging networking, packet logging is available in the UI in both debug
			
 
				+and release builds. The resulting `traffic.hex` file can be loaded in Wireshark
			
 
				+using file -> import from hex -> tick direction indication, timestamp %s.%f.