Files
cell/docs/spec/mach.md
2026-02-20 21:54:19 -06:00

4.4 KiB

title, description
title description
Register VM Binary encoding of the Mach bytecode interpreter

Overview

The Mach VM is a register-based virtual machine that directly interprets the Mcode IR instruction set as compact 32-bit binary bytecode. It is modeled after Lua's register VM — operands are register indices rather than stack positions, reducing instruction count and improving performance.

The Mach serializer (mach.c) converts streamlined mcode JSON into binary instructions. Since the Mach bytecode is a direct encoding of the mcode, the Mcode IR reference is the authoritative instruction set documentation.

Instruction Formats

All instructions are 32 bits wide. Four encoding formats are used:

iABC — Three-Register

[op: 8][A: 8][B: 8][C: 8]

Used for operations on three registers: R(A) = R(B) op R(C).

iABx — Register + Constant

[op: 8][A: 8][Bx: 16]

Used for loading constants: R(A) = K(Bx).

iAsBx — Register + Signed Offset

[op: 8][A: 8][sBx: 16]

Used for conditional jumps: if R(A) then jump by sBx.

isJ — Signed Jump

[op: 8][sJ: 24]

Used for unconditional jumps with a 24-bit signed offset.

Registers

Each function frame has a fixed number of register slots, determined at compile time:

  • R(0)this binding
  • R(1)..R(arity) — function arguments
  • R(arity+1).. — local variables and temporaries

JSCodeRegister

The compiled output for a function:

struct JSCodeRegister {
  uint16_t arity;           // argument count
  uint16_t nr_slots;        // total register count
  uint32_t cpool_count;     // constant pool size
  JSValue *cpool;           // constant pool
  uint32_t instr_count;     // instruction count
  MachInstr32 *instructions; // 32-bit instruction array
  uint32_t func_count;      // nested function count
  JSCodeRegister **functions; // nested function table
  JSValue name;             // function name
  uint16_t disruption_pc;   // disruption handler offset
};

The constant pool holds all non-immediate values referenced by LOADK instructions: strings, large numbers, and other constants.

Constant Pool Index Overflow

Named property instructions (LOAD_FIELD, STORE_FIELD, DELETE) use the iABC format where the constant pool key index occupies an 8-bit field (max 255). When a function references more than 256 unique property names, the serializer automatically falls back to a two-instruction sequence:

  1. LOADK tmp, key_index — load the key string into a temporary register (iABx, 16-bit index)
  2. LOAD_DYNAMIC / STORE_DYNAMIC / DELETEINDEX — use the register-based variant

This is transparent to the mcode compiler and streamline optimizer.

Arithmetic Dispatch

Arithmetic ops (ADD, SUB, MUL, DIV, MOD, POW) are executed inline without calling the polymorphic reg_vm_binop() helper. Since mcode's type guard dispatch guarantees both operands are numbers:

  1. Int-int fast path: JS_VALUE_IS_BOTH_INT → native integer arithmetic with int32 overflow check. Overflow promotes to float64.
  2. Float fallback: JS_ToFloat64 → native floating-point operation. Non-finite results produce null.

DIV and MOD check for zero divisor (→ null). POW uses pow() with non-finite handling for finite inputs.

Comparison ops (EQ through GE) and bitwise ops still use reg_vm_binop() for their slow paths, as they handle a wider range of type combinations (string comparisons, null equality, etc.).

String Concatenation

CONCAT has a three-tier dispatch for self-assign patterns (concat R(A), R(A), R(C) where dest equals the left operand):

  1. In-place append: If R(A) is a mutable heap text (S bit clear) with length + rhs_length <= cap56, characters are appended directly. Zero allocation, zero GC.
  2. Growth allocation (JS_ConcatStringGrow): Allocates a new text with 2x capacity and does not stone the result, leaving it mutable for subsequent appends.
  3. Exact-fit stoned (JS_ConcatString): Used when dest differs from the left operand (normal non-self-assign concat).

The stone_text instruction (iABC, B=0, C=0) sets the S bit on a mutable heap text in R(A). For non-pointer values or already-stoned text, it is a no-op. This instruction is emitted by the streamline optimizer at escape points; see Streamline — insert_stone_text and Stone Memory — Mutable Text.