Files
cell/docs/spec/mcode.md
2026-02-20 21:54:19 -06:00

15 KiB

title, description
title description
Mcode IR Instruction set reference for the JSON-based intermediate representation

Overview

Mcode is the intermediate representation at the center of the ƿit compilation pipeline. All source code is lowered to mcode before execution or native compilation. The mcode instruction set is the authoritative reference for the operations supported by the ƿit runtime — the Mach VM bytecode is a direct binary encoding of these same instructions.

Source → Tokenize → Parse → Fold → Mcode → Streamline → Machine

Mcode is produced by mcode.cm, optimized by streamline.cm, then either serialized to 32-bit bytecode for the Mach VM (mach.c), or lowered to QBE/LLVM IL for native compilation (qbe_emit.cm). See Compilation Pipeline for the full overview.

Module Structure

An .mcode file is a JSON object representing a compiled module:

Field Type Description
name string Module name (typically the source filename)
filename string Source filename
data object Constant pool — string and number literals used by instructions
main function The top-level function (module body)
functions array Nested function definitions (referenced by function dest, id)

Function Record

Each function (both main and entries in functions) has:

Field Type Description
name string Function name ("<anonymous>" for lambdas)
filename string Source filename
nr_args integer Number of parameters
nr_slots integer Total register slots needed (args + locals + temporaries)
nr_close_slots integer Number of closure slots captured from parent scope
disruption_pc integer Instruction index of the disruption handler (0 if none)
instructions array Instruction arrays and label strings

Slot 0 is reserved. Slots 1 through nr_args hold parameters. Remaining slots up to nr_slots - 1 are locals and temporaries.

Instruction Format

Each instruction is a JSON array. The first element is the instruction name (string), followed by operands. The last two elements are line and column numbers for source mapping:

["add_int", dest, a, b, line, col]
["load_field", dest, obj, "key", line, col]
["jump", "label_name"]

Operands are register slot numbers (integers), constant values (strings, numbers), or label names (strings).

Instruction Reference

Loading and Constants

Instruction Operands Description
access dest, name Load variable by name (intrinsic or environment)
int dest, value Load integer constant
true dest Load boolean true
false dest Load boolean false
null dest Load null
move dest, src Copy register value
function dest, id Load nested function by index
regexp dest, pattern Create regexp object

Arithmetic — Integer

Instruction Operands Description
add_int dest, a, b dest = a + b (integer)
sub_int dest, a, b dest = a - b (integer)
mul_int dest, a, b dest = a * b (integer)
div_int dest, a, b dest = a / b (integer)
mod_int dest, a, b dest = a % b (integer)
neg_int dest, src dest = -src (integer)

Arithmetic — Float

Instruction Operands Description
add_float dest, a, b dest = a + b (float)
sub_float dest, a, b dest = a - b (float)
mul_float dest, a, b dest = a * b (float)
div_float dest, a, b dest = a / b (float)
mod_float dest, a, b dest = a % b (float)
neg_float dest, src dest = -src (float)

Arithmetic — Generic

Instruction Operands Description
pow dest, a, b dest = a ^ b (exponentiation)

Text

Instruction Operands Description
concat dest, a, b dest = a ~ b (text concatenation)
stone_text slot Stone a mutable text value (see below)

The stone_text instruction is emitted by the streamline optimizer's escape analysis pass (insert_stone_text). It freezes a mutable text value before it escapes its defining slot — for example, before a move, setarg, store_field, push, or put. The instruction is only inserted when the slot is provably T_TEXT; non-text values never need stoning. See Streamline Optimizer — insert_stone_text for details.

At the VM level, stone_text is a single-operand instruction (iABC with B=0, C=0). If the slot holds a heap text without the S bit set, it sets the S bit. For all other values (integers, booleans, already-stoned text, etc.), it is a no-op.

Comparison — Integer

Instruction Operands Description
eq_int dest, a, b dest = a == b (integer)
ne_int dest, a, b dest = a != b (integer)
lt_int dest, a, b dest = a < b (integer)
le_int dest, a, b dest = a <= b (integer)
gt_int dest, a, b dest = a > b (integer)
ge_int dest, a, b dest = a >= b (integer)

Comparison — Float

Instruction Operands Description
eq_float dest, a, b dest = a == b (float)
ne_float dest, a, b dest = a != b (float)
lt_float dest, a, b dest = a < b (float)
le_float dest, a, b dest = a <= b (float)
gt_float dest, a, b dest = a > b (float)
ge_float dest, a, b dest = a >= b (float)

Comparison — Text

Instruction Operands Description
eq_text dest, a, b dest = a == b (text)
ne_text dest, a, b dest = a != b (text)
lt_text dest, a, b dest = a < b (lexicographic)
le_text dest, a, b dest = a <= b (lexicographic)
gt_text dest, a, b dest = a > b (lexicographic)
ge_text dest, a, b dest = a >= b (lexicographic)

Comparison — Boolean

Instruction Operands Description
eq_bool dest, a, b dest = a == b (boolean)
ne_bool dest, a, b dest = a != b (boolean)

Comparison — Special

Instruction Operands Description
is_identical dest, a, b Object identity check (same reference)
eq_tol dest, a, b Equality with tolerance
ne_tol dest, a, b Inequality with tolerance

Type Checks

Inlined from intrinsic function calls. Each sets dest to true or false.

Instruction Operands Description
is_int dest, src Check if integer
is_num dest, src Check if number (integer or float)
is_text dest, src Check if text
is_bool dest, src Check if logical
is_null dest, src Check if null
is_array dest, src Check if array
is_func dest, src Check if function
is_record dest, src Check if record (object)
is_stone dest, src Check if stone (immutable)
is_proxy dest, src Check if function proxy (arity 2)

Logical

Instruction Operands Description
not dest, src Logical NOT
and dest, a, b Logical AND
or dest, a, b Logical OR

Bitwise

Instruction Operands Description
bitand dest, a, b Bitwise AND
bitor dest, a, b Bitwise OR
bitxor dest, a, b Bitwise XOR
bitnot dest, src Bitwise NOT
shl dest, a, b Shift left
shr dest, a, b Arithmetic shift right
ushr dest, a, b Unsigned shift right

Property Access

Memory operations come in typed variants. The compiler selects the appropriate variant based on type_tag and access_kind annotations from parse and fold.

Instruction Operands Description
load_field dest, obj, key Load record property by string key
store_field obj, val, key Store record property by string key
load_index dest, obj, idx Load array element by integer index
store_index obj, val, idx Store array element by integer index
load_dynamic dest, obj, key Load property (dispatches at runtime)
store_dynamic obj, val, key Store property (dispatches at runtime)
delete obj, key Delete property
in dest, obj, key Check if property exists
length dest, src Get length of array or text

Object and Array Construction

Instruction Operands Description
record dest Create empty record {}
array dest, n Create empty array (elements added via push)
push arr, val Push value to array
pop dest, arr Pop value from array

Function Calls

Function calls are decomposed into three instructions:

Instruction Operands Description
frame dest, fn, argc Allocate call frame for fn with argc arguments
setarg frame, idx, val Set argument idx in call frame
invoke frame, result Execute the call, store result
goframe dest, fn, argc Allocate frame for async/concurrent call
goinvoke frame, result Invoke async/concurrent call

Variable Resolution

Instruction Operands Description
access dest, name Load variable (intrinsic or module environment)
get dest, level, slot Get closure variable from parent scope
put level, slot, src Set closure variable in parent scope

Control Flow

Instruction Operands Description
LABEL name Define a named label (not executed)
jump label Unconditional jump
jump_true cond, label Jump if cond is true
jump_false cond, label Jump if cond is false
jump_not_null val, label Jump if val is not null
return src Return value from function
disrupt Trigger disruption (error)

Typed Instruction Design

A key design principle of mcode is that every type check is an explicit instruction. Arithmetic and comparison operations come in type-specialized variants (add_int, add_float, eq_text, etc.) rather than a single polymorphic instruction.

When type information is available from the fold stage, the compiler emits the typed variant directly. When the type is unknown, the compiler emits a type-check/dispatch pattern:

["is_int", check, a]
["jump_false", check, "float_path"]
["add_int", dest, a, b]
["jump", "done"]
["LABEL", "float_path"]
["add_float", dest, a, b]
["LABEL", "done"]

The Streamline Optimizer eliminates dead branches when types are statically known, collapsing the dispatch to a single typed instruction.

Intrinsic Inlining

The mcode compiler recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence:

Source call Emitted instruction
is_array(x) is_array dest, src
is_function(x) is_func dest, src
is_object(x) is_record dest, src
is_stone(x) is_stone dest, src
is_integer(x) is_int dest, src
is_text(x) is_text dest, src
is_number(x) is_num dest, src
is_logical(x) is_bool dest, src
is_null(x) is_null dest, src
length(x) length dest, src
push(arr, val) push arr, val

Function Proxy Decomposition

When the compiler encounters a method call obj.method(args), it emits a branching pattern to handle ƿit's function proxy protocol. An arity-2 function used as a proxy target receives the method name and argument array instead of a normal method call:

["is_proxy", check, obj]
["jump_false", check, "record_path"]

["access", name_slot, "method"]
["array", args_arr, N, arg0, arg1]
["null", null_slot]
["frame", f, obj, 2]
["setarg", f, 0, null_slot]
["setarg", f, 1, name_slot]
["setarg", f, 2, args_arr]
["invoke", f, dest]
["jump", "done"]

["LABEL", "record_path"]
["load_field", method, obj, "method"]
["frame", f2, method, N]
["setarg", f2, 0, obj]
["setarg", f2, 1, arg0]
["invoke", f2, dest]

["LABEL", "done"]

Labels and Control Flow

Control flow uses named labels instead of numeric offsets:

["LABEL", "loop_start"]
["add_int", 1, 1, 2]
["jump_false", 3, "loop_end"]
["jump", "loop_start"]
["LABEL", "loop_end"]

Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution. The Mach serializer converts label names to numeric offsets in the binary bytecode.

Nop Convention

The streamline optimizer replaces eliminated instructions with nop strings (e.g., _nop_tc_1, _nop_bl_2). Nop strings are skipped during interpretation and native code emission but preserved in the instruction array to maintain positional stability for jump targets.

Internal Structures

JSMCode (Mcode Interpreter)

struct JSMCode {
  uint16_t nr_args;        // argument count
  uint16_t nr_slots;       // register count
  cJSON **instrs;          // instruction array
  uint32_t instr_count;    // number of instructions

  struct {
    const char *name;      // label name
    uint32_t index;        // instruction index
  } *labels;
  uint32_t label_count;

  struct JSMCode **functions; // nested functions
  uint32_t func_count;

  cJSON *json_root;        // keeps JSON alive
  const char *name;        // function name
  const char *filename;    // source file
  uint16_t disruption_pc;  // disruption handler offset
};

JSCodeRegister (Mach VM Bytecode)

struct JSCodeRegister {
  uint16_t arity;           // argument count
  uint16_t nr_slots;        // total register count
  uint32_t cpool_count;     // constant pool size
  JSValue *cpool;           // constant pool
  uint32_t instr_count;     // instruction count
  MachInstr32 *instructions; // 32-bit instruction array
  uint32_t func_count;      // nested function count
  JSCodeRegister **functions; // nested function table
  JSValue name;             // function name
  uint16_t disruption_pc;   // disruption handler offset
};

The Mach serializer (mach.c) converts the JSON mcode into compact 32-bit instructions with a constant pool. See Register VM for the binary encoding formats.