Files
cell/docs/spec/mcode.md
2026-02-10 16:37:11 -06:00

3.9 KiB

title, description
title description
Mcode IR JSON-based intermediate representation

Overview

Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.

Pipeline

Source → Tokenize → Parse (AST) → Fold → Mcode (JSON) → Streamline → Interpret
                                                                    → QBE → Native

Mcode is produced by mcode.cm, which lowers the folded AST to JSON instruction arrays. The streamline optimizer (streamline.cm) then eliminates redundant operations. The result can be interpreted by mcode.c, or lowered to QBE IL by qbe_emit.cm for native compilation. See Compilation Pipeline for the full overview.

JSMCode Structure

struct JSMCode {
  uint16_t nr_args;        // argument count
  uint16_t nr_slots;       // register count
  cJSON **instrs;          // pre-flattened instruction array
  uint32_t instr_count;    // number of instructions

  struct {
    const char *name;      // label name
    uint32_t index;        // instruction index
  } *labels;
  uint32_t label_count;

  struct JSMCode **functions; // nested functions
  uint32_t func_count;

  cJSON *json_root;        // keeps JSON alive
  const char *name;        // function name
  const char *filename;    // source file
  uint16_t disruption_pc;  // exception handler offset
};

Instruction Format

Each instruction is a JSON array. The first element is the instruction name (string), followed by operands (typically [op, dest, ...args, line, col]):

["access", 3, 5, 1, 9]
["load_index", 10, 4, 9, 5, 11]
["store_dynamic", 4, 11, 12, 6, 3]
["frame", 15, 14, 1, 7, 7]
["setarg", 15, 0, 16, 7, 7]
["invoke", 15, 13, 7, 7]

Typed Load/Store

Memory operations come in typed variants for optimization:

  • load_index dest, obj, idx — array element by integer index
  • load_field dest, obj, key — record property by string key
  • load_dynamic dest, obj, key — unknown; dispatches at runtime
  • store_index obj, val, idx — array element store
  • store_field obj, val, key — record property store
  • store_dynamic obj, val, key — unknown; dispatches at runtime

The compiler selects the appropriate variant based on type_tag and access_kind annotations from parse and fold.

Decomposed Calls

Function calls are split into separate instructions:

  • frame dest, fn, argc — allocate call frame
  • setarg frame, idx, val — set argument
  • invoke frame, result — execute the call

Labels

Control flow uses named labels instead of numeric offsets:

["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]

Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.

Differences from Mach

Property Mcode Mach
Instructions cJSON arrays 32-bit binary
Dispatch String comparison Switch on opcode byte
Constants Inline in JSON Separate constant pool
Jump targets Named labels Numeric offsets
Memory Heap (cJSON nodes) Off-heap (malloc)

Purpose

Mcode serves as an inspectable, debuggable intermediate format:

  • Human-readable — the JSON representation can be printed and examined
  • Language-independent — any tool that produces the correct JSON can target the ƿit runtime
  • Compilation target — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation

The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.