--- title: "Mcode IR" description: "JSON-based intermediate representation" --- ## Overview Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation. ## Pipeline ``` Source → Tokenize → Parse (AST) → Fold → Mcode (JSON) → Streamline → Interpret → QBE → Native ``` Mcode is produced by `mcode.cm`, which lowers the folded AST to JSON instruction arrays. The streamline optimizer (`streamline.cm`) then eliminates redundant operations. The result can be interpreted by `mcode.c`, or lowered to QBE IL by `qbe_emit.cm` for native compilation. See [Compilation Pipeline](pipeline.md) for the full overview. ## JSMCode Structure ```c struct JSMCode { uint16_t nr_args; // argument count uint16_t nr_slots; // register count cJSON **instrs; // pre-flattened instruction array uint32_t instr_count; // number of instructions struct { const char *name; // label name uint32_t index; // instruction index } *labels; uint32_t label_count; struct JSMCode **functions; // nested functions uint32_t func_count; cJSON *json_root; // keeps JSON alive const char *name; // function name const char *filename; // source file uint16_t disruption_pc; // exception handler offset }; ``` ## Instruction Format Each instruction is a JSON array. The first element is the instruction name (string), followed by operands (typically `[op, dest, ...args, line, col]`): ```json ["access", 3, 5, 1, 9] ["load_index", 10, 4, 9, 5, 11] ["store_dynamic", 4, 11, 12, 6, 3] ["frame", 15, 14, 1, 7, 7] ["setarg", 15, 0, 16, 7, 7] ["invoke", 15, 13, 7, 7] ``` ### Typed Load/Store Memory operations come in typed variants for optimization: - `load_index dest, obj, idx` — array element by integer index - `load_field dest, obj, key` — record property by string key - `load_dynamic dest, obj, key` — unknown; dispatches at runtime - `store_index obj, val, idx` — array element store - `store_field obj, val, key` — record property store - `store_dynamic obj, val, key` — unknown; dispatches at runtime The compiler selects the appropriate variant based on `type_tag` and `access_kind` annotations from parse and fold. ### Decomposed Calls Function calls are split into separate instructions: - `frame dest, fn, argc` — allocate call frame - `setarg frame, idx, val` — set argument - `invoke frame, result` — execute the call ## Labels Control flow uses named labels instead of numeric offsets: ```json ["LABEL", "loop_start"] ["ADD", 1, 1, 2] ["JMPFALSE", 3, "loop_end"] ["JMP", "loop_start"] ["LABEL", "loop_end"] ``` Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution. ## Differences from Mach | Property | Mcode | Mach | |----------|-------|------| | Instructions | cJSON arrays | 32-bit binary | | Dispatch | String comparison | Switch on opcode byte | | Constants | Inline in JSON | Separate constant pool | | Jump targets | Named labels | Numeric offsets | | Memory | Heap (cJSON nodes) | Off-heap (malloc) | ## Purpose Mcode serves as an inspectable, debuggable intermediate format: - **Human-readable** — the JSON representation can be printed and examined - **Language-independent** — any tool that produces the correct JSON can target the ƿit runtime - **Compilation target** — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.