3.9 KiB
title, description
| title | description |
|---|---|
| Mcode IR | JSON-based intermediate representation |
Overview
Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.
Pipeline
Source → Tokenize → Parse (AST) → Fold → Mcode (JSON) → Streamline → Interpret
→ QBE → Native
Mcode is produced by mcode.cm, which lowers the folded AST to JSON instruction arrays. The streamline optimizer (streamline.cm) then eliminates redundant operations. The result can be interpreted by mcode.c, or lowered to QBE IL by qbe_emit.cm for native compilation. See Compilation Pipeline for the full overview.
JSMCode Structure
struct JSMCode {
uint16_t nr_args; // argument count
uint16_t nr_slots; // register count
cJSON **instrs; // pre-flattened instruction array
uint32_t instr_count; // number of instructions
struct {
const char *name; // label name
uint32_t index; // instruction index
} *labels;
uint32_t label_count;
struct JSMCode **functions; // nested functions
uint32_t func_count;
cJSON *json_root; // keeps JSON alive
const char *name; // function name
const char *filename; // source file
uint16_t disruption_pc; // exception handler offset
};
Instruction Format
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands (typically [op, dest, ...args, line, col]):
["access", 3, 5, 1, 9]
["load_index", 10, 4, 9, 5, 11]
["store_dynamic", 4, 11, 12, 6, 3]
["frame", 15, 14, 1, 7, 7]
["setarg", 15, 0, 16, 7, 7]
["invoke", 15, 13, 7, 7]
Typed Load/Store
Memory operations come in typed variants for optimization:
load_index dest, obj, idx— array element by integer indexload_field dest, obj, key— record property by string keyload_dynamic dest, obj, key— unknown; dispatches at runtimestore_index obj, val, idx— array element storestore_field obj, val, key— record property storestore_dynamic obj, val, key— unknown; dispatches at runtime
The compiler selects the appropriate variant based on type_tag and access_kind annotations from parse and fold.
Decomposed Calls
Function calls are split into separate instructions:
frame dest, fn, argc— allocate call framesetarg frame, idx, val— set argumentinvoke frame, result— execute the call
Labels
Control flow uses named labels instead of numeric offsets:
["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.
Differences from Mach
| Property | Mcode | Mach |
|---|---|---|
| Instructions | cJSON arrays | 32-bit binary |
| Dispatch | String comparison | Switch on opcode byte |
| Constants | Inline in JSON | Separate constant pool |
| Jump targets | Named labels | Numeric offsets |
| Memory | Heap (cJSON nodes) | Off-heap (malloc) |
Purpose
Mcode serves as an inspectable, debuggable intermediate format:
- Human-readable — the JSON representation can be printed and examined
- Language-independent — any tool that produces the correct JSON can target the ƿit runtime
- Compilation target — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation
The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.