update docs

This commit is contained in:
2026-02-12 16:34:45 -06:00
parent 0ba2783b48
commit 7b46c6e947
5 changed files with 324 additions and 209 deletions

View File

@@ -1,23 +1,260 @@
---
title: "Mcode IR"
description: "JSON-based intermediate representation"
description: "Instruction set reference for the JSON-based intermediate representation"
---
## Overview
Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.
## Pipeline
Mcode is the intermediate representation at the center of the ƿit compilation pipeline. All source code is lowered to mcode before execution or native compilation. The mcode instruction set is the **authoritative reference** for the operations supported by the ƿit runtime — the Mach VM bytecode is a direct binary encoding of these same instructions.
```
Source → Tokenize → Parse (AST) → Fold → Mcode (JSON) → Streamline → Mach VM (default)
→ Mcode Interpreter
→ QBE → Native
Source → Tokenize → Parse → Fold → Mcode → Streamline → Machine
```
Mcode is produced by `mcode.cm`, which lowers the folded AST to JSON instruction arrays. The streamline optimizer (`streamline.cm`) then eliminates redundant operations. The result is serialized to binary bytecode by the Mach compiler (`mach.c`), interpreted directly by `mcode.c`, or lowered to QBE IL by `qbe_emit.cm` for native compilation. See [Compilation Pipeline](pipeline.md) for the full overview.
Mcode is produced by `mcode.cm`, optimized by `streamline.cm`, then either serialized to 32-bit bytecode for the Mach VM (`mach.c`), or lowered to QBE/LLVM IL for native compilation (`qbe_emit.cm`). See [Compilation Pipeline](pipeline.md) for the full overview.
### Function Proxy Decomposition
## Instruction Format
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands. The last two elements are line and column numbers for source mapping:
```json
["add_int", dest, a, b, line, col]
["load_field", dest, obj, "key", line, col]
["jump", "label_name"]
```
Operands are register slot numbers (integers), constant values (strings, numbers), or label names (strings).
## Instruction Reference
### Loading and Constants
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `access` | `dest, name` | Load variable by name (intrinsic or environment) |
| `int` | `dest, value` | Load integer constant |
| `true` | `dest` | Load boolean `true` |
| `false` | `dest` | Load boolean `false` |
| `null` | `dest` | Load `null` |
| `move` | `dest, src` | Copy register value |
| `function` | `dest, id` | Load nested function by index |
| `regexp` | `dest, pattern` | Create regexp object |
### Arithmetic — Integer
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `add_int` | `dest, a, b` | `dest = a + b` (integer) |
| `sub_int` | `dest, a, b` | `dest = a - b` (integer) |
| `mul_int` | `dest, a, b` | `dest = a * b` (integer) |
| `div_int` | `dest, a, b` | `dest = a / b` (integer) |
| `mod_int` | `dest, a, b` | `dest = a % b` (integer) |
| `neg_int` | `dest, src` | `dest = -src` (integer) |
### Arithmetic — Float
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `add_float` | `dest, a, b` | `dest = a + b` (float) |
| `sub_float` | `dest, a, b` | `dest = a - b` (float) |
| `mul_float` | `dest, a, b` | `dest = a * b` (float) |
| `div_float` | `dest, a, b` | `dest = a / b` (float) |
| `mod_float` | `dest, a, b` | `dest = a % b` (float) |
| `neg_float` | `dest, src` | `dest = -src` (float) |
### Arithmetic — Generic
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `pow` | `dest, a, b` | `dest = a ^ b` (exponentiation) |
### Text
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `concat` | `dest, a, b` | `dest = a ~ b` (text concatenation) |
### Comparison — Integer
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `eq_int` | `dest, a, b` | `dest = a == b` (integer) |
| `ne_int` | `dest, a, b` | `dest = a != b` (integer) |
| `lt_int` | `dest, a, b` | `dest = a < b` (integer) |
| `le_int` | `dest, a, b` | `dest = a <= b` (integer) |
| `gt_int` | `dest, a, b` | `dest = a > b` (integer) |
| `ge_int` | `dest, a, b` | `dest = a >= b` (integer) |
### Comparison — Float
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `eq_float` | `dest, a, b` | `dest = a == b` (float) |
| `ne_float` | `dest, a, b` | `dest = a != b` (float) |
| `lt_float` | `dest, a, b` | `dest = a < b` (float) |
| `le_float` | `dest, a, b` | `dest = a <= b` (float) |
| `gt_float` | `dest, a, b` | `dest = a > b` (float) |
| `ge_float` | `dest, a, b` | `dest = a >= b` (float) |
### Comparison — Text
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `eq_text` | `dest, a, b` | `dest = a == b` (text) |
| `ne_text` | `dest, a, b` | `dest = a != b` (text) |
| `lt_text` | `dest, a, b` | `dest = a < b` (lexicographic) |
| `le_text` | `dest, a, b` | `dest = a <= b` (lexicographic) |
| `gt_text` | `dest, a, b` | `dest = a > b` (lexicographic) |
| `ge_text` | `dest, a, b` | `dest = a >= b` (lexicographic) |
### Comparison — Boolean
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `eq_bool` | `dest, a, b` | `dest = a == b` (boolean) |
| `ne_bool` | `dest, a, b` | `dest = a != b` (boolean) |
### Comparison — Special
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `is_identical` | `dest, a, b` | Object identity check (same reference) |
| `eq_tol` | `dest, a, b` | Equality with tolerance |
| `ne_tol` | `dest, a, b` | Inequality with tolerance |
### Type Checks
Inlined from intrinsic function calls. Each sets `dest` to `true` or `false`.
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `is_int` | `dest, src` | Check if integer |
| `is_num` | `dest, src` | Check if number (integer or float) |
| `is_text` | `dest, src` | Check if text |
| `is_bool` | `dest, src` | Check if logical |
| `is_null` | `dest, src` | Check if null |
| `is_array` | `dest, src` | Check if array |
| `is_func` | `dest, src` | Check if function |
| `is_record` | `dest, src` | Check if record (object) |
| `is_stone` | `dest, src` | Check if stone (immutable) |
| `is_proxy` | `dest, src` | Check if function proxy (arity 2) |
### Logical
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `not` | `dest, src` | Logical NOT |
| `and` | `dest, a, b` | Logical AND |
| `or` | `dest, a, b` | Logical OR |
### Bitwise
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `bitand` | `dest, a, b` | Bitwise AND |
| `bitor` | `dest, a, b` | Bitwise OR |
| `bitxor` | `dest, a, b` | Bitwise XOR |
| `bitnot` | `dest, src` | Bitwise NOT |
| `shl` | `dest, a, b` | Shift left |
| `shr` | `dest, a, b` | Arithmetic shift right |
| `ushr` | `dest, a, b` | Unsigned shift right |
### Property Access
Memory operations come in typed variants. The compiler selects the appropriate variant based on `type_tag` and `access_kind` annotations from parse and fold.
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `load_field` | `dest, obj, key` | Load record property by string key |
| `store_field` | `obj, val, key` | Store record property by string key |
| `load_index` | `dest, obj, idx` | Load array element by integer index |
| `store_index` | `obj, val, idx` | Store array element by integer index |
| `load_dynamic` | `dest, obj, key` | Load property (dispatches at runtime) |
| `store_dynamic` | `obj, val, key` | Store property (dispatches at runtime) |
| `delete` | `obj, key` | Delete property |
| `in` | `dest, obj, key` | Check if property exists |
| `typeof` | `dest, src` | Get type name as text |
| `length` | `dest, src` | Get length of array or text |
### Object and Array Construction
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `record` | `dest` | Create empty record `{}` |
| `array` | `dest, n, ...elems` | Create array with `n` elements |
| `push` | `arr, val` | Push value to array |
| `pop` | `dest, arr` | Pop value from array |
### Function Calls
Function calls are decomposed into three instructions:
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `frame` | `dest, fn, argc` | Allocate call frame for `fn` with `argc` arguments |
| `setarg` | `frame, idx, val` | Set argument `idx` in call frame |
| `invoke` | `frame, result` | Execute the call, store result |
| `goframe` | `dest, fn, argc` | Allocate frame for async/concurrent call |
| `goinvoke` | `frame, result` | Invoke async/concurrent call |
### Variable Resolution
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `access` | `dest, name` | Load variable (intrinsic or module environment) |
| `set_var` | `name, src` | Set top-level variable by name |
| `get` | `dest, level, slot` | Get closure variable from parent scope |
| `put` | `level, slot, src` | Set closure variable in parent scope |
### Control Flow
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `LABEL` | `name` | Define a named label (not executed) |
| `jump` | `label` | Unconditional jump |
| `jump_true` | `cond, label` | Jump if `cond` is true |
| `jump_false` | `cond, label` | Jump if `cond` is false |
| `jump_not_null` | `val, label` | Jump if `val` is not null |
| `return` | `src` | Return value from function |
| `disrupt` | — | Trigger disruption (error) |
## Typed Instruction Design
A key design principle of mcode is that **every type check is an explicit instruction**. Arithmetic and comparison operations come in type-specialized variants (`add_int`, `add_float`, `eq_text`, etc.) rather than a single polymorphic instruction.
When type information is available from the fold stage, the compiler emits the typed variant directly. When the type is unknown, the compiler emits a type-check/dispatch pattern:
```json
["is_int", check, a]
["jump_false", check, "float_path"]
["add_int", dest, a, b]
["jump", "done"]
["LABEL", "float_path"]
["add_float", dest, a, b]
["LABEL", "done"]
```
The [Streamline Optimizer](streamline.md) eliminates dead branches when types are statically known, collapsing the dispatch to a single typed instruction.
## Intrinsic Inlining
The mcode compiler recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence:
| Source call | Emitted instruction |
|-------------|-------------------|
| `is_array(x)` | `is_array dest, src` |
| `is_function(x)` | `is_func dest, src` |
| `is_object(x)` | `is_record dest, src` |
| `is_stone(x)` | `is_stone dest, src` |
| `is_integer(x)` | `is_int dest, src` |
| `is_text(x)` | `is_text dest, src` |
| `is_number(x)` | `is_num dest, src` |
| `is_logical(x)` | `is_bool dest, src` |
| `is_null(x)` | `is_null dest, src` |
| `length(x)` | `length dest, src` |
| `push(arr, val)` | `push arr, val` |
## Function Proxy Decomposition
When the compiler encounters a method call `obj.method(args)`, it emits a branching pattern to handle ƿit's function proxy protocol. An arity-2 function used as a proxy target receives the method name and argument array instead of a normal method call:
@@ -25,9 +262,8 @@ When the compiler encounters a method call `obj.method(args)`, it emits a branch
["is_proxy", check, obj]
["jump_false", check, "record_path"]
// Proxy path: call obj(name, [args...]) with this=null
["access", name_slot, "method"]
["array", args_arr, N, arg0, arg1, ...]
["array", args_arr, N, arg0, arg1]
["null", null_slot]
["frame", f, obj, 2]
["setarg", f, 0, null_slot]
@@ -41,21 +277,38 @@ When the compiler encounters a method call `obj.method(args)`, it emits a branch
["frame", f2, method, N]
["setarg", f2, 0, obj]
["setarg", f2, 1, arg0]
...
["invoke", f2, dest]
["LABEL", "done"]
```
The streamline optimizer can eliminate the dead branch when the type of `obj` is statically known.
## Labels and Control Flow
## JSMCode Structure
Control flow uses named labels instead of numeric offsets:
```json
["LABEL", "loop_start"]
["add_int", 1, 1, 2]
["jump_false", 3, "loop_end"]
["jump", "loop_start"]
["LABEL", "loop_end"]
```
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution. The Mach serializer converts label names to numeric offsets in the binary bytecode.
## Nop Convention
The streamline optimizer replaces eliminated instructions with nop strings (e.g., `_nop_tc_1`, `_nop_bl_2`). Nop strings are skipped during interpretation and native code emission but preserved in the instruction array to maintain positional stability for jump targets.
## Internal Structures
### JSMCode (Mcode Interpreter)
```c
struct JSMCode {
uint16_t nr_args; // argument count
uint16_t nr_slots; // register count
cJSON **instrs; // pre-flattened instruction array
cJSON **instrs; // instruction array
uint32_t instr_count; // number of instructions
struct {
@@ -70,74 +323,25 @@ struct JSMCode {
cJSON *json_root; // keeps JSON alive
const char *name; // function name
const char *filename; // source file
uint16_t disruption_pc; // exception handler offset
uint16_t disruption_pc; // disruption handler offset
};
```
## Instruction Format
### JSCodeRegister (Mach VM Bytecode)
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands (typically `[op, dest, ...args, line, col]`):
```json
["access", 3, 5, 1, 9]
["load_index", 10, 4, 9, 5, 11]
["store_dynamic", 4, 11, 12, 6, 3]
["frame", 15, 14, 1, 7, 7]
["setarg", 15, 0, 16, 7, 7]
["invoke", 15, 13, 7, 7]
```c
struct JSCodeRegister {
uint16_t arity; // argument count
uint16_t nr_slots; // total register count
uint32_t cpool_count; // constant pool size
JSValue *cpool; // constant pool
uint32_t instr_count; // instruction count
MachInstr32 *instructions; // 32-bit instruction array
uint32_t func_count; // nested function count
JSCodeRegister **functions; // nested function table
JSValue name; // function name
uint16_t disruption_pc; // disruption handler offset
};
```
### Typed Load/Store
Memory operations come in typed variants for optimization:
- `load_index dest, obj, idx` — array element by integer index
- `load_field dest, obj, key` — record property by string key
- `load_dynamic dest, obj, key` — unknown; dispatches at runtime
- `store_index obj, val, idx` — array element store
- `store_field obj, val, key` — record property store
- `store_dynamic obj, val, key` — unknown; dispatches at runtime
The compiler selects the appropriate variant based on `type_tag` and `access_kind` annotations from parse and fold.
### Decomposed Calls
Function calls are split into separate instructions:
- `frame dest, fn, argc` — allocate call frame
- `setarg frame, idx, val` — set argument
- `invoke frame, result` — execute the call
## Labels
Control flow uses named labels instead of numeric offsets:
```json
["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]
```
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.
## Differences from Mach
| Property | Mcode | Mach |
|----------|-------|------|
| Instructions | cJSON arrays | 32-bit binary |
| Dispatch | String comparison | Switch on opcode byte |
| Constants | Inline in JSON | Separate constant pool |
| Jump targets | Named labels | Numeric offsets |
| Memory | Heap (cJSON nodes) | Off-heap (malloc) |
## Purpose
Mcode serves as an inspectable, debuggable intermediate format:
- **Human-readable** — the JSON representation can be printed and examined
- **Language-independent** — any tool that produces the correct JSON can target the ƿit runtime
- **Compilation target** — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation
The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.
The Mach serializer (`mach.c`) converts the JSON mcode into compact 32-bit instructions with a constant pool. See [Register VM](mach.md) for the binary encoding formats.