347 lines
13 KiB
Markdown
347 lines
13 KiB
Markdown
---
|
|
title: "Mcode IR"
|
|
description: "Instruction set reference for the JSON-based intermediate representation"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Mcode is the intermediate representation at the center of the ƿit compilation pipeline. All source code is lowered to mcode before execution or native compilation. The mcode instruction set is the **authoritative reference** for the operations supported by the ƿit runtime — the Mach VM bytecode is a direct binary encoding of these same instructions.
|
|
|
|
```
|
|
Source → Tokenize → Parse → Fold → Mcode → Streamline → Machine
|
|
```
|
|
|
|
Mcode is produced by `mcode.cm`, optimized by `streamline.cm`, then either serialized to 32-bit bytecode for the Mach VM (`mach.c`), or lowered to QBE/LLVM IL for native compilation (`qbe_emit.cm`). See [Compilation Pipeline](pipeline.md) for the full overview.
|
|
|
|
## Instruction Format
|
|
|
|
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands. The last two elements are line and column numbers for source mapping:
|
|
|
|
```json
|
|
["add_int", dest, a, b, line, col]
|
|
["load_field", dest, obj, "key", line, col]
|
|
["jump", "label_name"]
|
|
```
|
|
|
|
Operands are register slot numbers (integers), constant values (strings, numbers), or label names (strings).
|
|
|
|
## Instruction Reference
|
|
|
|
### Loading and Constants
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `access` | `dest, name` | Load variable by name (intrinsic or environment) |
|
|
| `int` | `dest, value` | Load integer constant |
|
|
| `true` | `dest` | Load boolean `true` |
|
|
| `false` | `dest` | Load boolean `false` |
|
|
| `null` | `dest` | Load `null` |
|
|
| `move` | `dest, src` | Copy register value |
|
|
| `function` | `dest, id` | Load nested function by index |
|
|
| `regexp` | `dest, pattern` | Create regexp object |
|
|
|
|
### Arithmetic — Integer
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `add_int` | `dest, a, b` | `dest = a + b` (integer) |
|
|
| `sub_int` | `dest, a, b` | `dest = a - b` (integer) |
|
|
| `mul_int` | `dest, a, b` | `dest = a * b` (integer) |
|
|
| `div_int` | `dest, a, b` | `dest = a / b` (integer) |
|
|
| `mod_int` | `dest, a, b` | `dest = a % b` (integer) |
|
|
| `neg_int` | `dest, src` | `dest = -src` (integer) |
|
|
|
|
### Arithmetic — Float
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `add_float` | `dest, a, b` | `dest = a + b` (float) |
|
|
| `sub_float` | `dest, a, b` | `dest = a - b` (float) |
|
|
| `mul_float` | `dest, a, b` | `dest = a * b` (float) |
|
|
| `div_float` | `dest, a, b` | `dest = a / b` (float) |
|
|
| `mod_float` | `dest, a, b` | `dest = a % b` (float) |
|
|
| `neg_float` | `dest, src` | `dest = -src` (float) |
|
|
|
|
### Arithmetic — Generic
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `pow` | `dest, a, b` | `dest = a ^ b` (exponentiation) |
|
|
|
|
### Text
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `concat` | `dest, a, b` | `dest = a ~ b` (text concatenation) |
|
|
|
|
### Comparison — Integer
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `eq_int` | `dest, a, b` | `dest = a == b` (integer) |
|
|
| `ne_int` | `dest, a, b` | `dest = a != b` (integer) |
|
|
| `lt_int` | `dest, a, b` | `dest = a < b` (integer) |
|
|
| `le_int` | `dest, a, b` | `dest = a <= b` (integer) |
|
|
| `gt_int` | `dest, a, b` | `dest = a > b` (integer) |
|
|
| `ge_int` | `dest, a, b` | `dest = a >= b` (integer) |
|
|
|
|
### Comparison — Float
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `eq_float` | `dest, a, b` | `dest = a == b` (float) |
|
|
| `ne_float` | `dest, a, b` | `dest = a != b` (float) |
|
|
| `lt_float` | `dest, a, b` | `dest = a < b` (float) |
|
|
| `le_float` | `dest, a, b` | `dest = a <= b` (float) |
|
|
| `gt_float` | `dest, a, b` | `dest = a > b` (float) |
|
|
| `ge_float` | `dest, a, b` | `dest = a >= b` (float) |
|
|
|
|
### Comparison — Text
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `eq_text` | `dest, a, b` | `dest = a == b` (text) |
|
|
| `ne_text` | `dest, a, b` | `dest = a != b` (text) |
|
|
| `lt_text` | `dest, a, b` | `dest = a < b` (lexicographic) |
|
|
| `le_text` | `dest, a, b` | `dest = a <= b` (lexicographic) |
|
|
| `gt_text` | `dest, a, b` | `dest = a > b` (lexicographic) |
|
|
| `ge_text` | `dest, a, b` | `dest = a >= b` (lexicographic) |
|
|
|
|
### Comparison — Boolean
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `eq_bool` | `dest, a, b` | `dest = a == b` (boolean) |
|
|
| `ne_bool` | `dest, a, b` | `dest = a != b` (boolean) |
|
|
|
|
### Comparison — Special
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `is_identical` | `dest, a, b` | Object identity check (same reference) |
|
|
| `eq_tol` | `dest, a, b` | Equality with tolerance |
|
|
| `ne_tol` | `dest, a, b` | Inequality with tolerance |
|
|
|
|
### Type Checks
|
|
|
|
Inlined from intrinsic function calls. Each sets `dest` to `true` or `false`.
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `is_int` | `dest, src` | Check if integer |
|
|
| `is_num` | `dest, src` | Check if number (integer or float) |
|
|
| `is_text` | `dest, src` | Check if text |
|
|
| `is_bool` | `dest, src` | Check if logical |
|
|
| `is_null` | `dest, src` | Check if null |
|
|
| `is_array` | `dest, src` | Check if array |
|
|
| `is_func` | `dest, src` | Check if function |
|
|
| `is_record` | `dest, src` | Check if record (object) |
|
|
| `is_stone` | `dest, src` | Check if stone (immutable) |
|
|
| `is_proxy` | `dest, src` | Check if function proxy (arity 2) |
|
|
|
|
### Logical
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `not` | `dest, src` | Logical NOT |
|
|
| `and` | `dest, a, b` | Logical AND |
|
|
| `or` | `dest, a, b` | Logical OR |
|
|
|
|
### Bitwise
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `bitand` | `dest, a, b` | Bitwise AND |
|
|
| `bitor` | `dest, a, b` | Bitwise OR |
|
|
| `bitxor` | `dest, a, b` | Bitwise XOR |
|
|
| `bitnot` | `dest, src` | Bitwise NOT |
|
|
| `shl` | `dest, a, b` | Shift left |
|
|
| `shr` | `dest, a, b` | Arithmetic shift right |
|
|
| `ushr` | `dest, a, b` | Unsigned shift right |
|
|
|
|
### Property Access
|
|
|
|
Memory operations come in typed variants. The compiler selects the appropriate variant based on `type_tag` and `access_kind` annotations from parse and fold.
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `load_field` | `dest, obj, key` | Load record property by string key |
|
|
| `store_field` | `obj, val, key` | Store record property by string key |
|
|
| `load_index` | `dest, obj, idx` | Load array element by integer index |
|
|
| `store_index` | `obj, val, idx` | Store array element by integer index |
|
|
| `load_dynamic` | `dest, obj, key` | Load property (dispatches at runtime) |
|
|
| `store_dynamic` | `obj, val, key` | Store property (dispatches at runtime) |
|
|
| `delete` | `obj, key` | Delete property |
|
|
| `in` | `dest, obj, key` | Check if property exists |
|
|
| `length` | `dest, src` | Get length of array or text |
|
|
|
|
### Object and Array Construction
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `record` | `dest` | Create empty record `{}` |
|
|
| `array` | `dest, n` | Create empty array (elements added via `push`) |
|
|
| `push` | `arr, val` | Push value to array |
|
|
| `pop` | `dest, arr` | Pop value from array |
|
|
|
|
### Function Calls
|
|
|
|
Function calls are decomposed into three instructions:
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `frame` | `dest, fn, argc` | Allocate call frame for `fn` with `argc` arguments |
|
|
| `setarg` | `frame, idx, val` | Set argument `idx` in call frame |
|
|
| `invoke` | `frame, result` | Execute the call, store result |
|
|
| `goframe` | `dest, fn, argc` | Allocate frame for async/concurrent call |
|
|
| `goinvoke` | `frame, result` | Invoke async/concurrent call |
|
|
|
|
### Variable Resolution
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `access` | `dest, name` | Load variable (intrinsic or module environment) |
|
|
| `set_var` | `name, src` | Set top-level variable by name |
|
|
| `get` | `dest, level, slot` | Get closure variable from parent scope |
|
|
| `put` | `level, slot, src` | Set closure variable in parent scope |
|
|
|
|
### Control Flow
|
|
|
|
| Instruction | Operands | Description |
|
|
|-------------|----------|-------------|
|
|
| `LABEL` | `name` | Define a named label (not executed) |
|
|
| `jump` | `label` | Unconditional jump |
|
|
| `jump_true` | `cond, label` | Jump if `cond` is true |
|
|
| `jump_false` | `cond, label` | Jump if `cond` is false |
|
|
| `jump_not_null` | `val, label` | Jump if `val` is not null |
|
|
| `return` | `src` | Return value from function |
|
|
| `disrupt` | — | Trigger disruption (error) |
|
|
|
|
## Typed Instruction Design
|
|
|
|
A key design principle of mcode is that **every type check is an explicit instruction**. Arithmetic and comparison operations come in type-specialized variants (`add_int`, `add_float`, `eq_text`, etc.) rather than a single polymorphic instruction.
|
|
|
|
When type information is available from the fold stage, the compiler emits the typed variant directly. When the type is unknown, the compiler emits a type-check/dispatch pattern:
|
|
|
|
```json
|
|
["is_int", check, a]
|
|
["jump_false", check, "float_path"]
|
|
["add_int", dest, a, b]
|
|
["jump", "done"]
|
|
["LABEL", "float_path"]
|
|
["add_float", dest, a, b]
|
|
["LABEL", "done"]
|
|
```
|
|
|
|
The [Streamline Optimizer](streamline.md) eliminates dead branches when types are statically known, collapsing the dispatch to a single typed instruction.
|
|
|
|
## Intrinsic Inlining
|
|
|
|
The mcode compiler recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence:
|
|
|
|
| Source call | Emitted instruction |
|
|
|-------------|-------------------|
|
|
| `is_array(x)` | `is_array dest, src` |
|
|
| `is_function(x)` | `is_func dest, src` |
|
|
| `is_object(x)` | `is_record dest, src` |
|
|
| `is_stone(x)` | `is_stone dest, src` |
|
|
| `is_integer(x)` | `is_int dest, src` |
|
|
| `is_text(x)` | `is_text dest, src` |
|
|
| `is_number(x)` | `is_num dest, src` |
|
|
| `is_logical(x)` | `is_bool dest, src` |
|
|
| `is_null(x)` | `is_null dest, src` |
|
|
| `length(x)` | `length dest, src` |
|
|
| `push(arr, val)` | `push arr, val` |
|
|
|
|
## Function Proxy Decomposition
|
|
|
|
When the compiler encounters a method call `obj.method(args)`, it emits a branching pattern to handle ƿit's function proxy protocol. An arity-2 function used as a proxy target receives the method name and argument array instead of a normal method call:
|
|
|
|
```json
|
|
["is_proxy", check, obj]
|
|
["jump_false", check, "record_path"]
|
|
|
|
["access", name_slot, "method"]
|
|
["array", args_arr, N, arg0, arg1]
|
|
["null", null_slot]
|
|
["frame", f, obj, 2]
|
|
["setarg", f, 0, null_slot]
|
|
["setarg", f, 1, name_slot]
|
|
["setarg", f, 2, args_arr]
|
|
["invoke", f, dest]
|
|
["jump", "done"]
|
|
|
|
["LABEL", "record_path"]
|
|
["load_field", method, obj, "method"]
|
|
["frame", f2, method, N]
|
|
["setarg", f2, 0, obj]
|
|
["setarg", f2, 1, arg0]
|
|
["invoke", f2, dest]
|
|
|
|
["LABEL", "done"]
|
|
```
|
|
|
|
## Labels and Control Flow
|
|
|
|
Control flow uses named labels instead of numeric offsets:
|
|
|
|
```json
|
|
["LABEL", "loop_start"]
|
|
["add_int", 1, 1, 2]
|
|
["jump_false", 3, "loop_end"]
|
|
["jump", "loop_start"]
|
|
["LABEL", "loop_end"]
|
|
```
|
|
|
|
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution. The Mach serializer converts label names to numeric offsets in the binary bytecode.
|
|
|
|
## Nop Convention
|
|
|
|
The streamline optimizer replaces eliminated instructions with nop strings (e.g., `_nop_tc_1`, `_nop_bl_2`). Nop strings are skipped during interpretation and native code emission but preserved in the instruction array to maintain positional stability for jump targets.
|
|
|
|
## Internal Structures
|
|
|
|
### JSMCode (Mcode Interpreter)
|
|
|
|
```c
|
|
struct JSMCode {
|
|
uint16_t nr_args; // argument count
|
|
uint16_t nr_slots; // register count
|
|
cJSON **instrs; // instruction array
|
|
uint32_t instr_count; // number of instructions
|
|
|
|
struct {
|
|
const char *name; // label name
|
|
uint32_t index; // instruction index
|
|
} *labels;
|
|
uint32_t label_count;
|
|
|
|
struct JSMCode **functions; // nested functions
|
|
uint32_t func_count;
|
|
|
|
cJSON *json_root; // keeps JSON alive
|
|
const char *name; // function name
|
|
const char *filename; // source file
|
|
uint16_t disruption_pc; // disruption handler offset
|
|
};
|
|
```
|
|
|
|
### JSCodeRegister (Mach VM Bytecode)
|
|
|
|
```c
|
|
struct JSCodeRegister {
|
|
uint16_t arity; // argument count
|
|
uint16_t nr_slots; // total register count
|
|
uint32_t cpool_count; // constant pool size
|
|
JSValue *cpool; // constant pool
|
|
uint32_t instr_count; // instruction count
|
|
MachInstr32 *instructions; // 32-bit instruction array
|
|
uint32_t func_count; // nested function count
|
|
JSCodeRegister **functions; // nested function table
|
|
JSValue name; // function name
|
|
uint16_t disruption_pc; // disruption handler offset
|
|
};
|
|
```
|
|
|
|
The Mach serializer (`mach.c`) converts the JSON mcode into compact 32-bit instructions with a constant pool. See [Register VM](mach.md) for the binary encoding formats.
|