This commit is contained in:
2026-02-08 08:25:48 -06:00
parent bae4e957e9
commit a4f3b025c5
27 changed files with 2044 additions and 8 deletions

90
docs/spec/mcode.md Normal file
View File

@@ -0,0 +1,90 @@
---
title: "Mcode IR"
description: "JSON-based intermediate representation"
---
## Overview
Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.
## Pipeline
```
Source → Tokenize → Parse (AST) → Mcode (JSON) → Interpret
→ Compile to Mach (planned)
→ Compile to native (planned)
```
Mcode is produced by the `JS_Mcode` compiler pass, which emits a cJSON tree. The mcode interpreter walks this tree directly, dispatching on instruction name strings.
## JSMCode Structure
```c
struct JSMCode {
uint16_t nr_args; // argument count
uint16_t nr_slots; // register count
cJSON **instrs; // pre-flattened instruction array
uint32_t instr_count; // number of instructions
struct {
const char *name; // label name
uint32_t index; // instruction index
} *labels;
uint32_t label_count;
struct JSMCode **functions; // nested functions
uint32_t func_count;
cJSON *json_root; // keeps JSON alive
const char *name; // function name
const char *filename; // source file
uint16_t disruption_pc; // exception handler offset
};
```
## Instruction Format
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands:
```json
["LOADK", 0, 42]
["ADD", 2, 0, 1]
["JMPFALSE", 3, "else_label"]
["CALL", 0, 2, 1]
```
The instruction set mirrors the Mach VM opcodes — same operations, same register semantics, but with string dispatch instead of numeric opcodes.
## Labels
Control flow uses named labels instead of numeric offsets:
```json
["LABEL", "loop_start"]
["ADD", 1, 1, 2]
["JMPFALSE", 3, "loop_end"]
["JMP", "loop_start"]
["LABEL", "loop_end"]
```
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.
## Differences from Mach
| Property | Mcode | Mach |
|----------|-------|------|
| Instructions | cJSON arrays | 32-bit binary |
| Dispatch | String comparison | Switch on opcode byte |
| Constants | Inline in JSON | Separate constant pool |
| Jump targets | Named labels | Numeric offsets |
| Memory | Heap (cJSON nodes) | Off-heap (malloc) |
## Purpose
Mcode serves as an inspectable, debuggable intermediate format:
- **Human-readable** — the JSON representation can be printed and examined
- **Language-independent** — any tool that produces the correct JSON can target the ƿit runtime
- **Compilation target** — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation
The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.