111 lines
3.9 KiB
Markdown
111 lines
3.9 KiB
Markdown
---
|
|
title: "Mcode IR"
|
|
description: "JSON-based intermediate representation"
|
|
---
|
|
|
|
## Overview
|
|
|
|
Mcode is a JSON-based intermediate representation that can be interpreted directly. It represents the same operations as the Mach register VM but uses string-based instruction dispatch rather than binary opcodes. Mcode is intended as an intermediate step toward native code compilation.
|
|
|
|
## Pipeline
|
|
|
|
```
|
|
Source → Tokenize → Parse (AST) → Fold → Mcode (JSON) → Streamline → Interpret
|
|
→ QBE → Native
|
|
```
|
|
|
|
Mcode is produced by `mcode.cm`, which lowers the folded AST to JSON instruction arrays. The streamline optimizer (`streamline.cm`) then eliminates redundant operations. The result can be interpreted by `mcode.c`, or lowered to QBE IL by `qbe_emit.cm` for native compilation. See [Compilation Pipeline](pipeline.md) for the full overview.
|
|
|
|
## JSMCode Structure
|
|
|
|
```c
|
|
struct JSMCode {
|
|
uint16_t nr_args; // argument count
|
|
uint16_t nr_slots; // register count
|
|
cJSON **instrs; // pre-flattened instruction array
|
|
uint32_t instr_count; // number of instructions
|
|
|
|
struct {
|
|
const char *name; // label name
|
|
uint32_t index; // instruction index
|
|
} *labels;
|
|
uint32_t label_count;
|
|
|
|
struct JSMCode **functions; // nested functions
|
|
uint32_t func_count;
|
|
|
|
cJSON *json_root; // keeps JSON alive
|
|
const char *name; // function name
|
|
const char *filename; // source file
|
|
uint16_t disruption_pc; // exception handler offset
|
|
};
|
|
```
|
|
|
|
## Instruction Format
|
|
|
|
Each instruction is a JSON array. The first element is the instruction name (string), followed by operands (typically `[op, dest, ...args, line, col]`):
|
|
|
|
```json
|
|
["access", 3, 5, 1, 9]
|
|
["load_index", 10, 4, 9, 5, 11]
|
|
["store_dynamic", 4, 11, 12, 6, 3]
|
|
["frame", 15, 14, 1, 7, 7]
|
|
["setarg", 15, 0, 16, 7, 7]
|
|
["invoke", 15, 13, 7, 7]
|
|
```
|
|
|
|
### Typed Load/Store
|
|
|
|
Memory operations come in typed variants for optimization:
|
|
|
|
- `load_index dest, obj, idx` — array element by integer index
|
|
- `load_field dest, obj, key` — record property by string key
|
|
- `load_dynamic dest, obj, key` — unknown; dispatches at runtime
|
|
- `store_index obj, val, idx` — array element store
|
|
- `store_field obj, val, key` — record property store
|
|
- `store_dynamic obj, val, key` — unknown; dispatches at runtime
|
|
|
|
The compiler selects the appropriate variant based on `type_tag` and `access_kind` annotations from parse and fold.
|
|
|
|
### Decomposed Calls
|
|
|
|
Function calls are split into separate instructions:
|
|
|
|
- `frame dest, fn, argc` — allocate call frame
|
|
- `setarg frame, idx, val` — set argument
|
|
- `invoke frame, result` — execute the call
|
|
|
|
## Labels
|
|
|
|
Control flow uses named labels instead of numeric offsets:
|
|
|
|
```json
|
|
["LABEL", "loop_start"]
|
|
["ADD", 1, 1, 2]
|
|
["JMPFALSE", 3, "loop_end"]
|
|
["JMP", "loop_start"]
|
|
["LABEL", "loop_end"]
|
|
```
|
|
|
|
Labels are collected into a name-to-index map during loading, enabling O(1) jump resolution.
|
|
|
|
## Differences from Mach
|
|
|
|
| Property | Mcode | Mach |
|
|
|----------|-------|------|
|
|
| Instructions | cJSON arrays | 32-bit binary |
|
|
| Dispatch | String comparison | Switch on opcode byte |
|
|
| Constants | Inline in JSON | Separate constant pool |
|
|
| Jump targets | Named labels | Numeric offsets |
|
|
| Memory | Heap (cJSON nodes) | Off-heap (malloc) |
|
|
|
|
## Purpose
|
|
|
|
Mcode serves as an inspectable, debuggable intermediate format:
|
|
|
|
- **Human-readable** — the JSON representation can be printed and examined
|
|
- **Language-independent** — any tool that produces the correct JSON can target the ƿit runtime
|
|
- **Compilation target** — the Mach compiler can consume mcode as input, and future native code generators can work from the same representation
|
|
|
|
The cost of string-based dispatch makes mcode slower than the binary Mach VM, so it is primarily useful during development and as a compilation intermediate rather than for production execution.
|