Merge branch 'optimize_mcode'

This commit is contained in:
2026-02-21 19:42:19 -06:00
34 changed files with 53477 additions and 121900 deletions

View File

@@ -30,6 +30,10 @@ Each stage has a corresponding CLI tool that lets you see its output.
| streamline | `streamline.ce --ir` | Human-readable canonical IR |
| disasm | `disasm.ce` | Source-interleaved disassembly |
| disasm | `disasm.ce --optimized` | Optimized source-interleaved disassembly |
| diff | `diff_ir.ce` | Mcode vs streamline instruction diff |
| xref | `xref.ce` | Cross-reference / call creation graph |
| cfg | `cfg.ce` | Control flow graph (basic blocks) |
| slots | `slots.ce` | Slot data flow / use-def chains |
| all | `ir_report.ce` | Structured optimizer flight recorder |
All tools take a source file as input and run the pipeline up to the relevant stage.
@@ -141,6 +145,160 @@ Function creation instructions include a cross-reference annotation showing the
3 function 5, 12 :235 ; -> [12] helper_fn
```
## diff_ir.ce
Compares mcode IR (before optimization) with streamline IR (after optimization), showing what the optimizer changed. Useful for understanding which instructions were eliminated, specialized, or rewritten.
```bash
cell diff_ir <file> # diff all functions
cell diff_ir --fn <N|name> <file> # diff only one function
cell diff_ir --summary <file> # counts only
```
| Flag | Description |
|------|-------------|
| (none) | Show all diffs with source interleaving |
| `--fn <N\|name>` | Filter to specific function by index or name |
| `--summary` | Show only eliminated/rewritten counts per function |
### Output Format
Changed instructions are shown in diff style with `-` (before) and `+` (after) lines:
```
=== [0] <anonymous> (args=1, slots=40) ===
17 eliminated, 51 rewritten
--- line 4: if (n <= 1) { ---
- 1 is_int 4, 1 :4
+ 1 is_int 3, 1 :4 (specialized)
- 3 is_int 5, 2 :4
+ 3 _nop_tc_1 (eliminated)
```
Summary mode gives a quick overview:
```
[0] <anonymous>: 17 eliminated, 51 rewritten
[1] <anonymous>: 65 eliminated, 181 rewritten
total: 86 eliminated, 250 rewritten across 4 functions
```
## xref.ce
Cross-reference / call graph tool. Shows which functions create other functions (via `function` instructions), building a creation tree.
```bash
cell xref <file> # full creation tree
cell xref --callers <N> <file> # who creates function [N]?
cell xref --callees <N> <file> # what does [N] create/call?
cell xref --dot <file> # DOT graph for graphviz
cell xref --optimized <file> # use optimized IR
```
| Flag | Description |
|------|-------------|
| (none) | Indented creation tree from main |
| `--callers <N>` | Show which functions create function [N] |
| `--callees <N>` | Show what function [N] creates (use -1 for main) |
| `--dot` | Output DOT format for graphviz |
| `--optimized` | Use optimized IR instead of raw mcode |
### Output Format
Default tree view:
```
demo_disasm.cm
[0] <anonymous>
[1] <anonymous>
[2] <anonymous>
```
Caller/callee query:
```
Callers of [0] <anonymous>:
demo_disasm.cm at line 3
```
DOT output can be piped to graphviz: `cell xref --dot file.cm | dot -Tpng -o xref.png`
## cfg.ce
Control flow graph tool. Identifies basic blocks from labels and jumps, computes edges, and detects loop back-edges.
```bash
cell cfg --fn <N|name> <file> # text CFG for function
cell cfg --dot --fn <N|name> <file> # DOT output for graphviz
cell cfg <file> # text CFG for all functions
cell cfg --optimized <file> # use optimized IR
```
| Flag | Description |
|------|-------------|
| `--fn <N\|name>` | Filter to specific function by index or name |
| `--dot` | Output DOT format for graphviz |
| `--optimized` | Use optimized IR instead of raw mcode |
### Output Format
```
=== [0] <anonymous> ===
B0 [pc 0-2, line 4]:
0 access 2, 1
1 is_int 4, 1
2 jump_false 4, "rel_ni_2"
-> B3 "rel_ni_2" (jump)
-> B1 (fallthrough)
B1 [pc 3-4, line 4]:
3 is_int 5, 2
4 jump_false 5, "rel_ni_2"
-> B3 "rel_ni_2" (jump)
-> B2 (fallthrough)
```
Each block shows its ID, PC range, source lines, instructions, and outgoing edges. Loop back-edges (target PC <= source PC) are annotated.
## slots.ce
Slot data flow analysis. Builds use-def chains for every slot in a function, showing where each slot is defined and used. Optionally captures type information from streamline.
```bash
cell slots --fn <N|name> <file> # slot summary for function
cell slots --slot <N> --fn <N|name> <file> # trace slot N
cell slots <file> # slot summary for all functions
```
| Flag | Description |
|------|-------------|
| `--fn <N\|name>` | Filter to specific function by index or name |
| `--slot <N>` | Show chronological DEF/USE trace for a specific slot |
### Output Format
Summary shows each slot with its def count, use count, inferred type, and first definition. Dead slots (defined but never used) are flagged:
```
=== [0] <anonymous> (args=1, slots=40) ===
slot defs uses type first-def
s0 0 0 - (this)
s1 0 10 - (arg 0)
s2 1 6 - pc 0: access
s10 1 0 - pc 29: invoke <- dead
```
Slot trace (`--slot N`) shows every DEF and USE in program order:
```
=== slot 3 in [0] <anonymous> ===
DEF pc 5: le_int 3, 1, 2 :4
DEF pc 11: le_float 3, 1, 2 :4
DEF pc 17: le_text 3, 1, 2 :4
USE pc 31: jump_false 3, "if_else_0" :4
```
## seed.ce
Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.

View File

@@ -93,3 +93,13 @@ Arithmetic ops (ADD, SUB, MUL, DIV, MOD, POW) are executed inline without callin
DIV and MOD check for zero divisor (→ null). POW uses `pow()` with non-finite handling for finite inputs.
Comparison ops (EQ through GE) and bitwise ops still use `reg_vm_binop()` for their slow paths, as they handle a wider range of type combinations (string comparisons, null equality, etc.).
## String Concatenation
CONCAT has a three-tier dispatch for self-assign patterns (`concat R(A), R(A), R(C)` where dest equals the left operand):
1. **In-place append**: If `R(A)` is a mutable heap text (S bit clear) with `length + rhs_length <= cap56`, characters are appended directly. Zero allocation, zero GC.
2. **Growth allocation** (`JS_ConcatStringGrow`): Allocates a new text with 2x capacity and does **not** stone the result, leaving it mutable for subsequent appends.
3. **Exact-fit stoned** (`JS_ConcatString`): Used when dest differs from the left operand (normal non-self-assign concat).
The `stone_text` instruction (iABC, B=0, C=0) sets the S bit on a mutable heap text in `R(A)`. For non-pointer values or already-stoned text, it is a no-op. This instruction is emitted by the streamline optimizer at escape points; see [Streamline — insert_stone_text](streamline.md#7-insert_stone_text-mutable-text-escape-analysis) and [Stone Memory — Mutable Text](stone.md#mutable-text-concatenation).

View File

@@ -101,6 +101,11 @@ Operands are register slot numbers (integers), constant values (strings, numbers
| Instruction | Operands | Description |
|-------------|----------|-------------|
| `concat` | `dest, a, b` | `dest = a ~ b` (text concatenation) |
| `stone_text` | `slot` | Stone a mutable text value (see below) |
The `stone_text` instruction is emitted by the streamline optimizer's escape analysis pass (`insert_stone_text`). It freezes a mutable text value before it escapes its defining slot — for example, before a `move`, `setarg`, `store_field`, `push`, or `put`. The instruction is only inserted when the slot is provably `T_TEXT`; non-text values never need stoning. See [Streamline Optimizer — insert_stone_text](streamline.md#7-insert_stone_text-mutable-text-escape-analysis) for details.
At the VM level, `stone_text` is a single-operand instruction (iABC with B=0, C=0). If the slot holds a heap text without the S bit set, it sets the S bit. For all other values (integers, booleans, already-stoned text, etc.), it is a no-op.
### Comparison — Integer

View File

@@ -77,6 +77,30 @@ Messages between actors are stoned before delivery, ensuring actors never share
Literal objects and arrays that can be determined at compile time may be allocated directly in stone memory.
## Mutable Text Concatenation
String concatenation in a loop (`s = s + "x"`) is optimized to O(n) amortized by leaving concat results **unstoned** with over-allocated capacity. On the next concatenation, if the destination text is mutable (S bit clear) and has enough room, the VM appends in-place with zero allocation.
### How It Works
When the VM executes `concat dest, dest, src` (same destination and left operand — a self-assign pattern):
1. **Inline fast path**: If `dest` holds a heap text, is not stoned, and `length + src_length <= capacity` — append characters in place, update length, done. No allocation, no GC possible.
2. **Growth path** (`JS_ConcatStringGrow`): Allocate a new text with `capacity = max(new_length * 2, 16)`, copy both operands, and return the result **without stoning** it. The 2x growth factor means a loop of N concatenations does O(log N) allocations totaling O(N) character copies.
3. **Exact-fit path** (`JS_ConcatString`): When `dest != left` (not self-assign), the existing exact-fit stoned path is used. This is the normal case for expressions like `var c = a + b`.
### Safety Invariant
**An unstoned heap text is uniquely referenced by exactly one slot.** This is enforced by the `stone_text` mcode instruction, which the [streamline optimizer](streamline.md#7-insert_stone_text-mutable-text-escape-analysis) inserts before any instruction that would create a second reference to the value (move, store, push, setarg, put). Two VM-level guards cover cases where the compiler cannot prove the type: `get` (closure reads) and `return` (inter-frame returns).
### Why Over-Allocation Is GC-Safe
- The copying collector copies based on `cap56` (the object header's capacity field), not `length`. Over-allocated capacity survives GC.
- `js_alloc_string` zero-fills the packed data region, so padding beyond `length` is always clean.
- String comparisons, hashing, and interning all use `length`, not `cap56`. Extra capacity is invisible to string operations.
## Relationship to GC
The Cheney copying collector only operates on the mutable heap. During collection, when the collector encounters a pointer to stone memory (S bit set), it skips it — stone objects are roots that never move. This means stone memory acts as a permanent root set with zero GC overhead.

View File

@@ -164,7 +164,44 @@ Removes `move a, a` instructions where the source and destination are the same s
**Nop prefix:** `_nop_mv_`
### 7. eliminate_unreachable (dead code after return)
### 7. insert_stone_text (mutable text escape analysis)
Inserts `stone_text` instructions before mutable text values escape their defining slot. This pass supports the mutable text concatenation optimization (see [Stone Memory — Mutable Text](stone.md#mutable-text-concatenation)), which leaves `concat` results unstoned with excess capacity so that subsequent `s = s + x` can append in-place.
The invariant is: **an unstoned heap text is uniquely referenced by exactly one slot.** This pass ensures that whenever a text value is copied or shared (via move, store, push, function argument, closure write, etc.), it is stoned first.
**Algorithm:**
1. **Compute liveness.** Build `first_ref[slot]` and `last_ref[slot]` arrays by scanning all instructions. Extend live ranges for backward jumps (loops): if a backward jump targets label L at position `lpos`, every slot referenced between `lpos` and the jump has its `last_ref` extended to the jump position.
2. **Forward walk with type tracking.** Walk instructions using `track_types` to maintain per-slot types. At each escape point, if the escaping slot is provably `T_TEXT`, insert `stone_text slot` before the instruction.
3. **Move special case.** For `move dest, src`: only insert `stone_text src` if the source is `T_TEXT` **and** `last_ref[src] > i` (the source slot is still live after the move, meaning both slots alias the same text). If the source is dead after the move, the value transfers uniquely — no stoning needed.
**Escape points and the slot that gets stoned:**
| Instruction | Stoned slot | Why it escapes |
|---|---|---|
| `move` | source (if still live) | Two slots alias the same value |
| `store_field` | value | Stored to object property |
| `store_index` | value | Stored to array element |
| `store_dynamic` | value | Dynamic property store |
| `push` | value | Pushed to array |
| `setarg` | value | Passed as function argument |
| `put` | source | Written to outer closure frame |
**Not handled by this pass** (handled by VM guards instead):
| Instruction | Reason |
|---|---|
| `get` (closure read) | Value arrives from outer frame; type may be T_UNKNOWN at compile time |
| `return` | Return value's type may be T_UNKNOWN; VM stones at inter-frame boundary |
These two cases use runtime `stone_mutable_text` guards in the VM because the streamline pass cannot always prove the slot type across frame boundaries.
**Nop prefix:** none (inserts instructions, does not create nops)
### 8. eliminate_unreachable (dead code after return)
Nops instructions after `return` until the next real label. Only `return` is treated as a terminal instruction; `disrupt` is not, because the disruption handler code immediately follows `disrupt` and must remain reachable.
@@ -172,13 +209,13 @@ The mcode compiler emits a label at disruption handler entry points (see `emit_l
**Nop prefix:** `_nop_ur_`
### 8. eliminate_dead_jumps (jump-to-next-label elimination)
### 9. eliminate_dead_jumps (jump-to-next-label elimination)
Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.
**Nop prefix:** `_nop_dj_`
### 9. diagnose_function (compile-time diagnostics)
### 10. diagnose_function (compile-time diagnostics)
Optional pass that runs when `_warn` is set on the mcode input. Performs a forward type-tracking scan and emits diagnostics for provably wrong operations. Diagnostics are collected in `ir._diagnostics` as `{severity, file, line, col, message}` records.
@@ -219,6 +256,7 @@ eliminate_type_checks → uses param_types + write_types
simplify_algebra
simplify_booleans
eliminate_moves
insert_stone_text → escape analysis for mutable text
eliminate_unreachable
eliminate_dead_jumps
diagnose_function → optional, when _warn is set
@@ -286,7 +324,9 @@ move 2, 7 // i = temp
subtract 2, 2, 6 // i = i - 1 (direct)
```
The `+` operator is excluded from target slot propagation when it would use the full text+num dispatch (i.e., when neither operand is a known number), because writing both `concat` and `add` to the variable's slot would pollute its write type. When the known-number shortcut applies, `+` uses `emit_numeric_binop` and would be safe for target propagation, but this is not currently implemented — the exclusion is by operator kind, not by dispatch path.
The `+` operator uses target slot propagation when the target slot equals the left operand (`target == left_slot`), i.e. for self-assign patterns like `s = s + x`. In this case both `concat` and `add` write to the same slot that already holds the left operand, so write-type pollution is acceptable — the value is being updated in place. For other cases (target differs from left operand), `+` still allocates a temp to avoid polluting the target slot's write type with both T_TEXT and T_NUM.
This enables the VM's in-place append fast path for string concatenation: when `concat dest, dest, src` has the same destination and left operand, the VM can append directly to a mutable text's excess capacity without allocating.
## Debugging Tools
@@ -375,7 +415,7 @@ This was implemented and tested but causes a bootstrap failure during self-hosti
### Target Slot Propagation for Add with Known Numbers
When the known-number add shortcut applies (one operand is a literal number), the generated code uses `emit_numeric_binop` which has a single write path. Target slot propagation should be safe in this case, but is currently blocked by the blanket `kind != "+"` exclusion. Refining the exclusion to check whether the shortcut will apply (by testing `is_known_number` on either operand) would enable direct writes for patterns like `i = i + 1`.
When the known-number add shortcut applies (one operand is a literal number), the generated code uses `emit_numeric_binop` which has a single write path. Target slot propagation is already enabled for the self-assign case (`i = i + 1`), but when the target differs from the left operand and neither operand is a known number, a temp is still used. Refining the exclusion to check `is_known_number` would enable direct writes for the remaining non-self-assign cases like `j = i + 1`.
### Forward Type Narrowing from Typed Operations