streamline mcode

2026-02-12 09:43:13 -06:00
parent 68fb440502
commit 900db912a5
22 changed files with 1475 additions and 93 deletions
--- a/docs/spec/pipeline.md
+++ b/docs/spec/pipeline.md
@@ -47,16 +47,22 @@ Lowers the AST to a JSON-based intermediate representation with explicit operati
 - **Typed load/store**: Emits `load_index` (array by integer), `load_field` (record by string), or `load_dynamic` (unknown) based on type information from fold.
 - **Decomposed calls**: Function calls are split into `frame` (create call frame) + `setarg` (set arguments) + `invoke` (execute call).
 - **Intrinsic access**: Intrinsic functions are loaded via `access` with an intrinsic marker rather than global lookup.
+- **Intrinsic inlining**: Type-check intrinsics (`is_array`, `is_text`, `is_number`, `is_integer`, `is_logical`, `is_null`, `is_function`, `is_object`, `is_stone`), `length`, and `push` are emitted as direct opcodes instead of frame/setarg/invoke call sequences.

 See [Mcode IR](mcode.md) for instruction format details.

 ### Streamline (`streamline.cm`)

-Optimizes the Mcode IR. Operates per-function:
+Optimizes the Mcode IR through a series of independent passes. Operates per-function:

- **Redundant instruction elimination**: Removes no-op patterns and redundant moves.
- **Dead code removal**: Eliminates instructions whose results are never used.
- **Type-based narrowing**: When type information is available, narrows `load_dynamic`/`store_dynamic` to typed variants.
+1. **Backward type inference**: Infers parameter types from how they are used in typed operators. Immutable `def` parameters keep their inferred type across label join points.
+2. **Type-check elimination**: When a slot's type is known, eliminates `is_<type>` + conditional jump pairs. Narrows `load_dynamic`/`store_dynamic` to typed variants.
+3. **Algebraic simplification**: Rewrites identity operations (add 0, multiply 1, divide 1) and folds same-slot comparisons.
+4. **Boolean simplification**: Fuses `not` + conditional jump into a single jump with inverted condition.
+5. **Move elimination**: Removes self-moves (`move a, a`).
+6. **Dead jump elimination**: Removes jumps to the immediately following label.
+
+See [Streamline Optimizer](streamline.md) for detailed pass descriptions.

 ### QBE Emit (`qbe_emit.cm`)

@@ -107,6 +113,14 @@ Generates QBE IL that can be compiled to native code.
 | `qbe.cm` | QBE IL operation templates |
 | `internal/bootstrap.cm` | Pipeline orchestrator |

+## Debug Tools
+
+| File | Purpose |
+|------|---------|
+| `dump_mcode.cm` | Print raw Mcode IR before streamlining |
+| `dump_stream.cm` | Print IR after streamlining with before/after stats |
+| `dump_types.cm` | Print streamlined IR with type annotations |
+
 ## Test Files

 | File | Tests |
@@ -116,3 +130,5 @@ Generates QBE IL that can be compiled to native code.
 | `mcode_test.ce` | Typed load/store, decomposed calls |
 | `streamline_test.ce` | Optimization counts, IR before/after |
 | `qbe_test.ce` | End-to-end QBE IL generation |
+| `test_intrinsics.cm` | Inlined intrinsic opcodes (is_array, length, push, etc.) |
+| `test_backward.cm` | Backward type propagation for parameters |
--- a/docs/spec/streamline.md
+++ b/docs/spec/streamline.md
@@ -0,0 +1,202 @@
+---
+title: "Streamline Optimizer"
+description: "Mcode IR optimization passes"
+---
+
+## Overview
+
+The streamline optimizer (`streamline.cm`) runs a series of independent passes over the Mcode IR to eliminate redundant operations. Each pass is a standalone function that can be enabled, disabled, or reordered. Passes communicate only through the instruction array they mutate in place, replacing eliminated instructions with nop strings (e.g., `_nop_tc_1`).
+
+The optimizer runs after `mcode.cm` generates the IR and before the result is lowered to the Mach VM or emitted as QBE IL.
+
+```
+Fold (AST) → Mcode (JSON IR) → Streamline → Mach VM / QBE
+```
+
+## Type Lattice
+
+The optimizer tracks a type for each slot in the register file:
+
+| Type | Meaning |
+|------|---------|
+| `unknown` | No type information |
+| `int` | Integer |
+| `float` | Floating-point |
+| `num` | Number (subsumes int and float) |
+| `text` | String |
+| `bool` | Logical (true/false) |
+| `null` | Null value |
+| `array` | Array |
+| `record` | Record (object) |
+| `function` | Function |
+| `blob` | Binary blob |
+
+Subsumption: `int` and `float` both satisfy a `num` check.
+
+## Passes
+
+### 1. infer_param_types (backward type inference)
+
+Scans all typed operators to determine what types their operands must be. For example, `add_int dest, a, b` implies both `a` and `b` are integers.
+
+When a parameter slot (1..nr_args) is consistently inferred as a single type, that type is recorded. Since parameters are immutable (`def`), the inferred type holds for the entire function and persists across label join points (loop headers, branch targets).
+
+Backward inference rules:
+
+| Operator class | Operand type inferred |
+|---|---|
+| `add_int`, `sub_int`, `mul_int`, `div_int`, `mod_int`, `eq_int`, comparisons, bitwise | T_INT |
+| `add_float`, `sub_float`, `mul_float`, `div_float`, `mod_float`, float comparisons | T_FLOAT |
+| `concat`, text comparisons | T_TEXT |
+| `eq_bool`, `ne_bool`, `not`, `and`, `or` | T_BOOL |
+| `store_index` (object operand) | T_ARRAY |
+| `store_index` (index operand) | T_INT |
+| `store_field` (object operand) | T_RECORD |
+| `push` (array operand) | T_ARRAY |
+
+When a slot appears with conflicting type inferences (e.g., used in both `add_int` and `concat` across different type-dispatch branches), the result is `unknown`. INT + FLOAT conflicts produce `num`.
+
+**Nop prefix:** none (analysis only, does not modify instructions)
+
+### 2. eliminate_type_checks (type-check + jump elimination)
+
+Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.
+
+Three cases:
+
+- **Known match** (e.g., `is_int` on a slot known to be `int`): both the check and the conditional jump are eliminated (nop'd).
+- **Known mismatch** (e.g., `is_text` on a slot known to be `int`): the check is nop'd and the conditional jump is rewritten to an unconditional `jump`.
+- **Unknown**: the check remains, but on fallthrough, the slot's type is narrowed to the checked type (enabling downstream eliminations).
+
+This pass also reduces `load_dynamic`/`store_dynamic` to `load_field`/`store_field` or `load_index`/`store_index` when the key slot's type is known.
+
+At label join points, all type information is reset except for parameter types seeded by the backward inference pass.
+
+**Nop prefix:** `_nop_tc_`
+
+### 3. simplify_algebra (algebraic identity + comparison folding)
+
+Tracks known constant values alongside types. Rewrites identity operations:
+
+| Pattern | Rewrite |
+|---------|---------|
+| `add_int dest, x, 0` | `move dest, x` |
+| `add_int dest, 0, x` | `move dest, x` |
+| `sub_int dest, x, 0` | `move dest, x` |
+| `mul_int dest, x, 1` | `move dest, x` |
+| `mul_int dest, 1, x` | `move dest, x` |
+| `mul_int dest, x, 0` | `int dest, 0` |
+| `div_int dest, x, 1` | `move dest, x` |
+| `add_float dest, x, 0` | `move dest, x` |
+| `mul_float dest, x, 1` | `move dest, x` |
+| `div_float dest, x, 1` | `move dest, x` |
+
+Float multiplication by zero is intentionally not optimized because it is not safe with NaN and Inf values.
+
+Same-slot comparison folding:
+
+| Pattern | Rewrite |
+|---------|---------|
+| `eq_* dest, x, x` | `true dest` |
+| `le_* dest, x, x` | `true dest` |
+| `ge_* dest, x, x` | `true dest` |
+| `is_identical dest, x, x` | `true dest` |
+| `ne_* dest, x, x` | `false dest` |
+| `lt_* dest, x, x` | `false dest` |
+| `gt_* dest, x, x` | `false dest` |
+
+**Nop prefix:** none (rewrites in place, does not create nops)
+
+### 4. simplify_booleans (not + jump fusion)
+
+Peephole pass that eliminates unnecessary `not` instructions:
+
+| Pattern | Rewrite |
+|---------|---------|
+| `not d, x; jump_false d, L` | nop; `jump_true x, L` |
+| `not d, x; jump_true d, L` | nop; `jump_false x, L` |
+| `not d1, x; not d2, d1` | nop; `move d2, x` |
+
+This is particularly effective on `if (!cond)` patterns, which the compiler generates as `not; jump_false`. After this pass, they become a single `jump_true`.
+
+**Nop prefix:** `_nop_bl_`
+
+### 5. eliminate_moves (self-move elimination)
+
+Removes `move a, a` instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves.
+
+**Nop prefix:** `_nop_mv_`
+
+### 6. eliminate_unreachable (dead code after return/disrupt)
+
+*Currently disabled.* Nops instructions after `return` or `disrupt` until the next real label.
+
+Disabled because disruption handler code is placed after the `return`/`disrupt` instruction without a label boundary. The VM dispatches to handlers via the `disruption_pc` offset, not through normal control flow. Re-enabling this pass requires the mcode compiler to emit labels at disruption handler entry points.
+
+**Nop prefix:** `_nop_ur_`
+
+### 7. eliminate_dead_jumps (jump-to-next-label elimination)
+
+Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.
+
+**Nop prefix:** `_nop_dj_`
+
+## Pass Composition
+
+All passes run in sequence in `optimize_function`:
+
+```
+infer_param_types      → returns param_types map
+eliminate_type_checks   → uses param_types
+simplify_algebra
+simplify_booleans
+eliminate_moves
+(eliminate_unreachable) → disabled
+eliminate_dead_jumps
+```
+
+Each pass is independent and can be commented out for testing or benchmarking.
+
+## Intrinsic Inlining
+
+Before streamlining, `mcode.cm` recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence. This reduces a 6-instruction call pattern to a single instruction:
+
+| Call | Emitted opcode |
+|------|---------------|
+| `is_array(x)` | `is_array dest, src` |
+| `is_function(x)` | `is_func dest, src` |
+| `is_object(x)` | `is_record dest, src` |
+| `is_stone(x)` | `is_stone dest, src` |
+| `is_integer(x)` | `is_int dest, src` |
+| `is_text(x)` | `is_text dest, src` |
+| `is_number(x)` | `is_num dest, src` |
+| `is_logical(x)` | `is_bool dest, src` |
+| `is_null(x)` | `is_null dest, src` |
+| `length(x)` | `length dest, src` |
+| `push(arr, val)` | `push arr, val` |
+
+These inlined opcodes have corresponding Mach VM implementations in `mach.c`.
+
+## Debugging Tools
+
+Three dump tools inspect the IR at different stages:
+
+- **`dump_mcode.cm`** — prints the raw Mcode IR after `mcode.cm`, before streamlining
+- **`dump_stream.cm`** — prints the IR after streamlining, with before/after instruction counts
+- **`dump_types.cm`** — prints the streamlined IR with type annotations on each instruction
+
+Usage:
+```
+./cell --core . dump_mcode.cm <file.ce|file.cm>
+./cell --core . dump_stream.cm <file.ce|file.cm>
+./cell --core . dump_types.cm <file.ce|file.cm>
+```
+
+## Nop Convention
+
+Eliminated instructions are replaced with strings matching `_nop_<prefix>_<counter>`. The prefix identifies which pass created the nop. Nop strings are:
+
+- Skipped during interpretation (the VM ignores them)
+- Skipped during QBE emission
+- Not counted in instruction statistics
+- Preserved in the instruction array to maintain positional stability for jump targets