Merge branch 'mcode_streamline' into runtime_rework

2026-02-13 15:42:20 -06:00
parent f2556c5622 3795533554
commit db73eb4eeb
31 changed files with 113406 additions and 121977 deletions
--- a/docs/spec/pipeline.md
+++ b/docs/spec/pipeline.md
@@ -27,7 +27,8 @@ Splits source text into tokens. Handles string interpolation by re-tokenizing te
 Converts tokens into an AST. Also performs semantic analysis:

 - **Scope records**: For each scope (global, function), builds a record mapping variable names to their metadata: `make` (var/def/function/input), `function_nr`, `nr_uses`, `closure` flag, and `level`.
- **Type tags**: When the right-hand side of a `def` is a syntactically obvious type, stamps `type_tag` on the scope record entry. Derivable types: `"integer"`, `"number"`, `"text"`, `"array"`, `"record"`, `"function"`, `"logical"`, `"null"`.
+- **Type tags**: When the right-hand side of a `def` is a syntactically obvious type, stamps `type_tag` on the scope record entry. Derivable types: `"integer"`, `"number"`, `"text"`, `"array"`, `"record"`, `"function"`, `"logical"`. For `def` variables, type tags are also inferred from usage patterns: push (`x[] = v`) implies array, property access (`x.foo = v`) implies record, integer key implies array, text key implies record.
+- **Type error detection**: For `def` variables with known type tags, provably wrong operations are reported as compile errors: property access on arrays, push on non-arrays, text keys on arrays, integer keys on records. Only `def` variables are checked because `var` can be reassigned.
 - **Intrinsic resolution**: Names used but not locally bound are recorded in `ast.intrinsics`. Name nodes referencing intrinsics get `intrinsic: true`.
 - **Access kind**: Subscript (`[`) nodes get `access_kind`: `"index"` for numeric subscripts, `"field"` for string subscripts, omitted otherwise.
 - **Tail position**: Return statements where the expression is a call get `tail: true`.
@@ -40,8 +41,8 @@ Operates on the AST. Performs constant folding and type analysis:
 - **Constant propagation**: Tracks `def` bindings whose values are known constants.
 - **Type propagation**: Extends `type_tag` through operations. When both operands of an arithmetic op have known types, the result type is known. Propagates type tags to reference sites.
 - **Intrinsic specialization**: When an intrinsic call's argument types are known, stamps a `hint` on the call node. For example, `length(x)` where x is a known array gets `hint: "array_length"`. Type checks like `is_array(known_array)` are folded to `true`.
- **Purity marking**: Stamps `pure: true` on expressions with no side effects (literals, name references, arithmetic on pure operands).
- **Dead code elimination**: Removes unreachable branches when conditions are known constants.
+- **Purity analysis**: Expressions with no side effects are marked pure (literals, name references, arithmetic on pure operands, calls to pure intrinsics). The pure intrinsic set contains only `is_*` sensory functions — they are the only intrinsics guaranteed to never disrupt regardless of argument types. Other intrinsics like `text`, `number`, and `length` can disrupt on wrong argument types and are excluded.
+- **Dead code elimination**: Removes unreachable branches when conditions are known constants. Removes unused `var`/`def` declarations with pure initializers. Removes standalone calls to pure intrinsics where the result is discarded.

 ### Mcode (`mcode.cm`)

@@ -51,6 +52,8 @@ Lowers the AST to a JSON-based intermediate representation with explicit operati
 - **Decomposed calls**: Function calls are split into `frame` (create call frame) + `setarg` (set arguments) + `invoke` (execute call).
 - **Intrinsic access**: Intrinsic functions are loaded via `access` with an intrinsic marker rather than global lookup.
 - **Intrinsic inlining**: Type-check intrinsics (`is_array`, `is_text`, `is_number`, `is_integer`, `is_logical`, `is_null`, `is_function`, `is_object`, `is_stone`), `length`, and `push` are emitted as direct opcodes instead of frame/setarg/invoke call sequences.
+- **Disruption handler labels**: When a function has a disruption handler, a label is emitted before the handler code. This allows the streamline optimizer's unreachable code elimination to safely nop dead code after `return` without accidentally eliminating the handler.
+- **Tail call marking**: When a return statement's expression is a call and the function has no disruption handler, the final `invoke` is renamed to `tail_invoke`. This marks the call site for future tail call optimization. Functions with disruption handlers cannot use TCO because the handler frame must remain on the stack.

 See [Mcode IR](mcode.md) for the instruction format and complete instruction reference.

@@ -58,12 +61,13 @@ See [Mcode IR](mcode.md) for the instruction format and complete instruction ref

 Optimizes the Mcode IR through a series of independent passes. Operates per-function:

-1. **Backward type inference**: Infers parameter types from how they are used in typed operators. Immutable `def` parameters keep their inferred type across label join points.
+1. **Backward type inference**: Infers parameter types from how they are used in typed operators (`add_int`, `store_index`, `load_field`, `push`, `pop`, etc.). Immutable `def` parameters keep their inferred type across label join points.
 2. **Type-check elimination**: When a slot's type is known, eliminates `is_<type>` + conditional jump pairs. Narrows `load_dynamic`/`store_dynamic` to typed variants.
 3. **Algebraic simplification**: Rewrites identity operations (add 0, multiply 1, divide 1) and folds same-slot comparisons.
 4. **Boolean simplification**: Fuses `not` + conditional jump into a single jump with inverted condition.
 5. **Move elimination**: Removes self-moves (`move a, a`).
-6. **Dead jump elimination**: Removes jumps to the immediately following label.
+6. **Unreachable elimination**: Nops dead code after `return` until the next label.
+7. **Dead jump elimination**: Removes jumps to the immediately following label.

 See [Streamline Optimizer](streamline.md) for detailed pass descriptions.

--- a/docs/spec/streamline.md
+++ b/docs/spec/streamline.md
@@ -37,7 +37,7 @@ Subsumption: `int` and `float` both satisfy a `num` check.

 ### 1. infer_param_types (backward type inference)

-Scans all typed operators to determine what types their operands must be. For example, `add_int dest, a, b` implies both `a` and `b` are integers.
+Scans typed operators and generic arithmetic to determine what types their operands must be. For example, `subtract dest, a, b` implies both `a` and `b` are numbers.

 When a parameter slot (1..nr_args) is consistently inferred as a single type, that type is recorded. Since parameters are immutable (`def`), the inferred type holds for the entire function and persists across label join points (loop headers, branch targets).

@@ -45,20 +45,67 @@ Backward inference rules:

 | Operator class | Operand type inferred |
 |---|---|
-| `add_int`, `sub_int`, `mul_int`, `div_int`, `mod_int`, `eq_int`, comparisons, bitwise | T_INT |
-| `add_float`, `sub_float`, `mul_float`, `div_float`, `mod_float`, float comparisons | T_FLOAT |
+| `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate` | T_NUM |
+| `eq_int`, `ne_int`, `lt_int`, `gt_int`, `le_int`, `ge_int`, bitwise ops | T_INT |
+| `eq_float`, `ne_float`, `lt_float`, `gt_float`, `le_float`, `ge_float` | T_FLOAT |
 | `concat`, text comparisons | T_TEXT |
 | `eq_bool`, `ne_bool`, `not`, `and`, `or` | T_BOOL |
 | `store_index` (object operand) | T_ARRAY |
 | `store_index` (index operand) | T_INT |
 | `store_field` (object operand) | T_RECORD |
 | `push` (array operand) | T_ARRAY |
+| `load_index` (object operand) | T_ARRAY |
+| `load_index` (index operand) | T_INT |
+| `load_field` (object operand) | T_RECORD |
+| `pop` (array operand) | T_ARRAY |

-When a slot appears with conflicting type inferences (e.g., used in both `add_int` and `concat` across different type-dispatch branches), the result is `unknown`. INT + FLOAT conflicts produce `num`.
+Note: `add` is excluded from backward inference because it is polymorphic — it handles both numeric addition and text concatenation. Only operators that are unambiguously numeric can infer T_NUM.
+
+When a slot appears with conflicting type inferences, the result is `unknown`. INT + FLOAT conflicts produce `num`.

 **Nop prefix:** none (analysis only, does not modify instructions)

-### 2. eliminate_type_checks (type-check + jump elimination)
+### 2. infer_slot_write_types (slot write-type invariance)
+
+Scans all instructions to determine which non-parameter slots have a consistent write type. If every instruction that writes to a given slot produces the same type, that type is globally invariant and can safely persist across label join points.
+
+This analysis is sound because:
+- `alloc_slot()` in mcode.cm is monotonically increasing — temp slots are never reused
+- All local variable declarations must be at function body level and initialized — slots are written before any backward jumps to loop headers
+- `move` is conservatively treated as T_UNKNOWN, avoiding unsound transitive assumptions
+
+Write type mapping:
+
+| Instruction class | Write type |
+|---|---|
+| `int` | T_INT |
+| `true`, `false` | T_BOOL |
+| `null` | T_NULL |
+| `access` | type of literal value |
+| `array` | T_ARRAY |
+| `record` | T_RECORD |
+| `function` | T_FUNCTION |
+| `length` | T_INT |
+| bitwise ops | T_INT |
+| `concat` | T_TEXT |
+| bool ops, comparisons, `in` | T_BOOL |
+| generic arithmetic (`add`, `subtract`, `negate`, etc.) | T_UNKNOWN |
+| `move`, `load_field`, `load_index`, `load_dynamic`, `pop`, `get` | T_UNKNOWN |
+| `invoke`, `tail_invoke` | T_UNKNOWN |
+
+The result is a map of slot→type for slots where all writes agree on a single known type. Parameter slots (1..nr_args) and slot 0 are excluded.
+
+Common patterns this enables:
+
+- **Length variables** (`var len = length(arr)`): written by `length` (T_INT) only → invariant T_INT
+- **Boolean flags** (`var found = false; ... found = true`): written by `false` and `true` → invariant T_BOOL
+- **Locally-created containers** (`var arr = []`): written by `array` only → invariant T_ARRAY
+
+Note: Loop counters (`var i = 0; i = i + 1`) are NOT invariant because `add` produces T_UNKNOWN. However, if `i` is a function parameter used in arithmetic, backward inference from `subtract`/`multiply`/etc. will infer T_NUM for it, which persists across labels.
+
+**Nop prefix:** none (analysis only, does not modify instructions)
+
+### 3. eliminate_type_checks (type-check + jump elimination)

 Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.

@@ -70,30 +117,13 @@ Three cases:

 This pass also reduces `load_dynamic`/`store_dynamic` to `load_field`/`store_field` or `load_index`/`store_index` when the key slot's type is known.

-At label join points, all type information is reset except for parameter types seeded by the backward inference pass.
+At label join points, all type information is reset except for parameter types from backward inference and write-invariant types from slot write-type analysis.

 **Nop prefix:** `_nop_tc_`

-### 3. simplify_algebra (algebraic identity + comparison folding)
+### 4. simplify_algebra (same-slot comparison folding)

-Tracks known constant values alongside types. Rewrites identity operations:
-
-| Pattern | Rewrite |
-|---------|---------|
-| `add_int dest, x, 0` | `move dest, x` |
-| `add_int dest, 0, x` | `move dest, x` |
-| `sub_int dest, x, 0` | `move dest, x` |
-| `mul_int dest, x, 1` | `move dest, x` |
-| `mul_int dest, 1, x` | `move dest, x` |
-| `mul_int dest, x, 0` | `int dest, 0` |
-| `div_int dest, x, 1` | `move dest, x` |
-| `add_float dest, x, 0` | `move dest, x` |
-| `mul_float dest, x, 1` | `move dest, x` |
-| `div_float dest, x, 1` | `move dest, x` |
-
-Float multiplication by zero is intentionally not optimized because it is not safe with NaN and Inf values.
-
-Same-slot comparison folding:
+Tracks known constant values. Folds same-slot comparisons:

 | Pattern | Rewrite |
 |---------|---------|
@@ -107,7 +137,7 @@ Same-slot comparison folding:

 **Nop prefix:** none (rewrites in place, does not create nops)

-### 4. simplify_booleans (not + jump fusion)
+### 5. simplify_booleans (not + jump fusion)

 Peephole pass that eliminates unnecessary `not` instructions:

@@ -121,21 +151,21 @@ This is particularly effective on `if (!cond)` patterns, which the compiler gene

 **Nop prefix:** `_nop_bl_`

-### 5. eliminate_moves (self-move elimination)
+### 6. eliminate_moves (self-move elimination)

 Removes `move a, a` instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves.

 **Nop prefix:** `_nop_mv_`

-### 6. eliminate_unreachable (dead code after return/disrupt)
+### 7. eliminate_unreachable (dead code after return)

-*Currently disabled.* Nops instructions after `return` or `disrupt` until the next real label.
+Nops instructions after `return` until the next real label. Only `return` is treated as a terminal instruction; `disrupt` is not, because the disruption handler code immediately follows `disrupt` and must remain reachable.

-Disabled because disruption handler code is placed after the `return`/`disrupt` instruction without a label boundary. The VM dispatches to handlers via the `disruption_pc` offset, not through normal control flow. Re-enabling this pass requires the mcode compiler to emit labels at disruption handler entry points.
+The mcode compiler emits a label at disruption handler entry points (see `emit_label(gen_label("disruption"))` in mcode.cm), which provides the label boundary that stops this pass from eliminating handler code.

 **Nop prefix:** `_nop_ur_`

-### 7. eliminate_dead_jumps (jump-to-next-label elimination)
+### 8. eliminate_dead_jumps (jump-to-next-label elimination)

 Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.

@@ -146,12 +176,13 @@ Removes `jump L` instructions where `L` is the immediately following label (skip
 All passes run in sequence in `optimize_function`:

 ```
-infer_param_types      → returns param_types map
-eliminate_type_checks   → uses param_types
+infer_param_types        → returns param_types map
+infer_slot_write_types   → returns write_types map
+eliminate_type_checks    → uses param_types + write_types
 simplify_algebra
 simplify_booleans
 eliminate_moves
-(eliminate_unreachable) → disabled
+eliminate_unreachable
 eliminate_dead_jumps
 ```

@@ -177,6 +208,16 @@ Before streamlining, `mcode.cm` recognizes calls to built-in intrinsic functions

 These inlined opcodes have corresponding Mach VM implementations in `mach.c`.

+## Unified Arithmetic
+
+Arithmetic operations use generic opcodes: `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate`. There are no type-dispatched variants (e.g., no `add_int`/`add_float`).
+
+The Mach VM dispatches at runtime with an int-first fast path via `reg_vm_binop()`: it checks `JS_VALUE_IS_BOTH_INT` first for fast integer arithmetic, then falls back to float conversion, text concatenation (for `add` only), or type error.
+
+Bitwise operations (`shl`, `shr`, `ushr`, `bitand`, `bitor`, `bitxor`, `bitnot`) remain integer-only and disrupt if operands are not integers.
+
+The QBE/native backend maps generic arithmetic to helper calls (`qbe.add`, `qbe.sub`, etc.). The vision for the native path is that with sufficient type inference, the backend can unbox proven-numeric values to raw registers, operate directly, and only rebox at boundaries (returns, calls, stores).
+
 ## Debugging Tools

 Three dump tools inspect the IR at different stages:
@@ -192,6 +233,124 @@ Usage:
 ./cell --core . dump_types.cm <file.ce|file.cm>
 ```

+## Tail Call Marking
+
+When a function's return expression is a call (`stmt.tail == true` from the parser) and the function has no disruption handler, mcode.cm renames the final `invoke` instruction to `tail_invoke`. This is semantically identical to `invoke` in the current Mach VM, but marks the call site for future tail call optimization.
+
+The disruption handler restriction exists because TCO would discard the current frame, but the handler must remain on the stack to catch disruptions from the callee.
+
+`tail_invoke` is handled by the same passes as `invoke` in streamline (type tracking, algebraic simplification) and executes identically in the VM.
+
+## Type Propagation Architecture
+
+Type information flows through three compilation stages, each building on the previous:
+
+### Stage 1: Parse-time type tags (parse.cm)
+
+The parser assigns `type_tag` strings to scope variable entries when the type is syntactically obvious:
+
+- **From initializers**: `def a = []` → `type_tag: "array"`, `def n = 42` → `type_tag: "integer"`, `def r = {}` → `type_tag: "record"`
+- **From usage patterns** (def only): `def x = null; x[] = v` infers `type_tag: "array"` from the push. `def x = null; x.foo = v` infers `type_tag: "record"` from property access.
+- **Type error detection** (def only): When a `def` variable has a known type_tag, provably wrong operations are compile errors:
+  - Property access (`.`) on array
+  - Push (`[]`) on non-array
+  - Text key on array
+  - Integer key on record
+
+Only `def` (constant) variables participate in type inference and error detection. `var` variables can be reassigned, making their initializer type unreliable.
+
+### Stage 2: Fold-time type propagation (fold.cm)
+
+The fold pass extends type information through the AST:
+
+- **Intrinsic folding**: `is_array(known_array)` folds to `true`. `length(known_array)` gets `hint: "array_length"`.
+- **Purity analysis**: Expressions involving only `is_*` intrinsic calls with pure arguments are considered pure. This enables dead code elimination for unused `var`/`def` bindings with pure initializers, and elimination of standalone pure call statements.
+- **Dead code**: Unused pure `var`/`def` declarations are removed. Standalone calls to pure intrinsics (where the result is discarded) are removed. Unreachable branches with constant conditions are removed.
+
+The `pure_intrinsics` set currently contains only `is_*` sensory functions (`is_array`, `is_text`, `is_number`, `is_integer`, `is_function`, `is_logical`, `is_null`, `is_object`, `is_stone`). Other intrinsics like `text`, `number`, and `length` can disrupt on wrong argument types, so they are excluded — removing a call that would disrupt changes observable behavior.
+
+### Stage 3: Streamline-time type tracking (streamline.cm)
+
+The streamline optimizer uses a numeric type lattice (`T_INT`, `T_FLOAT`, `T_TEXT`, etc.) for fine-grained per-instruction tracking:
+
+- **Backward inference** (pass 1): Scans typed operators to infer parameter types. Since parameters are `def` (immutable), inferred types persist across label boundaries.
+- **Write-type invariance** (pass 2): Scans all instructions to find local slots where every write produces the same type. These invariant types persist across label boundaries alongside parameter types.
+- **Forward tracking** (pass 3): `track_types` follows instruction execution order, tracking the type of each slot. Known-type operations set their destination type (e.g., `concat` → T_TEXT, `length` → T_INT). Generic arithmetic produces T_UNKNOWN. Type checks on unknown slots narrow the type on fallthrough.
+- **Type check elimination** (pass 3): When a slot's type is already known, `is_<type>` + conditional jump pairs are eliminated or converted to unconditional jumps.
+- **Dynamic access narrowing** (pass 3): `load_dynamic`/`store_dynamic` are narrowed to `load_field`/`store_field` or `load_index`/`store_index` when the key type is known.
+
+Type information resets at label join points (since control flow merges could bring different types), except for parameter types from backward inference and write-invariant types from slot write-type analysis.
+
+## Future Work
+
+### Copy Propagation
+
+A basic-block-local copy propagation pass would replace uses of a copied variable with its source, enabling further move elimination. An implementation was attempted but encountered an unsolved bug where 2-position instruction operand replacement produces incorrect code during self-hosting (the replacement logic for 3-position instructions works correctly). The root cause is not yet understood. See the project memory files for detailed notes.
+
+### Expanded Purity Analysis
+
+The current purity set is conservative (only `is_*`). It could be expanded by:
+
+- **Argument-type-aware purity**: If all arguments to an intrinsic are known to be the correct types (via type_tag or slot_types), the call cannot disrupt and is safe to eliminate. For example, `length(known_array)` is pure but `length(unknown)` is not.
+- **User function purity**: Analyze user-defined function bodies during pre_scan. A function is pure if its body contains only pure expressions and calls to known-pure functions. This requires fixpoint iteration for mutual recursion.
+- **Callback-aware purity**: Intrinsics like `filter`, `find`, `reduce`, `some`, `every` are pure if their callback argument is pure.
+
+### Forward Type Narrowing from Typed Operations
+
+With unified arithmetic (generic `add`/`subtract`/`multiply`/`divide`/`modulo`/`negate` instead of typed variants), this approach is no longer applicable. Typed comparisons (`eq_int`, `lt_float`, etc.) still exist and their operands have known types, but these are already handled by backward inference.
+
+### Guard Hoisting for Parameters
+
+When a type check on a parameter passes (falls through), the parameter's type could be promoted to `param_types` so it persists across label boundaries. This would allow the first type check on a parameter to prove its type for the entire function. However, this is unsound for polymorphic parameters — if a function is called with different argument types, the first check would wrongly eliminate checks for subsequent types.
+
+A safe version would require proving that a parameter is monomorphic (called with only one type across all call sites), which requires interprocedural analysis.
+
+**Note:** For local variables (non-parameters), the write-type invariance analysis (pass 2) achieves a similar effect safely — if every write to a slot produces the same type, that type persists across labels without needing to hoist any guard.
+
+### Tail Call Optimization
+
+`tail_invoke` instructions are currently marked but execute identically to `invoke`. Actual TCO would reuse the current call frame instead of creating a new one. This requires:
+
+- Ensuring argument count matches (or the frame can be resized)
+- No live locals needed after the call (guaranteed by tail position)
+- No disruption handler on the current function (already enforced by the marking)
+- VM support in mach.c to rewrite the frame in place
+
+### Interprocedural Type Inference
+
+Currently all type inference is intraprocedural (within a single function). Cross-function analysis could:
+
+- Infer return types from function bodies
+- Propagate argument types from call sites to callees
+- Specialize functions for known argument types (cloning)
+
+### Strength Reduction
+
+Common patterns that could be lowered to cheaper operations when operand types are known:
+
+- `multiply x, 2` with proven-int operands → shift left
+- `divide x, 2` with proven-int → arithmetic shift right
+- `modulo x, power_of_2` with proven-int → bitwise and
+
+### Numeric Unboxing (QBE/native path)
+
+With unified arithmetic and backward type inference, the native backend can identify regions where numeric values remain in registers without boxing/unboxing:
+
+1. **Guard once**: When backward inference proves a parameter is T_NUM, emit a single type guard at function entry.
+2. **Unbox**: Convert the tagged JSValue to a raw double register.
+3. **Operate**: Use native FP/int instructions directly (no function calls, no tag checks).
+4. **Rebox**: Convert back to tagged JSValue only at rebox points (function returns, calls, stores to arrays/records).
+
+This requires inserting `unbox`/`rebox` IR annotations (no-ops in the Mach VM, meaningful only to QBE).
+
+### Loop-Invariant Code Motion
+
+Type checks that are invariant across loop iterations (checking a variable that doesn't change in the loop body) could be hoisted above the loop. This would require identifying loop boundaries and proving invariance.
+
+### Algebraic Identity Optimization
+
+With unified arithmetic, algebraic identities (x+0→x, x*1→x, x*0→0, x/1→x) require knowing operand values at compile time. Since generic `add`/`multiply` operate on any numeric type, the constant-tracking logic in `simplify_algebra` could be extended to handle these for known-constant slots.
+
 ## Nop Convention

 Eliminated instructions are replaced with strings matching `_nop_<prefix>_<counter>`. The prefix identifies which pass created the nop. Nop strings are: