guard hoisting

2026-02-13 06:32:58 -06:00
parent 36fd0a35f9
commit f7e2ff13b5
10 changed files with 31941 additions and 29049 deletions
--- a/docs/spec/streamline.md
+++ b/docs/spec/streamline.md
@@ -62,7 +62,47 @@ When a slot appears with conflicting type inferences (e.g., used in both `add_in

 **Nop prefix:** none (analysis only, does not modify instructions)

-### 2. eliminate_type_checks (type-check + jump elimination)
+### 2. infer_slot_write_types (slot write-type invariance)
+
+Scans all instructions to determine which non-parameter slots have a consistent write type. If every instruction that writes to a given slot produces the same type, that type is globally invariant and can safely persist across label join points.
+
+This analysis is sound because:
+- `alloc_slot()` in mcode.cm is monotonically increasing — temp slots are never reused
+- All local variable declarations must be at function body level and initialized — slots are written before any backward jumps to loop headers
+- `move` is conservatively treated as T_UNKNOWN, avoiding unsound transitive assumptions
+
+Write type mapping:
+
+| Instruction class | Write type |
+|---|---|
+| `int` | T_INT |
+| `true`, `false` | T_BOOL |
+| `null` | T_NULL |
+| `access` | type of literal value |
+| `array` | T_ARRAY |
+| `record` | T_RECORD |
+| `function` | T_FUNCTION |
+| `length` | T_INT |
+| int arithmetic, `neg_int`, bitwise ops | T_INT |
+| float arithmetic, `neg_float` | T_FLOAT |
+| `concat` | T_TEXT |
+| bool ops, comparisons, `in` | T_BOOL |
+| generic arithmetic (`add`, `subtract`, etc.) | T_UNKNOWN |
+| `move`, `load_field`, `load_index`, `load_dynamic`, `pop`, `get` | T_UNKNOWN |
+| `invoke`, `tail_invoke` | T_UNKNOWN |
+
+The result is a map of slot→type for slots where all writes agree on a single known type. Parameter slots (1..nr_args) and slot 0 are excluded.
+
+Common patterns this enables:
+
+- **Loop counters** (`var i = 0; ... i = i + 1`): written by `int` (T_INT) and `add_int` (T_INT) → invariant T_INT
+- **Length variables** (`var len = length(arr)`): written by `length` (T_INT) only → invariant T_INT
+- **Boolean flags** (`var found = false; ... found = true`): written by `false` and `true` → invariant T_BOOL
+- **Locally-created containers** (`var arr = []`): written by `array` only → invariant T_ARRAY
+
+**Nop prefix:** none (analysis only, does not modify instructions)
+
+### 3. eliminate_type_checks (type-check + jump elimination)

 Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.

@@ -74,11 +114,11 @@ Three cases:

 This pass also reduces `load_dynamic`/`store_dynamic` to `load_field`/`store_field` or `load_index`/`store_index` when the key slot's type is known.

-At label join points, all type information is reset except for parameter types seeded by the backward inference pass.
+At label join points, all type information is reset except for parameter types from backward inference and write-invariant types from slot write-type analysis.

 **Nop prefix:** `_nop_tc_`

-### 3. simplify_algebra (algebraic identity + comparison folding)
+### 4. simplify_algebra (algebraic identity + comparison folding)

 Tracks known constant values alongside types. Rewrites identity operations:

@@ -111,7 +151,7 @@ Same-slot comparison folding:

 **Nop prefix:** none (rewrites in place, does not create nops)

-### 4. simplify_booleans (not + jump fusion)
+### 5. simplify_booleans (not + jump fusion)

 Peephole pass that eliminates unnecessary `not` instructions:

@@ -125,13 +165,13 @@ This is particularly effective on `if (!cond)` patterns, which the compiler gene

 **Nop prefix:** `_nop_bl_`

-### 5. eliminate_moves (self-move elimination)
+### 6. eliminate_moves (self-move elimination)

 Removes `move a, a` instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves.

 **Nop prefix:** `_nop_mv_`

-### 6. eliminate_unreachable (dead code after return)
+### 7. eliminate_unreachable (dead code after return)

 Nops instructions after `return` until the next real label. Only `return` is treated as a terminal instruction; `disrupt` is not, because the disruption handler code immediately follows `disrupt` and must remain reachable.

@@ -139,7 +179,7 @@ The mcode compiler emits a label at disruption handler entry points (see `emit_l

 **Nop prefix:** `_nop_ur_`

-### 7. eliminate_dead_jumps (jump-to-next-label elimination)
+### 8. eliminate_dead_jumps (jump-to-next-label elimination)

 Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.

@@ -150,8 +190,9 @@ Removes `jump L` instructions where `L` is the immediately following label (skip
 All passes run in sequence in `optimize_function`:

 ```
-infer_param_types      → returns param_types map
-eliminate_type_checks   → uses param_types
+infer_param_types        → returns param_types map
+infer_slot_write_types   → returns write_types map
+eliminate_type_checks    → uses param_types + write_types
 simplify_algebra
 simplify_booleans
 eliminate_moves
@@ -237,11 +278,12 @@ The `pure_intrinsics` set currently contains only `is_*` sensory functions (`is_
 The streamline optimizer uses a numeric type lattice (`T_INT`, `T_FLOAT`, `T_TEXT`, etc.) for fine-grained per-instruction tracking:

 - **Backward inference** (pass 1): Scans typed operators to infer parameter types. Since parameters are `def` (immutable), inferred types persist across label boundaries.
- **Forward tracking** (pass 2): `track_types` follows instruction execution order, tracking the type of each slot. Typed arithmetic results set their destination type. Type checks on unknown slots narrow the type on fallthrough.
- **Type check elimination** (pass 2): When a slot's type is already known, `is_<type>` + conditional jump pairs are eliminated or converted to unconditional jumps.
- **Dynamic access narrowing** (pass 2): `load_dynamic`/`store_dynamic` are narrowed to `load_field`/`store_field` or `load_index`/`store_index` when the key type is known.
+- **Write-type invariance** (pass 2): Scans all instructions to find local slots where every write produces the same type. These invariant types persist across label boundaries alongside parameter types.
+- **Forward tracking** (pass 3): `track_types` follows instruction execution order, tracking the type of each slot. Typed arithmetic results set their destination type. Type checks on unknown slots narrow the type on fallthrough.
+- **Type check elimination** (pass 3): When a slot's type is already known, `is_<type>` + conditional jump pairs are eliminated or converted to unconditional jumps.
+- **Dynamic access narrowing** (pass 3): `load_dynamic`/`store_dynamic` are narrowed to `load_field`/`store_field` or `load_index`/`store_index` when the key type is known.

-Type information resets at label join points (since control flow merges could bring different types), except for parameter types from backward inference.
+Type information resets at label join points (since control flow merges could bring different types), except for parameter types from backward inference and write-invariant types from slot write-type analysis.

 ## Future Work

@@ -267,6 +309,8 @@ When a type check on a parameter passes (falls through), the parameter's type co

 A safe version would require proving that a parameter is monomorphic (called with only one type across all call sites), which requires interprocedural analysis.

+**Note:** For local variables (non-parameters), the write-type invariance analysis (pass 2) achieves a similar effect safely — if every write to a slot produces the same type, that type persists across labels without needing to hoist any guard.
+
 ### Tail Call Optimization

 `tail_invoke` instructions are currently marked but execute identically to `invoke`. Actual TCO would reuse the current call frame instead of creating a new one. This requires: