Merge branch 'quicken_mcode' into gen_dylib

2026-02-16 00:35:40 -06:00
parent f4f56ed470 ff61ab1f50
commit cd6e357b6e
29 changed files with 160180 additions and 877679 deletions
--- a/docs/spec/mach.md
+++ b/docs/spec/mach.md
@@ -82,3 +82,14 @@ Named property instructions (`LOAD_FIELD`, `STORE_FIELD`, `DELETE`) use the iABC
 2. `LOAD_DYNAMIC` / `STORE_DYNAMIC` / `DELETEINDEX` — use the register-based variant

 This is transparent to the mcode compiler and streamline optimizer.
+
+## Arithmetic Dispatch
+
+Arithmetic ops (ADD, SUB, MUL, DIV, MOD, POW) are executed inline without calling the polymorphic `reg_vm_binop()` helper. Since mcode's type guard dispatch guarantees both operands are numbers:
+
+1. **Int-int fast path**: `JS_VALUE_IS_BOTH_INT` → native integer arithmetic with int32 overflow check. Overflow promotes to float64.
+2. **Float fallback**: `JS_ToFloat64` → native floating-point operation. Non-finite results produce null.
+
+DIV and MOD check for zero divisor (→ null). POW uses `pow()` with non-finite handling for finite inputs.
+
+Comparison ops (EQ through GE) and bitwise ops still use `reg_vm_binop()` for their slow paths, as they handle a wider range of type combinations (string comparisons, null equality, etc.).
--- a/docs/spec/streamline.md
+++ b/docs/spec/streamline.md
@@ -45,11 +45,10 @@ Backward inference rules:

 | Operator class | Operand type inferred |
 |---|---|
-| `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate` | T_NUM |
-| `eq_int`, `ne_int`, `lt_int`, `gt_int`, `le_int`, `ge_int`, bitwise ops | T_INT |
-| `eq_float`, `ne_float`, `lt_float`, `gt_float`, `le_float`, `ge_float` | T_FLOAT |
-| `concat`, text comparisons | T_TEXT |
-| `eq_bool`, `ne_bool`, `not`, `and`, `or` | T_BOOL |
+| `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate` | T_NUM |
+| bitwise ops (`bitand`, `bitor`, `bitxor`, `shl`, `shr`, `ushr`, `bitnot`) | T_INT |
+| `concat` | T_TEXT |
+| `not`, `and`, `or` | T_BOOL |
 | `store_index` (object operand) | T_ARRAY |
 | `store_index` (index operand) | T_INT |
 | `store_field` (object operand) | T_RECORD |
@@ -59,9 +58,11 @@ Backward inference rules:
 | `load_field` (object operand) | T_RECORD |
 | `pop` (array operand) | T_ARRAY |

-Note: `add` is excluded from backward inference because it is polymorphic — it handles both numeric addition and text concatenation. Only operators that are unambiguously numeric can infer T_NUM.
+Typed comparison operators (`eq_int`, `lt_float`, `lt_text`, etc.) and typed boolean comparisons (`eq_bool`, `ne_bool`) are excluded from backward inference. These ops always appear inside guard dispatch patterns (`is_type` + `jump_false` + typed_op), where mutually exclusive branches use the same slot with different types. Including them would merge conflicting types (e.g., T_INT from `lt_int` + T_FLOAT from `lt_float` + T_TEXT from `lt_text`) into T_UNKNOWN, losing all type information. Only unconditionally executed ops contribute to backward inference.

-When a slot appears with conflicting type inferences, the result is `unknown`. INT + FLOAT conflicts produce `num`.
+Note: `add` infers T_NUM even though it is polymorphic (numeric addition or text concatenation). When `add` appears in the IR, both operands have already passed a `is_num` guard, so they are guaranteed to be numeric. The text concatenation path uses `concat` instead.
+
+When a slot appears with conflicting type inferences, the merge widens: INT + FLOAT → NUM, INT + NUM → NUM, FLOAT + NUM → NUM. Incompatible types (e.g., NUM + TEXT) produce `unknown`.

 **Nop prefix:** none (analysis only, does not modify instructions)

@@ -88,8 +89,9 @@ Write type mapping:
 | `length` | T_INT |
 | bitwise ops | T_INT |
 | `concat` | T_TEXT |
+| `negate` | T_NUM |
+| `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow` | T_NUM |
 | bool ops, comparisons, `in` | T_BOOL |
-| generic arithmetic (`add`, `subtract`, `negate`, etc.) | T_UNKNOWN |
 | `move`, `load_field`, `load_index`, `load_dynamic`, `pop`, `get` | T_UNKNOWN |
 | `invoke`, `tail_invoke` | T_UNKNOWN |

@@ -100,8 +102,9 @@ Common patterns this enables:
 - **Length variables** (`var len = length(arr)`): written by `length` (T_INT) only → invariant T_INT
 - **Boolean flags** (`var found = false; ... found = true`): written by `false` and `true` → invariant T_BOOL
 - **Locally-created containers** (`var arr = []`): written by `array` only → invariant T_ARRAY
+- **Numeric accumulators** (`var sum = 0; sum = sum - x`): written by `access 0` (T_INT) and `subtract` (T_NUM) → merges to T_NUM

-Note: Loop counters (`var i = 0; i = i + 1`) are NOT invariant because `add` produces T_UNKNOWN. However, if `i` is a function parameter used in arithmetic, backward inference from `subtract`/`multiply`/etc. will infer T_NUM for it, which persists across labels.
+Note: Loop counters using `+` (`var i = 0; i = i + 1`) may not achieve write-type invariance because the `+` operator emits a guard dispatch with both `concat` (T_TEXT) and `add` (T_NUM) paths writing to the same temp slot, producing T_UNKNOWN. However, when one operand is a known number literal, `mcode.cm` emits a numeric-only path (see "Known-Number Add Shortcut" below), avoiding the text dispatch. Other arithmetic ops (`-`, `*`, `/`, `%`, `**`) always emit a single numeric write path and work cleanly with write-type analysis.

 **Nop prefix:** none (analysis only, does not modify instructions)

@@ -109,9 +112,11 @@ Note: Loop counters (`var i = 0; i = i + 1`) are NOT invariant because `add` pro

 Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.

-Three cases:
+Five cases:

 - **Known match** (e.g., `is_int` on a slot known to be `int`): both the check and the conditional jump are eliminated (nop'd).
+- **Subsumption match** (e.g., `is_num` on a slot known to be `int` or `float`): since `int` and `float` are subtypes of `num`, both the check and jump are eliminated.
+- **Subsumption partial** (e.g., `is_int` on a slot known to be `num`): the `num` type could be `int` or `float`, so the check must remain. On fallthrough, the slot narrows to the checked subtype (`int`). This is NOT a mismatch — `num` values can pass an `is_int` check.
 - **Known mismatch** (e.g., `is_text` on a slot known to be `int`): the check is nop'd and the conditional jump is rewritten to an unconditional `jump`.
 - **Unknown**: the check remains, but on fallthrough, the slot's type is narrowed to the checked type (enabling downstream eliminations).

@@ -212,12 +217,44 @@ These inlined opcodes have corresponding Mach VM implementations in `mach.c`.

 Arithmetic operations use generic opcodes: `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate`. There are no type-dispatched variants (e.g., no `add_int`/`add_float`).

-The Mach VM dispatches at runtime with an int-first fast path via `reg_vm_binop()`: it checks `JS_VALUE_IS_BOTH_INT` first for fast integer arithmetic, then falls back to float conversion, text concatenation (for `add` only), or type error.
+The Mach VM handles arithmetic inline with a two-tier fast path. Since mcode's type guard dispatch guarantees both operands are numbers by the time arithmetic executes, the VM does not need polymorphic dispatch:
+
+1. **Int-int fast path**: `JS_VALUE_IS_BOTH_INT` → native integer arithmetic with overflow check. If the result fits int32, returns int32; otherwise promotes to float64.
+2. **Float fallback**: `JS_ToFloat64` both operands → native floating-point arithmetic. Non-finite results (infinity, NaN) produce null.
+
+Division and modulo additionally check for zero divisor (→ null). Power uses `pow()` with non-finite handling.
+
+The legacy `reg_vm_binop()` function remains available for comparison operators and any non-mcode bytecode paths, but arithmetic ops no longer call it.

 Bitwise operations (`shl`, `shr`, `ushr`, `bitand`, `bitor`, `bitxor`, `bitnot`) remain integer-only and disrupt if operands are not integers.

 The QBE/native backend maps generic arithmetic to helper calls (`qbe.add`, `qbe.sub`, etc.). The vision for the native path is that with sufficient type inference, the backend can unbox proven-numeric values to raw registers, operate directly, and only rebox at boundaries (returns, calls, stores).

+## Known-Number Add Shortcut
+
+The `+` operator is the only arithmetic op that is polymorphic at the mcode level — `emit_add_decomposed` in `mcode.cm` emits a guard dispatch that checks for text (→ `concat`) before numeric (→ `add`). This dual dispatch means the temp slot is written by both `concat` (T_TEXT) and `add` (T_NUM), producing T_UNKNOWN in write-type analysis.
+
+When either operand is a known number literal (e.g., `i + 1`, `x + 0.5`), `emit_add_decomposed` skips the text dispatch entirely and emits `emit_numeric_binop("add")` — a single `is_num` guard + `add` with no `concat` path. This is safe because text concatenation requires both operands to be text; a known number can never participate in concat.
+
+This optimization eliminates 6-8 instructions from the add block (two `is_text` checks, two conditional jumps, `concat`, `jump`) and produces a clean single-type write path that works with write-type analysis.
+
+Other arithmetic ops (`subtract`, `multiply`, etc.) always use `emit_numeric_binop` and never have this problem.
+
+## Target Slot Propagation
+
+For simple local variable assignments (`i = expr`), the mcode compiler passes the variable's register slot as a `target` to the expression compiler. Binary operations that use `emit_numeric_binop` (subtract, multiply, divide, modulo, pow) can write directly to the target slot instead of allocating a temp and emitting a `move`:
+
+```
+// Before: i = i - 1
+subtract 7, 2, 6    // temp = i - 1
+move 2, 7           // i = temp
+
+// After: i = i - 1
+subtract 2, 2, 6    // i = i - 1 (direct)
+```
+
+The `+` operator is excluded from target slot propagation when it would use the full text+num dispatch (i.e., when neither operand is a known number), because writing both `concat` and `add` to the variable's slot would pollute its write type. When the known-number shortcut applies, `+` uses `emit_numeric_binop` and would be safe for target propagation, but this is not currently implemented — the exclusion is by operator kind, not by dispatch path.
+
 ## Debugging Tools

 Three dump tools inspect the IR at different stages:
@@ -295,6 +332,18 @@ The current purity set is conservative (only `is_*`). It could be expanded by:
 - **User function purity**: Analyze user-defined function bodies during pre_scan. A function is pure if its body contains only pure expressions and calls to known-pure functions. This requires fixpoint iteration for mutual recursion.
 - **Callback-aware purity**: Intrinsics like `filter`, `find`, `reduce`, `some`, `every` are pure if their callback argument is pure.

+### Move Type Resolution in Write-Type Analysis
+
+Currently, `move` instructions produce T_UNKNOWN in write-type analysis. This prevents type propagation through moves — e.g., a slot written by `access 0` (T_INT) and `move` from an `add` result (T_NUM) merges to T_UNKNOWN instead of T_NUM.
+
+A two-pass approach would fix this: first compute write types for all non-move instructions, then resolve moves by looking up the source slot's computed type. If the source has a known type, merge it into the destination; if unknown, skip the move (don't poison the destination with T_UNKNOWN).
+
+This was implemented and tested but causes a bootstrap failure during self-hosting convergence. The root cause is not yet understood — the optimizer modifies its own bytecode, and the move resolution changes the type landscape enough to produce different code on each pass, preventing convergence. Further investigation is needed; the fix is correct in isolation but interacts badly with the self-hosting fixed-point iteration.
+
+### Target Slot Propagation for Add with Known Numbers
+
+When the known-number add shortcut applies (one operand is a literal number), the generated code uses `emit_numeric_binop` which has a single write path. Target slot propagation should be safe in this case, but is currently blocked by the blanket `kind != "+"` exclusion. Refining the exclusion to check whether the shortcut will apply (by testing `is_known_number` on either operand) would enable direct writes for patterns like `i = i + 1`.
+
 ### Forward Type Narrowing from Typed Operations

 With unified arithmetic (generic `add`/`subtract`/`multiply`/`divide`/`modulo`/`negate` instead of typed variants), this approach is no longer applicable. Typed comparisons (`eq_int`, `lt_float`, etc.) still exist and their operands have known types, but these are already handled by backward inference.