411 lines
25 KiB
Markdown
411 lines
25 KiB
Markdown
---
|
|
title: "Streamline Optimizer"
|
|
description: "Mcode IR optimization passes"
|
|
---
|
|
|
|
## Overview
|
|
|
|
The streamline optimizer (`streamline.cm`) runs a series of independent passes over the Mcode IR to eliminate redundant operations. Each pass is a standalone function that can be enabled, disabled, or reordered. Passes communicate only through the instruction array they mutate in place, replacing eliminated instructions with nop strings (e.g., `_nop_tc_1`).
|
|
|
|
The optimizer runs after `mcode.cm` generates the IR and before the result is lowered to the Mach VM or emitted as QBE IL.
|
|
|
|
```
|
|
Fold (AST) → Mcode (JSON IR) → Streamline → Mach VM / QBE
|
|
```
|
|
|
|
## Type Lattice
|
|
|
|
The optimizer tracks a type for each slot in the register file:
|
|
|
|
| Type | Meaning |
|
|
|------|---------|
|
|
| `unknown` | No type information |
|
|
| `int` | Integer |
|
|
| `float` | Floating-point |
|
|
| `num` | Number (subsumes int and float) |
|
|
| `text` | String |
|
|
| `bool` | Logical (true/false) |
|
|
| `null` | Null value |
|
|
| `array` | Array |
|
|
| `record` | Record (object) |
|
|
| `function` | Function |
|
|
| `blob` | Binary blob |
|
|
|
|
Subsumption: `int` and `float` both satisfy a `num` check.
|
|
|
|
## Passes
|
|
|
|
### 1. infer_param_types (backward type inference)
|
|
|
|
Scans typed operators and generic arithmetic to determine what types their operands must be. For example, `subtract dest, a, b` implies both `a` and `b` are numbers.
|
|
|
|
When a parameter slot (1..nr_args) is consistently inferred as a single type, that type is recorded. Since parameters are immutable (`def`), the inferred type holds for the entire function and persists across label join points (loop headers, branch targets).
|
|
|
|
Backward inference rules:
|
|
|
|
| Operator class | Operand type inferred |
|
|
|---|---|
|
|
| `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate` | T_NUM |
|
|
| bitwise ops (`bitand`, `bitor`, `bitxor`, `shl`, `shr`, `ushr`, `bitnot`) | T_INT |
|
|
| `concat` | T_TEXT |
|
|
| `not`, `and`, `or` | T_BOOL |
|
|
| `store_index` (object operand) | T_ARRAY |
|
|
| `store_index` (index operand) | T_INT |
|
|
| `store_field` (object operand) | T_RECORD |
|
|
| `push` (array operand) | T_ARRAY |
|
|
| `load_index` (object operand) | T_ARRAY |
|
|
| `load_index` (index operand) | T_INT |
|
|
| `load_field` (object operand) | T_RECORD |
|
|
| `pop` (array operand) | T_ARRAY |
|
|
|
|
Typed comparison operators (`eq_int`, `lt_float`, `lt_text`, etc.) and typed boolean comparisons (`eq_bool`, `ne_bool`) are excluded from backward inference. These ops always appear inside guard dispatch patterns (`is_type` + `jump_false` + typed_op), where mutually exclusive branches use the same slot with different types. Including them would merge conflicting types (e.g., T_INT from `lt_int` + T_FLOAT from `lt_float` + T_TEXT from `lt_text`) into T_UNKNOWN, losing all type information. Only unconditionally executed ops contribute to backward inference.
|
|
|
|
Note: `add` infers T_NUM even though it is polymorphic (numeric addition or text concatenation). When `add` appears in the IR, both operands have already passed a `is_num` guard, so they are guaranteed to be numeric. The text concatenation path uses `concat` instead.
|
|
|
|
When a slot appears with conflicting type inferences, the merge widens: INT + FLOAT → NUM, INT + NUM → NUM, FLOAT + NUM → NUM. Incompatible types (e.g., NUM + TEXT) produce `unknown`.
|
|
|
|
**Nop prefix:** none (analysis only, does not modify instructions)
|
|
|
|
### 2. infer_slot_write_types (slot write-type invariance)
|
|
|
|
Scans all instructions to determine which non-parameter slots have a consistent write type. If every instruction that writes to a given slot produces the same type, that type is globally invariant and can safely persist across label join points.
|
|
|
|
This analysis is sound because:
|
|
- `alloc_slot()` in mcode.cm is monotonically increasing — temp slots are never reused
|
|
- All local variable declarations must be at function body level and initialized — slots are written before any backward jumps to loop headers
|
|
- `move` is conservatively treated as T_UNKNOWN, avoiding unsound transitive assumptions
|
|
|
|
Write type mapping:
|
|
|
|
| Instruction class | Write type |
|
|
|---|---|
|
|
| `int` | T_INT |
|
|
| `true`, `false` | T_BOOL |
|
|
| `null` | T_NULL |
|
|
| `access` | type of literal value |
|
|
| `array` | T_ARRAY |
|
|
| `record` | T_RECORD |
|
|
| `function` | T_FUNCTION |
|
|
| `length` | T_INT |
|
|
| bitwise ops | T_INT |
|
|
| `concat` | T_TEXT |
|
|
| `negate` | T_NUM |
|
|
| `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow` | T_NUM |
|
|
| bool ops, comparisons, `in` | T_BOOL |
|
|
| `move`, `load_field`, `load_index`, `load_dynamic`, `pop`, `get` | T_UNKNOWN |
|
|
| `invoke`, `tail_invoke` | T_UNKNOWN |
|
|
|
|
The result is a map of slot→type for slots where all writes agree on a single known type. Parameter slots (1..nr_args) and slot 0 are excluded.
|
|
|
|
Common patterns this enables:
|
|
|
|
- **Length variables** (`var len = length(arr)`): written by `length` (T_INT) only → invariant T_INT
|
|
- **Boolean flags** (`var found = false; ... found = true`): written by `false` and `true` → invariant T_BOOL
|
|
- **Locally-created containers** (`var arr = []`): written by `array` only → invariant T_ARRAY
|
|
- **Numeric accumulators** (`var sum = 0; sum = sum - x`): written by `access 0` (T_INT) and `subtract` (T_NUM) → merges to T_NUM
|
|
|
|
Note: Loop counters using `+` (`var i = 0; i = i + 1`) may not achieve write-type invariance because the `+` operator emits a guard dispatch with both `concat` (T_TEXT) and `add` (T_NUM) paths writing to the same temp slot, producing T_UNKNOWN. However, when one operand is a known number literal, `mcode.cm` emits a numeric-only path (see "Known-Number Add Shortcut" below), avoiding the text dispatch. Other arithmetic ops (`-`, `*`, `/`, `%`, `**`) always emit a single numeric write path and work cleanly with write-type analysis.
|
|
|
|
**Nop prefix:** none (analysis only, does not modify instructions)
|
|
|
|
### 3. eliminate_type_checks (type-check + jump elimination)
|
|
|
|
Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.
|
|
|
|
Five cases:
|
|
|
|
- **Known match** (e.g., `is_int` on a slot known to be `int`): both the check and the conditional jump are eliminated (nop'd).
|
|
- **Subsumption match** (e.g., `is_num` on a slot known to be `int` or `float`): since `int` and `float` are subtypes of `num`, both the check and jump are eliminated.
|
|
- **Subsumption partial** (e.g., `is_int` on a slot known to be `num`): the `num` type could be `int` or `float`, so the check must remain. On fallthrough, the slot narrows to the checked subtype (`int`). This is NOT a mismatch — `num` values can pass an `is_int` check.
|
|
- **Known mismatch** (e.g., `is_text` on a slot known to be `int`): the check is nop'd and the conditional jump is rewritten to an unconditional `jump`.
|
|
- **Unknown**: the check remains, but on fallthrough, the slot's type is narrowed to the checked type (enabling downstream eliminations).
|
|
|
|
This pass also reduces `load_dynamic`/`store_dynamic` to `load_field`/`store_field` or `load_index`/`store_index` when the key slot's type is known.
|
|
|
|
At label join points, all type information is reset except for parameter types from backward inference and write-invariant types from slot write-type analysis.
|
|
|
|
**Nop prefix:** `_nop_tc_`
|
|
|
|
### 4. simplify_algebra (same-slot comparison folding)
|
|
|
|
Tracks known constant values. Folds same-slot comparisons:
|
|
|
|
| Pattern | Rewrite |
|
|
|---------|---------|
|
|
| `eq_* dest, x, x` | `true dest` |
|
|
| `le_* dest, x, x` | `true dest` |
|
|
| `ge_* dest, x, x` | `true dest` |
|
|
| `is_identical dest, x, x` | `true dest` |
|
|
| `ne_* dest, x, x` | `false dest` |
|
|
| `lt_* dest, x, x` | `false dest` |
|
|
| `gt_* dest, x, x` | `false dest` |
|
|
|
|
**Nop prefix:** none (rewrites in place, does not create nops)
|
|
|
|
### 5. simplify_booleans (not + jump fusion)
|
|
|
|
Peephole pass that eliminates unnecessary `not` instructions:
|
|
|
|
| Pattern | Rewrite |
|
|
|---------|---------|
|
|
| `not d, x; jump_false d, L` | nop; `jump_true x, L` |
|
|
| `not d, x; jump_true d, L` | nop; `jump_false x, L` |
|
|
| `not d1, x; not d2, d1` | nop; `move d2, x` |
|
|
|
|
This is particularly effective on `if (!cond)` patterns, which the compiler generates as `not; jump_false`. After this pass, they become a single `jump_true`.
|
|
|
|
**Nop prefix:** `_nop_bl_`
|
|
|
|
### 6. eliminate_moves (self-move elimination)
|
|
|
|
Removes `move a, a` instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves.
|
|
|
|
**Nop prefix:** `_nop_mv_`
|
|
|
|
### 7. eliminate_unreachable (dead code after return)
|
|
|
|
Nops instructions after `return` until the next real label. Only `return` is treated as a terminal instruction; `disrupt` is not, because the disruption handler code immediately follows `disrupt` and must remain reachable.
|
|
|
|
The mcode compiler emits a label at disruption handler entry points (see `emit_label(gen_label("disruption"))` in mcode.cm), which provides the label boundary that stops this pass from eliminating handler code.
|
|
|
|
**Nop prefix:** `_nop_ur_`
|
|
|
|
### 8. eliminate_dead_jumps (jump-to-next-label elimination)
|
|
|
|
Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.
|
|
|
|
**Nop prefix:** `_nop_dj_`
|
|
|
|
## Pass Composition
|
|
|
|
All passes run in sequence in `optimize_function`:
|
|
|
|
```
|
|
infer_param_types → returns param_types map
|
|
infer_slot_write_types → returns write_types map
|
|
eliminate_type_checks → uses param_types + write_types
|
|
simplify_algebra
|
|
simplify_booleans
|
|
eliminate_moves
|
|
eliminate_unreachable
|
|
eliminate_dead_jumps
|
|
```
|
|
|
|
Each pass is independent and can be commented out for testing or benchmarking.
|
|
|
|
## Intrinsic Inlining
|
|
|
|
Before streamlining, `mcode.cm` recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence. This reduces a 6-instruction call pattern to a single instruction:
|
|
|
|
| Call | Emitted opcode |
|
|
|------|---------------|
|
|
| `is_array(x)` | `is_array dest, src` |
|
|
| `is_function(x)` | `is_func dest, src` |
|
|
| `is_object(x)` | `is_record dest, src` |
|
|
| `is_stone(x)` | `is_stone dest, src` |
|
|
| `is_integer(x)` | `is_int dest, src` |
|
|
| `is_text(x)` | `is_text dest, src` |
|
|
| `is_number(x)` | `is_num dest, src` |
|
|
| `is_logical(x)` | `is_bool dest, src` |
|
|
| `is_null(x)` | `is_null dest, src` |
|
|
| `length(x)` | `length dest, src` |
|
|
| `push(arr, val)` | `push arr, val` |
|
|
|
|
These inlined opcodes have corresponding Mach VM implementations in `mach.c`.
|
|
|
|
## Unified Arithmetic
|
|
|
|
Arithmetic operations use generic opcodes: `add`, `subtract`, `multiply`, `divide`, `modulo`, `pow`, `negate`. There are no type-dispatched variants (e.g., no `add_int`/`add_float`).
|
|
|
|
The Mach VM handles arithmetic inline with a two-tier fast path. Since mcode's type guard dispatch guarantees both operands are numbers by the time arithmetic executes, the VM does not need polymorphic dispatch:
|
|
|
|
1. **Int-int fast path**: `JS_VALUE_IS_BOTH_INT` → native integer arithmetic with overflow check. If the result fits int32, returns int32; otherwise promotes to float64.
|
|
2. **Float fallback**: `JS_ToFloat64` both operands → native floating-point arithmetic. Non-finite results (infinity, NaN) produce null.
|
|
|
|
Division and modulo additionally check for zero divisor (→ null). Power uses `pow()` with non-finite handling.
|
|
|
|
The legacy `reg_vm_binop()` function remains available for comparison operators and any non-mcode bytecode paths, but arithmetic ops no longer call it.
|
|
|
|
Bitwise operations (`shl`, `shr`, `ushr`, `bitand`, `bitor`, `bitxor`, `bitnot`) remain integer-only and disrupt if operands are not integers.
|
|
|
|
The QBE/native backend maps generic arithmetic to helper calls (`qbe.add`, `qbe.sub`, etc.). The vision for the native path is that with sufficient type inference, the backend can unbox proven-numeric values to raw registers, operate directly, and only rebox at boundaries (returns, calls, stores).
|
|
|
|
## Known-Number Add Shortcut
|
|
|
|
The `+` operator is the only arithmetic op that is polymorphic at the mcode level — `emit_add_decomposed` in `mcode.cm` emits a guard dispatch that checks for text (→ `concat`) before numeric (→ `add`). This dual dispatch means the temp slot is written by both `concat` (T_TEXT) and `add` (T_NUM), producing T_UNKNOWN in write-type analysis.
|
|
|
|
When either operand is a known number literal (e.g., `i + 1`, `x + 0.5`), `emit_add_decomposed` skips the text dispatch entirely and emits `emit_numeric_binop("add")` — a single `is_num` guard + `add` with no `concat` path. This is safe because text concatenation requires both operands to be text; a known number can never participate in concat.
|
|
|
|
This optimization eliminates 6-8 instructions from the add block (two `is_text` checks, two conditional jumps, `concat`, `jump`) and produces a clean single-type write path that works with write-type analysis.
|
|
|
|
Other arithmetic ops (`subtract`, `multiply`, etc.) always use `emit_numeric_binop` and never have this problem.
|
|
|
|
## Target Slot Propagation
|
|
|
|
For simple local variable assignments (`i = expr`), the mcode compiler passes the variable's register slot as a `target` to the expression compiler. Binary operations that use `emit_numeric_binop` (subtract, multiply, divide, modulo, pow) can write directly to the target slot instead of allocating a temp and emitting a `move`:
|
|
|
|
```
|
|
// Before: i = i - 1
|
|
subtract 7, 2, 6 // temp = i - 1
|
|
move 2, 7 // i = temp
|
|
|
|
// After: i = i - 1
|
|
subtract 2, 2, 6 // i = i - 1 (direct)
|
|
```
|
|
|
|
The `+` operator is excluded from target slot propagation when it would use the full text+num dispatch (i.e., when neither operand is a known number), because writing both `concat` and `add` to the variable's slot would pollute its write type. When the known-number shortcut applies, `+` uses `emit_numeric_binop` and would be safe for target propagation, but this is not currently implemented — the exclusion is by operator kind, not by dispatch path.
|
|
|
|
## Debugging Tools
|
|
|
|
CLI tools inspect the IR at different stages:
|
|
|
|
- **`cell mcode --pretty`** — prints the raw Mcode IR after `mcode.cm`, before streamlining
|
|
- **`cell streamline --stats`** — prints the IR after streamlining, with before/after instruction counts
|
|
- **`cell streamline --types`** — prints the streamlined IR with type annotations on each instruction
|
|
|
|
Usage:
|
|
```
|
|
cell mcode --pretty <file.ce|file.cm>
|
|
cell streamline --stats <file.ce|file.cm>
|
|
cell streamline --types <file.ce|file.cm>
|
|
```
|
|
|
|
## Tail Call Marking
|
|
|
|
When a function's return expression is a call (`stmt.tail == true` from the parser) and the function has no disruption handler, mcode.cm renames the final `invoke` instruction to `tail_invoke`. This is semantically identical to `invoke` in the current Mach VM, but marks the call site for future tail call optimization.
|
|
|
|
The disruption handler restriction exists because TCO would discard the current frame, but the handler must remain on the stack to catch disruptions from the callee.
|
|
|
|
`tail_invoke` is handled by the same passes as `invoke` in streamline (type tracking, algebraic simplification) and executes identically in the VM.
|
|
|
|
## Type Propagation Architecture
|
|
|
|
Type information flows through three compilation stages, each building on the previous:
|
|
|
|
### Stage 1: Parse-time type tags (parse.cm)
|
|
|
|
The parser assigns `type_tag` strings to scope variable entries when the type is syntactically obvious:
|
|
|
|
- **From initializers**: `def a = []` → `type_tag: "array"`, `def n = 42` → `type_tag: "integer"`, `def r = {}` → `type_tag: "record"`
|
|
- **From usage patterns** (def only): `def x = null; x[] = v` infers `type_tag: "array"` from the push. `def x = null; x.foo = v` infers `type_tag: "record"` from property access.
|
|
- **Type error detection** (def only): When a `def` variable has a known type_tag, provably wrong operations are compile errors:
|
|
- Property access (`.`) on array
|
|
- Push (`[]`) on non-array
|
|
- Text key on array
|
|
- Integer key on record
|
|
|
|
Only `def` (constant) variables participate in type inference and error detection. `var` variables can be reassigned, making their initializer type unreliable.
|
|
|
|
### Stage 2: Fold-time type propagation (fold.cm)
|
|
|
|
The fold pass extends type information through the AST:
|
|
|
|
- **Intrinsic folding**: `is_array(known_array)` folds to `true`. `length(known_array)` gets `hint: "array_length"`.
|
|
- **Purity analysis**: Expressions involving only `is_*` intrinsic calls with pure arguments are considered pure. This enables dead code elimination for unused `var`/`def` bindings with pure initializers, and elimination of standalone pure call statements.
|
|
- **Dead code**: Unused pure `var`/`def` declarations are removed. Standalone calls to pure intrinsics (where the result is discarded) are removed. Unreachable branches with constant conditions are removed.
|
|
|
|
The `pure_intrinsics` set currently contains only `is_*` sensory functions (`is_array`, `is_text`, `is_number`, `is_integer`, `is_function`, `is_logical`, `is_null`, `is_object`, `is_stone`). Other intrinsics like `text`, `number`, and `length` can disrupt on wrong argument types, so they are excluded — removing a call that would disrupt changes observable behavior.
|
|
|
|
### Stage 3: Streamline-time type tracking (streamline.cm)
|
|
|
|
The streamline optimizer uses a numeric type lattice (`T_INT`, `T_FLOAT`, `T_TEXT`, etc.) for fine-grained per-instruction tracking:
|
|
|
|
- **Backward inference** (pass 1): Scans typed operators to infer parameter types. Since parameters are `def` (immutable), inferred types persist across label boundaries.
|
|
- **Write-type invariance** (pass 2): Scans all instructions to find local slots where every write produces the same type. These invariant types persist across label boundaries alongside parameter types.
|
|
- **Forward tracking** (pass 3): `track_types` follows instruction execution order, tracking the type of each slot. Known-type operations set their destination type (e.g., `concat` → T_TEXT, `length` → T_INT). Generic arithmetic produces T_UNKNOWN. Type checks on unknown slots narrow the type on fallthrough.
|
|
- **Type check elimination** (pass 3): When a slot's type is already known, `is_<type>` + conditional jump pairs are eliminated or converted to unconditional jumps.
|
|
- **Dynamic access narrowing** (pass 3): `load_dynamic`/`store_dynamic` are narrowed to `load_field`/`store_field` or `load_index`/`store_index` when the key type is known.
|
|
|
|
Type information resets at label join points (since control flow merges could bring different types), except for parameter types from backward inference and write-invariant types from slot write-type analysis.
|
|
|
|
## Future Work
|
|
|
|
### Copy Propagation
|
|
|
|
A basic-block-local copy propagation pass would replace uses of a copied variable with its source, enabling further move elimination. An implementation was attempted but encountered an unsolved bug where 2-position instruction operand replacement produces incorrect code during self-hosting (the replacement logic for 3-position instructions works correctly). The root cause is not yet understood. See the project memory files for detailed notes.
|
|
|
|
### Expanded Purity Analysis
|
|
|
|
The current purity set is conservative (only `is_*`). It could be expanded by:
|
|
|
|
- **Argument-type-aware purity**: If all arguments to an intrinsic are known to be the correct types (via type_tag or slot_types), the call cannot disrupt and is safe to eliminate. For example, `length(known_array)` is pure but `length(unknown)` is not.
|
|
- **User function purity**: Analyze user-defined function bodies during pre_scan. A function is pure if its body contains only pure expressions and calls to known-pure functions. This requires fixpoint iteration for mutual recursion.
|
|
- **Callback-aware purity**: Intrinsics like `filter`, `find`, `reduce`, `some`, `every` are pure if their callback argument is pure.
|
|
|
|
### Move Type Resolution in Write-Type Analysis
|
|
|
|
Currently, `move` instructions produce T_UNKNOWN in write-type analysis. This prevents type propagation through moves — e.g., a slot written by `access 0` (T_INT) and `move` from an `add` result (T_NUM) merges to T_UNKNOWN instead of T_NUM.
|
|
|
|
A two-pass approach would fix this: first compute write types for all non-move instructions, then resolve moves by looking up the source slot's computed type. If the source has a known type, merge it into the destination; if unknown, skip the move (don't poison the destination with T_UNKNOWN).
|
|
|
|
This was implemented and tested but causes a bootstrap failure during self-hosting convergence. The root cause is not yet understood — the optimizer modifies its own bytecode, and the move resolution changes the type landscape enough to produce different code on each pass, preventing convergence. Further investigation is needed; the fix is correct in isolation but interacts badly with the self-hosting fixed-point iteration.
|
|
|
|
### Target Slot Propagation for Add with Known Numbers
|
|
|
|
When the known-number add shortcut applies (one operand is a literal number), the generated code uses `emit_numeric_binop` which has a single write path. Target slot propagation should be safe in this case, but is currently blocked by the blanket `kind != "+"` exclusion. Refining the exclusion to check whether the shortcut will apply (by testing `is_known_number` on either operand) would enable direct writes for patterns like `i = i + 1`.
|
|
|
|
### Forward Type Narrowing from Typed Operations
|
|
|
|
With unified arithmetic (generic `add`/`subtract`/`multiply`/`divide`/`modulo`/`negate` instead of typed variants), this approach is no longer applicable. Typed comparisons (`eq_int`, `lt_float`, etc.) still exist and their operands have known types, but these are already handled by backward inference.
|
|
|
|
### Guard Hoisting for Parameters
|
|
|
|
When a type check on a parameter passes (falls through), the parameter's type could be promoted to `param_types` so it persists across label boundaries. This would allow the first type check on a parameter to prove its type for the entire function. However, this is unsound for polymorphic parameters — if a function is called with different argument types, the first check would wrongly eliminate checks for subsequent types.
|
|
|
|
A safe version would require proving that a parameter is monomorphic (called with only one type across all call sites), which requires interprocedural analysis.
|
|
|
|
**Note:** For local variables (non-parameters), the write-type invariance analysis (pass 2) achieves a similar effect safely — if every write to a slot produces the same type, that type persists across labels without needing to hoist any guard.
|
|
|
|
### Tail Call Optimization
|
|
|
|
`tail_invoke` instructions are currently marked but execute identically to `invoke`. Actual TCO would reuse the current call frame instead of creating a new one. This requires:
|
|
|
|
- Ensuring argument count matches (or the frame can be resized)
|
|
- No live locals needed after the call (guaranteed by tail position)
|
|
- No disruption handler on the current function (already enforced by the marking)
|
|
- VM support in mach.c to rewrite the frame in place
|
|
|
|
### Interprocedural Type Inference
|
|
|
|
Currently all type inference is intraprocedural (within a single function). Cross-function analysis could:
|
|
|
|
- Infer return types from function bodies
|
|
- Propagate argument types from call sites to callees
|
|
- Specialize functions for known argument types (cloning)
|
|
|
|
### Strength Reduction
|
|
|
|
Common patterns that could be lowered to cheaper operations when operand types are known:
|
|
|
|
- `multiply x, 2` with proven-int operands → shift left
|
|
- `divide x, 2` with proven-int → arithmetic shift right
|
|
- `modulo x, power_of_2` with proven-int → bitwise and
|
|
|
|
### Numeric Unboxing (QBE/native path)
|
|
|
|
With unified arithmetic and backward type inference, the native backend can identify regions where numeric values remain in registers without boxing/unboxing:
|
|
|
|
1. **Guard once**: When backward inference proves a parameter is T_NUM, emit a single type guard at function entry.
|
|
2. **Unbox**: Convert the tagged JSValue to a raw double register.
|
|
3. **Operate**: Use native FP/int instructions directly (no function calls, no tag checks).
|
|
4. **Rebox**: Convert back to tagged JSValue only at rebox points (function returns, calls, stores to arrays/records).
|
|
|
|
This requires inserting `unbox`/`rebox` IR annotations (no-ops in the Mach VM, meaningful only to QBE).
|
|
|
|
### Loop-Invariant Code Motion
|
|
|
|
Type checks that are invariant across loop iterations (checking a variable that doesn't change in the loop body) could be hoisted above the loop. This would require identifying loop boundaries and proving invariance.
|
|
|
|
### Algebraic Identity Optimization
|
|
|
|
With unified arithmetic, algebraic identities (x+0→x, x*1→x, x*0→0, x/1→x) require knowing operand values at compile time. Since generic `add`/`multiply` operate on any numeric type, the constant-tracking logic in `simplify_algebra` could be extended to handle these for known-constant slots.
|
|
|
|
## Nop Convention
|
|
|
|
Eliminated instructions are replaced with strings matching `_nop_<prefix>_<counter>`. The prefix identifies which pass created the nop. Nop strings are:
|
|
|
|
- Skipped during interpretation (the VM ignores them)
|
|
- Skipped during QBE emission
|
|
- Not counted in instruction statistics
|
|
- Preserved in the instruction array to maintain positional stability for jump targets
|