--- title: "Streamline Optimizer" description: "Mcode IR optimization passes" --- ## Overview The streamline optimizer (`streamline.cm`) runs a series of independent passes over the Mcode IR to eliminate redundant operations. Each pass is a standalone function that can be enabled, disabled, or reordered. Passes communicate only through the instruction array they mutate in place, replacing eliminated instructions with nop strings (e.g., `_nop_tc_1`). The optimizer runs after `mcode.cm` generates the IR and before the result is lowered to the Mach VM or emitted as QBE IL. ``` Fold (AST) → Mcode (JSON IR) → Streamline → Mach VM / QBE ``` ## Type Lattice The optimizer tracks a type for each slot in the register file: | Type | Meaning | |------|---------| | `unknown` | No type information | | `int` | Integer | | `float` | Floating-point | | `num` | Number (subsumes int and float) | | `text` | String | | `bool` | Logical (true/false) | | `null` | Null value | | `array` | Array | | `record` | Record (object) | | `function` | Function | | `blob` | Binary blob | Subsumption: `int` and `float` both satisfy a `num` check. ## Passes ### 1. infer_param_types (backward type inference) Scans all typed operators to determine what types their operands must be. For example, `add_int dest, a, b` implies both `a` and `b` are integers. When a parameter slot (1..nr_args) is consistently inferred as a single type, that type is recorded. Since parameters are immutable (`def`), the inferred type holds for the entire function and persists across label join points (loop headers, branch targets). Backward inference rules: | Operator class | Operand type inferred | |---|---| | `add_int`, `sub_int`, `mul_int`, `div_int`, `mod_int`, `eq_int`, comparisons, bitwise | T_INT | | `add_float`, `sub_float`, `mul_float`, `div_float`, `mod_float`, float comparisons | T_FLOAT | | `concat`, text comparisons | T_TEXT | | `eq_bool`, `ne_bool`, `not`, `and`, `or` | T_BOOL | | `store_index` (object operand) | T_ARRAY | | `store_index` (index operand) | T_INT | | `store_field` (object operand) | T_RECORD | | `push` (array operand) | T_ARRAY | When a slot appears with conflicting type inferences (e.g., used in both `add_int` and `concat` across different type-dispatch branches), the result is `unknown`. INT + FLOAT conflicts produce `num`. **Nop prefix:** none (analysis only, does not modify instructions) ### 2. eliminate_type_checks (type-check + jump elimination) Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump. Three cases: - **Known match** (e.g., `is_int` on a slot known to be `int`): both the check and the conditional jump are eliminated (nop'd). - **Known mismatch** (e.g., `is_text` on a slot known to be `int`): the check is nop'd and the conditional jump is rewritten to an unconditional `jump`. - **Unknown**: the check remains, but on fallthrough, the slot's type is narrowed to the checked type (enabling downstream eliminations). This pass also reduces `load_dynamic`/`store_dynamic` to `load_field`/`store_field` or `load_index`/`store_index` when the key slot's type is known. At label join points, all type information is reset except for parameter types seeded by the backward inference pass. **Nop prefix:** `_nop_tc_` ### 3. simplify_algebra (algebraic identity + comparison folding) Tracks known constant values alongside types. Rewrites identity operations: | Pattern | Rewrite | |---------|---------| | `add_int dest, x, 0` | `move dest, x` | | `add_int dest, 0, x` | `move dest, x` | | `sub_int dest, x, 0` | `move dest, x` | | `mul_int dest, x, 1` | `move dest, x` | | `mul_int dest, 1, x` | `move dest, x` | | `mul_int dest, x, 0` | `int dest, 0` | | `div_int dest, x, 1` | `move dest, x` | | `add_float dest, x, 0` | `move dest, x` | | `mul_float dest, x, 1` | `move dest, x` | | `div_float dest, x, 1` | `move dest, x` | Float multiplication by zero is intentionally not optimized because it is not safe with NaN and Inf values. Same-slot comparison folding: | Pattern | Rewrite | |---------|---------| | `eq_* dest, x, x` | `true dest` | | `le_* dest, x, x` | `true dest` | | `ge_* dest, x, x` | `true dest` | | `is_identical dest, x, x` | `true dest` | | `ne_* dest, x, x` | `false dest` | | `lt_* dest, x, x` | `false dest` | | `gt_* dest, x, x` | `false dest` | **Nop prefix:** none (rewrites in place, does not create nops) ### 4. simplify_booleans (not + jump fusion) Peephole pass that eliminates unnecessary `not` instructions: | Pattern | Rewrite | |---------|---------| | `not d, x; jump_false d, L` | nop; `jump_true x, L` | | `not d, x; jump_true d, L` | nop; `jump_false x, L` | | `not d1, x; not d2, d1` | nop; `move d2, x` | This is particularly effective on `if (!cond)` patterns, which the compiler generates as `not; jump_false`. After this pass, they become a single `jump_true`. **Nop prefix:** `_nop_bl_` ### 5. eliminate_moves (self-move elimination) Removes `move a, a` instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves. **Nop prefix:** `_nop_mv_` ### 6. eliminate_unreachable (dead code after return/disrupt) *Currently disabled.* Nops instructions after `return` or `disrupt` until the next real label. Disabled because disruption handler code is placed after the `return`/`disrupt` instruction without a label boundary. The VM dispatches to handlers via the `disruption_pc` offset, not through normal control flow. Re-enabling this pass requires the mcode compiler to emit labels at disruption handler entry points. **Nop prefix:** `_nop_ur_` ### 7. eliminate_dead_jumps (jump-to-next-label elimination) Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally. **Nop prefix:** `_nop_dj_` ## Pass Composition All passes run in sequence in `optimize_function`: ``` infer_param_types → returns param_types map eliminate_type_checks → uses param_types simplify_algebra simplify_booleans eliminate_moves (eliminate_unreachable) → disabled eliminate_dead_jumps ``` Each pass is independent and can be commented out for testing or benchmarking. ## Intrinsic Inlining Before streamlining, `mcode.cm` recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence. This reduces a 6-instruction call pattern to a single instruction: | Call | Emitted opcode | |------|---------------| | `is_array(x)` | `is_array dest, src` | | `is_function(x)` | `is_func dest, src` | | `is_object(x)` | `is_record dest, src` | | `is_stone(x)` | `is_stone dest, src` | | `is_integer(x)` | `is_int dest, src` | | `is_text(x)` | `is_text dest, src` | | `is_number(x)` | `is_num dest, src` | | `is_logical(x)` | `is_bool dest, src` | | `is_null(x)` | `is_null dest, src` | | `length(x)` | `length dest, src` | | `push(arr, val)` | `push arr, val` | These inlined opcodes have corresponding Mach VM implementations in `mach.c`. ## Debugging Tools Three dump tools inspect the IR at different stages: - **`dump_mcode.cm`** — prints the raw Mcode IR after `mcode.cm`, before streamlining - **`dump_stream.cm`** — prints the IR after streamlining, with before/after instruction counts - **`dump_types.cm`** — prints the streamlined IR with type annotations on each instruction Usage: ``` ./cell --core . dump_mcode.cm ./cell --core . dump_stream.cm ./cell --core . dump_types.cm ``` ## Nop Convention Eliminated instructions are replaced with strings matching `_nop__`. The prefix identifies which pass created the nop. Nop strings are: - Skipped during interpretation (the VM ignores them) - Skipped during QBE emission - Not counted in instruction statistics - Preserved in the instruction array to maintain positional stability for jump targets