Files
cell/docs/spec/streamline.md
2026-02-12 09:43:13 -06:00

203 lines
8.1 KiB
Markdown

---
title: "Streamline Optimizer"
description: "Mcode IR optimization passes"
---
## Overview
The streamline optimizer (`streamline.cm`) runs a series of independent passes over the Mcode IR to eliminate redundant operations. Each pass is a standalone function that can be enabled, disabled, or reordered. Passes communicate only through the instruction array they mutate in place, replacing eliminated instructions with nop strings (e.g., `_nop_tc_1`).
The optimizer runs after `mcode.cm` generates the IR and before the result is lowered to the Mach VM or emitted as QBE IL.
```
Fold (AST) → Mcode (JSON IR) → Streamline → Mach VM / QBE
```
## Type Lattice
The optimizer tracks a type for each slot in the register file:
| Type | Meaning |
|------|---------|
| `unknown` | No type information |
| `int` | Integer |
| `float` | Floating-point |
| `num` | Number (subsumes int and float) |
| `text` | String |
| `bool` | Logical (true/false) |
| `null` | Null value |
| `array` | Array |
| `record` | Record (object) |
| `function` | Function |
| `blob` | Binary blob |
Subsumption: `int` and `float` both satisfy a `num` check.
## Passes
### 1. infer_param_types (backward type inference)
Scans all typed operators to determine what types their operands must be. For example, `add_int dest, a, b` implies both `a` and `b` are integers.
When a parameter slot (1..nr_args) is consistently inferred as a single type, that type is recorded. Since parameters are immutable (`def`), the inferred type holds for the entire function and persists across label join points (loop headers, branch targets).
Backward inference rules:
| Operator class | Operand type inferred |
|---|---|
| `add_int`, `sub_int`, `mul_int`, `div_int`, `mod_int`, `eq_int`, comparisons, bitwise | T_INT |
| `add_float`, `sub_float`, `mul_float`, `div_float`, `mod_float`, float comparisons | T_FLOAT |
| `concat`, text comparisons | T_TEXT |
| `eq_bool`, `ne_bool`, `not`, `and`, `or` | T_BOOL |
| `store_index` (object operand) | T_ARRAY |
| `store_index` (index operand) | T_INT |
| `store_field` (object operand) | T_RECORD |
| `push` (array operand) | T_ARRAY |
When a slot appears with conflicting type inferences (e.g., used in both `add_int` and `concat` across different type-dispatch branches), the result is `unknown`. INT + FLOAT conflicts produce `num`.
**Nop prefix:** none (analysis only, does not modify instructions)
### 2. eliminate_type_checks (type-check + jump elimination)
Forward pass that tracks the known type of each slot. When a type check (`is_int`, `is_text`, `is_num`, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.
Three cases:
- **Known match** (e.g., `is_int` on a slot known to be `int`): both the check and the conditional jump are eliminated (nop'd).
- **Known mismatch** (e.g., `is_text` on a slot known to be `int`): the check is nop'd and the conditional jump is rewritten to an unconditional `jump`.
- **Unknown**: the check remains, but on fallthrough, the slot's type is narrowed to the checked type (enabling downstream eliminations).
This pass also reduces `load_dynamic`/`store_dynamic` to `load_field`/`store_field` or `load_index`/`store_index` when the key slot's type is known.
At label join points, all type information is reset except for parameter types seeded by the backward inference pass.
**Nop prefix:** `_nop_tc_`
### 3. simplify_algebra (algebraic identity + comparison folding)
Tracks known constant values alongside types. Rewrites identity operations:
| Pattern | Rewrite |
|---------|---------|
| `add_int dest, x, 0` | `move dest, x` |
| `add_int dest, 0, x` | `move dest, x` |
| `sub_int dest, x, 0` | `move dest, x` |
| `mul_int dest, x, 1` | `move dest, x` |
| `mul_int dest, 1, x` | `move dest, x` |
| `mul_int dest, x, 0` | `int dest, 0` |
| `div_int dest, x, 1` | `move dest, x` |
| `add_float dest, x, 0` | `move dest, x` |
| `mul_float dest, x, 1` | `move dest, x` |
| `div_float dest, x, 1` | `move dest, x` |
Float multiplication by zero is intentionally not optimized because it is not safe with NaN and Inf values.
Same-slot comparison folding:
| Pattern | Rewrite |
|---------|---------|
| `eq_* dest, x, x` | `true dest` |
| `le_* dest, x, x` | `true dest` |
| `ge_* dest, x, x` | `true dest` |
| `is_identical dest, x, x` | `true dest` |
| `ne_* dest, x, x` | `false dest` |
| `lt_* dest, x, x` | `false dest` |
| `gt_* dest, x, x` | `false dest` |
**Nop prefix:** none (rewrites in place, does not create nops)
### 4. simplify_booleans (not + jump fusion)
Peephole pass that eliminates unnecessary `not` instructions:
| Pattern | Rewrite |
|---------|---------|
| `not d, x; jump_false d, L` | nop; `jump_true x, L` |
| `not d, x; jump_true d, L` | nop; `jump_false x, L` |
| `not d1, x; not d2, d1` | nop; `move d2, x` |
This is particularly effective on `if (!cond)` patterns, which the compiler generates as `not; jump_false`. After this pass, they become a single `jump_true`.
**Nop prefix:** `_nop_bl_`
### 5. eliminate_moves (self-move elimination)
Removes `move a, a` instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves.
**Nop prefix:** `_nop_mv_`
### 6. eliminate_unreachable (dead code after return/disrupt)
*Currently disabled.* Nops instructions after `return` or `disrupt` until the next real label.
Disabled because disruption handler code is placed after the `return`/`disrupt` instruction without a label boundary. The VM dispatches to handlers via the `disruption_pc` offset, not through normal control flow. Re-enabling this pass requires the mcode compiler to emit labels at disruption handler entry points.
**Nop prefix:** `_nop_ur_`
### 7. eliminate_dead_jumps (jump-to-next-label elimination)
Removes `jump L` instructions where `L` is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.
**Nop prefix:** `_nop_dj_`
## Pass Composition
All passes run in sequence in `optimize_function`:
```
infer_param_types → returns param_types map
eliminate_type_checks → uses param_types
simplify_algebra
simplify_booleans
eliminate_moves
(eliminate_unreachable) → disabled
eliminate_dead_jumps
```
Each pass is independent and can be commented out for testing or benchmarking.
## Intrinsic Inlining
Before streamlining, `mcode.cm` recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence. This reduces a 6-instruction call pattern to a single instruction:
| Call | Emitted opcode |
|------|---------------|
| `is_array(x)` | `is_array dest, src` |
| `is_function(x)` | `is_func dest, src` |
| `is_object(x)` | `is_record dest, src` |
| `is_stone(x)` | `is_stone dest, src` |
| `is_integer(x)` | `is_int dest, src` |
| `is_text(x)` | `is_text dest, src` |
| `is_number(x)` | `is_num dest, src` |
| `is_logical(x)` | `is_bool dest, src` |
| `is_null(x)` | `is_null dest, src` |
| `length(x)` | `length dest, src` |
| `push(arr, val)` | `push arr, val` |
These inlined opcodes have corresponding Mach VM implementations in `mach.c`.
## Debugging Tools
Three dump tools inspect the IR at different stages:
- **`dump_mcode.cm`** — prints the raw Mcode IR after `mcode.cm`, before streamlining
- **`dump_stream.cm`** — prints the IR after streamlining, with before/after instruction counts
- **`dump_types.cm`** — prints the streamlined IR with type annotations on each instruction
Usage:
```
./cell --core . dump_mcode.cm <file.ce|file.cm>
./cell --core . dump_stream.cm <file.ce|file.cm>
./cell --core . dump_types.cm <file.ce|file.cm>
```
## Nop Convention
Eliminated instructions are replaced with strings matching `_nop_<prefix>_<counter>`. The prefix identifies which pass created the nop. Nop strings are:
- Skipped during interpretation (the VM ignores them)
- Skipped during QBE emission
- Not counted in instruction statistics
- Preserved in the instruction array to maintain positional stability for jump targets