cell/streamline.md at f7e2ff13b5123e26bf217644457e14b10a848487

john/cell

Files

John Alanbrook f7e2ff13b5 guard hoisting

2026-02-13 06:32:58 -06:00

18 KiB

Raw Blame History

title, description

title	description
Streamline Optimizer	Mcode IR optimization passes

Overview

The streamline optimizer (streamline.cm) runs a series of independent passes over the Mcode IR to eliminate redundant operations. Each pass is a standalone function that can be enabled, disabled, or reordered. Passes communicate only through the instruction array they mutate in place, replacing eliminated instructions with nop strings (e.g., _nop_tc_1).

The optimizer runs after mcode.cm generates the IR and before the result is lowered to the Mach VM or emitted as QBE IL.

Fold (AST) → Mcode (JSON IR) → Streamline → Mach VM / QBE

Type Lattice

The optimizer tracks a type for each slot in the register file:

Type	Meaning
`unknown`	No type information
`int`	Integer
`float`	Floating-point
`num`	Number (subsumes int and float)
`text`	String
`bool`	Logical (true/false)
`null`	Null value
`array`	Array
`record`	Record (object)
`function`	Function
`blob`	Binary blob

Subsumption: int and float both satisfy a num check.

Passes

1. infer_param_types (backward type inference)

Scans all typed operators to determine what types their operands must be. For example, add_int dest, a, b implies both a and b are integers.

When a parameter slot (1..nr_args) is consistently inferred as a single type, that type is recorded. Since parameters are immutable (def), the inferred type holds for the entire function and persists across label join points (loop headers, branch targets).

Backward inference rules:

Operator class	Operand type inferred
`add_int`, `sub_int`, `mul_int`, `div_int`, `mod_int`, `eq_int`, comparisons, bitwise	T_INT
`add_float`, `sub_float`, `mul_float`, `div_float`, `mod_float`, float comparisons	T_FLOAT
`concat`, text comparisons	T_TEXT
`eq_bool`, `ne_bool`, `not`, `and`, `or`	T_BOOL
`store_index` (object operand)	T_ARRAY
`store_index` (index operand)	T_INT
`store_field` (object operand)	T_RECORD
`push` (array operand)	T_ARRAY
`load_index` (object operand)	T_ARRAY
`load_index` (index operand)	T_INT
`load_field` (object operand)	T_RECORD
`pop` (array operand)	T_ARRAY

When a slot appears with conflicting type inferences (e.g., used in both add_int and concat across different type-dispatch branches), the result is unknown. INT + FLOAT conflicts produce num.

Nop prefix: none (analysis only, does not modify instructions)

2. infer_slot_write_types (slot write-type invariance)

Scans all instructions to determine which non-parameter slots have a consistent write type. If every instruction that writes to a given slot produces the same type, that type is globally invariant and can safely persist across label join points.

This analysis is sound because:

alloc_slot() in mcode.cm is monotonically increasing — temp slots are never reused
All local variable declarations must be at function body level and initialized — slots are written before any backward jumps to loop headers
move is conservatively treated as T_UNKNOWN, avoiding unsound transitive assumptions

Write type mapping:

Instruction class	Write type
`int`	T_INT
`true`, `false`	T_BOOL
`null`	T_NULL
`access`	type of literal value
`array`	T_ARRAY
`record`	T_RECORD
`function`	T_FUNCTION
`length`	T_INT
int arithmetic, `neg_int`, bitwise ops	T_INT
float arithmetic, `neg_float`	T_FLOAT
`concat`	T_TEXT
bool ops, comparisons, `in`	T_BOOL
generic arithmetic (`add`, `subtract`, etc.)	T_UNKNOWN
`move`, `load_field`, `load_index`, `load_dynamic`, `pop`, `get`	T_UNKNOWN
`invoke`, `tail_invoke`	T_UNKNOWN

The result is a map of slot→type for slots where all writes agree on a single known type. Parameter slots (1..nr_args) and slot 0 are excluded.

Common patterns this enables:

Loop counters (var i = 0; ... i = i + 1): written by int (T_INT) and add_int (T_INT) → invariant T_INT
Length variables (var len = length(arr)): written by length (T_INT) only → invariant T_INT
Boolean flags (var found = false; ... found = true): written by false and true → invariant T_BOOL
Locally-created containers (var arr = []): written by array only → invariant T_ARRAY

Nop prefix: none (analysis only, does not modify instructions)

3. eliminate_type_checks (type-check + jump elimination)

Forward pass that tracks the known type of each slot. When a type check (is_int, is_text, is_num, etc.) is followed by a conditional jump, and the slot's type is already known, the check and jump can be eliminated or converted to an unconditional jump.

Three cases:

Known match (e.g., is_int on a slot known to be int): both the check and the conditional jump are eliminated (nop'd).
Known mismatch (e.g., is_text on a slot known to be int): the check is nop'd and the conditional jump is rewritten to an unconditional jump.
Unknown: the check remains, but on fallthrough, the slot's type is narrowed to the checked type (enabling downstream eliminations).

This pass also reduces load_dynamic/store_dynamic to load_field/store_field or load_index/store_index when the key slot's type is known.

At label join points, all type information is reset except for parameter types from backward inference and write-invariant types from slot write-type analysis.

Nop prefix: _nop_tc_

4. simplify_algebra (algebraic identity + comparison folding)

Tracks known constant values alongside types. Rewrites identity operations:

Pattern	Rewrite
`add_int dest, x, 0`	`move dest, x`
`add_int dest, 0, x`	`move dest, x`
`sub_int dest, x, 0`	`move dest, x`
`mul_int dest, x, 1`	`move dest, x`
`mul_int dest, 1, x`	`move dest, x`
`mul_int dest, x, 0`	`int dest, 0`
`div_int dest, x, 1`	`move dest, x`
`add_float dest, x, 0`	`move dest, x`
`mul_float dest, x, 1`	`move dest, x`
`div_float dest, x, 1`	`move dest, x`

Float multiplication by zero is intentionally not optimized because it is not safe with NaN and Inf values.

Same-slot comparison folding:

Pattern	Rewrite
`eq_* dest, x, x`	`true dest`
`le_* dest, x, x`	`true dest`
`ge_* dest, x, x`	`true dest`
`is_identical dest, x, x`	`true dest`
`ne_* dest, x, x`	`false dest`
`lt_* dest, x, x`	`false dest`
`gt_* dest, x, x`	`false dest`

Nop prefix: none (rewrites in place, does not create nops)

5. simplify_booleans (not + jump fusion)

Peephole pass that eliminates unnecessary not instructions:

Pattern	Rewrite
`not d, x; jump_false d, L`	nop; `jump_true x, L`
`not d, x; jump_true d, L`	nop; `jump_false x, L`
`not d1, x; not d2, d1`	nop; `move d2, x`

This is particularly effective on if (!cond) patterns, which the compiler generates as not; jump_false. After this pass, they become a single jump_true.

Nop prefix: _nop_bl_

6. eliminate_moves (self-move elimination)

Removes move a, a instructions where the source and destination are the same slot. These can arise from earlier passes rewriting binary operations into moves.

Nop prefix: _nop_mv_

7. eliminate_unreachable (dead code after return)

Nops instructions after return until the next real label. Only return is treated as a terminal instruction; disrupt is not, because the disruption handler code immediately follows disrupt and must remain reachable.

The mcode compiler emits a label at disruption handler entry points (see emit_label(gen_label("disruption")) in mcode.cm), which provides the label boundary that stops this pass from eliminating handler code.

Nop prefix: _nop_ur_

8. eliminate_dead_jumps (jump-to-next-label elimination)

Removes jump L instructions where L is the immediately following label (skipping over any intervening nop strings). These are common after other passes eliminate conditional branches, leaving behind jumps that fall through naturally.

Nop prefix: _nop_dj_

Pass Composition

All passes run in sequence in optimize_function:

infer_param_types        → returns param_types map
infer_slot_write_types   → returns write_types map
eliminate_type_checks    → uses param_types + write_types
simplify_algebra
simplify_booleans
eliminate_moves
eliminate_unreachable
eliminate_dead_jumps

Each pass is independent and can be commented out for testing or benchmarking.

Intrinsic Inlining

Before streamlining, mcode.cm recognizes calls to built-in intrinsic functions and emits direct opcodes instead of the generic frame/setarg/invoke call sequence. This reduces a 6-instruction call pattern to a single instruction:

Call	Emitted opcode
`is_array(x)`	`is_array dest, src`
`is_function(x)`	`is_func dest, src`
`is_object(x)`	`is_record dest, src`
`is_stone(x)`	`is_stone dest, src`
`is_integer(x)`	`is_int dest, src`
`is_text(x)`	`is_text dest, src`
`is_number(x)`	`is_num dest, src`
`is_logical(x)`	`is_bool dest, src`
`is_null(x)`	`is_null dest, src`
`length(x)`	`length dest, src`
`push(arr, val)`	`push arr, val`

These inlined opcodes have corresponding Mach VM implementations in mach.c.

Debugging Tools

Three dump tools inspect the IR at different stages:

dump_mcode.cm — prints the raw Mcode IR after mcode.cm, before streamlining
dump_stream.cm — prints the IR after streamlining, with before/after instruction counts
dump_types.cm — prints the streamlined IR with type annotations on each instruction

Usage:

./cell --core . dump_mcode.cm <file.ce|file.cm>
./cell --core . dump_stream.cm <file.ce|file.cm>
./cell --core . dump_types.cm <file.ce|file.cm>

Tail Call Marking

When a function's return expression is a call (stmt.tail == true from the parser) and the function has no disruption handler, mcode.cm renames the final invoke instruction to tail_invoke. This is semantically identical to invoke in the current Mach VM, but marks the call site for future tail call optimization.

The disruption handler restriction exists because TCO would discard the current frame, but the handler must remain on the stack to catch disruptions from the callee.

tail_invoke is handled by the same passes as invoke in streamline (type tracking, algebraic simplification) and executes identically in the VM.

Type Propagation Architecture

Type information flows through three compilation stages, each building on the previous:

Stage 1: Parse-time type tags (parse.cm)

The parser assigns type_tag strings to scope variable entries when the type is syntactically obvious:

From initializers: def a = [] → type_tag: "array", def n = 42 → type_tag: "integer", def r = {} → type_tag: "record"
From usage patterns (def only): def x = null; x[] = v infers type_tag: "array" from the push. def x = null; x.foo = v infers type_tag: "record" from property access.
Type error detection (def only): When a def variable has a known type_tag, provably wrong operations are compile errors:
- Property access (.) on array
- Push ([]) on non-array
- Text key on array
- Integer key on record

Only def (constant) variables participate in type inference and error detection. var variables can be reassigned, making their initializer type unreliable.

Stage 2: Fold-time type propagation (fold.cm)

The fold pass extends type information through the AST:

Intrinsic folding: is_array(known_array) folds to true. length(known_array) gets hint: "array_length".
Purity analysis: Expressions involving only is_* intrinsic calls with pure arguments are considered pure. This enables dead code elimination for unused var/def bindings with pure initializers, and elimination of standalone pure call statements.
Dead code: Unused pure var/def declarations are removed. Standalone calls to pure intrinsics (where the result is discarded) are removed. Unreachable branches with constant conditions are removed.

The pure_intrinsics set currently contains only is_* sensory functions (is_array, is_text, is_number, is_integer, is_function, is_logical, is_null, is_object, is_stone). Other intrinsics like text, number, and length can disrupt on wrong argument types, so they are excluded — removing a call that would disrupt changes observable behavior.

Stage 3: Streamline-time type tracking (streamline.cm)

The streamline optimizer uses a numeric type lattice (T_INT, T_FLOAT, T_TEXT, etc.) for fine-grained per-instruction tracking:

Backward inference (pass 1): Scans typed operators to infer parameter types. Since parameters are def (immutable), inferred types persist across label boundaries.
Write-type invariance (pass 2): Scans all instructions to find local slots where every write produces the same type. These invariant types persist across label boundaries alongside parameter types.
Forward tracking (pass 3): track_types follows instruction execution order, tracking the type of each slot. Typed arithmetic results set their destination type. Type checks on unknown slots narrow the type on fallthrough.
Type check elimination (pass 3): When a slot's type is already known, is_<type> + conditional jump pairs are eliminated or converted to unconditional jumps.
Dynamic access narrowing (pass 3): load_dynamic/store_dynamic are narrowed to load_field/store_field or load_index/store_index when the key type is known.

Type information resets at label join points (since control flow merges could bring different types), except for parameter types from backward inference and write-invariant types from slot write-type analysis.

Future Work

Copy Propagation

A basic-block-local copy propagation pass would replace uses of a copied variable with its source, enabling further move elimination. An implementation was attempted but encountered an unsolved bug where 2-position instruction operand replacement produces incorrect code during self-hosting (the replacement logic for 3-position instructions works correctly). The root cause is not yet understood. See the project memory files for detailed notes.

Expanded Purity Analysis

The current purity set is conservative (only is_*). It could be expanded by:

Argument-type-aware purity: If all arguments to an intrinsic are known to be the correct types (via type_tag or slot_types), the call cannot disrupt and is safe to eliminate. For example, length(known_array) is pure but length(unknown) is not.
User function purity: Analyze user-defined function bodies during pre_scan. A function is pure if its body contains only pure expressions and calls to known-pure functions. This requires fixpoint iteration for mutual recursion.
Callback-aware purity: Intrinsics like filter, find, reduce, some, every are pure if their callback argument is pure.

Forward Type Narrowing from Typed Operations

After a typed operation like add_int dest, a, b executes successfully, we know a and b are integers. This could be used to eliminate subsequent type checks on the same slots within a basic block. An implementation was attempted but caused intermittent GC crashes during self-hosting, suggesting the type narrowing interacted badly with the runtime's garbage collector (possibly through changed instruction timing or register pressure). The approach is sound in principle but needs careful investigation of the GC interaction.

Guard Hoisting for Parameters

When a type check on a parameter passes (falls through), the parameter's type could be promoted to param_types so it persists across label boundaries. This would allow the first type check on a parameter to prove its type for the entire function. However, this is unsound for polymorphic parameters — if a function is called with different argument types, the first check would wrongly eliminate checks for subsequent types.

A safe version would require proving that a parameter is monomorphic (called with only one type across all call sites), which requires interprocedural analysis.

Note: For local variables (non-parameters), the write-type invariance analysis (pass 2) achieves a similar effect safely — if every write to a slot produces the same type, that type persists across labels without needing to hoist any guard.

Tail Call Optimization

tail_invoke instructions are currently marked but execute identically to invoke. Actual TCO would reuse the current call frame instead of creating a new one. This requires:

Ensuring argument count matches (or the frame can be resized)
No live locals needed after the call (guaranteed by tail position)
No disruption handler on the current function (already enforced by the marking)
VM support in mach.c to rewrite the frame in place

Interprocedural Type Inference

Currently all type inference is intraprocedural (within a single function). Cross-function analysis could:

Infer return types from function bodies
Propagate argument types from call sites to callees
Specialize functions for known argument types (cloning)

Strength Reduction

Common patterns that could be lowered to cheaper operations:

mul_int x, 2 → add_int x, x (shift left)
div_int x, 2 → arithmetic shift right
mod_int x, power_of_2 → bitwise and

Loop-Invariant Code Motion

Type checks that are invariant across loop iterations (checking a variable that doesn't change in the loop body) could be hoisted above the loop. This would require identifying loop boundaries and proving invariance.

Nop Convention

Eliminated instructions are replaced with strings matching _nop_<prefix>_<counter>. The prefix identifies which pass created the nop. Nop strings are:

Skipped during interpretation (the VM ignores them)
Skipped during QBE emission
Not counted in instruction statistics
Preserved in the instruction array to maintain positional stability for jump targets

18 KiB Raw Blame History