Files
cell/docs/compiler-tools.md

8.9 KiB

title, description, weight, type
title description weight type
Compiler Inspection Tools Tools for inspecting and debugging the compiler pipeline 50 docs

ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code.

Pipeline Overview

The compiler runs in stages:

source → tokenize → parse → fold → mcode → streamline → output

Each stage has a corresponding dump tool that lets you see its output.

Stage Tool What it shows
fold dump_ast.cm Folded AST as JSON
mcode dump_mcode.cm Raw mcode IR before optimization
streamline dump_stream.cm Before/after instruction counts + IR
streamline dump_types.cm Optimized IR with type annotations
streamline streamline.ce Full optimized IR as JSON
all ir_report.ce Structured optimizer flight recorder

All tools take a source file as input and run the pipeline up to the relevant stage.

Quick Start

# see raw mcode IR
./cell --core . dump_mcode.cm myfile.ce

# see what the optimizer changed
./cell --core . dump_stream.cm myfile.ce

# full optimizer report with events
./cell --core . ir_report.ce --full myfile.ce

dump_ast.cm

Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.

./cell --core . dump_ast.cm <file.ce|file.cm>

dump_mcode.cm

Prints the raw mcode IR before any optimization. Shows the instruction array as formatted text with opcode, operands, and program counter.

./cell --core . dump_mcode.cm <file.ce|file.cm>

dump_stream.cm

Shows a before/after comparison of the optimizer. For each function, prints:

  • Instruction count before and after
  • Number of eliminated instructions
  • The streamlined IR (nops hidden by default)
./cell --core . dump_stream.cm <file.ce|file.cm>

dump_types.cm

Shows the optimized IR with type annotations. Each instruction is followed by the known types of its slot operands, inferred by walking the instruction stream.

./cell --core . dump_types.cm <file.ce|file.cm>

streamline.ce

Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to jq or saving for comparison.

./cell --core . streamline.ce <file.ce|file.cm>

ir_report.ce

The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.

./cell --core . ir_report.ce [options] <file.ce|file.cm>

Options

Flag Description
--summary Per-pass JSON summaries with instruction counts and timing (default)
--events Include rewrite events showing each optimization applied
--types Include type delta records showing inferred slot types
--ir-before=PASS Print canonical IR before a specific pass
--ir-after=PASS Print canonical IR after a specific pass
--ir-all Print canonical IR before and after all passes
--full Everything: summary + events + types + ir-all

With no flags, --summary is the default.

Output Format

Output is line-delimited JSON. Each line is a self-contained JSON object with a type field:

type: "pass" — Per-pass summary with categorized instruction counts before and after:

{
  "type": "pass",
  "pass": "eliminate_type_checks",
  "fn": "fib",
  "ms": 0.12,
  "changed": true,
  "before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...},
  "after":  {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...},
  "changes": {"guards_removed": 1, "nops_added": 1}
}

type: "event" — Individual rewrite event with before/after instructions and reasoning:

{
  "type": "event",
  "pass": "eliminate_type_checks",
  "rule": "incompatible_type_forces_jump",
  "at": 3,
  "before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]],
  "after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]],
  "why": {"slot": 2, "known_type": "float", "checked_type": "int"}
}

type: "types" — Inferred type information for a function:

{
  "type": "types",
  "fn": "fib",
  "param_types": {},
  "slot_types": {"25": "null"}
}

type: "ir" — Canonical IR text for a function at a specific point:

{
  "type": "ir",
  "when": "before",
  "pass": "all",
  "fn": "fib",
  "text": "fn fib (args=1, slots=26)\n  @0    access          s2, 2\n  ..."
}

Rewrite Rules

Each pass records events with named rules:

eliminate_type_checks:

  • known_type_eliminates_guard — type already known, guard removed
  • incompatible_type_forces_jump — type conflicts, conditional jump becomes unconditional
  • num_subsumes_int_float — num check satisfied by int or float
  • dynamic_to_field — load_dynamic/store_dynamic narrowed to field access
  • dynamic_to_index — load_dynamic/store_dynamic narrowed to index access

simplify_algebra:

  • add_zero, sub_zero, mul_one, div_one — identity operations become moves
  • mul_zero — multiplication by zero becomes constant
  • self_eq, self_ne — same-slot comparisons become constants

simplify_booleans:

  • not_jump_false_fusion — not + jump_false fused into jump_true
  • not_jump_true_fusion — not + jump_true fused into jump_false
  • double_not — not + not collapsed to move

eliminate_moves:

  • self_move — move to same slot becomes nop

eliminate_dead_jumps:

  • jump_to_next — jump to immediately following label becomes nop

Canonical IR Format

The --ir-all, --ir-before, and --ir-after flags produce a deterministic text representation of the IR:

fn fib (args=1, slots=26)
  @0    access          s2, 2
  @1    is_int          s4, s1                          ; [guard]
  @2    jump_false      s4, "rel_ni_2"                  ; [branch]
  @3    --- nop (tc) ---
  @4    jump            "rel_ni_2"                      ; [branch]
  @5    lt_int          s3, s1, s2
  @6    jump            "rel_done_4"                    ; [branch]
      rel_ni_2:
  @8    is_num          s4, s1                          ; [guard]

Properties:

  • @N is the raw array index, stable across passes (passes replace, never insert or delete)
  • sN prefix distinguishes slot operands from literal values
  • String operands are quoted
  • Labels appear as indented headers with a colon
  • Category tags in brackets: [guard], [branch], [load], [store], [call], [arith], [move], [const]
  • Nops shown as --- nop (reason) --- with reason codes: tc (type check), bl (boolean), mv (move), dj (dead jump), ur (unreachable)

Examples

# what passes changed something?
./cell --core . ir_report.ce --summary myfile.ce | jq 'select(.changed)'

# list all rewrite rules that fired
./cell --core . ir_report.ce --events myfile.ce | jq 'select(.type == "event") | .rule'

# diff IR before and after optimization
./cell --core . ir_report.ce --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'

# full report for analysis
./cell --core . ir_report.ce --full myfile.ce > report.json

ir_stats.cm

A utility module used by ir_report.ce and available for custom tooling. Not a standalone tool.

var ir_stats = use("ir_stats")

ir_stats.detailed_stats(func)                    // categorized instruction counts
ir_stats.ir_fingerprint(func)                    // djb2 hash of instruction array
ir_stats.canonical_ir(func, name, opts)          // deterministic text representation
ir_stats.type_snapshot(slot_types)               // frozen copy of type map
ir_stats.type_delta(before_types, after_types)   // compute type changes
ir_stats.category_tag(op)                        // classify an opcode

Instruction Categories

detailed_stats classifies each instruction into one of these categories:

Category Opcodes
load load_field, load_index, load_dynamic, get, access (non-constant)
store store_field, store_index, store_dynamic, set_var, put, push
branch jump, jump_true, jump_false, jump_not_null
call invoke, goinvoke
guard is_int, is_text, is_num, is_bool, is_null, is_array, is_func, is_record, is_stone
arith add_int, sub_int, ..., add_float, ..., concat, neg_int, neg_float, bitwise ops
move move
const int, true, false, null, access (with constant value)
label string entries that are not nops
nop strings starting with _nop_
other everything else (frame, setarg, array, record, function, return, etc.)