Files
cell/docs/compiler-tools.md
2026-02-21 01:21:53 -06:00

18 KiB

title, description, weight, type
title description weight type
Compiler Inspection Tools Tools for inspecting and debugging the compiler pipeline 50 docs

ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code.

Pipeline Overview

The compiler runs in stages:

source → tokenize → parse → fold → mcode → streamline → output

Each stage has a corresponding CLI tool that lets you see its output.

Stage Tool What it shows
tokenize tokenize.ce Token stream as JSON
parse parse.ce Unfolded AST as JSON
fold fold.ce Folded AST as JSON
mcode mcode.ce Raw mcode IR as JSON
mcode mcode.ce --pretty Human-readable mcode IR
streamline streamline.ce Full optimized IR as JSON
streamline streamline.ce --types Optimized IR with type annotations
streamline streamline.ce --stats Per-function summary stats
streamline streamline.ce --ir Human-readable canonical IR
disasm disasm.ce Source-interleaved disassembly
disasm disasm.ce --optimized Optimized source-interleaved disassembly
diff diff_ir.ce Mcode vs streamline instruction diff
xref xref.ce Cross-reference / call creation graph
cfg cfg.ce Control flow graph (basic blocks)
slots slots.ce Slot data flow / use-def chains
all ir_report.ce Structured optimizer flight recorder

All tools take a source file as input and run the pipeline up to the relevant stage.

Quick Start

# see raw mcode IR (pretty-printed)
cell mcode --pretty myfile.ce

# source-interleaved disassembly
cell disasm myfile.ce

# see optimized IR with type annotations
cell streamline --types myfile.ce

# full optimizer report with events
cell ir_report --full myfile.ce

fold.ce

Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.

cell fold <file.ce|file.cm>

mcode.ce

Prints mcode IR. Default output is JSON; use --pretty for human-readable format with opcodes, operands, and program counter.

cell mcode <file.ce|file.cm>            # JSON (default)
cell mcode --pretty <file.ce|file.cm>   # human-readable IR

streamline.ce

Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to jq or saving for comparison.

cell streamline <file.ce|file.cm>                # full JSON (default)
cell streamline --stats <file.ce|file.cm>        # summary stats per function
cell streamline --ir <file.ce|file.cm>           # human-readable IR
cell streamline --check <file.ce|file.cm>        # warnings only
cell streamline --types <file.ce|file.cm>        # IR with type annotations
cell streamline --diagnose <file.ce|file.cm>     # compile-time diagnostics
Flag Description
(none) Full optimized IR as JSON (backward compatible)
--stats Per-function summary: args, slots, instruction counts by category, nops eliminated
--ir Human-readable canonical IR (same format as ir_report.ce)
--check Warnings only (e.g. nr_slots > 200 approaching 255 limit)
--types Optimized IR with inferred type annotations per slot
--diagnose Run compile-time diagnostics (type errors and warnings)

Flags can be combined.

disasm.ce

Source-interleaved disassembly. Shows mcode or optimized IR with source lines interleaved, making it easy to see which instructions were generated from which source code.

cell disasm <file>                         # disassemble all functions (mcode)
cell disasm --optimized <file>             # disassemble optimized IR (streamline)
cell disasm --fn 87 <file>                 # show only function 87
cell disasm --fn my_func <file>            # show only functions named "my_func"
cell disasm --line 235 <file>              # show instructions generated from line 235
Flag Description
(none) Raw mcode IR with source interleaving (default)
--optimized Use optimized IR (streamline) instead of raw mcode
--fn <N|name> Filter to specific function by index or name substring
--line <N> Show only instructions generated from a specific source line

Output Format

Functions are shown with a header including argument count, slot count, and the source line where the function begins. Instructions are grouped by source line, with the source text shown before each group:

=== [87] <anonymous> (args=0, slots=12, closures=0) [line 234] ===

  --- line 235: var result = compute(x, y) ---
  0     access         2, "compute"                           :235
  1     get            3, 1, 0                                :235
  2     get            4, 1, 1                                :235
  3     invoke         3, 2, 2                                :235

  --- line 236: if (result > 0) { ---
  4     access         5, 0                                   :236
  5     gt             6, 4, 5                                :236
  6     jump_false     6, "else_1"                            :236

Each instruction line shows:

  • Program counter (left-aligned)
  • Opcode
  • Operands (comma-separated)
  • Source line number (:N suffix, right-aligned)

Function creation instructions include a cross-reference annotation showing the target function's name:

  3     function       5, 12                                  :235  ; -> [12] helper_fn

diff_ir.ce

Compares mcode IR (before optimization) with streamline IR (after optimization), showing what the optimizer changed. Useful for understanding which instructions were eliminated, specialized, or rewritten.

cell diff_ir <file>                  # diff all functions
cell diff_ir --fn <N|name> <file>    # diff only one function
cell diff_ir --summary <file>        # counts only
Flag Description
(none) Show all diffs with source interleaving
--fn <N|name> Filter to specific function by index or name
--summary Show only eliminated/rewritten counts per function

Output Format

Changed instructions are shown in diff style with - (before) and + (after) lines:

=== [0] <anonymous> (args=1, slots=40) ===
  17 eliminated, 51 rewritten

  --- line 4: if (n <= 1) { ---
  - 1     is_int         4, 1                          :4
  + 1     is_int         3, 1                          :4  (specialized)
  - 3     is_int         5, 2                          :4
  + 3     _nop_tc_1                                         (eliminated)

Summary mode gives a quick overview:

  [0] <anonymous>:                       17 eliminated, 51 rewritten
  [1] <anonymous>:                       65 eliminated, 181 rewritten
  total: 86 eliminated, 250 rewritten across 4 functions

xref.ce

Cross-reference / call graph tool. Shows which functions create other functions (via function instructions), building a creation tree.

cell xref <file>                     # full creation tree
cell xref --callers <N> <file>       # who creates function [N]?
cell xref --callees <N> <file>       # what does [N] create/call?
cell xref --dot <file>               # DOT graph for graphviz
cell xref --optimized <file>         # use optimized IR
Flag Description
(none) Indented creation tree from main
--callers <N> Show which functions create function [N]
--callees <N> Show what function [N] creates (use -1 for main)
--dot Output DOT format for graphviz
--optimized Use optimized IR instead of raw mcode

Output Format

Default tree view:

demo_disasm.cm
  [0] <anonymous>
  [1] <anonymous>
  [2] <anonymous>

Caller/callee query:

Callers of [0] <anonymous>:
  demo_disasm.cm at line 3

DOT output can be piped to graphviz: cell xref --dot file.cm | dot -Tpng -o xref.png

cfg.ce

Control flow graph tool. Identifies basic blocks from labels and jumps, computes edges, and detects loop back-edges.

cell cfg --fn <N|name> <file>        # text CFG for function
cell cfg --dot --fn <N|name> <file>  # DOT output for graphviz
cell cfg <file>                      # text CFG for all functions
cell cfg --optimized <file>          # use optimized IR
Flag Description
--fn <N|name> Filter to specific function by index or name
--dot Output DOT format for graphviz
--optimized Use optimized IR instead of raw mcode

Output Format

=== [0] <anonymous> ===
  B0 [pc 0-2, line 4]:
    0     access         2, 1
    1     is_int         4, 1
    2     jump_false     4, "rel_ni_2"
    -> B3 "rel_ni_2" (jump)
    -> B1 (fallthrough)

  B1 [pc 3-4, line 4]:
    3     is_int         5, 2
    4     jump_false     5, "rel_ni_2"
    -> B3 "rel_ni_2" (jump)
    -> B2 (fallthrough)

Each block shows its ID, PC range, source lines, instructions, and outgoing edges. Loop back-edges (target PC <= source PC) are annotated.

slots.ce

Slot data flow analysis. Builds use-def chains for every slot in a function, showing where each slot is defined and used. Optionally captures type information from streamline.

cell slots --fn <N|name> <file>              # slot summary for function
cell slots --slot <N> --fn <N|name> <file>   # trace slot N
cell slots <file>                            # slot summary for all functions
Flag Description
--fn <N|name> Filter to specific function by index or name
--slot <N> Show chronological DEF/USE trace for a specific slot

Output Format

Summary shows each slot with its def count, use count, inferred type, and first definition. Dead slots (defined but never used) are flagged:

=== [0] <anonymous> (args=1, slots=40) ===
  slot    defs    uses    type        first-def
  s0      0       0       -           (this)
  s1      0       10      -           (arg 0)
  s2      1       6       -           pc 0: access
  s10     1       0       -           pc 29: invoke  <- dead

Slot trace (--slot N) shows every DEF and USE in program order:

=== slot 3 in [0] <anonymous> ===
  DEF  pc 5:     le_int         3, 1, 2                       :4
  DEF  pc 11:    le_float       3, 1, 2                       :4
  DEF  pc 17:    le_text        3, 1, 2                       :4
  USE  pc 31:    jump_false     3, "if_else_0"                :4

seed.ce

Regenerates the boot seed files in boot/. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.

cell seed                # regenerate all boot seeds
cell seed --clean        # also clear the build cache after

The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and internal/bootstrap.cm through the current pipeline, encodes the output as JSON, and writes it to boot/<name>.cm.mcode.

When to regenerate seeds:

  • Before a release or distribution
  • When the pipeline source changes in a way the existing seeds can't compile the new source (e.g. language-level changes)
  • Seeds do NOT need regenerating for normal development — the engine recompiles pipeline modules from source automatically via the content-addressed cache

ir_report.ce

The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.

cell ir_report [options] <file.ce|file.cm>

Options

Flag Description
--summary Per-pass JSON summaries with instruction counts and timing (default)
--events Include rewrite events showing each optimization applied
--types Include type delta records showing inferred slot types
--ir-before=PASS Print canonical IR before a specific pass
--ir-after=PASS Print canonical IR after a specific pass
--ir-all Print canonical IR before and after all passes
--full Everything: summary + events + types + ir-all

With no flags, --summary is the default.

Output Format

Output is line-delimited JSON. Each line is a self-contained JSON object with a type field:

type: "pass" — Per-pass summary with categorized instruction counts before and after:

{
  "type": "pass",
  "pass": "eliminate_type_checks",
  "fn": "fib",
  "ms": 0.12,
  "changed": true,
  "before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...},
  "after":  {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...},
  "changes": {"guards_removed": 1, "nops_added": 1}
}

type: "event" — Individual rewrite event with before/after instructions and reasoning:

{
  "type": "event",
  "pass": "eliminate_type_checks",
  "rule": "incompatible_type_forces_jump",
  "at": 3,
  "before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]],
  "after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]],
  "why": {"slot": 2, "known_type": "float", "checked_type": "int"}
}

type: "types" — Inferred type information for a function:

{
  "type": "types",
  "fn": "fib",
  "param_types": {},
  "slot_types": {"25": "null"}
}

type: "ir" — Canonical IR text for a function at a specific point:

{
  "type": "ir",
  "when": "before",
  "pass": "all",
  "fn": "fib",
  "text": "fn fib (args=1, slots=26)\n  @0    access          s2, 2\n  ..."
}

Rewrite Rules

Each pass records events with named rules:

eliminate_type_checks:

  • known_type_eliminates_guard — type already known, guard removed
  • incompatible_type_forces_jump — type conflicts, conditional jump becomes unconditional
  • num_subsumes_int_float — num check satisfied by int or float
  • dynamic_to_field — load_dynamic/store_dynamic narrowed to field access
  • dynamic_to_index — load_dynamic/store_dynamic narrowed to index access

simplify_algebra:

  • add_zero, sub_zero, mul_one, div_one — identity operations become moves
  • mul_zero — multiplication by zero becomes constant
  • self_eq, self_ne — same-slot comparisons become constants

simplify_booleans:

  • not_jump_false_fusion — not + jump_false fused into jump_true
  • not_jump_true_fusion — not + jump_true fused into jump_false
  • double_not — not + not collapsed to move

eliminate_moves:

  • self_move — move to same slot becomes nop

eliminate_dead_jumps:

  • jump_to_next — jump to immediately following label becomes nop

Canonical IR Format

The --ir-all, --ir-before, and --ir-after flags produce a deterministic text representation of the IR:

fn fib (args=1, slots=26)
  @0    access          s2, 2
  @1    is_int          s4, s1                          ; [guard]
  @2    jump_false      s4, "rel_ni_2"                  ; [branch]
  @3    --- nop (tc) ---
  @4    jump            "rel_ni_2"                      ; [branch]
  @5    lt_int          s3, s1, s2
  @6    jump            "rel_done_4"                    ; [branch]
      rel_ni_2:
  @8    is_num          s4, s1                          ; [guard]

Properties:

  • @N is the raw array index, stable across passes (passes replace, never insert or delete)
  • sN prefix distinguishes slot operands from literal values
  • String operands are quoted
  • Labels appear as indented headers with a colon
  • Category tags in brackets: [guard], [branch], [load], [store], [call], [arith], [move], [const]
  • Nops shown as --- nop (reason) --- with reason codes: tc (type check), bl (boolean), mv (move), dj (dead jump), ur (unreachable)

Examples

# what passes changed something?
cell ir_report --summary myfile.ce | jq 'select(.changed)'

# list all rewrite rules that fired
cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule'

# diff IR before and after optimization
cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'

# full report for analysis
cell ir_report --full myfile.ce > report.json

ir_stats.cm

A utility module used by ir_report.ce and available for custom tooling. Not a standalone tool.

var ir_stats = use("ir_stats")

ir_stats.detailed_stats(func)                    // categorized instruction counts
ir_stats.ir_fingerprint(func)                    // djb2 hash of instruction array
ir_stats.canonical_ir(func, name, opts)          // deterministic text representation
ir_stats.type_snapshot(slot_types)               // frozen copy of type map
ir_stats.type_delta(before_types, after_types)   // compute type changes
ir_stats.category_tag(op)                        // classify an opcode

Instruction Categories

detailed_stats classifies each instruction into one of these categories:

Category Opcodes
load load_field, load_index, load_dynamic, get, access (non-constant)
store store_field, store_index, store_dynamic, set_var, put, push
branch jump, jump_true, jump_false, jump_not_null
call invoke, goinvoke
guard is_int, is_text, is_num, is_bool, is_null, is_array, is_func, is_record, is_stone
arith add_int, sub_int, ..., add_float, ..., concat, neg_int, neg_float, bitwise ops
move move
const int, true, false, null, access (with constant value)
label string entries that are not nops
nop strings starting with _nop_
other everything else (frame, setarg, array, record, function, return, etc.)