--- title: "Compiler Inspection Tools" description: "Tools for inspecting and debugging the compiler pipeline" weight: 50 type: "docs" --- ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code. ## Pipeline Overview The compiler runs in stages: ``` source → tokenize → parse → fold → mcode → streamline → output ``` Each stage has a corresponding CLI tool that lets you see its output. | Stage | Tool | What it shows | |-------------|---------------------------|----------------------------------------| | tokenize | `tokenize.ce` | Token stream as JSON | | parse | `parse.ce` | Unfolded AST as JSON | | fold | `fold.ce` | Folded AST as JSON | | mcode | `mcode.ce` | Raw mcode IR as JSON | | mcode | `mcode.ce --pretty` | Human-readable mcode IR | | streamline | `streamline.ce` | Full optimized IR as JSON | | streamline | `streamline.ce --types` | Optimized IR with type annotations | | streamline | `streamline.ce --stats` | Per-function summary stats | | streamline | `streamline.ce --ir` | Human-readable canonical IR | | disasm | `disasm.ce` | Source-interleaved disassembly | | disasm | `disasm.ce --optimized` | Optimized source-interleaved disassembly | | diff | `diff_ir.ce` | Mcode vs streamline instruction diff | | xref | `xref.ce` | Cross-reference / call creation graph | | cfg | `cfg.ce` | Control flow graph (basic blocks) | | slots | `slots.ce` | Slot data flow / use-def chains | | all | `ir_report.ce` | Structured optimizer flight recorder | All tools take a source file as input and run the pipeline up to the relevant stage. ## Quick Start ```bash # see raw mcode IR (pretty-printed) cell mcode --pretty myfile.ce # source-interleaved disassembly cell disasm myfile.ce # see optimized IR with type annotations cell streamline --types myfile.ce # full optimizer report with events cell ir_report --full myfile.ce ``` ## fold.ce Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation. ```bash cell fold ``` ## mcode.ce Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter. ```bash cell mcode # JSON (default) cell mcode --pretty # human-readable IR ``` ## streamline.ce Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison. ```bash cell streamline # full JSON (default) cell streamline --stats # summary stats per function cell streamline --ir # human-readable IR cell streamline --check # warnings only cell streamline --types # IR with type annotations cell streamline --diagnose # compile-time diagnostics ``` | Flag | Description | |------|-------------| | (none) | Full optimized IR as JSON (backward compatible) | | `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated | | `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) | | `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) | | `--types` | Optimized IR with inferred type annotations per slot | | `--diagnose` | Run compile-time diagnostics (type errors and warnings) | Flags can be combined. ## disasm.ce Source-interleaved disassembly. Shows mcode or optimized IR with source lines interleaved, making it easy to see which instructions were generated from which source code. ```bash cell disasm # disassemble all functions (mcode) cell disasm --optimized # disassemble optimized IR (streamline) cell disasm --fn 87 # show only function 87 cell disasm --fn my_func # show only functions named "my_func" cell disasm --line 235 # show instructions generated from line 235 ``` | Flag | Description | |------|-------------| | (none) | Raw mcode IR with source interleaving (default) | | `--optimized` | Use optimized IR (streamline) instead of raw mcode | | `--fn ` | Filter to specific function by index or name substring | | `--line ` | Show only instructions generated from a specific source line | ### Output Format Functions are shown with a header including argument count, slot count, and the source line where the function begins. Instructions are grouped by source line, with the source text shown before each group: ``` === [87] (args=0, slots=12, closures=0) [line 234] === --- line 235: var result = compute(x, y) --- 0 access 2, "compute" :235 1 get 3, 1, 0 :235 2 get 4, 1, 1 :235 3 invoke 3, 2, 2 :235 --- line 236: if (result > 0) { --- 4 access 5, 0 :236 5 gt 6, 4, 5 :236 6 jump_false 6, "else_1" :236 ``` Each instruction line shows: - Program counter (left-aligned) - Opcode - Operands (comma-separated) - Source line number (`:N` suffix, right-aligned) Function creation instructions include a cross-reference annotation showing the target function's name: ``` 3 function 5, 12 :235 ; -> [12] helper_fn ``` ## diff_ir.ce Compares mcode IR (before optimization) with streamline IR (after optimization), showing what the optimizer changed. Useful for understanding which instructions were eliminated, specialized, or rewritten. ```bash cell diff_ir # diff all functions cell diff_ir --fn # diff only one function cell diff_ir --summary # counts only ``` | Flag | Description | |------|-------------| | (none) | Show all diffs with source interleaving | | `--fn ` | Filter to specific function by index or name | | `--summary` | Show only eliminated/rewritten counts per function | ### Output Format Changed instructions are shown in diff style with `-` (before) and `+` (after) lines: ``` === [0] (args=1, slots=40) === 17 eliminated, 51 rewritten --- line 4: if (n <= 1) { --- - 1 is_int 4, 1 :4 + 1 is_int 3, 1 :4 (specialized) - 3 is_int 5, 2 :4 + 3 _nop_tc_1 (eliminated) ``` Summary mode gives a quick overview: ``` [0] : 17 eliminated, 51 rewritten [1] : 65 eliminated, 181 rewritten total: 86 eliminated, 250 rewritten across 4 functions ``` ## xref.ce Cross-reference / call graph tool. Shows which functions create other functions (via `function` instructions), building a creation tree. ```bash cell xref # full creation tree cell xref --callers # who creates function [N]? cell xref --callees # what does [N] create/call? cell xref --dot # DOT graph for graphviz cell xref --optimized # use optimized IR ``` | Flag | Description | |------|-------------| | (none) | Indented creation tree from main | | `--callers ` | Show which functions create function [N] | | `--callees ` | Show what function [N] creates (use -1 for main) | | `--dot` | Output DOT format for graphviz | | `--optimized` | Use optimized IR instead of raw mcode | ### Output Format Default tree view: ``` demo_disasm.cm [0] [1] [2] ``` Caller/callee query: ``` Callers of [0] : demo_disasm.cm at line 3 ``` DOT output can be piped to graphviz: `cell xref --dot file.cm | dot -Tpng -o xref.png` ## cfg.ce Control flow graph tool. Identifies basic blocks from labels and jumps, computes edges, and detects loop back-edges. ```bash cell cfg --fn # text CFG for function cell cfg --dot --fn # DOT output for graphviz cell cfg # text CFG for all functions cell cfg --optimized # use optimized IR ``` | Flag | Description | |------|-------------| | `--fn ` | Filter to specific function by index or name | | `--dot` | Output DOT format for graphviz | | `--optimized` | Use optimized IR instead of raw mcode | ### Output Format ``` === [0] === B0 [pc 0-2, line 4]: 0 access 2, 1 1 is_int 4, 1 2 jump_false 4, "rel_ni_2" -> B3 "rel_ni_2" (jump) -> B1 (fallthrough) B1 [pc 3-4, line 4]: 3 is_int 5, 2 4 jump_false 5, "rel_ni_2" -> B3 "rel_ni_2" (jump) -> B2 (fallthrough) ``` Each block shows its ID, PC range, source lines, instructions, and outgoing edges. Loop back-edges (target PC <= source PC) are annotated. ## slots.ce Slot data flow analysis. Builds use-def chains for every slot in a function, showing where each slot is defined and used. Optionally captures type information from streamline. ```bash cell slots --fn # slot summary for function cell slots --slot --fn # trace slot N cell slots # slot summary for all functions ``` | Flag | Description | |------|-------------| | `--fn ` | Filter to specific function by index or name | | `--slot ` | Show chronological DEF/USE trace for a specific slot | ### Output Format Summary shows each slot with its def count, use count, inferred type, and first definition. Dead slots (defined but never used) are flagged: ``` === [0] (args=1, slots=40) === slot defs uses type first-def s0 0 0 - (this) s1 0 10 - (arg 0) s2 1 6 - pc 0: access s10 1 0 - pc 29: invoke <- dead ``` Slot trace (`--slot N`) shows every DEF and USE in program order: ``` === slot 3 in [0] === DEF pc 5: le_int 3, 1, 2 :4 DEF pc 11: le_float 3, 1, 2 :4 DEF pc 17: le_text 3, 1, 2 :4 USE pc 31: jump_false 3, "if_else_0" :4 ``` ## seed.ce Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start. ```bash cell seed # regenerate all boot seeds cell seed --clean # also clear the build cache after ``` The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/.cm.mcode`. **When to regenerate seeds:** - Before a release or distribution - When the pipeline source changes in a way the existing seeds can't compile the new source (e.g. language-level changes) - Seeds do NOT need regenerating for normal development — the engine recompiles pipeline modules from source automatically via the content-addressed cache ## ir_report.ce The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why. ```bash cell ir_report [options] ``` ### Options | Flag | Description | |------|-------------| | `--summary` | Per-pass JSON summaries with instruction counts and timing (default) | | `--events` | Include rewrite events showing each optimization applied | | `--types` | Include type delta records showing inferred slot types | | `--ir-before=PASS` | Print canonical IR before a specific pass | | `--ir-after=PASS` | Print canonical IR after a specific pass | | `--ir-all` | Print canonical IR before and after all passes | | `--full` | Everything: summary + events + types + ir-all | With no flags, `--summary` is the default. ### Output Format Output is line-delimited JSON. Each line is a self-contained JSON object with a `type` field: **`type: "pass"`** — Per-pass summary with categorized instruction counts before and after: ```json { "type": "pass", "pass": "eliminate_type_checks", "fn": "fib", "ms": 0.12, "changed": true, "before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...}, "after": {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...}, "changes": {"guards_removed": 1, "nops_added": 1} } ``` **`type: "event"`** — Individual rewrite event with before/after instructions and reasoning: ```json { "type": "event", "pass": "eliminate_type_checks", "rule": "incompatible_type_forces_jump", "at": 3, "before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]], "after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]], "why": {"slot": 2, "known_type": "float", "checked_type": "int"} } ``` **`type: "types"`** — Inferred type information for a function: ```json { "type": "types", "fn": "fib", "param_types": {}, "slot_types": {"25": "null"} } ``` **`type: "ir"`** — Canonical IR text for a function at a specific point: ```json { "type": "ir", "when": "before", "pass": "all", "fn": "fib", "text": "fn fib (args=1, slots=26)\n @0 access s2, 2\n ..." } ``` ### Rewrite Rules Each pass records events with named rules: **eliminate_type_checks:** - `known_type_eliminates_guard` — type already known, guard removed - `incompatible_type_forces_jump` — type conflicts, conditional jump becomes unconditional - `num_subsumes_int_float` — num check satisfied by int or float - `dynamic_to_field` — load_dynamic/store_dynamic narrowed to field access - `dynamic_to_index` — load_dynamic/store_dynamic narrowed to index access **simplify_algebra:** - `add_zero`, `sub_zero`, `mul_one`, `div_one` — identity operations become moves - `mul_zero` — multiplication by zero becomes constant - `self_eq`, `self_ne` — same-slot comparisons become constants **simplify_booleans:** - `not_jump_false_fusion` — not + jump_false fused into jump_true - `not_jump_true_fusion` — not + jump_true fused into jump_false - `double_not` — not + not collapsed to move **eliminate_moves:** - `self_move` — move to same slot becomes nop **eliminate_dead_jumps:** - `jump_to_next` — jump to immediately following label becomes nop ### Canonical IR Format The `--ir-all`, `--ir-before`, and `--ir-after` flags produce a deterministic text representation of the IR: ``` fn fib (args=1, slots=26) @0 access s2, 2 @1 is_int s4, s1 ; [guard] @2 jump_false s4, "rel_ni_2" ; [branch] @3 --- nop (tc) --- @4 jump "rel_ni_2" ; [branch] @5 lt_int s3, s1, s2 @6 jump "rel_done_4" ; [branch] rel_ni_2: @8 is_num s4, s1 ; [guard] ``` Properties: - `@N` is the raw array index, stable across passes (passes replace, never insert or delete) - `sN` prefix distinguishes slot operands from literal values - String operands are quoted - Labels appear as indented headers with a colon - Category tags in brackets: `[guard]`, `[branch]`, `[load]`, `[store]`, `[call]`, `[arith]`, `[move]`, `[const]` - Nops shown as `--- nop (reason) ---` with reason codes: `tc` (type check), `bl` (boolean), `mv` (move), `dj` (dead jump), `ur` (unreachable) ### Examples ```bash # what passes changed something? cell ir_report --summary myfile.ce | jq 'select(.changed)' # list all rewrite rules that fired cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule' # diff IR before and after optimization cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text' # full report for analysis cell ir_report --full myfile.ce > report.json ``` ## ir_stats.cm A utility module used by `ir_report.ce` and available for custom tooling. Not a standalone tool. ```javascript var ir_stats = use("ir_stats") ir_stats.detailed_stats(func) // categorized instruction counts ir_stats.ir_fingerprint(func) // djb2 hash of instruction array ir_stats.canonical_ir(func, name, opts) // deterministic text representation ir_stats.type_snapshot(slot_types) // frozen copy of type map ir_stats.type_delta(before_types, after_types) // compute type changes ir_stats.category_tag(op) // classify an opcode ``` ### Instruction Categories `detailed_stats` classifies each instruction into one of these categories: | Category | Opcodes | |----------|---------| | load | `load_field`, `load_index`, `load_dynamic`, `get`, `access` (non-constant) | | store | `store_field`, `store_index`, `store_dynamic`, `set_var`, `put`, `push` | | branch | `jump`, `jump_true`, `jump_false`, `jump_not_null` | | call | `invoke`, `goinvoke` | | guard | `is_int`, `is_text`, `is_num`, `is_bool`, `is_null`, `is_array`, `is_func`, `is_record`, `is_stone` | | arith | `add_int`, `sub_int`, ..., `add_float`, ..., `concat`, `neg_int`, `neg_float`, bitwise ops | | move | `move` | | const | `int`, `true`, `false`, `null`, `access` (with constant value) | | label | string entries that are not nops | | nop | strings starting with `_nop_` | | other | everything else (`frame`, `setarg`, `array`, `record`, `function`, `return`, etc.) |