cell/docs/compiler-tools.md

---
title: "Compiler Inspection Tools"
description: "Tools for inspecting and debugging the compiler pipeline"
weight: 50
type: "docs"
---

ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code.

## Pipeline Overview

The compiler runs in stages:

```
source → tokenize → parse → fold → mcode → streamline → output
```

Each stage has a corresponding CLI tool that lets you see its output.

| Stage       | Tool                      | What it shows                          |
|-------------|---------------------------|----------------------------------------|
| tokenize    | `tokenize.ce`             | Token stream as JSON                   |
| parse       | `parse.ce`                | Unfolded AST as JSON                   |
| fold        | `fold.ce`                 | Folded AST as JSON                     |
| mcode       | `mcode.ce`                | Raw mcode IR as JSON                   |
| mcode       | `mcode.ce --pretty`       | Human-readable mcode IR               |
| streamline  | `streamline.ce`           | Full optimized IR as JSON              |
| streamline  | `streamline.ce --types`   | Optimized IR with type annotations     |
| streamline  | `streamline.ce --stats`   | Per-function summary stats             |
| streamline  | `streamline.ce --ir`      | Human-readable canonical IR            |
| disasm      | `disasm.ce`               | Source-interleaved disassembly          |
| disasm      | `disasm.ce --optimized`   | Optimized source-interleaved disassembly |
| diff        | `diff_ir.ce`              | Mcode vs streamline instruction diff   |
| xref        | `xref.ce`                 | Cross-reference / call creation graph  |
| cfg         | `cfg.ce`                  | Control flow graph (basic blocks)      |
| slots       | `slots.ce`                | Slot data flow / use-def chains        |
| all         | `ir_report.ce`            | Structured optimizer flight recorder   |

All tools take a source file as input and run the pipeline up to the relevant stage.

## Quick Start

```bash
# see raw mcode IR (pretty-printed)
cell mcode --pretty myfile.ce

# source-interleaved disassembly
cell disasm myfile.ce

# see optimized IR with type annotations
cell streamline --types myfile.ce

# full optimizer report with events
cell ir_report --full myfile.ce
```

## fold.ce

Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.

```bash
cell fold <file.ce|file.cm>
```

## mcode.ce

Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter.

```bash
cell mcode <file.ce|file.cm>            # JSON (default)
cell mcode --pretty <file.ce|file.cm>   # human-readable IR
```

## streamline.ce

Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison.

```bash
cell streamline <file.ce|file.cm>                # full JSON (default)
cell streamline --stats <file.ce|file.cm>        # summary stats per function
cell streamline --ir <file.ce|file.cm>           # human-readable IR
cell streamline --check <file.ce|file.cm>        # warnings only
cell streamline --types <file.ce|file.cm>        # IR with type annotations
cell streamline --diagnose <file.ce|file.cm>     # compile-time diagnostics
```

| Flag | Description |
|------|-------------|
| (none) | Full optimized IR as JSON (backward compatible) |
| `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated |
| `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) |
| `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) |
| `--types` | Optimized IR with inferred type annotations per slot |
| `--diagnose` | Run compile-time diagnostics (type errors and warnings) |

Flags can be combined.

## disasm.ce

Source-interleaved disassembly. Shows mcode or optimized IR with source lines interleaved, making it easy to see which instructions were generated from which source code.

```bash
cell disasm <file>                         # disassemble all functions (mcode)
cell disasm --optimized <file>             # disassemble optimized IR (streamline)
cell disasm --fn 87 <file>                 # show only function 87
cell disasm --fn my_func <file>            # show only functions named "my_func"
cell disasm --line 235 <file>              # show instructions generated from line 235
```

| Flag | Description |
|------|-------------|
| (none) | Raw mcode IR with source interleaving (default) |
| `--optimized` | Use optimized IR (streamline) instead of raw mcode |
| `--fn <N\|name>` | Filter to specific function by index or name substring |
| `--line <N>` | Show only instructions generated from a specific source line |

### Output Format

Functions are shown with a header including argument count, slot count, and the source line where the function begins. Instructions are grouped by source line, with the source text shown before each group:

```
=== [87] <anonymous> (args=0, slots=12, closures=0) [line 234] ===

  --- line 235: var result = compute(x, y) ---
  0     access         2, "compute"                           :235
  1     get            3, 1, 0                                :235
  2     get            4, 1, 1                                :235
  3     invoke         3, 2, 2                                :235

  --- line 236: if (result > 0) { ---
  4     access         5, 0                                   :236
  5     gt             6, 4, 5                                :236
  6     jump_false     6, "else_1"                            :236
```

Each instruction line shows:
- Program counter (left-aligned)
- Opcode
- Operands (comma-separated)
- Source line number (`:N` suffix, right-aligned)

Function creation instructions include a cross-reference annotation showing the target function's name:

```
  3     function       5, 12                                  :235  ; -> [12] helper_fn
```

## diff_ir.ce

Compares mcode IR (before optimization) with streamline IR (after optimization), showing what the optimizer changed. Useful for understanding which instructions were eliminated, specialized, or rewritten.

```bash
cell diff_ir <file>                  # diff all functions
cell diff_ir --fn <N|name> <file>    # diff only one function
cell diff_ir --summary <file>        # counts only
```

| Flag | Description |
|------|-------------|
| (none) | Show all diffs with source interleaving |
| `--fn <N\|name>` | Filter to specific function by index or name |
| `--summary` | Show only eliminated/rewritten counts per function |

### Output Format

Changed instructions are shown in diff style with `-` (before) and `+` (after) lines:

```
=== [0] <anonymous> (args=1, slots=40) ===
  17 eliminated, 51 rewritten

  --- line 4: if (n <= 1) { ---
  - 1     is_int         4, 1                          :4
  + 1     is_int         3, 1                          :4  (specialized)
  - 3     is_int         5, 2                          :4
  + 3     _nop_tc_1                                         (eliminated)
```

Summary mode gives a quick overview:

```
  [0] <anonymous>:                       17 eliminated, 51 rewritten
  [1] <anonymous>:                       65 eliminated, 181 rewritten
  total: 86 eliminated, 250 rewritten across 4 functions
```

## xref.ce

Cross-reference / call graph tool. Shows which functions create other functions (via `function` instructions), building a creation tree.

```bash
cell xref <file>                     # full creation tree
cell xref --callers <N> <file>       # who creates function [N]?
cell xref --callees <N> <file>       # what does [N] create/call?
cell xref --dot <file>               # DOT graph for graphviz
cell xref --optimized <file>         # use optimized IR
```

| Flag | Description |
|------|-------------|
| (none) | Indented creation tree from main |
| `--callers <N>` | Show which functions create function [N] |
| `--callees <N>` | Show what function [N] creates (use -1 for main) |
| `--dot` | Output DOT format for graphviz |
| `--optimized` | Use optimized IR instead of raw mcode |

### Output Format

Default tree view:

```
demo_disasm.cm
  [0] <anonymous>
  [1] <anonymous>
  [2] <anonymous>
```

Caller/callee query:

```
Callers of [0] <anonymous>:
  demo_disasm.cm at line 3
```

DOT output can be piped to graphviz: `cell xref --dot file.cm | dot -Tpng -o xref.png`

## cfg.ce

Control flow graph tool. Identifies basic blocks from labels and jumps, computes edges, and detects loop back-edges.

```bash
cell cfg --fn <N|name> <file>        # text CFG for function
cell cfg --dot --fn <N|name> <file>  # DOT output for graphviz
cell cfg <file>                      # text CFG for all functions
cell cfg --optimized <file>          # use optimized IR
```

| Flag | Description |
|------|-------------|
| `--fn <N\|name>` | Filter to specific function by index or name |
| `--dot` | Output DOT format for graphviz |
| `--optimized` | Use optimized IR instead of raw mcode |

### Output Format

```
=== [0] <anonymous> ===
  B0 [pc 0-2, line 4]:
    0     access         2, 1
    1     is_int         4, 1
    2     jump_false     4, "rel_ni_2"
    -> B3 "rel_ni_2" (jump)
    -> B1 (fallthrough)

  B1 [pc 3-4, line 4]:
    3     is_int         5, 2
    4     jump_false     5, "rel_ni_2"
    -> B3 "rel_ni_2" (jump)
    -> B2 (fallthrough)
```

Each block shows its ID, PC range, source lines, instructions, and outgoing edges. Loop back-edges (target PC <= source PC) are annotated.

## slots.ce

Slot data flow analysis. Builds use-def chains for every slot in a function, showing where each slot is defined and used. Optionally captures type information from streamline.

```bash
cell slots --fn <N|name> <file>              # slot summary for function
cell slots --slot <N> --fn <N|name> <file>   # trace slot N
cell slots <file>                            # slot summary for all functions
```

| Flag | Description |
|------|-------------|
| `--fn <N\|name>` | Filter to specific function by index or name |
| `--slot <N>` | Show chronological DEF/USE trace for a specific slot |

### Output Format

Summary shows each slot with its def count, use count, inferred type, and first definition. Dead slots (defined but never used) are flagged:

```
=== [0] <anonymous> (args=1, slots=40) ===
  slot    defs    uses    type        first-def
  s0      0       0       -           (this)
  s1      0       10      -           (arg 0)
  s2      1       6       -           pc 0: access
  s10     1       0       -           pc 29: invoke  <- dead
```

Slot trace (`--slot N`) shows every DEF and USE in program order:

```
=== slot 3 in [0] <anonymous> ===
  DEF  pc 5:     le_int         3, 1, 2                       :4
  DEF  pc 11:    le_float       3, 1, 2                       :4
  DEF  pc 17:    le_text        3, 1, 2                       :4
  USE  pc 31:    jump_false     3, "if_else_0"                :4
```

## seed.ce

Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.

```bash
cell seed                # regenerate all boot seeds
cell seed --clean        # also clear the build cache after
```

The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/<name>.cm.mcode`.

**When to regenerate seeds:**
- Before a release or distribution
- When the pipeline source changes in a way the existing seeds can't compile the new source (e.g. language-level changes)
- Seeds do NOT need regenerating for normal development — the engine recompiles pipeline modules from source automatically via the content-addressed cache

## ir_report.ce

The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.

```bash
cell ir_report [options] <file.ce|file.cm>
```

### Options

| Flag | Description |
|------|-------------|
| `--summary` | Per-pass JSON summaries with instruction counts and timing (default) |
| `--events` | Include rewrite events showing each optimization applied |
| `--types` | Include type delta records showing inferred slot types |
| `--ir-before=PASS` | Print canonical IR before a specific pass |
| `--ir-after=PASS` | Print canonical IR after a specific pass |
| `--ir-all` | Print canonical IR before and after all passes |
| `--full` | Everything: summary + events + types + ir-all |

With no flags, `--summary` is the default.

### Output Format

Output is line-delimited JSON. Each line is a self-contained JSON object with a `type` field:

**`type: "pass"`** — Per-pass summary with categorized instruction counts before and after:

```json
{
  "type": "pass",
  "pass": "eliminate_type_checks",
  "fn": "fib",
  "ms": 0.12,
  "changed": true,
  "before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...},
  "after":  {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...},
  "changes": {"guards_removed": 1, "nops_added": 1}
}
```

**`type: "event"`** — Individual rewrite event with before/after instructions and reasoning:

```json
{
  "type": "event",
  "pass": "eliminate_type_checks",
  "rule": "incompatible_type_forces_jump",
  "at": 3,
  "before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]],
  "after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]],
  "why": {"slot": 2, "known_type": "float", "checked_type": "int"}
}
```

**`type: "types"`** — Inferred type information for a function:

```json
{
  "type": "types",
  "fn": "fib",
  "param_types": {},
  "slot_types": {"25": "null"}
}
```

**`type: "ir"`** — Canonical IR text for a function at a specific point:

```json
{
  "type": "ir",
  "when": "before",
  "pass": "all",
  "fn": "fib",
  "text": "fn fib (args=1, slots=26)\n  @0    access          s2, 2\n  ..."
}
```

### Rewrite Rules

Each pass records events with named rules:

**eliminate_type_checks:**
- `known_type_eliminates_guard` — type already known, guard removed
- `incompatible_type_forces_jump` — type conflicts, conditional jump becomes unconditional
- `num_subsumes_int_float` — num check satisfied by int or float
- `dynamic_to_field` — load_dynamic/store_dynamic narrowed to field access
- `dynamic_to_index` — load_dynamic/store_dynamic narrowed to index access

**simplify_algebra:**
- `add_zero`, `sub_zero`, `mul_one`, `div_one` — identity operations become moves
- `mul_zero` — multiplication by zero becomes constant
- `self_eq`, `self_ne` — same-slot comparisons become constants

**simplify_booleans:**
- `not_jump_false_fusion` — not + jump_false fused into jump_true
- `not_jump_true_fusion` — not + jump_true fused into jump_false
- `double_not` — not + not collapsed to move

**eliminate_moves:**
- `self_move` — move to same slot becomes nop

**eliminate_dead_jumps:**
- `jump_to_next` — jump to immediately following label becomes nop

### Canonical IR Format

The `--ir-all`, `--ir-before`, and `--ir-after` flags produce a deterministic text representation of the IR:

```
fn fib (args=1, slots=26)
  @0    access          s2, 2
  @1    is_int          s4, s1                          ; [guard]
  @2    jump_false      s4, "rel_ni_2"                  ; [branch]
  @3    --- nop (tc) ---
  @4    jump            "rel_ni_2"                      ; [branch]
  @5    lt_int          s3, s1, s2
  @6    jump            "rel_done_4"                    ; [branch]
      rel_ni_2:
  @8    is_num          s4, s1                          ; [guard]
```

Properties:
- `@N` is the raw array index, stable across passes (passes replace, never insert or delete)
- `sN` prefix distinguishes slot operands from literal values
- String operands are quoted
- Labels appear as indented headers with a colon
- Category tags in brackets: `[guard]`, `[branch]`, `[load]`, `[store]`, `[call]`, `[arith]`, `[move]`, `[const]`
- Nops shown as `--- nop (reason) ---` with reason codes: `tc` (type check), `bl` (boolean), `mv` (move), `dj` (dead jump), `ur` (unreachable)

### Examples

```bash
# what passes changed something?
cell ir_report --summary myfile.ce | jq 'select(.changed)'

# list all rewrite rules that fired
cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule'

# diff IR before and after optimization
cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'

# full report for analysis
cell ir_report --full myfile.ce > report.json
```

## ir_stats.cm

A utility module used by `ir_report.ce` and available for custom tooling. Not a standalone tool.

```javascript
var ir_stats = use("ir_stats")

ir_stats.detailed_stats(func)                    // categorized instruction counts
ir_stats.ir_fingerprint(func)                    // djb2 hash of instruction array
ir_stats.canonical_ir(func, name, opts)          // deterministic text representation
ir_stats.type_snapshot(slot_types)               // frozen copy of type map
ir_stats.type_delta(before_types, after_types)   // compute type changes
ir_stats.category_tag(op)                        // classify an opcode
```

### Instruction Categories

`detailed_stats` classifies each instruction into one of these categories:

| Category | Opcodes |
|----------|---------|
| load     | `load_field`, `load_index`, `load_dynamic`, `get`, `access` (non-constant) |
| store    | `store_field`, `store_index`, `store_dynamic`, `set_var`, `put`, `push` |
| branch   | `jump`, `jump_true`, `jump_false`, `jump_not_null` |
| call     | `invoke`, `goinvoke` |
| guard    | `is_int`, `is_text`, `is_num`, `is_bool`, `is_null`, `is_array`, `is_func`, `is_record`, `is_stone` |
| arith    | `add_int`, `sub_int`, ..., `add_float`, ..., `concat`, `neg_int`, `neg_float`, bitwise ops |
| move     | `move` |
| const    | `int`, `true`, `false`, `null`, `access` (with constant value) |
| label    | string entries that are not nops |
| nop      | strings starting with `_nop_` |
| other    | everything else (`frame`, `setarg`, `array`, `record`, `function`, `return`, etc.) |