cell/docs/compiler-tools.md

---
title: "Compiler Inspection Tools"
description: "Tools for inspecting and debugging the compiler pipeline"
weight: 50
type: "docs"
---

ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code.

## Pipeline Overview

The compiler runs in stages:

```
source → tokenize → parse → fold → mcode → streamline → output
```

Each stage has a corresponding CLI tool that lets you see its output.

| Stage       | Tool                      | What it shows                          |
|-------------|---------------------------|----------------------------------------|
| tokenize    | `tokenize.ce`             | Token stream as JSON                   |
| parse       | `parse.ce`                | Unfolded AST as JSON                   |
| fold        | `fold.ce`                 | Folded AST as JSON                     |
| mcode       | `mcode.ce`                | Raw mcode IR as JSON                   |
| mcode       | `mcode.ce --pretty`       | Human-readable mcode IR               |
| streamline  | `streamline.ce`           | Full optimized IR as JSON              |
| streamline  | `streamline.ce --types`   | Optimized IR with type annotations     |
| streamline  | `streamline.ce --stats`   | Per-function summary stats             |
| streamline  | `streamline.ce --ir`      | Human-readable canonical IR            |
| all         | `ir_report.ce`            | Structured optimizer flight recorder   |

All tools take a source file as input and run the pipeline up to the relevant stage.

## Quick Start

```bash
# see raw mcode IR (pretty-printed)
cell mcode --pretty myfile.ce

# see optimized IR with type annotations
cell streamline --types myfile.ce

# full optimizer report with events
cell ir_report --full myfile.ce
```

## fold.ce

Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.

```bash
cell fold <file.ce|file.cm>
```

## mcode.ce

Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter.

```bash
cell mcode <file.ce|file.cm>            # JSON (default)
cell mcode --pretty <file.ce|file.cm>   # human-readable IR
```

## streamline.ce

Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison.

```bash
cell streamline <file.ce|file.cm>                # full JSON (default)
cell streamline --stats <file.ce|file.cm>        # summary stats per function
cell streamline --ir <file.ce|file.cm>           # human-readable IR
cell streamline --check <file.ce|file.cm>        # warnings only
cell streamline --types <file.ce|file.cm>        # IR with type annotations
```

| Flag | Description |
|------|-------------|
| (none) | Full optimized IR as JSON (backward compatible) |
| `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated |
| `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) |
| `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) |
| `--types` | Optimized IR with inferred type annotations per slot |

Flags can be combined.

## seed.ce

Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.

```bash
cell seed                # regenerate all boot seeds
cell seed --clean        # also clear the build cache after
```

The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/<name>.cm.mcode`.

**When to regenerate seeds:**
- Before a release or distribution
- When the pipeline source changes in a way the existing seeds can't compile the new source (e.g. language-level changes)
- Seeds do NOT need regenerating for normal development — the engine recompiles pipeline modules from source automatically via the content-addressed cache

## ir_report.ce

The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.

```bash
cell ir_report [options] <file.ce|file.cm>
```

### Options

| Flag | Description |
|------|-------------|
| `--summary` | Per-pass JSON summaries with instruction counts and timing (default) |
| `--events` | Include rewrite events showing each optimization applied |
| `--types` | Include type delta records showing inferred slot types |
| `--ir-before=PASS` | Print canonical IR before a specific pass |
| `--ir-after=PASS` | Print canonical IR after a specific pass |
| `--ir-all` | Print canonical IR before and after all passes |
| `--full` | Everything: summary + events + types + ir-all |

With no flags, `--summary` is the default.

### Output Format

Output is line-delimited JSON. Each line is a self-contained JSON object with a `type` field:

**`type: "pass"`** — Per-pass summary with categorized instruction counts before and after:

```json
{
  "type": "pass",
  "pass": "eliminate_type_checks",
  "fn": "fib",
  "ms": 0.12,
  "changed": true,
  "before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...},
  "after":  {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...},
  "changes": {"guards_removed": 1, "nops_added": 1}
}
```

**`type: "event"`** — Individual rewrite event with before/after instructions and reasoning:

```json
{
  "type": "event",
  "pass": "eliminate_type_checks",
  "rule": "incompatible_type_forces_jump",
  "at": 3,
  "before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]],
  "after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]],
  "why": {"slot": 2, "known_type": "float", "checked_type": "int"}
}
```

**`type: "types"`** — Inferred type information for a function:

```json
{
  "type": "types",
  "fn": "fib",
  "param_types": {},
  "slot_types": {"25": "null"}
}
```

**`type: "ir"`** — Canonical IR text for a function at a specific point:

```json
{
  "type": "ir",
  "when": "before",
  "pass": "all",
  "fn": "fib",
  "text": "fn fib (args=1, slots=26)\n  @0    access          s2, 2\n  ..."
}
```

### Rewrite Rules

Each pass records events with named rules:

**eliminate_type_checks:**
- `known_type_eliminates_guard` — type already known, guard removed
- `incompatible_type_forces_jump` — type conflicts, conditional jump becomes unconditional
- `num_subsumes_int_float` — num check satisfied by int or float
- `dynamic_to_field` — load_dynamic/store_dynamic narrowed to field access
- `dynamic_to_index` — load_dynamic/store_dynamic narrowed to index access

**simplify_algebra:**
- `add_zero`, `sub_zero`, `mul_one`, `div_one` — identity operations become moves
- `mul_zero` — multiplication by zero becomes constant
- `self_eq`, `self_ne` — same-slot comparisons become constants

**simplify_booleans:**
- `not_jump_false_fusion` — not + jump_false fused into jump_true
- `not_jump_true_fusion` — not + jump_true fused into jump_false
- `double_not` — not + not collapsed to move

**eliminate_moves:**
- `self_move` — move to same slot becomes nop

**eliminate_dead_jumps:**
- `jump_to_next` — jump to immediately following label becomes nop

### Canonical IR Format

The `--ir-all`, `--ir-before`, and `--ir-after` flags produce a deterministic text representation of the IR:

```
fn fib (args=1, slots=26)
  @0    access          s2, 2
  @1    is_int          s4, s1                          ; [guard]
  @2    jump_false      s4, "rel_ni_2"                  ; [branch]
  @3    --- nop (tc) ---
  @4    jump            "rel_ni_2"                      ; [branch]
  @5    lt_int          s3, s1, s2
  @6    jump            "rel_done_4"                    ; [branch]
      rel_ni_2:
  @8    is_num          s4, s1                          ; [guard]
```

Properties:
- `@N` is the raw array index, stable across passes (passes replace, never insert or delete)
- `sN` prefix distinguishes slot operands from literal values
- String operands are quoted
- Labels appear as indented headers with a colon
- Category tags in brackets: `[guard]`, `[branch]`, `[load]`, `[store]`, `[call]`, `[arith]`, `[move]`, `[const]`
- Nops shown as `--- nop (reason) ---` with reason codes: `tc` (type check), `bl` (boolean), `mv` (move), `dj` (dead jump), `ur` (unreachable)

### Examples

```bash
# what passes changed something?
cell ir_report --summary myfile.ce | jq 'select(.changed)'

# list all rewrite rules that fired
cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule'

# diff IR before and after optimization
cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'

# full report for analysis
cell ir_report --full myfile.ce > report.json
```

## ir_stats.cm

A utility module used by `ir_report.ce` and available for custom tooling. Not a standalone tool.

```javascript
var ir_stats = use("ir_stats")

ir_stats.detailed_stats(func)                    // categorized instruction counts
ir_stats.ir_fingerprint(func)                    // djb2 hash of instruction array
ir_stats.canonical_ir(func, name, opts)          // deterministic text representation
ir_stats.type_snapshot(slot_types)               // frozen copy of type map
ir_stats.type_delta(before_types, after_types)   // compute type changes
ir_stats.category_tag(op)                        // classify an opcode
```

### Instruction Categories

`detailed_stats` classifies each instruction into one of these categories:

| Category | Opcodes |
|----------|---------|
| load     | `load_field`, `load_index`, `load_dynamic`, `get`, `access` (non-constant) |
| store    | `store_field`, `store_index`, `store_dynamic`, `set_var`, `put`, `push` |
| branch   | `jump`, `jump_true`, `jump_false`, `jump_not_null` |
| call     | `invoke`, `goinvoke` |
| guard    | `is_int`, `is_text`, `is_num`, `is_bool`, `is_null`, `is_array`, `is_func`, `is_record`, `is_stone` |
| arith    | `add_int`, `sub_int`, ..., `add_float`, ..., `concat`, `neg_int`, `neg_float`, bitwise ops |
| move     | `move` |
| const    | `int`, `true`, `false`, `null`, `access` (with constant value) |
| label    | string entries that are not nops |
| nop      | strings starting with `_nop_` |
| other    | everything else (`frame`, `setarg`, `array`, `record`, `function`, `return`, etc.) |