Files
cell/docs/compiler-tools.md
2026-02-18 10:34:47 -06:00

281 lines
10 KiB
Markdown

---
title: "Compiler Inspection Tools"
description: "Tools for inspecting and debugging the compiler pipeline"
weight: 50
type: "docs"
---
ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code.
## Pipeline Overview
The compiler runs in stages:
```
source → tokenize → parse → fold → mcode → streamline → output
```
Each stage has a corresponding CLI tool that lets you see its output.
| Stage | Tool | What it shows |
|-------------|---------------------------|----------------------------------------|
| tokenize | `tokenize.ce` | Token stream as JSON |
| parse | `parse.ce` | Unfolded AST as JSON |
| fold | `fold.ce` | Folded AST as JSON |
| mcode | `mcode.ce` | Raw mcode IR as JSON |
| mcode | `mcode.ce --pretty` | Human-readable mcode IR |
| streamline | `streamline.ce` | Full optimized IR as JSON |
| streamline | `streamline.ce --types` | Optimized IR with type annotations |
| streamline | `streamline.ce --stats` | Per-function summary stats |
| streamline | `streamline.ce --ir` | Human-readable canonical IR |
| all | `ir_report.ce` | Structured optimizer flight recorder |
All tools take a source file as input and run the pipeline up to the relevant stage.
## Quick Start
```bash
# see raw mcode IR (pretty-printed)
cell mcode --pretty myfile.ce
# see optimized IR with type annotations
cell streamline --types myfile.ce
# full optimizer report with events
cell ir_report --full myfile.ce
```
## fold.ce
Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.
```bash
cell fold <file.ce|file.cm>
```
## mcode.ce
Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter.
```bash
cell mcode <file.ce|file.cm> # JSON (default)
cell mcode --pretty <file.ce|file.cm> # human-readable IR
```
## streamline.ce
Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison.
```bash
cell streamline <file.ce|file.cm> # full JSON (default)
cell streamline --stats <file.ce|file.cm> # summary stats per function
cell streamline --ir <file.ce|file.cm> # human-readable IR
cell streamline --check <file.ce|file.cm> # warnings only
cell streamline --types <file.ce|file.cm> # IR with type annotations
```
| Flag | Description |
|------|-------------|
| (none) | Full optimized IR as JSON (backward compatible) |
| `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated |
| `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) |
| `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) |
| `--types` | Optimized IR with inferred type annotations per slot |
Flags can be combined.
## seed.ce
Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.
```bash
cell seed # regenerate all boot seeds
cell seed --clean # also clear the build cache after
```
The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/<name>.cm.mcode`.
**When to regenerate seeds:**
- Before a release or distribution
- When the pipeline source changes in a way the existing seeds can't compile the new source (e.g. language-level changes)
- Seeds do NOT need regenerating for normal development — the engine recompiles pipeline modules from source automatically via the content-addressed cache
## ir_report.ce
The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.
```bash
cell ir_report [options] <file.ce|file.cm>
```
### Options
| Flag | Description |
|------|-------------|
| `--summary` | Per-pass JSON summaries with instruction counts and timing (default) |
| `--events` | Include rewrite events showing each optimization applied |
| `--types` | Include type delta records showing inferred slot types |
| `--ir-before=PASS` | Print canonical IR before a specific pass |
| `--ir-after=PASS` | Print canonical IR after a specific pass |
| `--ir-all` | Print canonical IR before and after all passes |
| `--full` | Everything: summary + events + types + ir-all |
With no flags, `--summary` is the default.
### Output Format
Output is line-delimited JSON. Each line is a self-contained JSON object with a `type` field:
**`type: "pass"`** — Per-pass summary with categorized instruction counts before and after:
```json
{
"type": "pass",
"pass": "eliminate_type_checks",
"fn": "fib",
"ms": 0.12,
"changed": true,
"before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...},
"after": {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...},
"changes": {"guards_removed": 1, "nops_added": 1}
}
```
**`type: "event"`** — Individual rewrite event with before/after instructions and reasoning:
```json
{
"type": "event",
"pass": "eliminate_type_checks",
"rule": "incompatible_type_forces_jump",
"at": 3,
"before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]],
"after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]],
"why": {"slot": 2, "known_type": "float", "checked_type": "int"}
}
```
**`type: "types"`** — Inferred type information for a function:
```json
{
"type": "types",
"fn": "fib",
"param_types": {},
"slot_types": {"25": "null"}
}
```
**`type: "ir"`** — Canonical IR text for a function at a specific point:
```json
{
"type": "ir",
"when": "before",
"pass": "all",
"fn": "fib",
"text": "fn fib (args=1, slots=26)\n @0 access s2, 2\n ..."
}
```
### Rewrite Rules
Each pass records events with named rules:
**eliminate_type_checks:**
- `known_type_eliminates_guard` — type already known, guard removed
- `incompatible_type_forces_jump` — type conflicts, conditional jump becomes unconditional
- `num_subsumes_int_float` — num check satisfied by int or float
- `dynamic_to_field` — load_dynamic/store_dynamic narrowed to field access
- `dynamic_to_index` — load_dynamic/store_dynamic narrowed to index access
**simplify_algebra:**
- `add_zero`, `sub_zero`, `mul_one`, `div_one` — identity operations become moves
- `mul_zero` — multiplication by zero becomes constant
- `self_eq`, `self_ne` — same-slot comparisons become constants
**simplify_booleans:**
- `not_jump_false_fusion` — not + jump_false fused into jump_true
- `not_jump_true_fusion` — not + jump_true fused into jump_false
- `double_not` — not + not collapsed to move
**eliminate_moves:**
- `self_move` — move to same slot becomes nop
**eliminate_dead_jumps:**
- `jump_to_next` — jump to immediately following label becomes nop
### Canonical IR Format
The `--ir-all`, `--ir-before`, and `--ir-after` flags produce a deterministic text representation of the IR:
```
fn fib (args=1, slots=26)
@0 access s2, 2
@1 is_int s4, s1 ; [guard]
@2 jump_false s4, "rel_ni_2" ; [branch]
@3 --- nop (tc) ---
@4 jump "rel_ni_2" ; [branch]
@5 lt_int s3, s1, s2
@6 jump "rel_done_4" ; [branch]
rel_ni_2:
@8 is_num s4, s1 ; [guard]
```
Properties:
- `@N` is the raw array index, stable across passes (passes replace, never insert or delete)
- `sN` prefix distinguishes slot operands from literal values
- String operands are quoted
- Labels appear as indented headers with a colon
- Category tags in brackets: `[guard]`, `[branch]`, `[load]`, `[store]`, `[call]`, `[arith]`, `[move]`, `[const]`
- Nops shown as `--- nop (reason) ---` with reason codes: `tc` (type check), `bl` (boolean), `mv` (move), `dj` (dead jump), `ur` (unreachable)
### Examples
```bash
# what passes changed something?
cell ir_report --summary myfile.ce | jq 'select(.changed)'
# list all rewrite rules that fired
cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule'
# diff IR before and after optimization
cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'
# full report for analysis
cell ir_report --full myfile.ce > report.json
```
## ir_stats.cm
A utility module used by `ir_report.ce` and available for custom tooling. Not a standalone tool.
```javascript
var ir_stats = use("ir_stats")
ir_stats.detailed_stats(func) // categorized instruction counts
ir_stats.ir_fingerprint(func) // djb2 hash of instruction array
ir_stats.canonical_ir(func, name, opts) // deterministic text representation
ir_stats.type_snapshot(slot_types) // frozen copy of type map
ir_stats.type_delta(before_types, after_types) // compute type changes
ir_stats.category_tag(op) // classify an opcode
```
### Instruction Categories
`detailed_stats` classifies each instruction into one of these categories:
| Category | Opcodes |
|----------|---------|
| load | `load_field`, `load_index`, `load_dynamic`, `get`, `access` (non-constant) |
| store | `store_field`, `store_index`, `store_dynamic`, `set_var`, `put`, `push` |
| branch | `jump`, `jump_true`, `jump_false`, `jump_not_null` |
| call | `invoke`, `goinvoke` |
| guard | `is_int`, `is_text`, `is_num`, `is_bool`, `is_null`, `is_array`, `is_func`, `is_record`, `is_stone` |
| arith | `add_int`, `sub_int`, ..., `add_float`, ..., `concat`, `neg_int`, `neg_float`, bitwise ops |
| move | `move` |
| const | `int`, `true`, `false`, `null`, `access` (with constant value) |
| label | string entries that are not nops |
| nop | strings starting with `_nop_` |
| other | everything else (`frame`, `setarg`, `array`, `record`, `function`, `return`, etc.) |