496 lines
18 KiB
Markdown
496 lines
18 KiB
Markdown
---
|
|
title: "Compiler Inspection Tools"
|
|
description: "Tools for inspecting and debugging the compiler pipeline"
|
|
weight: 50
|
|
type: "docs"
|
|
---
|
|
|
|
ƿit includes a set of tools for inspecting the compiler pipeline at every stage. These are useful for debugging, testing optimizations, and understanding what the compiler does with your code.
|
|
|
|
## Pipeline Overview
|
|
|
|
The compiler runs in stages:
|
|
|
|
```
|
|
source → tokenize → parse → fold → mcode → streamline → output
|
|
```
|
|
|
|
Each stage has a corresponding CLI tool that lets you see its output.
|
|
|
|
| Stage | Tool | What it shows |
|
|
|-------------|---------------------------|----------------------------------------|
|
|
| tokenize | `tokenize.ce` | Token stream as JSON |
|
|
| parse | `parse.ce` | Unfolded AST as JSON |
|
|
| fold | `fold.ce` | Folded AST as JSON |
|
|
| mcode | `mcode.ce` | Raw mcode IR as JSON |
|
|
| mcode | `mcode.ce --pretty` | Human-readable mcode IR |
|
|
| streamline | `streamline.ce` | Full optimized IR as JSON |
|
|
| streamline | `streamline.ce --types` | Optimized IR with type annotations |
|
|
| streamline | `streamline.ce --stats` | Per-function summary stats |
|
|
| streamline | `streamline.ce --ir` | Human-readable canonical IR |
|
|
| disasm | `disasm.ce` | Source-interleaved disassembly |
|
|
| disasm | `disasm.ce --optimized` | Optimized source-interleaved disassembly |
|
|
| diff | `diff_ir.ce` | Mcode vs streamline instruction diff |
|
|
| xref | `xref.ce` | Cross-reference / call creation graph |
|
|
| cfg | `cfg.ce` | Control flow graph (basic blocks) |
|
|
| slots | `slots.ce` | Slot data flow / use-def chains |
|
|
| all | `ir_report.ce` | Structured optimizer flight recorder |
|
|
|
|
All tools take a source file as input and run the pipeline up to the relevant stage.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# see raw mcode IR (pretty-printed)
|
|
cell mcode --pretty myfile.ce
|
|
|
|
# source-interleaved disassembly
|
|
cell disasm myfile.ce
|
|
|
|
# see optimized IR with type annotations
|
|
cell streamline --types myfile.ce
|
|
|
|
# full optimizer report with events
|
|
cell ir_report --full myfile.ce
|
|
```
|
|
|
|
## fold.ce
|
|
|
|
Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.
|
|
|
|
```bash
|
|
cell fold <file.ce|file.cm>
|
|
```
|
|
|
|
## mcode.ce
|
|
|
|
Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter.
|
|
|
|
```bash
|
|
cell mcode <file.ce|file.cm> # JSON (default)
|
|
cell mcode --pretty <file.ce|file.cm> # human-readable IR
|
|
```
|
|
|
|
## streamline.ce
|
|
|
|
Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison.
|
|
|
|
```bash
|
|
cell streamline <file.ce|file.cm> # full JSON (default)
|
|
cell streamline --stats <file.ce|file.cm> # summary stats per function
|
|
cell streamline --ir <file.ce|file.cm> # human-readable IR
|
|
cell streamline --check <file.ce|file.cm> # warnings only
|
|
cell streamline --types <file.ce|file.cm> # IR with type annotations
|
|
cell streamline --diagnose <file.ce|file.cm> # compile-time diagnostics
|
|
```
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| (none) | Full optimized IR as JSON (backward compatible) |
|
|
| `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated |
|
|
| `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) |
|
|
| `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) |
|
|
| `--types` | Optimized IR with inferred type annotations per slot |
|
|
| `--diagnose` | Run compile-time diagnostics (type errors and warnings) |
|
|
|
|
Flags can be combined.
|
|
|
|
## disasm.ce
|
|
|
|
Source-interleaved disassembly. Shows mcode or optimized IR with source lines interleaved, making it easy to see which instructions were generated from which source code.
|
|
|
|
```bash
|
|
cell disasm <file> # disassemble all functions (mcode)
|
|
cell disasm --optimized <file> # disassemble optimized IR (streamline)
|
|
cell disasm --fn 87 <file> # show only function 87
|
|
cell disasm --fn my_func <file> # show only functions named "my_func"
|
|
cell disasm --line 235 <file> # show instructions generated from line 235
|
|
```
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| (none) | Raw mcode IR with source interleaving (default) |
|
|
| `--optimized` | Use optimized IR (streamline) instead of raw mcode |
|
|
| `--fn <N\|name>` | Filter to specific function by index or name substring |
|
|
| `--line <N>` | Show only instructions generated from a specific source line |
|
|
|
|
### Output Format
|
|
|
|
Functions are shown with a header including argument count, slot count, and the source line where the function begins. Instructions are grouped by source line, with the source text shown before each group:
|
|
|
|
```
|
|
=== [87] <anonymous> (args=0, slots=12, closures=0) [line 234] ===
|
|
|
|
--- line 235: var result = compute(x, y) ---
|
|
0 access 2, "compute" :235
|
|
1 get 3, 1, 0 :235
|
|
2 get 4, 1, 1 :235
|
|
3 invoke 3, 2, 2 :235
|
|
|
|
--- line 236: if (result > 0) { ---
|
|
4 access 5, 0 :236
|
|
5 gt 6, 4, 5 :236
|
|
6 jump_false 6, "else_1" :236
|
|
```
|
|
|
|
Each instruction line shows:
|
|
- Program counter (left-aligned)
|
|
- Opcode
|
|
- Operands (comma-separated)
|
|
- Source line number (`:N` suffix, right-aligned)
|
|
|
|
Function creation instructions include a cross-reference annotation showing the target function's name:
|
|
|
|
```
|
|
3 function 5, 12 :235 ; -> [12] helper_fn
|
|
```
|
|
|
|
## diff_ir.ce
|
|
|
|
Compares mcode IR (before optimization) with streamline IR (after optimization), showing what the optimizer changed. Useful for understanding which instructions were eliminated, specialized, or rewritten.
|
|
|
|
```bash
|
|
cell diff_ir <file> # diff all functions
|
|
cell diff_ir --fn <N|name> <file> # diff only one function
|
|
cell diff_ir --summary <file> # counts only
|
|
```
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| (none) | Show all diffs with source interleaving |
|
|
| `--fn <N\|name>` | Filter to specific function by index or name |
|
|
| `--summary` | Show only eliminated/rewritten counts per function |
|
|
|
|
### Output Format
|
|
|
|
Changed instructions are shown in diff style with `-` (before) and `+` (after) lines:
|
|
|
|
```
|
|
=== [0] <anonymous> (args=1, slots=40) ===
|
|
17 eliminated, 51 rewritten
|
|
|
|
--- line 4: if (n <= 1) { ---
|
|
- 1 is_int 4, 1 :4
|
|
+ 1 is_int 3, 1 :4 (specialized)
|
|
- 3 is_int 5, 2 :4
|
|
+ 3 _nop_tc_1 (eliminated)
|
|
```
|
|
|
|
Summary mode gives a quick overview:
|
|
|
|
```
|
|
[0] <anonymous>: 17 eliminated, 51 rewritten
|
|
[1] <anonymous>: 65 eliminated, 181 rewritten
|
|
total: 86 eliminated, 250 rewritten across 4 functions
|
|
```
|
|
|
|
## xref.ce
|
|
|
|
Cross-reference / call graph tool. Shows which functions create other functions (via `function` instructions), building a creation tree.
|
|
|
|
```bash
|
|
cell xref <file> # full creation tree
|
|
cell xref --callers <N> <file> # who creates function [N]?
|
|
cell xref --callees <N> <file> # what does [N] create/call?
|
|
cell xref --dot <file> # DOT graph for graphviz
|
|
cell xref --optimized <file> # use optimized IR
|
|
```
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| (none) | Indented creation tree from main |
|
|
| `--callers <N>` | Show which functions create function [N] |
|
|
| `--callees <N>` | Show what function [N] creates (use -1 for main) |
|
|
| `--dot` | Output DOT format for graphviz |
|
|
| `--optimized` | Use optimized IR instead of raw mcode |
|
|
|
|
### Output Format
|
|
|
|
Default tree view:
|
|
|
|
```
|
|
demo_disasm.cm
|
|
[0] <anonymous>
|
|
[1] <anonymous>
|
|
[2] <anonymous>
|
|
```
|
|
|
|
Caller/callee query:
|
|
|
|
```
|
|
Callers of [0] <anonymous>:
|
|
demo_disasm.cm at line 3
|
|
```
|
|
|
|
DOT output can be piped to graphviz: `cell xref --dot file.cm | dot -Tpng -o xref.png`
|
|
|
|
## cfg.ce
|
|
|
|
Control flow graph tool. Identifies basic blocks from labels and jumps, computes edges, and detects loop back-edges.
|
|
|
|
```bash
|
|
cell cfg --fn <N|name> <file> # text CFG for function
|
|
cell cfg --dot --fn <N|name> <file> # DOT output for graphviz
|
|
cell cfg <file> # text CFG for all functions
|
|
cell cfg --optimized <file> # use optimized IR
|
|
```
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--fn <N\|name>` | Filter to specific function by index or name |
|
|
| `--dot` | Output DOT format for graphviz |
|
|
| `--optimized` | Use optimized IR instead of raw mcode |
|
|
|
|
### Output Format
|
|
|
|
```
|
|
=== [0] <anonymous> ===
|
|
B0 [pc 0-2, line 4]:
|
|
0 access 2, 1
|
|
1 is_int 4, 1
|
|
2 jump_false 4, "rel_ni_2"
|
|
-> B3 "rel_ni_2" (jump)
|
|
-> B1 (fallthrough)
|
|
|
|
B1 [pc 3-4, line 4]:
|
|
3 is_int 5, 2
|
|
4 jump_false 5, "rel_ni_2"
|
|
-> B3 "rel_ni_2" (jump)
|
|
-> B2 (fallthrough)
|
|
```
|
|
|
|
Each block shows its ID, PC range, source lines, instructions, and outgoing edges. Loop back-edges (target PC <= source PC) are annotated.
|
|
|
|
## slots.ce
|
|
|
|
Slot data flow analysis. Builds use-def chains for every slot in a function, showing where each slot is defined and used. Optionally captures type information from streamline.
|
|
|
|
```bash
|
|
cell slots --fn <N|name> <file> # slot summary for function
|
|
cell slots --slot <N> --fn <N|name> <file> # trace slot N
|
|
cell slots <file> # slot summary for all functions
|
|
```
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--fn <N\|name>` | Filter to specific function by index or name |
|
|
| `--slot <N>` | Show chronological DEF/USE trace for a specific slot |
|
|
|
|
### Output Format
|
|
|
|
Summary shows each slot with its def count, use count, inferred type, and first definition. Dead slots (defined but never used) are flagged:
|
|
|
|
```
|
|
=== [0] <anonymous> (args=1, slots=40) ===
|
|
slot defs uses type first-def
|
|
s0 0 0 - (this)
|
|
s1 0 10 - (arg 0)
|
|
s2 1 6 - pc 0: access
|
|
s10 1 0 - pc 29: invoke <- dead
|
|
```
|
|
|
|
Slot trace (`--slot N`) shows every DEF and USE in program order:
|
|
|
|
```
|
|
=== slot 3 in [0] <anonymous> ===
|
|
DEF pc 5: le_int 3, 1, 2 :4
|
|
DEF pc 11: le_float 3, 1, 2 :4
|
|
DEF pc 17: le_text 3, 1, 2 :4
|
|
USE pc 31: jump_false 3, "if_else_0" :4
|
|
```
|
|
|
|
## seed.ce
|
|
|
|
Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.
|
|
|
|
```bash
|
|
cell seed # regenerate all boot seeds
|
|
cell seed --clean # also clear the build cache after
|
|
```
|
|
|
|
The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/<name>.cm.mcode`.
|
|
|
|
**When to regenerate seeds:**
|
|
- Before a release or distribution
|
|
- When the pipeline source changes in a way the existing seeds can't compile the new source (e.g. language-level changes)
|
|
- Seeds do NOT need regenerating for normal development — the engine recompiles pipeline modules from source automatically via the content-addressed cache
|
|
|
|
## ir_report.ce
|
|
|
|
The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.
|
|
|
|
```bash
|
|
cell ir_report [options] <file.ce|file.cm>
|
|
```
|
|
|
|
### Options
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--summary` | Per-pass JSON summaries with instruction counts and timing (default) |
|
|
| `--events` | Include rewrite events showing each optimization applied |
|
|
| `--types` | Include type delta records showing inferred slot types |
|
|
| `--ir-before=PASS` | Print canonical IR before a specific pass |
|
|
| `--ir-after=PASS` | Print canonical IR after a specific pass |
|
|
| `--ir-all` | Print canonical IR before and after all passes |
|
|
| `--full` | Everything: summary + events + types + ir-all |
|
|
|
|
With no flags, `--summary` is the default.
|
|
|
|
### Output Format
|
|
|
|
Output is line-delimited JSON. Each line is a self-contained JSON object with a `type` field:
|
|
|
|
**`type: "pass"`** — Per-pass summary with categorized instruction counts before and after:
|
|
|
|
```json
|
|
{
|
|
"type": "pass",
|
|
"pass": "eliminate_type_checks",
|
|
"fn": "fib",
|
|
"ms": 0.12,
|
|
"changed": true,
|
|
"before": {"instr": 77, "nop": 0, "guard": 16, "branch": 28, ...},
|
|
"after": {"instr": 77, "nop": 1, "guard": 15, "branch": 28, ...},
|
|
"changes": {"guards_removed": 1, "nops_added": 1}
|
|
}
|
|
```
|
|
|
|
**`type: "event"`** — Individual rewrite event with before/after instructions and reasoning:
|
|
|
|
```json
|
|
{
|
|
"type": "event",
|
|
"pass": "eliminate_type_checks",
|
|
"rule": "incompatible_type_forces_jump",
|
|
"at": 3,
|
|
"before": [["is_int", 5, 2, 4, 9], ["jump_false", 5, "rel_ni_2", 4, 9]],
|
|
"after": ["_nop_tc_1", ["jump", "rel_ni_2", 4, 9]],
|
|
"why": {"slot": 2, "known_type": "float", "checked_type": "int"}
|
|
}
|
|
```
|
|
|
|
**`type: "types"`** — Inferred type information for a function:
|
|
|
|
```json
|
|
{
|
|
"type": "types",
|
|
"fn": "fib",
|
|
"param_types": {},
|
|
"slot_types": {"25": "null"}
|
|
}
|
|
```
|
|
|
|
**`type: "ir"`** — Canonical IR text for a function at a specific point:
|
|
|
|
```json
|
|
{
|
|
"type": "ir",
|
|
"when": "before",
|
|
"pass": "all",
|
|
"fn": "fib",
|
|
"text": "fn fib (args=1, slots=26)\n @0 access s2, 2\n ..."
|
|
}
|
|
```
|
|
|
|
### Rewrite Rules
|
|
|
|
Each pass records events with named rules:
|
|
|
|
**eliminate_type_checks:**
|
|
- `known_type_eliminates_guard` — type already known, guard removed
|
|
- `incompatible_type_forces_jump` — type conflicts, conditional jump becomes unconditional
|
|
- `num_subsumes_int_float` — num check satisfied by int or float
|
|
- `dynamic_to_field` — load_dynamic/store_dynamic narrowed to field access
|
|
- `dynamic_to_index` — load_dynamic/store_dynamic narrowed to index access
|
|
|
|
**simplify_algebra:**
|
|
- `add_zero`, `sub_zero`, `mul_one`, `div_one` — identity operations become moves
|
|
- `mul_zero` — multiplication by zero becomes constant
|
|
- `self_eq`, `self_ne` — same-slot comparisons become constants
|
|
|
|
**simplify_booleans:**
|
|
- `not_jump_false_fusion` — not + jump_false fused into jump_true
|
|
- `not_jump_true_fusion` — not + jump_true fused into jump_false
|
|
- `double_not` — not + not collapsed to move
|
|
|
|
**eliminate_moves:**
|
|
- `self_move` — move to same slot becomes nop
|
|
|
|
**eliminate_dead_jumps:**
|
|
- `jump_to_next` — jump to immediately following label becomes nop
|
|
|
|
### Canonical IR Format
|
|
|
|
The `--ir-all`, `--ir-before`, and `--ir-after` flags produce a deterministic text representation of the IR:
|
|
|
|
```
|
|
fn fib (args=1, slots=26)
|
|
@0 access s2, 2
|
|
@1 is_int s4, s1 ; [guard]
|
|
@2 jump_false s4, "rel_ni_2" ; [branch]
|
|
@3 --- nop (tc) ---
|
|
@4 jump "rel_ni_2" ; [branch]
|
|
@5 lt_int s3, s1, s2
|
|
@6 jump "rel_done_4" ; [branch]
|
|
rel_ni_2:
|
|
@8 is_num s4, s1 ; [guard]
|
|
```
|
|
|
|
Properties:
|
|
- `@N` is the raw array index, stable across passes (passes replace, never insert or delete)
|
|
- `sN` prefix distinguishes slot operands from literal values
|
|
- String operands are quoted
|
|
- Labels appear as indented headers with a colon
|
|
- Category tags in brackets: `[guard]`, `[branch]`, `[load]`, `[store]`, `[call]`, `[arith]`, `[move]`, `[const]`
|
|
- Nops shown as `--- nop (reason) ---` with reason codes: `tc` (type check), `bl` (boolean), `mv` (move), `dj` (dead jump), `ur` (unreachable)
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
# what passes changed something?
|
|
cell ir_report --summary myfile.ce | jq 'select(.changed)'
|
|
|
|
# list all rewrite rules that fired
|
|
cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule'
|
|
|
|
# diff IR before and after optimization
|
|
cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'
|
|
|
|
# full report for analysis
|
|
cell ir_report --full myfile.ce > report.json
|
|
```
|
|
|
|
## ir_stats.cm
|
|
|
|
A utility module used by `ir_report.ce` and available for custom tooling. Not a standalone tool.
|
|
|
|
```javascript
|
|
var ir_stats = use("ir_stats")
|
|
|
|
ir_stats.detailed_stats(func) // categorized instruction counts
|
|
ir_stats.ir_fingerprint(func) // djb2 hash of instruction array
|
|
ir_stats.canonical_ir(func, name, opts) // deterministic text representation
|
|
ir_stats.type_snapshot(slot_types) // frozen copy of type map
|
|
ir_stats.type_delta(before_types, after_types) // compute type changes
|
|
ir_stats.category_tag(op) // classify an opcode
|
|
```
|
|
|
|
### Instruction Categories
|
|
|
|
`detailed_stats` classifies each instruction into one of these categories:
|
|
|
|
| Category | Opcodes |
|
|
|----------|---------|
|
|
| load | `load_field`, `load_index`, `load_dynamic`, `get`, `access` (non-constant) |
|
|
| store | `store_field`, `store_index`, `store_dynamic`, `set_var`, `put`, `push` |
|
|
| branch | `jump`, `jump_true`, `jump_false`, `jump_not_null` |
|
|
| call | `invoke`, `goinvoke` |
|
|
| guard | `is_int`, `is_text`, `is_num`, `is_bool`, `is_null`, `is_array`, `is_func`, `is_record`, `is_stone` |
|
|
| arith | `add_int`, `sub_int`, ..., `add_float`, ..., `concat`, `neg_int`, `neg_float`, bitwise ops |
|
|
| move | `move` |
|
|
| const | `int`, `true`, `false`, `null`, `access` (with constant value) |
|
|
| label | string entries that are not nops |
|
|
| nop | strings starting with `_nop_` |
|
|
| other | everything else (`frame`, `setarg`, `array`, `record`, `function`, `return`, etc.) |
|