Merge branch 'sem_grab'

This commit is contained in:
2026-02-18 10:35:25 -06:00
16 changed files with 664 additions and 572 deletions

View File

@@ -15,65 +15,51 @@ The compiler runs in stages:
source → tokenize → parse → fold → mcode → streamline → output
```
Each stage has a corresponding dump tool that lets you see its output.
Each stage has a corresponding CLI tool that lets you see its output.
| Stage | Tool | What it shows |
|-------------|-------------------|----------------------------------------|
| fold | `dump_ast.cm` | Folded AST as JSON |
| mcode | `dump_mcode.cm` | Raw mcode IR before optimization |
| streamline | `dump_stream.cm` | Before/after instruction counts + IR |
| streamline | `dump_types.cm` | Optimized IR with type annotations |
| streamline | `streamline.ce` | Full optimized IR as JSON |
| all | `ir_report.ce` | Structured optimizer flight recorder |
| Stage | Tool | What it shows |
|-------------|---------------------------|----------------------------------------|
| tokenize | `tokenize.ce` | Token stream as JSON |
| parse | `parse.ce` | Unfolded AST as JSON |
| fold | `fold.ce` | Folded AST as JSON |
| mcode | `mcode.ce` | Raw mcode IR as JSON |
| mcode | `mcode.ce --pretty` | Human-readable mcode IR |
| streamline | `streamline.ce` | Full optimized IR as JSON |
| streamline | `streamline.ce --types` | Optimized IR with type annotations |
| streamline | `streamline.ce --stats` | Per-function summary stats |
| streamline | `streamline.ce --ir` | Human-readable canonical IR |
| all | `ir_report.ce` | Structured optimizer flight recorder |
All tools take a source file as input and run the pipeline up to the relevant stage.
## Quick Start
```bash
# see raw mcode IR
./cell --core . dump_mcode.cm myfile.ce
# see raw mcode IR (pretty-printed)
cell mcode --pretty myfile.ce
# see what the optimizer changed
./cell --core . dump_stream.cm myfile.ce
# see optimized IR with type annotations
cell streamline --types myfile.ce
# full optimizer report with events
./cell --core . ir_report.ce --full myfile.ce
cell ir_report --full myfile.ce
```
## dump_ast.cm
## fold.ce
Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation.
```bash
./cell --core . dump_ast.cm <file.ce|file.cm>
cell fold <file.ce|file.cm>
```
## dump_mcode.cm
## mcode.ce
Prints the raw mcode IR before any optimization. Shows the instruction array as formatted text with opcode, operands, and program counter.
Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter.
```bash
./cell --core . dump_mcode.cm <file.ce|file.cm>
```
## dump_stream.cm
Shows a before/after comparison of the optimizer. For each function, prints:
- Instruction count before and after
- Number of eliminated instructions
- The streamlined IR (nops hidden by default)
```bash
./cell --core . dump_stream.cm <file.ce|file.cm>
```
## dump_types.cm
Shows the optimized IR with type annotations. Each instruction is followed by the known types of its slot operands, inferred by walking the instruction stream.
```bash
./cell --core . dump_types.cm <file.ce|file.cm>
cell mcode <file.ce|file.cm> # JSON (default)
cell mcode --pretty <file.ce|file.cm> # human-readable IR
```
## streamline.ce
@@ -81,10 +67,11 @@ Shows the optimized IR with type annotations. Each instruction is followed by th
Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison.
```bash
./cell --core . streamline.ce <file.ce|file.cm> # full JSON (default)
./cell --core . streamline.ce --stats <file.ce|file.cm> # summary stats per function
./cell --core . streamline.ce --ir <file.ce|file.cm> # human-readable IR
./cell --core . streamline.ce --check <file.ce|file.cm> # warnings only
cell streamline <file.ce|file.cm> # full JSON (default)
cell streamline --stats <file.ce|file.cm> # summary stats per function
cell streamline --ir <file.ce|file.cm> # human-readable IR
cell streamline --check <file.ce|file.cm> # warnings only
cell streamline --types <file.ce|file.cm> # IR with type annotations
```
| Flag | Description |
@@ -93,6 +80,7 @@ Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs th
| `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated |
| `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) |
| `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) |
| `--types` | Optimized IR with inferred type annotations per slot |
Flags can be combined.
@@ -101,8 +89,8 @@ Flags can be combined.
Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start.
```bash
./cell --core . seed.ce # regenerate all boot seeds
./cell --core . seed.ce --clean # also clear the build cache after
cell seed # regenerate all boot seeds
cell seed --clean # also clear the build cache after
```
The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/<name>.cm.mcode`.
@@ -117,7 +105,7 @@ The script compiles each pipeline module (tokenize, parse, fold, mcode, streamli
The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why.
```bash
./cell --core . ir_report.ce [options] <file.ce|file.cm>
cell ir_report [options] <file.ce|file.cm>
```
### Options
@@ -246,16 +234,16 @@ Properties:
```bash
# what passes changed something?
./cell --core . ir_report.ce --summary myfile.ce | jq 'select(.changed)'
cell ir_report --summary myfile.ce | jq 'select(.changed)'
# list all rewrite rules that fired
./cell --core . ir_report.ce --events myfile.ce | jq 'select(.type == "event") | .rule'
cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule'
# diff IR before and after optimization
./cell --core . ir_report.ce --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'
cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text'
# full report for analysis
./cell --core . ir_report.ce --full myfile.ce > report.json
cell ir_report --full myfile.ce > report.json
```
## ir_stats.cm

View File

@@ -130,9 +130,9 @@ Seeds are used during cold start (empty cache) to compile the pipeline modules f
| File | Purpose |
|------|---------|
| `dump_mcode.cm` | Print raw Mcode IR before streamlining |
| `dump_stream.cm` | Print IR after streamlining with before/after stats |
| `dump_types.cm` | Print streamlined IR with type annotations |
| `mcode.ce --pretty` | Print raw Mcode IR before streamlining |
| `streamline.ce --types` | Print streamlined IR with type annotations |
| `streamline.ce --stats` | Print IR after streamlining with before/after stats |
## Test Files

View File

@@ -257,17 +257,17 @@ The `+` operator is excluded from target slot propagation when it would use the
## Debugging Tools
Three dump tools inspect the IR at different stages:
CLI tools inspect the IR at different stages:
- **`dump_mcode.cm`** — prints the raw Mcode IR after `mcode.cm`, before streamlining
- **`dump_stream.cm`** — prints the IR after streamlining, with before/after instruction counts
- **`dump_types.cm`** — prints the streamlined IR with type annotations on each instruction
- **`cell mcode --pretty`** — prints the raw Mcode IR after `mcode.cm`, before streamlining
- **`cell streamline --stats`** — prints the IR after streamlining, with before/after instruction counts
- **`cell streamline --types`** — prints the streamlined IR with type annotations on each instruction
Usage:
```
./cell --core . dump_mcode.cm <file.ce|file.cm>
./cell --core . dump_stream.cm <file.ce|file.cm>
./cell --core . dump_types.cm <file.ce|file.cm>
cell mcode --pretty <file.ce|file.cm>
cell streamline --stats <file.ce|file.cm>
cell streamline --types <file.ce|file.cm>
```
## Tail Call Marking