96 lines
3.9 KiB
Markdown
96 lines
3.9 KiB
Markdown
# Fix Compilation Pipeline Bootstrap
|
|
|
|
## Problem
|
|
|
|
After merging `fix_gc` into `pitweb`, the compilation pipeline `.cm` source files
|
|
(tokenize.cm, parse.cm, fold.cm, mcode.cm, streamline.cm) cannot bootstrap themselves.
|
|
|
|
The old pitweb pipeline mcode compiles the merged `.cm` source without errors, but the
|
|
resulting new pipeline mcode is **semantically broken** — it can't even compile
|
|
`var x = 42; print(x)`.
|
|
|
|
Both branches worked independently. The merge introduced no syntax errors, but the old
|
|
pitweb compiler produces incorrect bytecode from the merged pipeline source. This is a
|
|
classic bootstrapping problem: the new pipeline needs a compatible compiler to build
|
|
itself, but the only available compiler (old pitweb) miscompiles it.
|
|
|
|
## Current State
|
|
|
|
- `boot/tokenize.cm.mcode` through `boot/streamline.cm.mcode` contain the **old pitweb**
|
|
pipeline mcode (pre-merge). These work correctly — 641/641 vm_suite tests pass.
|
|
- All other boot mcode files (engine, bootstrap, seed_bootstrap, plus core modules like
|
|
fd, time, toml, etc.) are compiled from the merged source and work correctly.
|
|
- The merged pipeline `.cm` source has changes from fix_gc that are **not active** — the
|
|
runtime uses the old pitweb pipeline mcode.
|
|
|
|
## What Changed in the Pipeline
|
|
|
|
The fix_gc merge brought these changes to the pipeline `.cm` files:
|
|
|
|
- **mcode.cm**: Type-guarded arithmetic (`emit_add_decomposed` now generates `is_text`/`is_num`
|
|
checks instead of letting the VM dispatch), `emit_numeric_binop` for subtract/multiply/etc.,
|
|
`sensory_ops` lookup table, array/record literal count args (`["array", dest, count]`
|
|
instead of `["array", dest, 0]`)
|
|
- **fold.cm**: Lookup tables (`binary_ops`, `unary_ops`, `assign_ops`, etc.) replacing
|
|
if-chains, combined `"array"` and `"text literal"` handling
|
|
- **tokenize.cm**: ~500 lines of changes
|
|
- **streamline.cm**: ~700 lines of changes
|
|
- **parse.cm**: ~40 lines of changes (minor)
|
|
|
|
## Regen Flags
|
|
|
|
`regen.ce` now has two modes:
|
|
|
|
```
|
|
./cell --dev --seed regen # default: skip pipeline files
|
|
./cell --dev --seed regen --all # include pipeline files (tokenize/parse/fold/mcode/streamline)
|
|
```
|
|
|
|
The default mode is safe — it regenerates everything except the 5 pipeline files,
|
|
preserving the working old pitweb pipeline mcode.
|
|
|
|
## How to Fix
|
|
|
|
The goal is to get the merged pipeline `.cm` source to produce working mcode when
|
|
compiled by the current (old pitweb) pipeline. The process:
|
|
|
|
1. Start from the current repo state (old pitweb pipeline mcode in boot/)
|
|
2. Edit one or more pipeline `.cm` files to fix the issue
|
|
3. Regen with `--all` to recompile everything including pipeline:
|
|
```
|
|
./cell --dev --seed regen --all
|
|
```
|
|
4. Test the new pipeline with a simple sanity check:
|
|
```
|
|
rm -rf .cell/build/*
|
|
echo 'var x = 42; print(x)' > /tmp/test.ce
|
|
./cell --dev --seed /tmp/test
|
|
```
|
|
5. If that works, run the full test suite:
|
|
```
|
|
rm -rf .cell/build/*
|
|
./cell --dev vm_suite
|
|
```
|
|
6. If tests pass, regen again (the new pipeline compiles itself):
|
|
```
|
|
./cell --dev --seed regen --all
|
|
```
|
|
7. Repeat steps 4-6 until **idempotent** — two consecutive `regen --all` runs produce
|
|
identical boot mcode and all tests pass.
|
|
|
|
## Debugging Tips
|
|
|
|
- The old pitweb pipeline mcode is always available via:
|
|
```
|
|
git checkout HEAD^1 -- boot/tokenize.cm.mcode boot/parse.cm.mcode \
|
|
boot/fold.cm.mcode boot/mcode.cm.mcode boot/streamline.cm.mcode
|
|
```
|
|
- Use `--seed` mode for testing compilation — it bypasses the engine entirely and
|
|
loads the pipeline directly from boot mcode.
|
|
- The failure mode is silent: the old compiler compiles the new source without errors
|
|
but produces wrong bytecode. Start debugging with the simplest failing case
|
|
(`var x = 42; print(x)`) and work up.
|
|
- The most likely culprits are the mcode.cm changes (type-guarded arithmetic, array/record
|
|
count args) since these change the bytecode format. The fold.cm changes (lookup tables)
|
|
are more likely safe refactors.
|