4.9 KiB
Fix Compilation Pipeline Bootstrap
Problem
After merging fix_gc into pitweb, the compilation pipeline .cm source files
(tokenize.cm, parse.cm, fold.cm, mcode.cm, streamline.cm) cannot bootstrap themselves.
The old pitweb pipeline mcode compiles the merged .cm source without errors, but the
resulting new pipeline mcode is semantically broken — it can't even compile
var x = 42; print(x).
Both branches worked independently. The merge introduced no syntax errors, but the old pitweb compiler produces incorrect bytecode from the merged pipeline source. This is a classic bootstrapping problem: the new pipeline needs a compatible compiler to build itself, but the only available compiler (old pitweb) miscompiles it.
Current State
boot/tokenize.cm.mcodethroughboot/streamline.cm.mcodecontain the old pitweb pipeline mcode (pre-merge). These pass 641/641 vm_suite tests.- All other boot mcode files (engine, bootstrap, seed_bootstrap, plus core modules like fd, time, toml, etc.) are compiled from the merged source and work correctly.
- The merged pipeline
.cmsource has changes from fix_gc that are not active — the runtime uses the old pitweb pipeline mcode.
The old pitweb pipeline is NOT fully working. While it passes the test suite, it miscompiles nested function declarations. This breaks:
toml.encode()— the encoder uses nestedfunctiondeclarations insideencode_tomlShop.save_lock()— callstoml.encode(), so any lock.toml mutation fails- Any other
.cmmodule that uses nested named function declarations
This means the ID-based package symbol naming (Phase 2 in the plan) is blocked: it
needs save_lock() to persist package IDs to lock.toml.
The shop.cm changes for ID-based naming are already written and correct — they just need a working pipeline underneath. Once the pipeline is fixed, the ID system will work.
What Changed in the Pipeline
The fix_gc merge brought these changes to the pipeline .cm files:
- mcode.cm: Type-guarded arithmetic (
emit_add_decomposednow generatesis_text/is_numchecks instead of letting the VM dispatch),emit_numeric_binopfor subtract/multiply/etc.,sensory_opslookup table, array/record literal count args (["array", dest, count]instead of["array", dest, 0]) - fold.cm: Lookup tables (
binary_ops,unary_ops,assign_ops, etc.) replacing if-chains, combined"array"and"text literal"handling - tokenize.cm: ~500 lines of changes
- streamline.cm: ~700 lines of changes
- parse.cm: ~40 lines of changes (minor)
Regen Flags
regen.ce now has two modes:
./cell --dev --seed regen # default: skip pipeline files
./cell --dev --seed regen --all # include pipeline files (tokenize/parse/fold/mcode/streamline)
The default mode is safe — it regenerates everything except the 5 pipeline files, preserving the working old pitweb pipeline mcode.
How to Fix
The goal is to get the merged pipeline .cm source to produce working mcode when
compiled by the current (old pitweb) pipeline. The process:
- Start from the current repo state (old pitweb pipeline mcode in boot/)
- Edit one or more pipeline
.cmfiles to fix the issue - Regen with
--allto recompile everything including pipeline:./cell --dev --seed regen --all - Test the new pipeline with a simple sanity check:
rm -rf .cell/build/* echo 'var x = 42; print(x)' > /tmp/test.ce ./cell --dev --seed /tmp/test - If that works, run the full test suite:
rm -rf .cell/build/* ./cell --dev vm_suite - If tests pass, regen again (the new pipeline compiles itself):
./cell --dev --seed regen --all - Repeat steps 4-6 until idempotent — two consecutive
regen --allruns produce identical boot mcode and all tests pass.
Debugging Tips
- The old pitweb pipeline mcode is always available via:
git checkout HEAD^1 -- boot/tokenize.cm.mcode boot/parse.cm.mcode \ boot/fold.cm.mcode boot/mcode.cm.mcode boot/streamline.cm.mcode - Use
--seedmode for testing compilation — it bypasses the engine entirely and loads the pipeline directly from boot mcode. - The failure mode is silent: the old compiler compiles the new source without errors but produces wrong bytecode.
- Known broken patterns with the old pitweb pipeline:
var x = 42; print(x)fails when compiled by the regenned pipeline mcode- Nested named function declarations (
function foo() {}inside another function) produce "not a function" errors — this breakstoml.encode() - Test with:
echo 'var toml = use("toml"); print(toml.encode({a: 1}))' > /tmp/t.ce && ./cell --dev /tmp/t.ce
- The most likely culprits are the mcode.cm changes (type-guarded arithmetic, array/record count args) since these change the bytecode format. The fold.cm changes (lookup tables) are more likely safe refactors.