# Fix Compilation Pipeline Bootstrap ## Problem After merging `fix_gc` into `pitweb`, the compilation pipeline `.cm` source files (tokenize.cm, parse.cm, fold.cm, mcode.cm, streamline.cm) cannot bootstrap themselves. The old pitweb pipeline mcode compiles the merged `.cm` source without errors, but the resulting new pipeline mcode is **semantically broken** — it can't even compile `var x = 42; print(x)`. Both branches worked independently. The merge introduced no syntax errors, but the old pitweb compiler produces incorrect bytecode from the merged pipeline source. This is a classic bootstrapping problem: the new pipeline needs a compatible compiler to build itself, but the only available compiler (old pitweb) miscompiles it. ## Current State - `boot/tokenize.cm.mcode` through `boot/streamline.cm.mcode` contain the **old pitweb** pipeline mcode (pre-merge). These pass 641/641 vm_suite tests. - All other boot mcode files (engine, bootstrap, seed_bootstrap, plus core modules like fd, time, toml, etc.) are compiled from the merged source and work correctly. - The merged pipeline `.cm` source has changes from fix_gc that are **not active** — the runtime uses the old pitweb pipeline mcode. **The old pitweb pipeline is NOT fully working.** While it passes the test suite, it miscompiles nested function declarations. This breaks: - `toml.encode()` — the encoder uses nested `function` declarations inside `encode_toml` - `Shop.save_lock()` — calls `toml.encode()`, so any lock.toml mutation fails - Any other `.cm` module that uses nested named function declarations This means the **ID-based package symbol naming** (Phase 2 in the plan) is blocked: it needs `save_lock()` to persist package IDs to lock.toml. The shop.cm changes for ID-based naming are already written and correct — they just need a working pipeline underneath. Once the pipeline is fixed, the ID system will work. ## What Changed in the Pipeline The fix_gc merge brought these changes to the pipeline `.cm` files: - **mcode.cm**: Type-guarded arithmetic (`emit_add_decomposed` now generates `is_text`/`is_num` checks instead of letting the VM dispatch), `emit_numeric_binop` for subtract/multiply/etc., `sensory_ops` lookup table, array/record literal count args (`["array", dest, count]` instead of `["array", dest, 0]`) - **fold.cm**: Lookup tables (`binary_ops`, `unary_ops`, `assign_ops`, etc.) replacing if-chains, combined `"array"` and `"text literal"` handling - **tokenize.cm**: ~500 lines of changes - **streamline.cm**: ~700 lines of changes - **parse.cm**: ~40 lines of changes (minor) ## Regen Flags `regen.ce` now has two modes: ``` ./cell --dev --seed regen # default: skip pipeline files ./cell --dev --seed regen --all # include pipeline files (tokenize/parse/fold/mcode/streamline) ``` The default mode is safe — it regenerates everything except the 5 pipeline files, preserving the working old pitweb pipeline mcode. ## How to Fix The goal is to get the merged pipeline `.cm` source to produce working mcode when compiled by the current (old pitweb) pipeline. The process: 1. Start from the current repo state (old pitweb pipeline mcode in boot/) 2. Edit one or more pipeline `.cm` files to fix the issue 3. Regen with `--all` to recompile everything including pipeline: ``` ./cell --dev --seed regen --all ``` 4. Test the new pipeline with a simple sanity check: ``` rm -rf .cell/build/* echo 'var x = 42; print(x)' > /tmp/test.ce ./cell --dev --seed /tmp/test ``` 5. If that works, run the full test suite: ``` rm -rf .cell/build/* ./cell --dev vm_suite ``` 6. If tests pass, regen again (the new pipeline compiles itself): ``` ./cell --dev --seed regen --all ``` 7. Repeat steps 4-6 until **idempotent** — two consecutive `regen --all` runs produce identical boot mcode and all tests pass. ## Debugging Tips - The old pitweb pipeline mcode is always available via: ``` git checkout HEAD^1 -- boot/tokenize.cm.mcode boot/parse.cm.mcode \ boot/fold.cm.mcode boot/mcode.cm.mcode boot/streamline.cm.mcode ``` - Use `--seed` mode for testing compilation — it bypasses the engine entirely and loads the pipeline directly from boot mcode. - The failure mode is silent: the old compiler compiles the new source without errors but produces wrong bytecode. - Known broken patterns with the old pitweb pipeline: - `var x = 42; print(x)` fails when compiled by the regenned pipeline mcode - Nested named function declarations (`function foo() {}` inside another function) produce "not a function" errors — this breaks `toml.encode()` - Test with: `echo 'var toml = use("toml"); print(toml.encode({a: 1}))' > /tmp/t.ce && ./cell --dev /tmp/t.ce` - The most likely culprits are the mcode.cm changes (type-guarded arithmetic, array/record count args) since these change the bytecode format. The fold.cm changes (lookup tables) are more likely safe refactors.