Files
cell/fix_pipeline.md

4.9 KiB

Fix Compilation Pipeline Bootstrap

Problem

After merging fix_gc into pitweb, the compilation pipeline .cm source files (tokenize.cm, parse.cm, fold.cm, mcode.cm, streamline.cm) cannot bootstrap themselves.

The old pitweb pipeline mcode compiles the merged .cm source without errors, but the resulting new pipeline mcode is semantically broken — it can't even compile var x = 42; print(x).

Both branches worked independently. The merge introduced no syntax errors, but the old pitweb compiler produces incorrect bytecode from the merged pipeline source. This is a classic bootstrapping problem: the new pipeline needs a compatible compiler to build itself, but the only available compiler (old pitweb) miscompiles it.

Current State

  • boot/tokenize.cm.mcode through boot/streamline.cm.mcode contain the old pitweb pipeline mcode (pre-merge). These pass 641/641 vm_suite tests.
  • All other boot mcode files (engine, bootstrap, seed_bootstrap, plus core modules like fd, time, toml, etc.) are compiled from the merged source and work correctly.
  • The merged pipeline .cm source has changes from fix_gc that are not active — the runtime uses the old pitweb pipeline mcode.

The old pitweb pipeline is NOT fully working. While it passes the test suite, it miscompiles nested function declarations. This breaks:

  • toml.encode() — the encoder uses nested function declarations inside encode_toml
  • Shop.save_lock() — calls toml.encode(), so any lock.toml mutation fails
  • Any other .cm module that uses nested named function declarations

This means the ID-based package symbol naming (Phase 2 in the plan) is blocked: it needs save_lock() to persist package IDs to lock.toml.

The shop.cm changes for ID-based naming are already written and correct — they just need a working pipeline underneath. Once the pipeline is fixed, the ID system will work.

What Changed in the Pipeline

The fix_gc merge brought these changes to the pipeline .cm files:

  • mcode.cm: Type-guarded arithmetic (emit_add_decomposed now generates is_text/is_num checks instead of letting the VM dispatch), emit_numeric_binop for subtract/multiply/etc., sensory_ops lookup table, array/record literal count args (["array", dest, count] instead of ["array", dest, 0])
  • fold.cm: Lookup tables (binary_ops, unary_ops, assign_ops, etc.) replacing if-chains, combined "array" and "text literal" handling
  • tokenize.cm: ~500 lines of changes
  • streamline.cm: ~700 lines of changes
  • parse.cm: ~40 lines of changes (minor)

Regen Flags

regen.ce now has two modes:

./cell --dev --seed regen          # default: skip pipeline files
./cell --dev --seed regen --all    # include pipeline files (tokenize/parse/fold/mcode/streamline)

The default mode is safe — it regenerates everything except the 5 pipeline files, preserving the working old pitweb pipeline mcode.

How to Fix

The goal is to get the merged pipeline .cm source to produce working mcode when compiled by the current (old pitweb) pipeline. The process:

  1. Start from the current repo state (old pitweb pipeline mcode in boot/)
  2. Edit one or more pipeline .cm files to fix the issue
  3. Regen with --all to recompile everything including pipeline:
    ./cell --dev --seed regen --all
    
  4. Test the new pipeline with a simple sanity check:
    rm -rf .cell/build/*
    echo 'var x = 42; print(x)' > /tmp/test.ce
    ./cell --dev --seed /tmp/test
    
  5. If that works, run the full test suite:
    rm -rf .cell/build/*
    ./cell --dev vm_suite
    
  6. If tests pass, regen again (the new pipeline compiles itself):
    ./cell --dev --seed regen --all
    
  7. Repeat steps 4-6 until idempotent — two consecutive regen --all runs produce identical boot mcode and all tests pass.

Debugging Tips

  • The old pitweb pipeline mcode is always available via:
    git checkout HEAD^1 -- boot/tokenize.cm.mcode boot/parse.cm.mcode \
      boot/fold.cm.mcode boot/mcode.cm.mcode boot/streamline.cm.mcode
    
  • Use --seed mode for testing compilation — it bypasses the engine entirely and loads the pipeline directly from boot mcode.
  • The failure mode is silent: the old compiler compiles the new source without errors but produces wrong bytecode.
  • Known broken patterns with the old pitweb pipeline:
    • var x = 42; print(x) fails when compiled by the regenned pipeline mcode
    • Nested named function declarations (function foo() {} inside another function) produce "not a function" errors — this breaks toml.encode()
    • Test with: echo 'var toml = use("toml"); print(toml.encode({a: 1}))' > /tmp/t.ce && ./cell --dev /tmp/t.ce
  • The most likely culprits are the mcode.cm changes (type-guarded arithmetic, array/record count args) since these change the bytecode format. The fold.cm changes (lookup tables) are more likely safe refactors.