Files
cell/docs/spec/pipeline.md

6.9 KiB

title, description
title description
Compilation Pipeline Overview of the compilation stages and optimizations

Overview

The compilation pipeline transforms source code through several stages, each adding information or lowering the representation toward execution. All backends share the same path through mcode and streamline. There are three execution backends: the Mach register VM (default), the Mcode interpreter (debug), and native code via QBE (experimental).

Source → Tokenize → Parse → Fold → Mcode → Streamline → Mach VM (default)
                                                        → Mcode Interpreter
                                                        → QBE → Native

Stages

Tokenize (tokenize.cm)

Splits source text into tokens. Handles string interpolation by re-tokenizing template literal contents. Produces a token array with position information (line, column).

Parse (parse.cm)

Converts tokens into an AST. Also performs semantic analysis:

  • Scope records: For each scope (global, function), builds a record mapping variable names to their metadata: make (var/def/function/input), function_nr, nr_uses, closure flag, and level.
  • Type tags: When the right-hand side of a def is a syntactically obvious type, stamps type_tag on the scope record entry. Derivable types: "integer", "number", "text", "array", "record", "function", "logical", "null".
  • Intrinsic resolution: Names used but not locally bound are recorded in ast.intrinsics. Name nodes referencing intrinsics get intrinsic: true.
  • Access kind: Subscript ([) nodes get access_kind: "index" for numeric subscripts, "field" for string subscripts, omitted otherwise.
  • Tail position: Return statements where the expression is a call get tail: true.

Fold (fold.cm)

Operates on the AST. Performs constant folding and type analysis:

  • Constant folding: Evaluates arithmetic on known constants at compile time (e.g., 5 + 10 becomes 15).
  • Constant propagation: Tracks def bindings whose values are known constants.
  • Type propagation: Extends type_tag through operations. When both operands of an arithmetic op have known types, the result type is known. Propagates type tags to reference sites.
  • Intrinsic specialization: When an intrinsic call's argument types are known, stamps a hint on the call node. For example, length(x) where x is a known array gets hint: "array_length". Type checks like is_array(known_array) are folded to true.
  • Purity marking: Stamps pure: true on expressions with no side effects (literals, name references, arithmetic on pure operands).
  • Dead code elimination: Removes unreachable branches when conditions are known constants.

Mcode (mcode.cm)

Lowers the AST to a JSON-based intermediate representation with explicit operations. Key design principle: every type check is an explicit instruction so downstream optimizers can see and eliminate them.

  • Typed load/store: Emits load_index (array by integer), load_field (record by string), or load_dynamic (unknown) based on type information from fold.
  • Decomposed calls: Function calls are split into frame (create call frame) + setarg (set arguments) + invoke (execute call).
  • Intrinsic access: Intrinsic functions are loaded via access with an intrinsic marker rather than global lookup.
  • Intrinsic inlining: Type-check intrinsics (is_array, is_text, is_number, is_integer, is_logical, is_null, is_function, is_object, is_stone), length, and push are emitted as direct opcodes instead of frame/setarg/invoke call sequences.

See Mcode IR for instruction format details.

Streamline (streamline.cm)

Optimizes the Mcode IR through a series of independent passes. Operates per-function:

  1. Backward type inference: Infers parameter types from how they are used in typed operators. Immutable def parameters keep their inferred type across label join points.
  2. Type-check elimination: When a slot's type is known, eliminates is_<type> + conditional jump pairs. Narrows load_dynamic/store_dynamic to typed variants.
  3. Algebraic simplification: Rewrites identity operations (add 0, multiply 1, divide 1) and folds same-slot comparisons.
  4. Boolean simplification: Fuses not + conditional jump into a single jump with inverted condition.
  5. Move elimination: Removes self-moves (move a, a).
  6. Dead jump elimination: Removes jumps to the immediately following label.

See Streamline Optimizer for detailed pass descriptions.

QBE Emit (qbe_emit.cm)

Lowers optimized Mcode IR to QBE intermediate language for native code compilation. Each Mcode function becomes a QBE function that calls into the cell runtime (cell_rt_* functions) for operations that require the runtime (allocation, intrinsic dispatch, etc.).

String constants are interned in a data section. Integer constants are NaN-boxed inline.

QBE Macros (qbe.cm)

Provides operation implementations as QBE IL templates. Each arithmetic, comparison, and type operation is defined as a function that emits the corresponding QBE instructions, handling type dispatch (integer, float, text paths) with proper guard checks.

Execution Backends

Mach VM (default)

Binary 32-bit register VM. The Mach serializer (mach.c) converts streamlined mcode JSON into compact 32-bit bytecode with a constant pool. Used for production execution and bootstrapping.

./cell script.ce

Debug the mach bytecode output:

./cell --core . --dump-mach script.ce

Mcode Interpreter

JSON-based interpreter. Used for debugging the compilation pipeline.

./cell --mcode script.ce

QBE Native (experimental)

Generates QBE IL that can be compiled to native code.

./cell --emit-qbe script.ce > output.ssa

Files

File Role
tokenize.cm Lexer
parse.cm Parser + semantic analysis
fold.cm Constant folding + type analysis
mcode.cm AST → Mcode IR lowering
streamline.cm Mcode IR optimizer
qbe_emit.cm Mcode IR → QBE IL emitter
qbe.cm QBE IL operation templates
internal/bootstrap.cm Pipeline orchestrator

Debug Tools

File Purpose
dump_mcode.cm Print raw Mcode IR before streamlining
dump_stream.cm Print IR after streamlining with before/after stats
dump_types.cm Print streamlined IR with type annotations

Test Files

File Tests
parse_test.ce Type tags, access_kind, intrinsic resolution
fold_test.ce Type propagation, purity, intrinsic hints
mcode_test.ce Typed load/store, decomposed calls
streamline_test.ce Optimization counts, IR before/after
qbe_test.ce End-to-end QBE IL generation
test_intrinsics.cm Inlined intrinsic opcodes (is_array, length, push, etc.)
test_backward.cm Backward type propagation for parameters