5.5 KiB
title, description
| title | description |
|---|---|
| Compilation Pipeline | Overview of the compilation stages and optimizations |
Overview
The compilation pipeline transforms source code through several stages, each adding information or lowering the representation toward execution. There are three execution backends: the Mach register VM (default), the Mcode interpreter (debug), and native code via QBE (experimental).
Source → Tokenize → Parse → Fold → Mach VM (default)
→ Mcode → Streamline → Mcode Interpreter
→ QBE → Native
Stages
Tokenize (tokenize.cm)
Splits source text into tokens. Handles string interpolation by re-tokenizing template literal contents. Produces a token array with position information (line, column).
Parse (parse.cm)
Converts tokens into an AST. Also performs semantic analysis:
- Scope records: For each scope (global, function), builds a record mapping variable names to their metadata:
make(var/def/function/input),function_nr,nr_uses,closureflag, andlevel. - Type tags: When the right-hand side of a
defis a syntactically obvious type, stampstype_tagon the scope record entry. Derivable types:"integer","number","text","array","record","function","logical","null". - Intrinsic resolution: Names used but not locally bound are recorded in
ast.intrinsics. Name nodes referencing intrinsics getintrinsic: true. - Access kind: Subscript (
[) nodes getaccess_kind:"index"for numeric subscripts,"field"for string subscripts, omitted otherwise. - Tail position: Return statements where the expression is a call get
tail: true.
Fold (fold.cm)
Operates on the AST. Performs constant folding and type analysis:
- Constant folding: Evaluates arithmetic on known constants at compile time (e.g.,
5 + 10becomes15). - Constant propagation: Tracks
defbindings whose values are known constants. - Type propagation: Extends
type_tagthrough operations. When both operands of an arithmetic op have known types, the result type is known. Propagates type tags to reference sites. - Intrinsic specialization: When an intrinsic call's argument types are known, stamps a
hinton the call node. For example,length(x)where x is a known array getshint: "array_length". Type checks likeis_array(known_array)are folded totrue. - Purity marking: Stamps
pure: trueon expressions with no side effects (literals, name references, arithmetic on pure operands). - Dead code elimination: Removes unreachable branches when conditions are known constants.
Mcode (mcode.cm)
Lowers the AST to a JSON-based intermediate representation with explicit operations. Key design principle: every type check is an explicit instruction so downstream optimizers can see and eliminate them.
- Typed load/store: Emits
load_index(array by integer),load_field(record by string), orload_dynamic(unknown) based on type information from fold. - Decomposed calls: Function calls are split into
frame(create call frame) +setarg(set arguments) +invoke(execute call). - Intrinsic access: Intrinsic functions are loaded via
accesswith an intrinsic marker rather than global lookup.
See Mcode IR for instruction format details.
Streamline (streamline.cm)
Optimizes the Mcode IR. Operates per-function:
- Redundant instruction elimination: Removes no-op patterns and redundant moves.
- Dead code removal: Eliminates instructions whose results are never used.
- Type-based narrowing: When type information is available, narrows
load_dynamic/store_dynamicto typed variants.
QBE Emit (qbe_emit.cm)
Lowers optimized Mcode IR to QBE intermediate language for native code compilation. Each Mcode function becomes a QBE function that calls into the cell runtime (cell_rt_* functions) for operations that require the runtime (allocation, intrinsic dispatch, etc.).
String constants are interned in a data section. Integer constants are NaN-boxed inline.
QBE Macros (qbe.cm)
Provides operation implementations as QBE IL templates. Each arithmetic, comparison, and type operation is defined as a function that emits the corresponding QBE instructions, handling type dispatch (integer, float, text paths) with proper guard checks.
Execution Backends
Mach VM (default)
Binary 32-bit register VM. Used for production execution and bootstrapping.
./cell script.ce
Mcode Interpreter
JSON-based interpreter. Used for debugging the compilation pipeline.
./cell --mcode script.ce
QBE Native (experimental)
Generates QBE IL that can be compiled to native code.
./cell --emit-qbe script.ce > output.ssa
Files
| File | Role |
|---|---|
tokenize.cm |
Lexer |
parse.cm |
Parser + semantic analysis |
fold.cm |
Constant folding + type analysis |
mcode.cm |
AST → Mcode IR lowering |
streamline.cm |
Mcode IR optimizer |
qbe_emit.cm |
Mcode IR → QBE IL emitter |
qbe.cm |
QBE IL operation templates |
internal/bootstrap.cm |
Pipeline orchestrator |
Test Files
| File | Tests |
|---|---|
parse_test.ce |
Type tags, access_kind, intrinsic resolution |
fold_test.ce |
Type propagation, purity, intrinsic hints |
mcode_test.ce |
Typed load/store, decomposed calls |
streamline_test.ce |
Optimization counts, IR before/after |
qbe_test.ce |
End-to-end QBE IL generation |