6.9 KiB
title, description
| title | description |
|---|---|
| Compilation Pipeline | Overview of the compilation stages and optimizations |
Overview
The compilation pipeline transforms source code through several stages, each adding information or lowering the representation toward execution. All backends share the same path through mcode and streamline. There are three execution backends: the Mach register VM (default), the Mcode interpreter (debug), and native code via QBE (experimental).
Source → Tokenize → Parse → Fold → Mcode → Streamline → Mach VM (default)
→ Mcode Interpreter
→ QBE → Native
Stages
Tokenize (tokenize.cm)
Splits source text into tokens. Handles string interpolation by re-tokenizing template literal contents. Produces a token array with position information (line, column).
Parse (parse.cm)
Converts tokens into an AST. Also performs semantic analysis:
- Scope records: For each scope (global, function), builds a record mapping variable names to their metadata:
make(var/def/function/input),function_nr,nr_uses,closureflag, andlevel. - Type tags: When the right-hand side of a
defis a syntactically obvious type, stampstype_tagon the scope record entry. Derivable types:"integer","number","text","array","record","function","logical","null". - Intrinsic resolution: Names used but not locally bound are recorded in
ast.intrinsics. Name nodes referencing intrinsics getintrinsic: true. - Access kind: Subscript (
[) nodes getaccess_kind:"index"for numeric subscripts,"field"for string subscripts, omitted otherwise. - Tail position: Return statements where the expression is a call get
tail: true.
Fold (fold.cm)
Operates on the AST. Performs constant folding and type analysis:
- Constant folding: Evaluates arithmetic on known constants at compile time (e.g.,
5 + 10becomes15). - Constant propagation: Tracks
defbindings whose values are known constants. - Type propagation: Extends
type_tagthrough operations. When both operands of an arithmetic op have known types, the result type is known. Propagates type tags to reference sites. - Intrinsic specialization: When an intrinsic call's argument types are known, stamps a
hinton the call node. For example,length(x)where x is a known array getshint: "array_length". Type checks likeis_array(known_array)are folded totrue. - Purity marking: Stamps
pure: trueon expressions with no side effects (literals, name references, arithmetic on pure operands). - Dead code elimination: Removes unreachable branches when conditions are known constants.
Mcode (mcode.cm)
Lowers the AST to a JSON-based intermediate representation with explicit operations. Key design principle: every type check is an explicit instruction so downstream optimizers can see and eliminate them.
- Typed load/store: Emits
load_index(array by integer),load_field(record by string), orload_dynamic(unknown) based on type information from fold. - Decomposed calls: Function calls are split into
frame(create call frame) +setarg(set arguments) +invoke(execute call). - Intrinsic access: Intrinsic functions are loaded via
accesswith an intrinsic marker rather than global lookup. - Intrinsic inlining: Type-check intrinsics (
is_array,is_text,is_number,is_integer,is_logical,is_null,is_function,is_object,is_stone),length, andpushare emitted as direct opcodes instead of frame/setarg/invoke call sequences.
See Mcode IR for instruction format details.
Streamline (streamline.cm)
Optimizes the Mcode IR through a series of independent passes. Operates per-function:
- Backward type inference: Infers parameter types from how they are used in typed operators. Immutable
defparameters keep their inferred type across label join points. - Type-check elimination: When a slot's type is known, eliminates
is_<type>+ conditional jump pairs. Narrowsload_dynamic/store_dynamicto typed variants. - Algebraic simplification: Rewrites identity operations (add 0, multiply 1, divide 1) and folds same-slot comparisons.
- Boolean simplification: Fuses
not+ conditional jump into a single jump with inverted condition. - Move elimination: Removes self-moves (
move a, a). - Dead jump elimination: Removes jumps to the immediately following label.
See Streamline Optimizer for detailed pass descriptions.
QBE Emit (qbe_emit.cm)
Lowers optimized Mcode IR to QBE intermediate language for native code compilation. Each Mcode function becomes a QBE function that calls into the cell runtime (cell_rt_* functions) for operations that require the runtime (allocation, intrinsic dispatch, etc.).
String constants are interned in a data section. Integer constants are NaN-boxed inline.
QBE Macros (qbe.cm)
Provides operation implementations as QBE IL templates. Each arithmetic, comparison, and type operation is defined as a function that emits the corresponding QBE instructions, handling type dispatch (integer, float, text paths) with proper guard checks.
Execution Backends
Mach VM (default)
Binary 32-bit register VM. The Mach serializer (mach.c) converts streamlined mcode JSON into compact 32-bit bytecode with a constant pool. Used for production execution and bootstrapping.
./cell script.ce
Debug the mach bytecode output:
./cell --core . --dump-mach script.ce
Mcode Interpreter
JSON-based interpreter. Used for debugging the compilation pipeline.
./cell --mcode script.ce
QBE Native (experimental)
Generates QBE IL that can be compiled to native code.
./cell --emit-qbe script.ce > output.ssa
Files
| File | Role |
|---|---|
tokenize.cm |
Lexer |
parse.cm |
Parser + semantic analysis |
fold.cm |
Constant folding + type analysis |
mcode.cm |
AST → Mcode IR lowering |
streamline.cm |
Mcode IR optimizer |
qbe_emit.cm |
Mcode IR → QBE IL emitter |
qbe.cm |
QBE IL operation templates |
internal/bootstrap.cm |
Pipeline orchestrator |
Debug Tools
| File | Purpose |
|---|---|
dump_mcode.cm |
Print raw Mcode IR before streamlining |
dump_stream.cm |
Print IR after streamlining with before/after stats |
dump_types.cm |
Print streamlined IR with type annotations |
Test Files
| File | Tests |
|---|---|
parse_test.ce |
Type tags, access_kind, intrinsic resolution |
fold_test.ce |
Type propagation, purity, intrinsic hints |
mcode_test.ce |
Typed load/store, decomposed calls |
streamline_test.ce |
Optimization counts, IR before/after |
qbe_test.ce |
End-to-end QBE IL generation |
test_intrinsics.cm |
Inlined intrinsic opcodes (is_array, length, push, etc.) |
test_backward.cm |
Backward type propagation for parameters |