135 lines
6.7 KiB
Markdown
135 lines
6.7 KiB
Markdown
---
|
|
title: "Compilation Pipeline"
|
|
description: "Overview of the compilation stages and optimizations"
|
|
---
|
|
|
|
## Overview
|
|
|
|
The compilation pipeline transforms source code through several stages, each adding information or lowering the representation toward execution. There are three execution backends: the Mach register VM (default), the Mcode interpreter (debug), and native code via QBE (experimental).
|
|
|
|
```
|
|
Source → Tokenize → Parse → Fold → Mach VM (default)
|
|
→ Mcode → Streamline → Mcode Interpreter
|
|
→ QBE → Native
|
|
```
|
|
|
|
## Stages
|
|
|
|
### Tokenize (`tokenize.cm`)
|
|
|
|
Splits source text into tokens. Handles string interpolation by re-tokenizing template literal contents. Produces a token array with position information (line, column).
|
|
|
|
### Parse (`parse.cm`)
|
|
|
|
Converts tokens into an AST. Also performs semantic analysis:
|
|
|
|
- **Scope records**: For each scope (global, function), builds a record mapping variable names to their metadata: `make` (var/def/function/input), `function_nr`, `nr_uses`, `closure` flag, and `level`.
|
|
- **Type tags**: When the right-hand side of a `def` is a syntactically obvious type, stamps `type_tag` on the scope record entry. Derivable types: `"integer"`, `"number"`, `"text"`, `"array"`, `"record"`, `"function"`, `"logical"`, `"null"`.
|
|
- **Intrinsic resolution**: Names used but not locally bound are recorded in `ast.intrinsics`. Name nodes referencing intrinsics get `intrinsic: true`.
|
|
- **Access kind**: Subscript (`[`) nodes get `access_kind`: `"index"` for numeric subscripts, `"field"` for string subscripts, omitted otherwise.
|
|
- **Tail position**: Return statements where the expression is a call get `tail: true`.
|
|
|
|
### Fold (`fold.cm`)
|
|
|
|
Operates on the AST. Performs constant folding and type analysis:
|
|
|
|
- **Constant folding**: Evaluates arithmetic on known constants at compile time (e.g., `5 + 10` becomes `15`).
|
|
- **Constant propagation**: Tracks `def` bindings whose values are known constants.
|
|
- **Type propagation**: Extends `type_tag` through operations. When both operands of an arithmetic op have known types, the result type is known. Propagates type tags to reference sites.
|
|
- **Intrinsic specialization**: When an intrinsic call's argument types are known, stamps a `hint` on the call node. For example, `length(x)` where x is a known array gets `hint: "array_length"`. Type checks like `is_array(known_array)` are folded to `true`.
|
|
- **Purity marking**: Stamps `pure: true` on expressions with no side effects (literals, name references, arithmetic on pure operands).
|
|
- **Dead code elimination**: Removes unreachable branches when conditions are known constants.
|
|
|
|
### Mcode (`mcode.cm`)
|
|
|
|
Lowers the AST to a JSON-based intermediate representation with explicit operations. Key design principle: **every type check is an explicit instruction** so downstream optimizers can see and eliminate them.
|
|
|
|
- **Typed load/store**: Emits `load_index` (array by integer), `load_field` (record by string), or `load_dynamic` (unknown) based on type information from fold.
|
|
- **Decomposed calls**: Function calls are split into `frame` (create call frame) + `setarg` (set arguments) + `invoke` (execute call).
|
|
- **Intrinsic access**: Intrinsic functions are loaded via `access` with an intrinsic marker rather than global lookup.
|
|
- **Intrinsic inlining**: Type-check intrinsics (`is_array`, `is_text`, `is_number`, `is_integer`, `is_logical`, `is_null`, `is_function`, `is_object`, `is_stone`), `length`, and `push` are emitted as direct opcodes instead of frame/setarg/invoke call sequences.
|
|
|
|
See [Mcode IR](mcode.md) for instruction format details.
|
|
|
|
### Streamline (`streamline.cm`)
|
|
|
|
Optimizes the Mcode IR through a series of independent passes. Operates per-function:
|
|
|
|
1. **Backward type inference**: Infers parameter types from how they are used in typed operators. Immutable `def` parameters keep their inferred type across label join points.
|
|
2. **Type-check elimination**: When a slot's type is known, eliminates `is_<type>` + conditional jump pairs. Narrows `load_dynamic`/`store_dynamic` to typed variants.
|
|
3. **Algebraic simplification**: Rewrites identity operations (add 0, multiply 1, divide 1) and folds same-slot comparisons.
|
|
4. **Boolean simplification**: Fuses `not` + conditional jump into a single jump with inverted condition.
|
|
5. **Move elimination**: Removes self-moves (`move a, a`).
|
|
6. **Dead jump elimination**: Removes jumps to the immediately following label.
|
|
|
|
See [Streamline Optimizer](streamline.md) for detailed pass descriptions.
|
|
|
|
### QBE Emit (`qbe_emit.cm`)
|
|
|
|
Lowers optimized Mcode IR to QBE intermediate language for native code compilation. Each Mcode function becomes a QBE function that calls into the cell runtime (`cell_rt_*` functions) for operations that require the runtime (allocation, intrinsic dispatch, etc.).
|
|
|
|
String constants are interned in a data section. Integer constants are NaN-boxed inline.
|
|
|
|
### QBE Macros (`qbe.cm`)
|
|
|
|
Provides operation implementations as QBE IL templates. Each arithmetic, comparison, and type operation is defined as a function that emits the corresponding QBE instructions, handling type dispatch (integer, float, text paths) with proper guard checks.
|
|
|
|
## Execution Backends
|
|
|
|
### Mach VM (default)
|
|
|
|
Binary 32-bit register VM. Used for production execution and bootstrapping.
|
|
|
|
```
|
|
./cell script.ce
|
|
```
|
|
|
|
### Mcode Interpreter
|
|
|
|
JSON-based interpreter. Used for debugging the compilation pipeline.
|
|
|
|
```
|
|
./cell --mcode script.ce
|
|
```
|
|
|
|
### QBE Native (experimental)
|
|
|
|
Generates QBE IL that can be compiled to native code.
|
|
|
|
```
|
|
./cell --emit-qbe script.ce > output.ssa
|
|
```
|
|
|
|
## Files
|
|
|
|
| File | Role |
|
|
|------|------|
|
|
| `tokenize.cm` | Lexer |
|
|
| `parse.cm` | Parser + semantic analysis |
|
|
| `fold.cm` | Constant folding + type analysis |
|
|
| `mcode.cm` | AST → Mcode IR lowering |
|
|
| `streamline.cm` | Mcode IR optimizer |
|
|
| `qbe_emit.cm` | Mcode IR → QBE IL emitter |
|
|
| `qbe.cm` | QBE IL operation templates |
|
|
| `internal/bootstrap.cm` | Pipeline orchestrator |
|
|
|
|
## Debug Tools
|
|
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `dump_mcode.cm` | Print raw Mcode IR before streamlining |
|
|
| `dump_stream.cm` | Print IR after streamlining with before/after stats |
|
|
| `dump_types.cm` | Print streamlined IR with type annotations |
|
|
|
|
## Test Files
|
|
|
|
| File | Tests |
|
|
|------|-------|
|
|
| `parse_test.ce` | Type tags, access_kind, intrinsic resolution |
|
|
| `fold_test.ce` | Type propagation, purity, intrinsic hints |
|
|
| `mcode_test.ce` | Typed load/store, decomposed calls |
|
|
| `streamline_test.ce` | Optimization counts, IR before/after |
|
|
| `qbe_test.ce` | End-to-end QBE IL generation |
|
|
| `test_intrinsics.cm` | Inlined intrinsic opcodes (is_array, length, push, etc.) |
|
|
| `test_backward.cm` | Backward type propagation for parameters |
|