diff --git a/docs/compiler-tools.md b/docs/compiler-tools.md index f6217e8b..e91eb605 100644 --- a/docs/compiler-tools.md +++ b/docs/compiler-tools.md @@ -15,65 +15,51 @@ The compiler runs in stages: source → tokenize → parse → fold → mcode → streamline → output ``` -Each stage has a corresponding dump tool that lets you see its output. +Each stage has a corresponding CLI tool that lets you see its output. -| Stage | Tool | What it shows | -|-------------|-------------------|----------------------------------------| -| fold | `dump_ast.cm` | Folded AST as JSON | -| mcode | `dump_mcode.cm` | Raw mcode IR before optimization | -| streamline | `dump_stream.cm` | Before/after instruction counts + IR | -| streamline | `dump_types.cm` | Optimized IR with type annotations | -| streamline | `streamline.ce` | Full optimized IR as JSON | -| all | `ir_report.ce` | Structured optimizer flight recorder | +| Stage | Tool | What it shows | +|-------------|---------------------------|----------------------------------------| +| tokenize | `tokenize.ce` | Token stream as JSON | +| parse | `parse.ce` | Unfolded AST as JSON | +| fold | `fold.ce` | Folded AST as JSON | +| mcode | `mcode.ce` | Raw mcode IR as JSON | +| mcode | `mcode.ce --pretty` | Human-readable mcode IR | +| streamline | `streamline.ce` | Full optimized IR as JSON | +| streamline | `streamline.ce --types` | Optimized IR with type annotations | +| streamline | `streamline.ce --stats` | Per-function summary stats | +| streamline | `streamline.ce --ir` | Human-readable canonical IR | +| all | `ir_report.ce` | Structured optimizer flight recorder | All tools take a source file as input and run the pipeline up to the relevant stage. ## Quick Start ```bash -# see raw mcode IR -./cell --core . dump_mcode.cm myfile.ce +# see raw mcode IR (pretty-printed) +cell mcode --pretty myfile.ce -# see what the optimizer changed -./cell --core . dump_stream.cm myfile.ce +# see optimized IR with type annotations +cell streamline --types myfile.ce # full optimizer report with events -./cell --core . ir_report.ce --full myfile.ce +cell ir_report --full myfile.ce ``` -## dump_ast.cm +## fold.ce Prints the folded AST as JSON. This is the output of the parser and constant folder, before mcode generation. ```bash -./cell --core . dump_ast.cm +cell fold ``` -## dump_mcode.cm +## mcode.ce -Prints the raw mcode IR before any optimization. Shows the instruction array as formatted text with opcode, operands, and program counter. +Prints mcode IR. Default output is JSON; use `--pretty` for human-readable format with opcodes, operands, and program counter. ```bash -./cell --core . dump_mcode.cm -``` - -## dump_stream.cm - -Shows a before/after comparison of the optimizer. For each function, prints: -- Instruction count before and after -- Number of eliminated instructions -- The streamlined IR (nops hidden by default) - -```bash -./cell --core . dump_stream.cm -``` - -## dump_types.cm - -Shows the optimized IR with type annotations. Each instruction is followed by the known types of its slot operands, inferred by walking the instruction stream. - -```bash -./cell --core . dump_types.cm +cell mcode # JSON (default) +cell mcode --pretty # human-readable IR ``` ## streamline.ce @@ -81,10 +67,11 @@ Shows the optimized IR with type annotations. Each instruction is followed by th Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs the optimized IR as JSON. Useful for piping to `jq` or saving for comparison. ```bash -./cell --core . streamline.ce # full JSON (default) -./cell --core . streamline.ce --stats # summary stats per function -./cell --core . streamline.ce --ir # human-readable IR -./cell --core . streamline.ce --check # warnings only +cell streamline # full JSON (default) +cell streamline --stats # summary stats per function +cell streamline --ir # human-readable IR +cell streamline --check # warnings only +cell streamline --types # IR with type annotations ``` | Flag | Description | @@ -93,6 +80,7 @@ Runs the full pipeline (tokenize, parse, fold, mcode, streamline) and outputs th | `--stats` | Per-function summary: args, slots, instruction counts by category, nops eliminated | | `--ir` | Human-readable canonical IR (same format as `ir_report.ce`) | | `--check` | Warnings only (e.g. `nr_slots > 200` approaching 255 limit) | +| `--types` | Optimized IR with inferred type annotations per slot | Flags can be combined. @@ -101,8 +89,8 @@ Flags can be combined. Regenerates the boot seed files in `boot/`. These are pre-compiled mcode IR (JSON) files that bootstrap the compilation pipeline on cold start. ```bash -./cell --core . seed.ce # regenerate all boot seeds -./cell --core . seed.ce --clean # also clear the build cache after +cell seed # regenerate all boot seeds +cell seed --clean # also clear the build cache after ``` The script compiles each pipeline module (tokenize, parse, fold, mcode, streamline) and `internal/bootstrap.cm` through the current pipeline, encodes the output as JSON, and writes it to `boot/.cm.mcode`. @@ -117,7 +105,7 @@ The script compiles each pipeline module (tokenize, parse, fold, mcode, streamli The optimizer flight recorder. Runs the full pipeline with structured logging and outputs machine-readable, diff-friendly JSON. This is the most detailed tool for understanding what the optimizer did and why. ```bash -./cell --core . ir_report.ce [options] +cell ir_report [options] ``` ### Options @@ -246,16 +234,16 @@ Properties: ```bash # what passes changed something? -./cell --core . ir_report.ce --summary myfile.ce | jq 'select(.changed)' +cell ir_report --summary myfile.ce | jq 'select(.changed)' # list all rewrite rules that fired -./cell --core . ir_report.ce --events myfile.ce | jq 'select(.type == "event") | .rule' +cell ir_report --events myfile.ce | jq 'select(.type == "event") | .rule' # diff IR before and after optimization -./cell --core . ir_report.ce --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text' +cell ir_report --ir-all myfile.ce | jq -r 'select(.type == "ir") | .text' # full report for analysis -./cell --core . ir_report.ce --full myfile.ce > report.json +cell ir_report --full myfile.ce > report.json ``` ## ir_stats.cm diff --git a/docs/spec/pipeline.md b/docs/spec/pipeline.md index 4cd2bfca..c4daa276 100644 --- a/docs/spec/pipeline.md +++ b/docs/spec/pipeline.md @@ -130,9 +130,9 @@ Seeds are used during cold start (empty cache) to compile the pipeline modules f | File | Purpose | |------|---------| -| `dump_mcode.cm` | Print raw Mcode IR before streamlining | -| `dump_stream.cm` | Print IR after streamlining with before/after stats | -| `dump_types.cm` | Print streamlined IR with type annotations | +| `mcode.ce --pretty` | Print raw Mcode IR before streamlining | +| `streamline.ce --types` | Print streamlined IR with type annotations | +| `streamline.ce --stats` | Print IR after streamlining with before/after stats | ## Test Files diff --git a/docs/spec/streamline.md b/docs/spec/streamline.md index debcd99e..52baf878 100644 --- a/docs/spec/streamline.md +++ b/docs/spec/streamline.md @@ -257,17 +257,17 @@ The `+` operator is excluded from target slot propagation when it would use the ## Debugging Tools -Three dump tools inspect the IR at different stages: +CLI tools inspect the IR at different stages: -- **`dump_mcode.cm`** — prints the raw Mcode IR after `mcode.cm`, before streamlining -- **`dump_stream.cm`** — prints the IR after streamlining, with before/after instruction counts -- **`dump_types.cm`** — prints the streamlined IR with type annotations on each instruction +- **`cell mcode --pretty`** — prints the raw Mcode IR after `mcode.cm`, before streamlining +- **`cell streamline --stats`** — prints the IR after streamlining, with before/after instruction counts +- **`cell streamline --types`** — prints the streamlined IR with type annotations on each instruction Usage: ``` -./cell --core . dump_mcode.cm -./cell --core . dump_stream.cm -./cell --core . dump_types.cm +cell mcode --pretty +cell streamline --stats +cell streamline --types ``` ## Tail Call Marking diff --git a/dump_ast.cm b/dump_ast.cm deleted file mode 100644 index 3075132c..00000000 --- a/dump_ast.cm +++ /dev/null @@ -1,16 +0,0 @@ -// dump_ast.cm — pretty-print the folded AST as JSON -// -// Usage: ./cell --core . dump_ast.cm - -var fd = use("fd") -var json = use("json") -var tokenize = use("tokenize") -var parse = use("parse") -var fold = use("fold") - -var filename = args[0] -var src = text(fd.slurp(filename)) -var tok = tokenize(src, filename) -var ast = parse(tok.tokens, src, filename, tokenize) -var folded = fold(ast) -print(json.encode(folded)) diff --git a/dump_mcode.cm b/dump_mcode.cm deleted file mode 100644 index 52395280..00000000 --- a/dump_mcode.cm +++ /dev/null @@ -1,117 +0,0 @@ -// dump_mcode.cm — pretty-print mcode IR (before streamlining) -// -// Usage: ./cell --core . dump_mcode.cm - -var fd = use("fd") -var json = use("json") -var tokenize = use("tokenize") -var parse = use("parse") -var fold = use("fold") -var mcode = use("mcode") - -if (length(args) < 1) { - print("usage: cell --core . dump_mcode.cm ") - return -} - -var filename = args[0] -var src = text(fd.slurp(filename)) -var tok = tokenize(src, filename) -var ast = parse(tok.tokens, src, filename, tokenize) -var folded = fold(ast) -var compiled = mcode(folded) - -var pad_right = function(s, w) { - var r = s - while (length(r) < w) { - r = r + " " - } - return r -} - -var fmt_val = function(v) { - if (is_null(v)) { - return "null" - } - if (is_number(v)) { - return text(v) - } - if (is_text(v)) { - return `"${v}"` - } - if (is_object(v)) { - return json.encode(v) - } - if (is_logical(v)) { - return v ? "true" : "false" - } - return text(v) -} - -var dump_function = function(func, name) { - var nr_args = func.nr_args != null ? func.nr_args : 0 - var nr_slots = func.nr_slots != null ? func.nr_slots : 0 - var nr_close = func.nr_close_slots != null ? func.nr_close_slots : 0 - var instrs = func.instructions - var i = 0 - var pc = 0 - var instr = null - var op = null - var n = 0 - var parts = null - var j = 0 - var operands = null - var pc_str = null - var op_str = null - print(`\n=== ${name} (args=${text(nr_args)}, slots=${text(nr_slots)}, closures=${text(nr_close)}) ===`) - if (instrs == null || length(instrs) == 0) { - print(" (empty)") - return null - } - while (i < length(instrs)) { - instr = instrs[i] - if (is_text(instr)) { - if (!starts_with(instr, "_nop_")) { - print(`${instr}:`) - } - } else if (is_array(instr)) { - op = instr[0] - n = length(instr) - parts = [] - j = 1 - while (j < n - 2) { - push(parts, fmt_val(instr[j])) - j = j + 1 - } - operands = text(parts, ", ") - pc_str = pad_right(text(pc), 5) - op_str = pad_right(op, 14) - print(` ${pc_str} ${op_str} ${operands}`) - pc = pc + 1 - } - i = i + 1 - } - return null -} - -var main_name = null -var fi = 0 -var func = null -var fname = null - -// Dump main -if (compiled.main != null) { - main_name = compiled.name != null ? compiled.name : "
" - dump_function(compiled.main, main_name) -} - -// Dump sub-functions -if (compiled.functions != null) { - fi = 0 - while (fi < length(compiled.functions)) { - func = compiled.functions[fi] - fname = func.name != null ? func.name : `` - dump_function(func, `[${text(fi)}] ${fname}`) - fi = fi + 1 - } -} diff --git a/dump_types.cm b/dump_types.cm deleted file mode 100644 index d359f966..00000000 --- a/dump_types.cm +++ /dev/null @@ -1,237 +0,0 @@ -// dump_types.cm — show streamlined IR with type annotations -// -// Usage: ./cell --core . dump_types.cm - -var fd = use("fd") -var json = use("json") -var tokenize = use("tokenize") -var parse = use("parse") -var fold = use("fold") -var mcode = use("mcode") -var streamline = use("streamline") - -if (length(args) < 1) { - print("usage: cell --core . dump_types.cm ") - return -} - -var filename = args[0] -var src = text(fd.slurp(filename)) -var tok = tokenize(src, filename) -var ast = parse(tok.tokens, src, filename, tokenize) -var folded = fold(ast) -var compiled = mcode(folded) -var optimized = streamline(compiled) - -// Type constants -def T_UNKNOWN = "unknown" -def T_INT = "int" -def T_FLOAT = "float" -def T_NUM = "num" -def T_TEXT = "text" -def T_BOOL = "bool" -def T_NULL = "null" -def T_ARRAY = "array" -def T_RECORD = "record" -def T_FUNCTION = "function" - -def int_result_ops = { - bitnot: true, bitand: true, bitor: true, - bitxor: true, shl: true, shr: true, ushr: true -} -def bool_result_ops = { - eq_int: true, ne_int: true, lt_int: true, gt_int: true, - le_int: true, ge_int: true, - eq_float: true, ne_float: true, lt_float: true, gt_float: true, - le_float: true, ge_float: true, - eq_text: true, ne_text: true, lt_text: true, gt_text: true, - le_text: true, ge_text: true, - eq_bool: true, ne_bool: true, - not: true, and: true, or: true, - is_int: true, is_text: true, is_num: true, - is_bool: true, is_null: true, is_identical: true, - is_array: true, is_func: true, is_record: true, is_stone: true -} - -var access_value_type = function(val) { - if (is_number(val)) { - return is_integer(val) ? T_INT : T_FLOAT - } - if (is_text(val)) { - return T_TEXT - } - return T_UNKNOWN -} - -var track_types = function(slot_types, instr) { - var op = instr[0] - var src_type = null - if (op == "access") { - slot_types[text(instr[1])] = access_value_type(instr[2]) - } else if (op == "int") { - slot_types[text(instr[1])] = T_INT - } else if (op == "true" || op == "false") { - slot_types[text(instr[1])] = T_BOOL - } else if (op == "null") { - slot_types[text(instr[1])] = T_NULL - } else if (op == "move") { - src_type = slot_types[text(instr[2])] - slot_types[text(instr[1])] = src_type != null ? src_type : T_UNKNOWN - } else if (int_result_ops[op] == true) { - slot_types[text(instr[1])] = T_INT - } else if (op == "concat") { - slot_types[text(instr[1])] = T_TEXT - } else if (bool_result_ops[op] == true) { - slot_types[text(instr[1])] = T_BOOL - } else if (op == "typeof") { - slot_types[text(instr[1])] = T_TEXT - } else if (op == "array") { - slot_types[text(instr[1])] = T_ARRAY - } else if (op == "record") { - slot_types[text(instr[1])] = T_RECORD - } else if (op == "function") { - slot_types[text(instr[1])] = T_FUNCTION - } else if (op == "invoke" || op == "tail_invoke") { - slot_types[text(instr[2])] = T_UNKNOWN - } else if (op == "load_field" || op == "load_index" || op == "load_dynamic") { - slot_types[text(instr[1])] = T_UNKNOWN - } else if (op == "pop" || op == "get") { - slot_types[text(instr[1])] = T_UNKNOWN - } else if (op == "length") { - slot_types[text(instr[1])] = T_INT - } else if (op == "add" || op == "subtract" || op == "multiply" || - op == "divide" || op == "modulo" || op == "pow" || op == "negate") { - slot_types[text(instr[1])] = T_UNKNOWN - } - return null -} - -var pad_right = function(s, w) { - var r = s - while (length(r) < w) { - r = r + " " - } - return r -} - -var fmt_val = function(v) { - if (is_null(v)) { - return "null" - } - if (is_number(v)) { - return text(v) - } - if (is_text(v)) { - return `"${v}"` - } - if (is_object(v)) { - return json.encode(v) - } - if (is_logical(v)) { - return v ? "true" : "false" - } - return text(v) -} - -// Build type annotation string for an instruction -var type_annotation = function(slot_types, instr) { - var n = length(instr) - var parts = [] - var j = 1 - var v = null - var t = null - while (j < n - 2) { - v = instr[j] - if (is_number(v)) { - t = slot_types[text(v)] - if (t != null && t != T_UNKNOWN) { - push(parts, `s${text(v)}:${t}`) - } - } - j = j + 1 - } - if (length(parts) == 0) { - return "" - } - return text(parts, " ") -} - -var dump_function_typed = function(func, name) { - var nr_args = func.nr_args != null ? func.nr_args : 0 - var nr_slots = func.nr_slots != null ? func.nr_slots : 0 - var instrs = func.instructions - var slot_types = {} - var i = 0 - var pc = 0 - var instr = null - var op = null - var n = 0 - var annotation = null - var operand_parts = null - var j = 0 - var operands = null - var pc_str = null - var op_str = null - var line = null - print(`\n=== ${name} (args=${text(nr_args)}, slots=${text(nr_slots)}) ===`) - if (instrs == null || length(instrs) == 0) { - print(" (empty)") - return null - } - while (i < length(instrs)) { - instr = instrs[i] - if (is_text(instr)) { - if (starts_with(instr, "_nop_")) { - i = i + 1 - continue - } - slot_types = {} - print(`${instr}:`) - } else if (is_array(instr)) { - op = instr[0] - n = length(instr) - annotation = type_annotation(slot_types, instr) - operand_parts = [] - j = 1 - while (j < n - 2) { - push(operand_parts, fmt_val(instr[j])) - j = j + 1 - } - operands = text(operand_parts, ", ") - pc_str = pad_right(text(pc), 5) - op_str = pad_right(op, 14) - line = pad_right(` ${pc_str} ${op_str} ${operands}`, 50) - if (length(annotation) > 0) { - print(`${line} ; ${annotation}`) - } else { - print(line) - } - track_types(slot_types, instr) - pc = pc + 1 - } - i = i + 1 - } - return null -} - -var main_name = null -var fi = 0 -var func = null -var fname = null - -// Dump main -if (optimized.main != null) { - main_name = optimized.name != null ? optimized.name : "
" - dump_function_typed(optimized.main, main_name) -} - -// Dump sub-functions -if (optimized.functions != null) { - fi = 0 - while (fi < length(optimized.functions)) { - func = optimized.functions[fi] - fname = func.name != null ? func.name : `` - dump_function_typed(func, `[${text(fi)}] ${fname}`) - fi = fi + 1 - } -} diff --git a/explain.ce b/explain.ce index d066e249..b794b71f 100644 --- a/explain.ce +++ b/explain.ce @@ -8,36 +8,9 @@ var fd = use('fd') var json = use('json') -var tokenize_mod = use('tokenize') -var parse_mod = use('parse') -var fold_mod = use('fold') -var index_mod = use('index') var explain_mod = use('explain') var shop = use('internal/shop') -// Resolve import paths on an index in-place. -var resolve_imports = function(idx_obj, fname) { - var fi = shop.file_info(fd.realpath(fname)) - var ctx = fi.package - var ri = 0 - var rp = null - var lp = null - while (ri < length(idx_obj.imports)) { - rp = shop.resolve_use_path(idx_obj.imports[ri].module_path, ctx) - // Fallback: check sibling files in the same directory. - if (rp == null) { - lp = fd.dirname(fd.realpath(fname)) + '/' + idx_obj.imports[ri].module_path + '.cm' - if (fd.is_file(lp)) { - rp = lp - } - } - if (rp != null) { - idx_obj.imports[ri].resolved_path = rp - } - ri = ri + 1 - } -} - var mode = null var span_arg = null var symbol_name = null @@ -47,12 +20,10 @@ var parts = null var filename = null var line = null var col = null -var src = null var idx = null var indexes = [] var explain = null var result = null -var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod} for (i = 0; i < length(args); i++) { if (args[i] == '--span') { @@ -108,9 +79,7 @@ if (mode == "span") { $stop() } - src = text(fd.slurp(filename)) - idx = index_mod.index_file(src, filename, pipeline) - resolve_imports(idx, filename) + idx = shop.index_file(filename) explain = explain_mod.make(idx) result = explain.at_span(line, col) @@ -139,11 +108,8 @@ if (mode == "symbol") { } if (length(files) == 1) { - // Single file: use by_symbol for a focused result. filename = files[0] - src = text(fd.slurp(filename)) - idx = index_mod.index_file(src, filename, pipeline) - resolve_imports(idx, filename) + idx = shop.index_file(filename) explain = explain_mod.make(idx) result = explain.by_symbol(symbol_name) @@ -154,13 +120,10 @@ if (mode == "symbol") { print("\n") } } else if (length(files) > 1) { - // Multiple files: index each and search across all. indexes = [] i = 0 while (i < length(files)) { - src = text(fd.slurp(files[i])) - idx = index_mod.index_file(src, files[i], pipeline) - resolve_imports(idx, files[i]) + idx = shop.index_file(files[i]) indexes[] = idx i = i + 1 } diff --git a/fold.ce b/fold.ce index 0881f8b8..5839883c 100644 --- a/fold.ce +++ b/fold.ce @@ -1,13 +1,5 @@ -var fd = use("fd") var json = use("json") - +var shop = use("internal/shop") var filename = args[0] -var src = text(fd.slurp(filename)) -var tokenize = use("tokenize") -var parse = use("parse") -var fold = use("fold") - -var tok_result = tokenize(src, filename) -var ast = parse(tok_result.tokens, src, filename, tokenize) -var folded = fold(ast) +var folded = shop.analyze_file(filename) print(json.encode(folded)) diff --git a/index.ce b/index.ce index ab5359a0..4b3a8fb9 100644 --- a/index.ce +++ b/index.ce @@ -7,19 +7,11 @@ var fd = use('fd') var json = use('json') -var tokenize_mod = use('tokenize') -var parse_mod = use('parse') -var fold_mod = use('fold') -var index_mod = use('index') var shop = use('internal/shop') var filename = null var output_path = null var i = 0 -var file_info = null -var pkg_ctx = null -var resolved = null -var local_path = null for (i = 0; i < length(args); i++) { if (args[i] == '-o' || args[i] == '--output') { @@ -53,29 +45,7 @@ if (!fd.is_file(filename)) { $stop() } -var src = text(fd.slurp(filename)) -var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod} -var idx = index_mod.index_file(src, filename, pipeline) - -// Resolve import paths to filesystem locations. -file_info = shop.file_info(fd.realpath(filename)) -pkg_ctx = file_info.package -i = 0 -while (i < length(idx.imports)) { - resolved = shop.resolve_use_path(idx.imports[i].module_path, pkg_ctx) - // Fallback: check sibling files in the same directory. - if (resolved == null) { - local_path = fd.dirname(fd.realpath(filename)) + '/' + idx.imports[i].module_path + '.cm' - if (fd.is_file(local_path)) { - resolved = local_path - } - } - if (resolved != null) { - idx.imports[i].resolved_path = resolved - } - i = i + 1 -} - +var idx = shop.index_file(filename) var out = json.encode(idx, true) if (output_path != null) { diff --git a/index.cm b/index.cm index d76b8b43..4a5ed9f9 100644 --- a/index.cm +++ b/index.cm @@ -22,6 +22,7 @@ var index_ast = function(ast, tokens, filename) { var references = [] var call_sites = [] var exports_list = [] + var intrinsic_refs = [] var node_counter = 0 var fn_map = {} var _i = 0 @@ -147,6 +148,29 @@ var index_ast = function(ast, tokens, filename) { nid = next_id() + // this keyword + if (node.kind == "this") { + references[] = { + node_id: nid, + name: "this", + symbol_id: null, + span: make_span(node), + enclosing: enclosing, + ref_kind: "read" + } + return + } + + // Capture intrinsic refs with positions (intrinsics lack function_nr). + if (node.kind == "name" && node.name != null && node.intrinsic == true) { + intrinsic_refs[] = { + node_id: nid, + name: node.name, + span: make_span(node), + enclosing: enclosing + } + } + // Name reference — has function_nr when it's a true variable reference. if (node.kind == "name" && node.name != null && node.function_nr != null) { if (node.intrinsic != true) { @@ -208,6 +232,17 @@ var index_ast = function(ast, tokens, filename) { } } + // Capture intrinsic callee refs (e.g., print, length). + if (node.expression != null && node.expression.kind == "name" && + node.expression.intrinsic == true && node.expression.name != null) { + intrinsic_refs[] = { + node_id: nid, + name: node.expression.name, + span: make_span(node.expression), + enclosing: enclosing + } + } + // Walk callee expression (skip name — already recorded above). if (node.expression != null && node.expression.kind != "name") { walk_expr(node.expression, enclosing, false) @@ -596,6 +631,7 @@ var index_ast = function(ast, tokens, filename) { imports: imports, symbols: symbols, references: references, + intrinsic_refs: intrinsic_refs, call_sites: call_sites, exports: exports_list, reverse_refs: reverse diff --git a/internal/shop.cm b/internal/shop.cm index 66aa02f1..f78e46e4 100644 --- a/internal/shop.cm +++ b/internal/shop.cm @@ -510,6 +510,139 @@ function inject_env(inject) { return env } +// --- Pipeline API --- +// Lazy-loaded pipeline modules from use_cache (no re-entrancy risk). +var _tokenize_mod = null +var _parse_mod = null +var _fold_mod = null +var _index_mod = null + +var _token_cache = {} +var _ast_cache = {} +var _analyze_cache = {} +var _index_cache = {} + +var get_tokenize = function() { + if (!_tokenize_mod) _tokenize_mod = use_cache['core/tokenize'] || use_cache['tokenize'] + return _tokenize_mod +} +var get_parse = function() { + if (!_parse_mod) _parse_mod = use_cache['core/parse'] || use_cache['parse'] + return _parse_mod +} +var get_fold = function() { + if (!_fold_mod) _fold_mod = use_cache['core/fold'] || use_cache['fold'] + return _fold_mod +} +var get_index = function() { + if (!_index_mod) { + _index_mod = use_cache['core/index'] || use_cache['index'] + if (!_index_mod) _index_mod = Shop.use('index', 'core') + } + return _index_mod +} + +Shop.tokenize_file = function(path) { + var src = text(fd.slurp(path)) + var key = content_hash(stone(blob(src))) + if (_token_cache[key]) return _token_cache[key] + var result = get_tokenize()(src, path) + _token_cache[key] = result + return result +} + +Shop.parse_file = function(path) { + var src = text(fd.slurp(path)) + var key = content_hash(stone(blob(src))) + if (_ast_cache[key]) return _ast_cache[key] + var tok = Shop.tokenize_file(path) + var ast = get_parse()(tok.tokens, src, path, get_tokenize()) + _ast_cache[key] = ast + return ast +} + +Shop.analyze_file = function(path) { + var src = text(fd.slurp(path)) + var key = content_hash(stone(blob(src))) + if (_analyze_cache[key]) return _analyze_cache[key] + var ast = Shop.parse_file(path) + var folded = get_fold()(ast) + _analyze_cache[key] = folded + return folded +} + +// Resolve import paths on an index in-place. +Shop.resolve_imports = function(idx_obj, fname) { + var fi = Shop.file_info(fd.realpath(fname)) + var ctx = fi.package + var ri = 0 + var rp = null + var lp = null + while (ri < length(idx_obj.imports)) { + rp = Shop.resolve_use_path(idx_obj.imports[ri].module_path, ctx) + if (rp == null) { + lp = fd.dirname(fd.realpath(fname)) + '/' + idx_obj.imports[ri].module_path + '.cm' + if (fd.is_file(lp)) { + rp = lp + } + } + if (rp != null) { + idx_obj.imports[ri].resolved_path = rp + } + ri = ri + 1 + } +} + +Shop.index_file = function(path) { + var src = text(fd.slurp(path)) + var key = content_hash(stone(blob(src))) + if (_index_cache[key]) return _index_cache[key] + var tok = Shop.tokenize_file(path) + var pipeline = {tokenize: get_tokenize(), parse: get_parse(), fold: get_fold()} + var idx = get_index().index_file(src, path, pipeline) + Shop.resolve_imports(idx, path) + _index_cache[key] = idx + return idx +} + +Shop.pipeline = function() { + return { + tokenize: get_tokenize(), + parse: get_parse(), + fold: get_fold(), + mcode: use_cache['core/mcode'] || use_cache['mcode'], + streamline: use_cache['core/streamline'] || use_cache['streamline'] + } +} + +Shop.all_script_paths = function() { + var packages = Shop.list_packages() + var result = [] + var i = 0 + var j = 0 + var scripts = null + var pkg_dir = null + var has_core = false + for (i = 0; i < length(packages); i++) { + if (packages[i] == 'core') has_core = true + } + if (!has_core) { + packages = array(packages, ['core']) + } + for (i = 0; i < length(packages); i++) { + pkg_dir = starts_with(packages[i], '/') ? packages[i] : get_packages_dir() + '/' + safe_package_path(packages[i]) + scripts = get_package_scripts(packages[i]) + for (j = 0; j < length(scripts); j++) { + result[] = { + package: packages[i], + rel_path: scripts[j], + full_path: pkg_dir + '/' + scripts[j] + } + } + } + return result +} + // Lazy-loaded compiler modules for on-the-fly compilation var _mcode_mod = null var _streamline_mod = null diff --git a/ls.ce b/ls.ce index acbab34b..256a8bc0 100644 --- a/ls.ce +++ b/ls.ce @@ -1,35 +1,131 @@ -// list modules and actors in a package -// if args[0] is a package alias, list that one -// otherwise, list the local one +// list modules and actors in packages +// +// Usage: +// cell ls [] List modules and programs +// cell ls --all List across all packages +// cell ls --modules|-m [] Modules only +// cell ls --programs|-p [] Programs only +// cell ls --paths [] One absolute path per line var shop = use('internal/shop') var package = use('package') -var ctx = null -var pkg = args[0] || package.find_package_dir('.') -var modules = package.list_modules(pkg) -var programs = package.list_programs(pkg) +var show_all = false +var show_modules = true +var show_programs = true +var show_paths = false +var filter_modules = false +var filter_programs = false +var pkg_arg = null +var show_help = false var i = 0 -log.console("Modules in " + pkg + ":") -modules = sort(modules) -if (length(modules) == 0) { - log.console(" (none)") -} else { - for (i = 0; i < length(modules); i++) { - log.console(" " + modules[i]) +for (i = 0; i < length(args); i++) { + if (args[i] == '--all' || args[i] == '-a') { + show_all = true + } else if (args[i] == '--modules' || args[i] == '-m') { + filter_modules = true + } else if (args[i] == '--programs' || args[i] == '-p') { + filter_programs = true + } else if (args[i] == '--paths') { + show_paths = true + } else if (args[i] == '--help' || args[i] == '-h') { + show_help = true + } else if (!starts_with(args[i], '-')) { + pkg_arg = args[i] } } -log.console("") -log.console("Programs in " + pkg + ":") -programs = sort(programs) -if (length(programs) == 0) { - log.console(" (none)") -} else { - for (i = 0; i < length(programs); i++) { - log.console(" " + programs[i]) +if (filter_modules || filter_programs) { + show_modules = filter_modules + show_programs = filter_programs +} + +var list_one_package = function(pkg) { + var pkg_dir = null + var modules = null + var programs = null + var j = 0 + + if (starts_with(pkg, '/')) { + pkg_dir = pkg + } else { + pkg_dir = shop.get_package_dir(pkg) } + + if (show_modules) { + modules = sort(package.list_modules(pkg)) + if (show_paths) { + for (j = 0; j < length(modules); j++) { + log.console(pkg_dir + '/' + modules[j] + '.cm') + } + } else { + if (!filter_modules || show_all) { + log.console("Modules in " + pkg + ":") + } + if (length(modules) == 0) { + log.console(" (none)") + } else { + for (j = 0; j < length(modules); j++) { + log.console(" " + modules[j]) + } + } + } + } + + if (show_programs) { + programs = sort(package.list_programs(pkg)) + if (show_paths) { + for (j = 0; j < length(programs); j++) { + log.console(pkg_dir + '/' + programs[j] + '.ce') + } + } else { + if (!show_paths && show_modules && !filter_programs) { + log.console("") + } + if (!filter_programs || show_all) { + log.console("Programs in " + pkg + ":") + } + if (length(programs) == 0) { + log.console(" (none)") + } else { + for (j = 0; j < length(programs); j++) { + log.console(" " + programs[j]) + } + } + } + } +} + +var packages = null +var pkg = null + +if (show_help) { + log.console("Usage: cell ls [options] []") + log.console("") + log.console("Options:") + log.console(" --all, -a List across all installed packages") + log.console(" --modules, -m Show modules only") + log.console(" --programs, -p Show programs only") + log.console(" --paths Output one absolute path per line") +} else if (show_all) { + packages = shop.list_packages() + if (find(packages, function(p) { return p == 'core' }) == null) { + packages[] = 'core' + } + packages = sort(packages) + for (i = 0; i < length(packages); i++) { + if (!show_paths && i > 0) { + log.console("") + } + if (!show_paths) { + log.console("--- " + packages[i] + " ---") + } + list_one_package(packages[i]) + } +} else { + pkg = pkg_arg || package.find_package_dir('.') + list_one_package(pkg) } $stop() diff --git a/mcode.ce b/mcode.ce index eb0708d4..44d3e240 100644 --- a/mcode.ce +++ b/mcode.ce @@ -1,13 +1,124 @@ +// mcode.ce — compile to mcode IR +// +// Usage: +// cell mcode Full mcode IR as JSON (default) +// cell mcode --pretty Pretty-printed human-readable IR + var fd = use("fd") var json = use("json") -var tokenize = use("tokenize") -var parse = use("parse") -var fold = use("fold") -var mcode = use("mcode") -var filename = args[0] -var src = text(fd.slurp(filename)) -var result = tokenize(src, filename) -var ast = parse(result.tokens, src, filename, tokenize) -var folded = fold(ast) -var compiled = mcode(folded) -print(json.encode(compiled)) +var shop = use("internal/shop") + +var show_pretty = false +var filename = null +var i = 0 + +for (i = 0; i < length(args); i++) { + if (args[i] == '--pretty') { + show_pretty = true + } else if (args[i] == '--help' || args[i] == '-h') { + log.console("Usage: cell mcode [--pretty] ") + $stop() + } else if (!starts_with(args[i], '-')) { + filename = args[i] + } +} + +if (!filename) { + log.console("usage: cell mcode [--pretty] ") + $stop() +} + +var folded = shop.analyze_file(filename) +var pl = shop.pipeline() +var compiled = pl.mcode(folded) + +if (!show_pretty) { + print(json.encode(compiled)) + $stop() +} + +// Pretty-print helpers (from dump_mcode.cm) +var pad_right = function(s, w) { + var r = s + while (length(r) < w) { + r = r + " " + } + return r +} + +var fmt_val = function(v) { + if (is_null(v)) return "null" + if (is_number(v)) return text(v) + if (is_text(v)) return `"${v}"` + if (is_object(v)) return json.encode(v) + if (is_logical(v)) return v ? "true" : "false" + return text(v) +} + +var dump_function = function(func, name) { + var nr_args = func.nr_args != null ? func.nr_args : 0 + var nr_slots = func.nr_slots != null ? func.nr_slots : 0 + var nr_close = func.nr_close_slots != null ? func.nr_close_slots : 0 + var instrs = func.instructions + var i = 0 + var pc = 0 + var instr = null + var op = null + var n = 0 + var parts = null + var j = 0 + var operands = null + var pc_str = null + var op_str = null + print(`\n=== ${name} (args=${text(nr_args)}, slots=${text(nr_slots)}, closures=${text(nr_close)}) ===`) + if (instrs == null || length(instrs) == 0) { + print(" (empty)") + return null + } + while (i < length(instrs)) { + instr = instrs[i] + if (is_text(instr)) { + if (!starts_with(instr, "_nop_")) { + print(`${instr}:`) + } + } else if (is_array(instr)) { + op = instr[0] + n = length(instr) + parts = [] + j = 1 + while (j < n - 2) { + push(parts, fmt_val(instr[j])) + j = j + 1 + } + operands = text(parts, ", ") + pc_str = pad_right(text(pc), 5) + op_str = pad_right(op, 14) + print(` ${pc_str} ${op_str} ${operands}`) + pc = pc + 1 + } + i = i + 1 + } + return null +} + +var main_name = null +var fi = 0 +var func = null +var fname = null + +if (compiled.main != null) { + main_name = compiled.name != null ? compiled.name : "
" + dump_function(compiled.main, main_name) +} + +if (compiled.functions != null) { + fi = 0 + while (fi < length(compiled.functions)) { + func = compiled.functions[fi] + fname = func.name != null ? func.name : `` + dump_function(func, `[${text(fi)}] ${fname}`) + fi = fi + 1 + } +} + +$stop() diff --git a/parse.ce b/parse.ce index c0187a9e..242b7b2c 100644 --- a/parse.ce +++ b/parse.ce @@ -1,9 +1,5 @@ -var fd = use("fd") var json = use("json") -var tokenize = use("tokenize") -var parse = use("parse") +var shop = use("internal/shop") var filename = args[0] -var src = text(fd.slurp(filename)) -var result = tokenize(src, filename) -var ast = parse(result.tokens, src, filename, tokenize) +var ast = shop.parse_file(filename) print(json.encode(ast, true)) diff --git a/streamline.ce b/streamline.ce index ae011d32..72ca5044 100644 --- a/streamline.ce +++ b/streamline.ce @@ -1,22 +1,20 @@ // streamline.ce — run the full compile + optimize pipeline // // Usage: -// pit streamline Full optimized IR as JSON (default) -// pit streamline --stats Summary stats per function -// pit streamline --ir Human-readable IR -// pit streamline --check Warnings only (e.g. high slot count) +// cell streamline Full optimized IR as JSON (default) +// cell streamline --stats Summary stats per function +// cell streamline --ir Human-readable IR +// cell streamline --check Warnings only (e.g. high slot count) +// cell streamline --types Optimized IR with type annotations var fd = use("fd") var json = use("json") -var tokenize = use("tokenize") -var parse = use("parse") -var fold = use("fold") -var mcode = use("mcode") -var streamline = use("streamline") +var shop = use("internal/shop") var show_stats = false var show_ir = false var show_check = false +var show_types = false var filename = null var i = 0 @@ -27,21 +25,24 @@ for (i = 0; i < length(args); i++) { show_ir = true } else if (args[i] == '--check') { show_check = true + } else if (args[i] == '--types') { + show_types = true + } else if (args[i] == '--help' || args[i] == '-h') { + log.console("Usage: cell streamline [--stats] [--ir] [--check] [--types] ") + $stop() } else if (!starts_with(args[i], '-')) { filename = args[i] } } if (!filename) { - print("usage: pit streamline [--stats] [--ir] [--check] ") + print("usage: cell streamline [--stats] [--ir] [--check] [--types] ") $stop() } -var src = text(fd.slurp(filename)) -var result = tokenize(src, filename) -var ast = parse(result.tokens, src, filename, tokenize) -var folded = fold(ast) -var compiled = mcode(folded) +var folded = shop.analyze_file(filename) +var pl = shop.pipeline() +var compiled = pl.mcode(folded) // Deep copy for before snapshot (needed by --stats) var before = null @@ -49,18 +50,16 @@ if (show_stats) { before = json.decode(json.encode(compiled)) } -var optimized = streamline(compiled) +var optimized = pl.streamline(compiled) // If no flags, default to full JSON output -if (!show_stats && !show_ir && !show_check) { +if (!show_stats && !show_ir && !show_check && !show_types) { print(json.encode(optimized, true)) $stop() } // --- Helpers --- -var ir_stats = use("ir_stats") - var pad_right = function(s, w) { var r = s while (length(r) < w) { @@ -69,6 +68,15 @@ var pad_right = function(s, w) { return r } +var fmt_val = function(v) { + if (is_null(v)) return "null" + if (is_number(v)) return text(v) + if (is_text(v)) return `"${v}"` + if (is_object(v)) return json.encode(v) + if (is_logical(v)) return v ? "true" : "false" + return text(v) +} + var count_nops = function(func) { var instrs = func.instructions var nops = 0 @@ -83,6 +91,13 @@ var count_nops = function(func) { return nops } +// --- Stats mode --- + +var ir_stats = null +if (show_stats || show_ir) { + ir_stats = use("ir_stats") +} + var print_func_stats = function(func, before_func, name) { var nr_args = func.nr_args != null ? func.nr_args : 0 var nr_slots = func.nr_slots != null ? func.nr_slots : 0 @@ -118,6 +133,164 @@ var check_func = function(func, name) { } } +// --- Types mode (from dump_types.cm) --- + +def T_UNKNOWN = "unknown" +def T_INT = "int" +def T_FLOAT = "float" +def T_NUM = "num" +def T_TEXT = "text" +def T_BOOL = "bool" +def T_NULL = "null" +def T_ARRAY = "array" +def T_RECORD = "record" +def T_FUNCTION = "function" + +def int_result_ops = { + bitnot: true, bitand: true, bitor: true, + bitxor: true, shl: true, shr: true, ushr: true +} +def bool_result_ops = { + eq_int: true, ne_int: true, lt_int: true, gt_int: true, + le_int: true, ge_int: true, + eq_float: true, ne_float: true, lt_float: true, gt_float: true, + le_float: true, ge_float: true, + eq_text: true, ne_text: true, lt_text: true, gt_text: true, + le_text: true, ge_text: true, + eq_bool: true, ne_bool: true, + not: true, and: true, or: true, + is_int: true, is_text: true, is_num: true, + is_bool: true, is_null: true, is_identical: true, + is_array: true, is_func: true, is_record: true, is_stone: true +} + +var access_value_type = function(val) { + if (is_number(val)) return is_integer(val) ? T_INT : T_FLOAT + if (is_text(val)) return T_TEXT + return T_UNKNOWN +} + +var track_types = function(slot_types, instr) { + var op = instr[0] + var src_type = null + if (op == "access") { + slot_types[text(instr[1])] = access_value_type(instr[2]) + } else if (op == "int") { + slot_types[text(instr[1])] = T_INT + } else if (op == "true" || op == "false") { + slot_types[text(instr[1])] = T_BOOL + } else if (op == "null") { + slot_types[text(instr[1])] = T_NULL + } else if (op == "move") { + src_type = slot_types[text(instr[2])] + slot_types[text(instr[1])] = src_type != null ? src_type : T_UNKNOWN + } else if (int_result_ops[op] == true) { + slot_types[text(instr[1])] = T_INT + } else if (op == "concat") { + slot_types[text(instr[1])] = T_TEXT + } else if (bool_result_ops[op] == true) { + slot_types[text(instr[1])] = T_BOOL + } else if (op == "typeof") { + slot_types[text(instr[1])] = T_TEXT + } else if (op == "array") { + slot_types[text(instr[1])] = T_ARRAY + } else if (op == "record") { + slot_types[text(instr[1])] = T_RECORD + } else if (op == "function") { + slot_types[text(instr[1])] = T_FUNCTION + } else if (op == "invoke" || op == "tail_invoke") { + slot_types[text(instr[2])] = T_UNKNOWN + } else if (op == "load_field" || op == "load_index" || op == "load_dynamic") { + slot_types[text(instr[1])] = T_UNKNOWN + } else if (op == "pop" || op == "get") { + slot_types[text(instr[1])] = T_UNKNOWN + } else if (op == "length") { + slot_types[text(instr[1])] = T_INT + } else if (op == "add" || op == "subtract" || op == "multiply" || + op == "divide" || op == "modulo" || op == "pow" || op == "negate") { + slot_types[text(instr[1])] = T_UNKNOWN + } + return null +} + +var type_annotation = function(slot_types, instr) { + var n = length(instr) + var parts = [] + var j = 1 + var v = null + var t = null + while (j < n - 2) { + v = instr[j] + if (is_number(v)) { + t = slot_types[text(v)] + if (t != null && t != T_UNKNOWN) { + push(parts, `s${text(v)}:${t}`) + } + } + j = j + 1 + } + if (length(parts) == 0) return "" + return text(parts, " ") +} + +var dump_function_typed = function(func, name) { + var nr_args = func.nr_args != null ? func.nr_args : 0 + var nr_slots = func.nr_slots != null ? func.nr_slots : 0 + var instrs = func.instructions + var slot_types = {} + var i = 0 + var pc = 0 + var instr = null + var op = null + var n = 0 + var annotation = null + var operand_parts = null + var j = 0 + var operands = null + var pc_str = null + var op_str = null + var line = null + print(`\n=== ${name} (args=${text(nr_args)}, slots=${text(nr_slots)}) ===`) + if (instrs == null || length(instrs) == 0) { + print(" (empty)") + return null + } + while (i < length(instrs)) { + instr = instrs[i] + if (is_text(instr)) { + if (starts_with(instr, "_nop_")) { + i = i + 1 + continue + } + slot_types = {} + print(`${instr}:`) + } else if (is_array(instr)) { + op = instr[0] + n = length(instr) + annotation = type_annotation(slot_types, instr) + operand_parts = [] + j = 1 + while (j < n - 2) { + push(operand_parts, fmt_val(instr[j])) + j = j + 1 + } + operands = text(operand_parts, ", ") + pc_str = pad_right(text(pc), 5) + op_str = pad_right(op, 14) + line = pad_right(` ${pc_str} ${op_str} ${operands}`, 50) + if (length(annotation) > 0) { + print(`${line} ; ${annotation}`) + } else { + print(line) + } + track_types(slot_types, instr) + pc = pc + 1 + } + i = i + 1 + } + return null +} + // --- Process functions --- var main_name = optimized.name != null ? optimized.name : "
" @@ -141,6 +314,9 @@ if (optimized.main != null) { if (show_check) { check_func(optimized.main, main_name) } + if (show_types) { + dump_function_typed(optimized.main, main_name) + } } // Sub-functions @@ -160,6 +336,9 @@ if (optimized.functions != null) { if (show_check) { check_func(func, fname) } + if (show_types) { + dump_function_typed(func, fname) + } fi = fi + 1 } } diff --git a/tokenize.ce b/tokenize.ce index f7d4fd06..8ac001e4 100644 --- a/tokenize.ce +++ b/tokenize.ce @@ -1,7 +1,5 @@ -var fd = use("fd") var json = use("json") -var tokenize = use("tokenize") +var shop = use("internal/shop") var filename = args[0] -var src = text(fd.slurp(filename)) -var result = tokenize(src, filename) +var result = shop.tokenize_file(filename) print(json.encode({filename: result.filename, tokens: result.tokens}))