--- title: "Semantic Index" description: "Index and query symbols, references, and call sites in source files" weight: 55 type: "docs" --- ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging. ## Overview The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file. ``` source → tokenize → parse → fold → index ↓ symbols, references, call sites, imports, exports, reverse refs ``` Two CLI commands expose this: | Command | Purpose | |---------|---------| | `pit index ` | Produce the full semantic index as JSON | | `pit explain` | Query the index for a specific symbol or position | ## pit index Index a source file and print the result as JSON. ```bash pit index pit index -o output.json ``` ### Output The index contains these sections: | Section | Description | |---------|-------------| | `imports` | All `use()` calls with local name, module path, and span | | `symbols` | Every declaration: vars, defs, functions, params | | `references` | Every use of a name, classified as read, write, or call | | `call_sites` | Every function call with callee, args count, and enclosing function | | `exports` | For `.cm` modules, the keys of the top-level `return` record | | `reverse_refs` | Inverted index: name to list of reference spans | ### Example Given a file `graph.ce` with functions `make_node`, `connect`, and `build_graph`: ```bash pit index graph.ce ``` ```json { "version": 1, "path": "graph.ce", "is_actor": true, "imports": [ {"local_name": "json", "module_path": "json", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}} ], "symbols": [ { "symbol_id": "graph.ce:make_node:fn", "name": "make_node", "kind": "fn", "params": ["name", "kind"], "doc_comment": "// A node in the graph.", "decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1}, "scope_fn_nr": 0 } ], "references": [ {"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}} ], "call_sites": [ {"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}} ], "exports": [], "reverse_refs": { "make_node": [ {"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}} ] } } ``` ### Symbol Kinds | Kind | Description | |------|-------------| | `fn` | Function (var or def with function value) | | `var` | Mutable variable | | `def` | Constant | | `param` | Function parameter | Each symbol has a `symbol_id` in the format `filename:name:kind` and a `decl_span` with `from_row`, `from_col`, `to_row`, `to_col` (0-based). ### Reference Kinds | Kind | Description | |------|-------------| | `read` | Value is read | | `write` | Value is assigned | | `call` | Used as a function call target | ### Module Exports For `.cm` files, the indexer detects the top-level `return` statement. If it returns a record literal, each key becomes an export linked to its symbol: ```javascript // math_utils.cm var add = function(a, b) { return a + b } var sub = function(a, b) { return a - b } return {add: add, sub: sub} ``` ```bash pit index math_utils.cm ``` The `exports` section will contain: ```json [ {"name": "add", "symbol_id": "math_utils.cm:add:fn"}, {"name": "sub", "symbol_id": "math_utils.cm:sub:fn"} ] ``` ## pit explain Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question. ```bash pit explain --span :: pit explain --symbol ``` ### --span: What is at this position? Point at a line and column (0-based) to find out what symbol or reference is there. ```bash pit explain --span demo.ce:6:4 ``` If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information. The result includes: | Field | Description | |-------|-------------| | `symbol` | The resolved declaration (name, kind, params, doc comment, span) | | `reference` | The reference at the cursor, if the cursor was on a reference | | `references` | All references to this symbol across the file | | `call_sites` | All call sites for this symbol | | `imports` | The file's imports (for context) | ```json { "symbol": { "name": "build_graph", "symbol_id": "demo.ce:build_graph:fn", "kind": "fn", "params": [], "doc_comment": "// Build a sample graph and return it." }, "references": [ {"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}} ], "call_sites": [] } ``` ### --symbol: Find a symbol by name Look up a symbol by name, returning all matching declarations and every reference. ```bash pit explain --symbol connect demo.ce ``` ```json { "symbols": [ { "name": "connect", "symbol_id": "demo.ce:connect:fn", "kind": "fn", "params": ["from", "to", "label"], "doc_comment": "// Connect two nodes with a labeled edge." } ], "references": [ {"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}}, {"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}}, {"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}} ], "call_sites": [ {"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}}, {"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}}, {"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}} ] } ``` This tells you: `connect` is a function taking `(from, to, label)`, declared on line 11, and called 3 times inside `build_graph`. ## Programmatic Use The index and explain modules can be used directly from ƿit scripts: ### index.cm ```javascript var tokenize_mod = use('tokenize') var parse_mod = use('parse') var fold_mod = use('fold') var index_mod = use('index') var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod} var idx = index_mod.index_file(src, filename, pipeline) ``` `index_file` runs the full pipeline (tokenize, parse, fold) and returns the index. If you already have a parsed AST and tokens, use `index_ast` instead: ```javascript var idx = index_mod.index_ast(ast, tokens, filename) ``` ### explain.cm ```javascript var explain_mod = use('explain') var expl = explain_mod.make(idx) // What is at line 10, column 5? var result = expl.at_span(10, 5) // Find all symbols named "connect" var result = expl.by_symbol("connect") // Get callers and callees of a symbol var chain = expl.call_chain("demo.ce:connect:fn", 2) ``` For cross-file queries: ```javascript var result = explain_mod.explain_across([idx1, idx2, idx3], "connect") ``` ## LSP Integration The semantic index powers these LSP features: | Feature | LSP Method | Description | |---------|------------|-------------| | Find References | `textDocument/references` | All references to the symbol under the cursor | | Rename | `textDocument/rename` | Rename a symbol and all its references | | Prepare Rename | `textDocument/prepareRename` | Validate that the cursor is on a renameable symbol | | Go to Definition | `textDocument/definition` | Jump to a symbol's declaration (index-backed with AST fallback) | These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.