Files
cell/docs/semantic-index.md
2026-02-16 21:50:39 -06:00

271 lines
8.0 KiB
Markdown

---
title: "Semantic Index"
description: "Index and query symbols, references, and call sites in source files"
weight: 55
type: "docs"
---
ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.
## Overview
The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.
```
source → tokenize → parse → fold → index
symbols, references,
call sites, imports,
exports, reverse refs
```
Two CLI commands expose this:
| Command | Purpose |
|---------|---------|
| `pit index <file>` | Produce the full semantic index as JSON |
| `pit explain` | Query the index for a specific symbol or position |
## pit index
Index a source file and print the result as JSON.
```bash
pit index <file.ce|file.cm>
pit index <file> -o output.json
```
### Output
The index contains these sections:
| Section | Description |
|---------|-------------|
| `imports` | All `use()` calls with local name, module path, and span |
| `symbols` | Every declaration: vars, defs, functions, params |
| `references` | Every use of a name, classified as read, write, or call |
| `call_sites` | Every function call with callee, args count, and enclosing function |
| `exports` | For `.cm` modules, the keys of the top-level `return` record |
| `reverse_refs` | Inverted index: name to list of reference spans |
### Example
Given a file `graph.ce` with functions `make_node`, `connect`, and `build_graph`:
```bash
pit index graph.ce
```
```json
{
"version": 1,
"path": "graph.ce",
"is_actor": true,
"imports": [
{"local_name": "json", "module_path": "json", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
],
"symbols": [
{
"symbol_id": "graph.ce:make_node:fn",
"name": "make_node",
"kind": "fn",
"params": ["name", "kind"],
"doc_comment": "// A node in the graph.",
"decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
"scope_fn_nr": 0
}
],
"references": [
{"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
],
"call_sites": [
{"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
],
"exports": [],
"reverse_refs": {
"make_node": [
{"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
]
}
}
```
### Symbol Kinds
| Kind | Description |
|------|-------------|
| `fn` | Function (var or def with function value) |
| `var` | Mutable variable |
| `def` | Constant |
| `param` | Function parameter |
Each symbol has a `symbol_id` in the format `filename:name:kind` and a `decl_span` with `from_row`, `from_col`, `to_row`, `to_col` (0-based).
### Reference Kinds
| Kind | Description |
|------|-------------|
| `read` | Value is read |
| `write` | Value is assigned |
| `call` | Used as a function call target |
### Module Exports
For `.cm` files, the indexer detects the top-level `return` statement. If it returns a record literal, each key becomes an export linked to its symbol:
```javascript
// math_utils.cm
var add = function(a, b) { return a + b }
var sub = function(a, b) { return a - b }
return {add: add, sub: sub}
```
```bash
pit index math_utils.cm
```
The `exports` section will contain:
```json
[
{"name": "add", "symbol_id": "math_utils.cm:add:fn"},
{"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
]
```
## pit explain
Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.
```bash
pit explain --span <file>:<line>:<col>
pit explain --symbol <name> <file>
```
### --span: What is at this position?
Point at a line and column (0-based) to find out what symbol or reference is there.
```bash
pit explain --span demo.ce:6:4
```
If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.
The result includes:
| Field | Description |
|-------|-------------|
| `symbol` | The resolved declaration (name, kind, params, doc comment, span) |
| `reference` | The reference at the cursor, if the cursor was on a reference |
| `references` | All references to this symbol across the file |
| `call_sites` | All call sites for this symbol |
| `imports` | The file's imports (for context) |
```json
{
"symbol": {
"name": "build_graph",
"symbol_id": "demo.ce:build_graph:fn",
"kind": "fn",
"params": [],
"doc_comment": "// Build a sample graph and return it."
},
"references": [
{"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
],
"call_sites": []
}
```
### --symbol: Find a symbol by name
Look up a symbol by name, returning all matching declarations and every reference.
```bash
pit explain --symbol connect demo.ce
```
```json
{
"symbols": [
{
"name": "connect",
"symbol_id": "demo.ce:connect:fn",
"kind": "fn",
"params": ["from", "to", "label"],
"doc_comment": "// Connect two nodes with a labeled edge."
}
],
"references": [
{"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
{"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
{"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
],
"call_sites": [
{"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
{"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
{"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
]
}
```
This tells you: `connect` is a function taking `(from, to, label)`, declared on line 11, and called 3 times inside `build_graph`.
## Programmatic Use
The index and explain modules can be used directly from ƿit scripts:
### index.cm
```javascript
var tokenize_mod = use('tokenize')
var parse_mod = use('parse')
var fold_mod = use('fold')
var index_mod = use('index')
var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod}
var idx = index_mod.index_file(src, filename, pipeline)
```
`index_file` runs the full pipeline (tokenize, parse, fold) and returns the index. If you already have a parsed AST and tokens, use `index_ast` instead:
```javascript
var idx = index_mod.index_ast(ast, tokens, filename)
```
### explain.cm
```javascript
var explain_mod = use('explain')
var expl = explain_mod.make(idx)
// What is at line 10, column 5?
var result = expl.at_span(10, 5)
// Find all symbols named "connect"
var result = expl.by_symbol("connect")
// Get callers and callees of a symbol
var chain = expl.call_chain("demo.ce:connect:fn", 2)
```
For cross-file queries:
```javascript
var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")
```
## LSP Integration
The semantic index powers these LSP features:
| Feature | LSP Method | Description |
|---------|------------|-------------|
| Find References | `textDocument/references` | All references to the symbol under the cursor |
| Rename | `textDocument/rename` | Rename a symbol and all its references |
| Prepare Rename | `textDocument/prepareRename` | Validate that the cursor is on a renameable symbol |
| Go to Definition | `textDocument/definition` | Jump to a symbol's declaration (index-backed with AST fallback) |
These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.