271 lines
8.0 KiB
Markdown
271 lines
8.0 KiB
Markdown
---
|
|
title: "Semantic Index"
|
|
description: "Index and query symbols, references, and call sites in source files"
|
|
weight: 55
|
|
type: "docs"
|
|
---
|
|
|
|
ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.
|
|
|
|
## Overview
|
|
|
|
The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.
|
|
|
|
```
|
|
source → tokenize → parse → fold → index
|
|
↓
|
|
symbols, references,
|
|
call sites, imports,
|
|
exports, reverse refs
|
|
```
|
|
|
|
Two CLI commands expose this:
|
|
|
|
| Command | Purpose |
|
|
|---------|---------|
|
|
| `pit index <file>` | Produce the full semantic index as JSON |
|
|
| `pit explain` | Query the index for a specific symbol or position |
|
|
|
|
## pit index
|
|
|
|
Index a source file and print the result as JSON.
|
|
|
|
```bash
|
|
pit index <file.ce|file.cm>
|
|
pit index <file> -o output.json
|
|
```
|
|
|
|
### Output
|
|
|
|
The index contains these sections:
|
|
|
|
| Section | Description |
|
|
|---------|-------------|
|
|
| `imports` | All `use()` calls with local name, module path, and span |
|
|
| `symbols` | Every declaration: vars, defs, functions, params |
|
|
| `references` | Every use of a name, classified as read, write, or call |
|
|
| `call_sites` | Every function call with callee, args count, and enclosing function |
|
|
| `exports` | For `.cm` modules, the keys of the top-level `return` record |
|
|
| `reverse_refs` | Inverted index: name to list of reference spans |
|
|
|
|
### Example
|
|
|
|
Given a file `graph.ce` with functions `make_node`, `connect`, and `build_graph`:
|
|
|
|
```bash
|
|
pit index graph.ce
|
|
```
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"path": "graph.ce",
|
|
"is_actor": true,
|
|
"imports": [
|
|
{"local_name": "json", "module_path": "json", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
|
|
],
|
|
"symbols": [
|
|
{
|
|
"symbol_id": "graph.ce:make_node:fn",
|
|
"name": "make_node",
|
|
"kind": "fn",
|
|
"params": ["name", "kind"],
|
|
"doc_comment": "// A node in the graph.",
|
|
"decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
|
|
"scope_fn_nr": 0
|
|
}
|
|
],
|
|
"references": [
|
|
{"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
|
|
],
|
|
"call_sites": [
|
|
{"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
|
|
],
|
|
"exports": [],
|
|
"reverse_refs": {
|
|
"make_node": [
|
|
{"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Symbol Kinds
|
|
|
|
| Kind | Description |
|
|
|------|-------------|
|
|
| `fn` | Function (var or def with function value) |
|
|
| `var` | Mutable variable |
|
|
| `def` | Constant |
|
|
| `param` | Function parameter |
|
|
|
|
Each symbol has a `symbol_id` in the format `filename:name:kind` and a `decl_span` with `from_row`, `from_col`, `to_row`, `to_col` (0-based).
|
|
|
|
### Reference Kinds
|
|
|
|
| Kind | Description |
|
|
|------|-------------|
|
|
| `read` | Value is read |
|
|
| `write` | Value is assigned |
|
|
| `call` | Used as a function call target |
|
|
|
|
### Module Exports
|
|
|
|
For `.cm` files, the indexer detects the top-level `return` statement. If it returns a record literal, each key becomes an export linked to its symbol:
|
|
|
|
```javascript
|
|
// math_utils.cm
|
|
var add = function(a, b) { return a + b }
|
|
var sub = function(a, b) { return a - b }
|
|
return {add: add, sub: sub}
|
|
```
|
|
|
|
```bash
|
|
pit index math_utils.cm
|
|
```
|
|
|
|
The `exports` section will contain:
|
|
|
|
```json
|
|
[
|
|
{"name": "add", "symbol_id": "math_utils.cm:add:fn"},
|
|
{"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
|
|
]
|
|
```
|
|
|
|
## pit explain
|
|
|
|
Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.
|
|
|
|
```bash
|
|
pit explain --span <file>:<line>:<col>
|
|
pit explain --symbol <name> <file>
|
|
```
|
|
|
|
### --span: What is at this position?
|
|
|
|
Point at a line and column (0-based) to find out what symbol or reference is there.
|
|
|
|
```bash
|
|
pit explain --span demo.ce:6:4
|
|
```
|
|
|
|
If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.
|
|
|
|
The result includes:
|
|
|
|
| Field | Description |
|
|
|-------|-------------|
|
|
| `symbol` | The resolved declaration (name, kind, params, doc comment, span) |
|
|
| `reference` | The reference at the cursor, if the cursor was on a reference |
|
|
| `references` | All references to this symbol across the file |
|
|
| `call_sites` | All call sites for this symbol |
|
|
| `imports` | The file's imports (for context) |
|
|
|
|
```json
|
|
{
|
|
"symbol": {
|
|
"name": "build_graph",
|
|
"symbol_id": "demo.ce:build_graph:fn",
|
|
"kind": "fn",
|
|
"params": [],
|
|
"doc_comment": "// Build a sample graph and return it."
|
|
},
|
|
"references": [
|
|
{"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
|
|
],
|
|
"call_sites": []
|
|
}
|
|
```
|
|
|
|
### --symbol: Find a symbol by name
|
|
|
|
Look up a symbol by name, returning all matching declarations and every reference.
|
|
|
|
```bash
|
|
pit explain --symbol connect demo.ce
|
|
```
|
|
|
|
```json
|
|
{
|
|
"symbols": [
|
|
{
|
|
"name": "connect",
|
|
"symbol_id": "demo.ce:connect:fn",
|
|
"kind": "fn",
|
|
"params": ["from", "to", "label"],
|
|
"doc_comment": "// Connect two nodes with a labeled edge."
|
|
}
|
|
],
|
|
"references": [
|
|
{"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
|
|
{"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
|
|
{"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
|
|
],
|
|
"call_sites": [
|
|
{"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
|
|
{"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
|
|
{"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
|
|
]
|
|
}
|
|
```
|
|
|
|
This tells you: `connect` is a function taking `(from, to, label)`, declared on line 11, and called 3 times inside `build_graph`.
|
|
|
|
## Programmatic Use
|
|
|
|
The index and explain modules can be used directly from ƿit scripts:
|
|
|
|
### index.cm
|
|
|
|
```javascript
|
|
var tokenize_mod = use('tokenize')
|
|
var parse_mod = use('parse')
|
|
var fold_mod = use('fold')
|
|
var index_mod = use('index')
|
|
|
|
var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod}
|
|
var idx = index_mod.index_file(src, filename, pipeline)
|
|
```
|
|
|
|
`index_file` runs the full pipeline (tokenize, parse, fold) and returns the index. If you already have a parsed AST and tokens, use `index_ast` instead:
|
|
|
|
```javascript
|
|
var idx = index_mod.index_ast(ast, tokens, filename)
|
|
```
|
|
|
|
### explain.cm
|
|
|
|
```javascript
|
|
var explain_mod = use('explain')
|
|
var expl = explain_mod.make(idx)
|
|
|
|
// What is at line 10, column 5?
|
|
var result = expl.at_span(10, 5)
|
|
|
|
// Find all symbols named "connect"
|
|
var result = expl.by_symbol("connect")
|
|
|
|
// Get callers and callees of a symbol
|
|
var chain = expl.call_chain("demo.ce:connect:fn", 2)
|
|
```
|
|
|
|
For cross-file queries:
|
|
|
|
```javascript
|
|
var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")
|
|
```
|
|
|
|
## LSP Integration
|
|
|
|
The semantic index powers these LSP features:
|
|
|
|
| Feature | LSP Method | Description |
|
|
|---------|------------|-------------|
|
|
| Find References | `textDocument/references` | All references to the symbol under the cursor |
|
|
| Rename | `textDocument/rename` | Rename a symbol and all its references |
|
|
| Prepare Rename | `textDocument/prepareRename` | Validate that the cursor is on a renameable symbol |
|
|
| Go to Definition | `textDocument/definition` | Jump to a symbol's declaration (index-backed with AST fallback) |
|
|
|
|
These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.
|