344 lines
12 KiB
Markdown
344 lines
12 KiB
Markdown
---
|
|
title: "Semantic Index"
|
|
description: "Index and query symbols, references, and call sites in source files"
|
|
weight: 55
|
|
type: "docs"
|
|
---
|
|
|
|
ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.
|
|
|
|
## Overview
|
|
|
|
The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.
|
|
|
|
```
|
|
source → tokenize → parse → fold → index
|
|
↓
|
|
symbols, references,
|
|
call sites, imports,
|
|
exports, reverse refs
|
|
```
|
|
|
|
Two CLI commands expose this:
|
|
|
|
| Command | Purpose |
|
|
|---------|---------|
|
|
| `pit index <file>` | Produce the full semantic index as JSON |
|
|
| `pit explain` | Query the index for a specific symbol or position |
|
|
|
|
## pit index
|
|
|
|
Index a source file and print the result as JSON.
|
|
|
|
```bash
|
|
pit index <file.ce|file.cm>
|
|
pit index <file> -o output.json
|
|
```
|
|
|
|
### Output
|
|
|
|
The index contains these sections:
|
|
|
|
| Section | Description |
|
|
|---------|-------------|
|
|
| `imports` | All `use()` calls with local name, module path, resolved filesystem path, and span |
|
|
| `symbols` | Every declaration: vars, defs, functions, params |
|
|
| `references` | Every use of a name, classified as read, write, or call |
|
|
| `call_sites` | Every function call with callee, args count, and enclosing function |
|
|
| `exports` | For `.cm` modules, the keys of the top-level `return` record |
|
|
| `reverse_refs` | Inverted index: name to list of reference spans |
|
|
|
|
### Example
|
|
|
|
Given a file `graph.ce` with functions `make_node`, `connect`, and `build_graph`:
|
|
|
|
```bash
|
|
pit index graph.ce
|
|
```
|
|
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"path": "graph.ce",
|
|
"is_actor": true,
|
|
"imports": [
|
|
{"local_name": "json", "module_path": "json", "resolved_path": ".cell/packages/core/json.cm", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
|
|
],
|
|
"symbols": [
|
|
{
|
|
"symbol_id": "graph.ce:make_node:fn",
|
|
"name": "make_node",
|
|
"kind": "fn",
|
|
"params": ["name", "kind"],
|
|
"doc_comment": "// A node in the graph.",
|
|
"decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
|
|
"scope_fn_nr": 0
|
|
}
|
|
],
|
|
"references": [
|
|
{"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
|
|
],
|
|
"call_sites": [
|
|
{"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
|
|
],
|
|
"exports": [],
|
|
"reverse_refs": {
|
|
"make_node": [
|
|
{"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
### Symbol Kinds
|
|
|
|
| Kind | Description |
|
|
|------|-------------|
|
|
| `fn` | Function (var or def with function value) |
|
|
| `var` | Mutable variable |
|
|
| `def` | Constant |
|
|
| `param` | Function parameter |
|
|
|
|
Each symbol has a `symbol_id` in the format `filename:name:kind` and a `decl_span` with `from_row`, `from_col`, `to_row`, `to_col` (0-based).
|
|
|
|
### Reference Kinds
|
|
|
|
| Kind | Description |
|
|
|------|-------------|
|
|
| `read` | Value is read |
|
|
| `write` | Value is assigned |
|
|
| `call` | Used as a function call target |
|
|
|
|
### Module Exports
|
|
|
|
For `.cm` files, the indexer detects the top-level `return` statement. If it returns a record literal, each key becomes an export linked to its symbol:
|
|
|
|
```javascript
|
|
// math_utils.cm
|
|
var add = function(a, b) { return a + b }
|
|
var sub = function(a, b) { return a - b }
|
|
return {add: add, sub: sub}
|
|
```
|
|
|
|
```bash
|
|
pit index math_utils.cm
|
|
```
|
|
|
|
The `exports` section will contain:
|
|
|
|
```json
|
|
[
|
|
{"name": "add", "symbol_id": "math_utils.cm:add:fn"},
|
|
{"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
|
|
]
|
|
```
|
|
|
|
## pit explain
|
|
|
|
Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.
|
|
|
|
```bash
|
|
pit explain --span <file>:<line>:<col>
|
|
pit explain --symbol <name> <file>...
|
|
```
|
|
|
|
### --span: What is at this position?
|
|
|
|
Point at a line and column (0-based) to find out what symbol or reference is there.
|
|
|
|
```bash
|
|
pit explain --span demo.ce:6:4
|
|
```
|
|
|
|
If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.
|
|
|
|
The result includes:
|
|
|
|
| Field | Description |
|
|
|-------|-------------|
|
|
| `symbol` | The resolved declaration (name, kind, params, doc comment, span) |
|
|
| `reference` | The reference at the cursor, if the cursor was on a reference |
|
|
| `references` | All references to this symbol across the file |
|
|
| `call_sites` | All call sites for this symbol |
|
|
| `imports` | The file's imports (for context) |
|
|
|
|
```json
|
|
{
|
|
"symbol": {
|
|
"name": "build_graph",
|
|
"symbol_id": "demo.ce:build_graph:fn",
|
|
"kind": "fn",
|
|
"params": [],
|
|
"doc_comment": "// Build a sample graph and return it."
|
|
},
|
|
"references": [
|
|
{"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
|
|
],
|
|
"call_sites": []
|
|
}
|
|
```
|
|
|
|
### --symbol: Find a symbol by name
|
|
|
|
Look up a symbol by name. Pass one file for a focused result, or multiple files (including shell globs) to search across them all:
|
|
|
|
```bash
|
|
pit explain --symbol connect demo.ce
|
|
pit explain --symbol connect *.ce *.cm
|
|
```
|
|
|
|
```json
|
|
{
|
|
"symbols": [
|
|
{
|
|
"name": "connect",
|
|
"symbol_id": "demo.ce:connect:fn",
|
|
"kind": "fn",
|
|
"params": ["from", "to", "label"],
|
|
"doc_comment": "// Connect two nodes with a labeled edge."
|
|
}
|
|
],
|
|
"references": [
|
|
{"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
|
|
{"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
|
|
{"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
|
|
],
|
|
"call_sites": [
|
|
{"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
|
|
{"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
|
|
{"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
|
|
]
|
|
}
|
|
```
|
|
|
|
This tells you: `connect` is a function taking `(from, to, label)`, declared on line 11, and called 3 times inside `build_graph`.
|
|
|
|
## Programmatic Use
|
|
|
|
The index and explain modules can be used directly from ƿit scripts:
|
|
|
|
### index.cm
|
|
|
|
```javascript
|
|
var tokenize_mod = use('tokenize')
|
|
var parse_mod = use('parse')
|
|
var fold_mod = use('fold')
|
|
var index_mod = use('index')
|
|
|
|
var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod}
|
|
var idx = index_mod.index_file(src, filename, pipeline)
|
|
```
|
|
|
|
`index_file` runs the full pipeline (tokenize, parse, fold) and returns the index. If you already have a parsed AST and tokens, use `index_ast` instead:
|
|
|
|
```javascript
|
|
var idx = index_mod.index_ast(ast, tokens, filename)
|
|
```
|
|
|
|
### explain.cm
|
|
|
|
```javascript
|
|
var explain_mod = use('explain')
|
|
var expl = explain_mod.make(idx)
|
|
|
|
// What is at line 10, column 5?
|
|
var result = expl.at_span(10, 5)
|
|
|
|
// Find all symbols named "connect"
|
|
var result = expl.by_symbol("connect")
|
|
|
|
// Get callers and callees of a symbol
|
|
var chain = expl.call_chain("demo.ce:connect:fn", 2)
|
|
```
|
|
|
|
For cross-file queries:
|
|
|
|
```javascript
|
|
var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")
|
|
```
|
|
|
|
## LSP Integration
|
|
|
|
The semantic index powers these LSP features:
|
|
|
|
| Feature | LSP Method | Description |
|
|
|---------|------------|-------------|
|
|
| Find References | `textDocument/references` | All references to the symbol under the cursor |
|
|
| Rename | `textDocument/rename` | Rename a symbol and all its references |
|
|
| Prepare Rename | `textDocument/prepareRename` | Validate that the cursor is on a renameable symbol |
|
|
| Go to Definition | `textDocument/definition` | Jump to a symbol's declaration (index-backed with AST fallback) |
|
|
|
|
These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.
|
|
|
|
## LLM / AI Assistance
|
|
|
|
The semantic index is designed to give LLMs the context they need to read and edit ƿit code accurately. ƿit is not in any training set, so an LLM cannot rely on memorized patterns — it needs structured information about names, scopes, and call relationships. The commands below are the recommended way to provide that.
|
|
|
|
### Understand a file before editing
|
|
|
|
Before modifying a file, index it to see its structure:
|
|
|
|
```bash
|
|
pit index file.ce
|
|
```
|
|
|
|
This gives the LLM every declaration, every reference, every call site, and the import list with resolved paths. Key things to extract:
|
|
|
|
- **`symbols`** — what functions exist, their parameters, and their doc comments. This is enough to understand the file's API without reading every line.
|
|
- **`imports`** with `resolved_path` — which modules are used, and where they live on disk. The LLM can follow these paths to read dependency source when it needs to understand a called function. Imports without a `resolved_path` are C built-ins (like `json`) with no script source to read.
|
|
- **`exports`** — for `.cm` modules, what the public API is. This tells the LLM what names other files can access.
|
|
|
|
### Investigate a specific symbol
|
|
|
|
When the LLM needs to rename, refactor, or understand a specific function:
|
|
|
|
```bash
|
|
pit explain --symbol update analysis.cm
|
|
```
|
|
|
|
This returns the declaration (with doc comment and parameter list), every reference, and every call site. The LLM can use this to:
|
|
|
|
- **Rename safely** — the references list has exact spans for every use of the name.
|
|
- **Understand callers** — `call_sites` shows where and how the function is called, including argument counts.
|
|
- **Read the doc comment** — often enough to understand intent without reading the function body.
|
|
|
|
### Investigate a cursor position
|
|
|
|
When the LLM is looking at a specific line and column (e.g., from an error message or a user selection):
|
|
|
|
```bash
|
|
pit explain --span file.ce:17:4
|
|
```
|
|
|
|
This resolves whatever is at that position — declaration or reference — back to the underlying symbol, then returns all references and call sites. Useful for "what is this name?" queries.
|
|
|
|
### Search across files
|
|
|
|
To find a symbol across multiple files, pass them all:
|
|
|
|
```bash
|
|
pit explain --symbol connect *.ce *.cm
|
|
pit explain --symbol send server.ce client.ce protocol.cm
|
|
```
|
|
|
|
This indexes each file and searches across all of them. The result merges all matching declarations, references, and call sites. Use this when the LLM needs to understand cross-file usage before making a change that touches multiple files.
|
|
|
|
### Import resolution
|
|
|
|
Every import in the index includes the original `module_path` (the string passed to `use()`). For script modules, it also includes `resolved_path` — the filesystem path the module resolves to. This lets the LLM follow dependency chains:
|
|
|
|
```json
|
|
{"local_name": "fd", "module_path": "fd", "resolved_path": ".cell/packages/core/fd.cm"}
|
|
{"local_name": "json", "module_path": "json"}
|
|
```
|
|
|
|
An import without `resolved_path` is a C built-in — no script source to read.
|
|
|
|
### Recommended workflow
|
|
|
|
1. **Start with `pit index`** on the file to edit. Scan imports and symbols for an overview.
|
|
2. **Use `pit explain --symbol`** to drill into any function the LLM needs to understand or modify. The doc comment and parameter list are usually sufficient.
|
|
3. **Follow `resolved_path`** on imports when the LLM needs to understand a dependency — index or read the resolved file.
|
|
4. **Before renaming**, use `pit explain --symbol` (or `--span`) to get all reference spans, then apply edits to each span.
|
|
5. **For cross-file changes**, pass all affected files to `pit explain --symbol` to see the full picture before editing.
|