12 KiB
title, description, weight, type
| title | description | weight | type |
|---|---|---|---|
| Semantic Index | Index and query symbols, references, and call sites in source files | 55 | docs |
ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.
Overview
The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.
source → tokenize → parse → fold → index
↓
symbols, references,
call sites, imports,
exports, reverse refs
Two CLI commands expose this:
| Command | Purpose |
|---|---|
pit index <file> |
Produce the full semantic index as JSON |
pit explain |
Query the index for a specific symbol or position |
pit index
Index a source file and print the result as JSON.
pit index <file.ce|file.cm>
pit index <file> -o output.json
Output
The index contains these sections:
| Section | Description |
|---|---|
imports |
All use() calls with local name, module path, resolved filesystem path, and span |
symbols |
Every declaration: vars, defs, functions, params |
references |
Every use of a name, classified as read, write, or call |
call_sites |
Every function call with callee, args count, and enclosing function |
exports |
For .cm modules, the keys of the top-level return record |
reverse_refs |
Inverted index: name to list of reference spans |
Example
Given a file graph.ce with functions make_node, connect, and build_graph:
pit index graph.ce
{
"version": 1,
"path": "graph.ce",
"is_actor": true,
"imports": [
{"local_name": "json", "module_path": "json", "resolved_path": ".cell/packages/core/json.cm", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
],
"symbols": [
{
"symbol_id": "graph.ce:make_node:fn",
"name": "make_node",
"kind": "fn",
"params": ["name", "kind"],
"doc_comment": "// A node in the graph.",
"decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
"scope_fn_nr": 0
}
],
"references": [
{"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
],
"call_sites": [
{"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
],
"exports": [],
"reverse_refs": {
"make_node": [
{"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
]
}
}
Symbol Kinds
| Kind | Description |
|---|---|
fn |
Function (var or def with function value) |
var |
Mutable variable |
def |
Constant |
param |
Function parameter |
Each symbol has a symbol_id in the format filename:name:kind and a decl_span with from_row, from_col, to_row, to_col (0-based).
Reference Kinds
| Kind | Description |
|---|---|
read |
Value is read |
write |
Value is assigned |
call |
Used as a function call target |
Module Exports
For .cm files, the indexer detects the top-level return statement. If it returns a record literal, each key becomes an export linked to its symbol:
// math_utils.cm
var add = function(a, b) { return a + b }
var sub = function(a, b) { return a - b }
return {add: add, sub: sub}
pit index math_utils.cm
The exports section will contain:
[
{"name": "add", "symbol_id": "math_utils.cm:add:fn"},
{"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
]
pit explain
Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.
pit explain --span <file>:<line>:<col>
pit explain --symbol <name> <file>...
--span: What is at this position?
Point at a line and column (0-based) to find out what symbol or reference is there.
pit explain --span demo.ce:6:4
If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.
The result includes:
| Field | Description |
|---|---|
symbol |
The resolved declaration (name, kind, params, doc comment, span) |
reference |
The reference at the cursor, if the cursor was on a reference |
references |
All references to this symbol across the file |
call_sites |
All call sites for this symbol |
imports |
The file's imports (for context) |
{
"symbol": {
"name": "build_graph",
"symbol_id": "demo.ce:build_graph:fn",
"kind": "fn",
"params": [],
"doc_comment": "// Build a sample graph and return it."
},
"references": [
{"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
],
"call_sites": []
}
--symbol: Find a symbol by name
Look up a symbol by name. Pass one file for a focused result, or multiple files (including shell globs) to search across them all:
pit explain --symbol connect demo.ce
pit explain --symbol connect *.ce *.cm
{
"symbols": [
{
"name": "connect",
"symbol_id": "demo.ce:connect:fn",
"kind": "fn",
"params": ["from", "to", "label"],
"doc_comment": "// Connect two nodes with a labeled edge."
}
],
"references": [
{"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
{"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
{"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
],
"call_sites": [
{"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
{"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
{"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
]
}
This tells you: connect is a function taking (from, to, label), declared on line 11, and called 3 times inside build_graph.
Programmatic Use
The index and explain modules can be used directly from ƿit scripts:
Via shop (recommended)
var shop = use('internal/shop')
var idx = shop.index_file(path)
shop.index_file runs the full pipeline (tokenize, parse, index, resolve imports) and caches the result.
index.cm (direct)
If you already have a parsed AST and tokens, use index_ast directly:
var index_mod = use('index')
var idx = index_mod.index_ast(ast, tokens, filename)
explain.cm
var explain_mod = use('explain')
var expl = explain_mod.make(idx)
// What is at line 10, column 5?
var result = expl.at_span(10, 5)
// Find all symbols named "connect"
var result = expl.by_symbol("connect")
// Get callers and callees of a symbol
var chain = expl.call_chain("demo.ce:connect:fn", 2)
For cross-file queries:
var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")
LSP Integration
The semantic index powers these LSP features:
| Feature | LSP Method | Description |
|---|---|---|
| Find References | textDocument/references |
All references to the symbol under the cursor |
| Rename | textDocument/rename |
Rename a symbol and all its references |
| Prepare Rename | textDocument/prepareRename |
Validate that the cursor is on a renameable symbol |
| Go to Definition | textDocument/definition |
Jump to a symbol's declaration (index-backed with AST fallback) |
These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.
LLM / AI Assistance
The semantic index is designed to give LLMs the context they need to read and edit ƿit code accurately. ƿit is not in any training set, so an LLM cannot rely on memorized patterns — it needs structured information about names, scopes, and call relationships. The commands below are the recommended way to provide that.
Understand a file before editing
Before modifying a file, index it to see its structure:
pit index file.ce
This gives the LLM every declaration, every reference, every call site, and the import list with resolved paths. Key things to extract:
symbols— what functions exist, their parameters, and their doc comments. This is enough to understand the file's API without reading every line.importswithresolved_path— which modules are used, and where they live on disk. The LLM can follow these paths to read dependency source when it needs to understand a called function. Imports without aresolved_pathare C built-ins (likejson) with no script source to read.exports— for.cmmodules, what the public API is. This tells the LLM what names other files can access.
Investigate a specific symbol
When the LLM needs to rename, refactor, or understand a specific function:
pit explain --symbol update analysis.cm
This returns the declaration (with doc comment and parameter list), every reference, and every call site. The LLM can use this to:
- Rename safely — the references list has exact spans for every use of the name.
- Understand callers —
call_sitesshows where and how the function is called, including argument counts. - Read the doc comment — often enough to understand intent without reading the function body.
Investigate a cursor position
When the LLM is looking at a specific line and column (e.g., from an error message or a user selection):
pit explain --span file.ce:17:4
This resolves whatever is at that position — declaration or reference — back to the underlying symbol, then returns all references and call sites. Useful for "what is this name?" queries.
Search across files
To find a symbol across multiple files, pass them all:
pit explain --symbol connect *.ce *.cm
pit explain --symbol send server.ce client.ce protocol.cm
This indexes each file and searches across all of them. The result merges all matching declarations, references, and call sites. Use this when the LLM needs to understand cross-file usage before making a change that touches multiple files.
Import resolution
Every import in the index includes the original module_path (the string passed to use()). For script modules, it also includes resolved_path — the filesystem path the module resolves to. This lets the LLM follow dependency chains:
{"local_name": "fd", "module_path": "fd", "resolved_path": ".cell/packages/core/fd.cm"}
{"local_name": "json", "module_path": "json"}
An import without resolved_path is a C built-in — no script source to read.
Recommended workflow
- Start with
pit indexon the file to edit. Scan imports and symbols for an overview. - Use
pit explain --symbolto drill into any function the LLM needs to understand or modify. The doc comment and parameter list are usually sufficient. - Follow
resolved_pathon imports when the LLM needs to understand a dependency — index or read the resolved file. - Before renaming, use
pit explain --symbol(or--span) to get all reference spans, then apply edits to each span. - For cross-file changes, pass all affected files to
pit explain --symbolto see the full picture before editing.