Files
cell/docs/semantic-index.md
2026-02-16 21:50:39 -06:00

8.0 KiB

title, description, weight, type
title description weight type
Semantic Index Index and query symbols, references, and call sites in source files 55 docs

ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.

Overview

The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.

source → tokenize → parse → fold → index
                                      ↓
                              symbols, references,
                              call sites, imports,
                              exports, reverse refs

Two CLI commands expose this:

Command Purpose
pit index <file> Produce the full semantic index as JSON
pit explain Query the index for a specific symbol or position

pit index

Index a source file and print the result as JSON.

pit index <file.ce|file.cm>
pit index <file> -o output.json

Output

The index contains these sections:

Section Description
imports All use() calls with local name, module path, and span
symbols Every declaration: vars, defs, functions, params
references Every use of a name, classified as read, write, or call
call_sites Every function call with callee, args count, and enclosing function
exports For .cm modules, the keys of the top-level return record
reverse_refs Inverted index: name to list of reference spans

Example

Given a file graph.ce with functions make_node, connect, and build_graph:

pit index graph.ce
{
  "version": 1,
  "path": "graph.ce",
  "is_actor": true,
  "imports": [
    {"local_name": "json", "module_path": "json", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
  ],
  "symbols": [
    {
      "symbol_id": "graph.ce:make_node:fn",
      "name": "make_node",
      "kind": "fn",
      "params": ["name", "kind"],
      "doc_comment": "// A node in the graph.",
      "decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
      "scope_fn_nr": 0
    }
  ],
  "references": [
    {"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
  ],
  "call_sites": [
    {"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
  ],
  "exports": [],
  "reverse_refs": {
    "make_node": [
      {"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
    ]
  }
}

Symbol Kinds

Kind Description
fn Function (var or def with function value)
var Mutable variable
def Constant
param Function parameter

Each symbol has a symbol_id in the format filename:name:kind and a decl_span with from_row, from_col, to_row, to_col (0-based).

Reference Kinds

Kind Description
read Value is read
write Value is assigned
call Used as a function call target

Module Exports

For .cm files, the indexer detects the top-level return statement. If it returns a record literal, each key becomes an export linked to its symbol:

// math_utils.cm
var add = function(a, b) { return a + b }
var sub = function(a, b) { return a - b }
return {add: add, sub: sub}
pit index math_utils.cm

The exports section will contain:

[
  {"name": "add", "symbol_id": "math_utils.cm:add:fn"},
  {"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
]

pit explain

Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.

pit explain --span <file>:<line>:<col>
pit explain --symbol <name> <file>

--span: What is at this position?

Point at a line and column (0-based) to find out what symbol or reference is there.

pit explain --span demo.ce:6:4

If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.

The result includes:

Field Description
symbol The resolved declaration (name, kind, params, doc comment, span)
reference The reference at the cursor, if the cursor was on a reference
references All references to this symbol across the file
call_sites All call sites for this symbol
imports The file's imports (for context)
{
  "symbol": {
    "name": "build_graph",
    "symbol_id": "demo.ce:build_graph:fn",
    "kind": "fn",
    "params": [],
    "doc_comment": "// Build a sample graph and return it."
  },
  "references": [
    {"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
  ],
  "call_sites": []
}

--symbol: Find a symbol by name

Look up a symbol by name, returning all matching declarations and every reference.

pit explain --symbol connect demo.ce
{
  "symbols": [
    {
      "name": "connect",
      "symbol_id": "demo.ce:connect:fn",
      "kind": "fn",
      "params": ["from", "to", "label"],
      "doc_comment": "// Connect two nodes with a labeled edge."
    }
  ],
  "references": [
    {"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
    {"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
    {"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
  ],
  "call_sites": [
    {"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
    {"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
    {"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
  ]
}

This tells you: connect is a function taking (from, to, label), declared on line 11, and called 3 times inside build_graph.

Programmatic Use

The index and explain modules can be used directly from ƿit scripts:

index.cm

var tokenize_mod = use('tokenize')
var parse_mod = use('parse')
var fold_mod = use('fold')
var index_mod = use('index')

var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod}
var idx = index_mod.index_file(src, filename, pipeline)

index_file runs the full pipeline (tokenize, parse, fold) and returns the index. If you already have a parsed AST and tokens, use index_ast instead:

var idx = index_mod.index_ast(ast, tokens, filename)

explain.cm

var explain_mod = use('explain')
var expl = explain_mod.make(idx)

// What is at line 10, column 5?
var result = expl.at_span(10, 5)

// Find all symbols named "connect"
var result = expl.by_symbol("connect")

// Get callers and callees of a symbol
var chain = expl.call_chain("demo.ce:connect:fn", 2)

For cross-file queries:

var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")

LSP Integration

The semantic index powers these LSP features:

Feature LSP Method Description
Find References textDocument/references All references to the symbol under the cursor
Rename textDocument/rename Rename a symbol and all its references
Prepare Rename textDocument/prepareRename Validate that the cursor is on a renameable symbol
Go to Definition textDocument/definition Jump to a symbol's declaration (index-backed with AST fallback)

These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.