cell/semantic-index.md at a1b41d5ecf00a943fe6ee45559b3f05de8bc846c

john/cell

Files

John Alanbrook bd7f9f34ec simplify compilation requestors

2026-02-18 10:46:47 -06:00

12 KiB

Raw Blame History

title, description, weight, type

title	description	weight	type
Semantic Index	Index and query symbols, references, and call sites in source files	55	docs

ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.

Overview

The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.

source → tokenize → parse → fold → index
                                      ↓
                              symbols, references,
                              call sites, imports,
                              exports, reverse refs

Two CLI commands expose this:

Command	Purpose
`pit index <file>`	Produce the full semantic index as JSON
`pit explain`	Query the index for a specific symbol or position

pit index

Index a source file and print the result as JSON.

pit index <file.ce|file.cm>
pit index <file> -o output.json

Output

The index contains these sections:

Section	Description
`imports`	All `use()` calls with local name, module path, resolved filesystem path, and span
`symbols`	Every declaration: vars, defs, functions, params
`references`	Every use of a name, classified as read, write, or call
`call_sites`	Every function call with callee, args count, and enclosing function
`exports`	For `.cm` modules, the keys of the top-level `return` record
`reverse_refs`	Inverted index: name to list of reference spans

Example

Given a file graph.ce with functions make_node, connect, and build_graph:

pit index graph.ce

{
  "version": 1,
  "path": "graph.ce",
  "is_actor": true,
  "imports": [
    {"local_name": "json", "module_path": "json", "resolved_path": ".cell/packages/core/json.cm", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
  ],
  "symbols": [
    {
      "symbol_id": "graph.ce:make_node:fn",
      "name": "make_node",
      "kind": "fn",
      "params": ["name", "kind"],
      "doc_comment": "// A node in the graph.",
      "decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
      "scope_fn_nr": 0
    }
  ],
  "references": [
    {"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
  ],
  "call_sites": [
    {"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
  ],
  "exports": [],
  "reverse_refs": {
    "make_node": [
      {"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
    ]
  }
}

Symbol Kinds

Kind	Description
`fn`	Function (var or def with function value)
`var`	Mutable variable
`def`	Constant
`param`	Function parameter

Each symbol has a symbol_id in the format filename:name:kind and a decl_span with from_row, from_col, to_row, to_col (0-based).

Reference Kinds

Kind	Description
`read`	Value is read
`write`	Value is assigned
`call`	Used as a function call target

Module Exports

For .cm files, the indexer detects the top-level return statement. If it returns a record literal, each key becomes an export linked to its symbol:

// math_utils.cm
var add = function(a, b) { return a + b }
var sub = function(a, b) { return a - b }
return {add: add, sub: sub}

pit index math_utils.cm

The exports section will contain:

[
  {"name": "add", "symbol_id": "math_utils.cm:add:fn"},
  {"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
]

pit explain

Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.

pit explain --span <file>:<line>:<col>
pit explain --symbol <name> <file>...

--span: What is at this position?

Point at a line and column (0-based) to find out what symbol or reference is there.

pit explain --span demo.ce:6:4

If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.

The result includes:

Field	Description
`symbol`	The resolved declaration (name, kind, params, doc comment, span)
`reference`	The reference at the cursor, if the cursor was on a reference
`references`	All references to this symbol across the file
`call_sites`	All call sites for this symbol
`imports`	The file's imports (for context)

{
  "symbol": {
    "name": "build_graph",
    "symbol_id": "demo.ce:build_graph:fn",
    "kind": "fn",
    "params": [],
    "doc_comment": "// Build a sample graph and return it."
  },
  "references": [
    {"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
  ],
  "call_sites": []
}

--symbol: Find a symbol by name

Look up a symbol by name. Pass one file for a focused result, or multiple files (including shell globs) to search across them all:

pit explain --symbol connect demo.ce
pit explain --symbol connect *.ce *.cm

{
  "symbols": [
    {
      "name": "connect",
      "symbol_id": "demo.ce:connect:fn",
      "kind": "fn",
      "params": ["from", "to", "label"],
      "doc_comment": "// Connect two nodes with a labeled edge."
    }
  ],
  "references": [
    {"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
    {"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
    {"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
  ],
  "call_sites": [
    {"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
    {"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
    {"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
  ]
}

This tells you: connect is a function taking (from, to, label), declared on line 11, and called 3 times inside build_graph.

Programmatic Use

The index and explain modules can be used directly from ƿit scripts:

Via shop (recommended)

var shop = use('internal/shop')
var idx = shop.index_file(path)

shop.index_file runs the full pipeline (tokenize, parse, index, resolve imports) and caches the result.

index.cm (direct)

If you already have a parsed AST and tokens, use index_ast directly:

var index_mod = use('index')
var idx = index_mod.index_ast(ast, tokens, filename)

explain.cm

var explain_mod = use('explain')
var expl = explain_mod.make(idx)

// What is at line 10, column 5?
var result = expl.at_span(10, 5)

// Find all symbols named "connect"
var result = expl.by_symbol("connect")

// Get callers and callees of a symbol
var chain = expl.call_chain("demo.ce:connect:fn", 2)

For cross-file queries:

var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")

LSP Integration

The semantic index powers these LSP features:

Feature	LSP Method	Description
Find References	`textDocument/references`	All references to the symbol under the cursor
Rename	`textDocument/rename`	Rename a symbol and all its references
Prepare Rename	`textDocument/prepareRename`	Validate that the cursor is on a renameable symbol
Go to Definition	`textDocument/definition`	Jump to a symbol's declaration (index-backed with AST fallback)

These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.

LLM / AI Assistance

The semantic index is designed to give LLMs the context they need to read and edit ƿit code accurately. ƿit is not in any training set, so an LLM cannot rely on memorized patterns — it needs structured information about names, scopes, and call relationships. The commands below are the recommended way to provide that.

Understand a file before editing

Before modifying a file, index it to see its structure:

pit index file.ce

This gives the LLM every declaration, every reference, every call site, and the import list with resolved paths. Key things to extract:

symbols — what functions exist, their parameters, and their doc comments. This is enough to understand the file's API without reading every line.
imports with resolved_path — which modules are used, and where they live on disk. The LLM can follow these paths to read dependency source when it needs to understand a called function. Imports without a resolved_path are C built-ins (like json) with no script source to read.
exports — for .cm modules, what the public API is. This tells the LLM what names other files can access.

Investigate a specific symbol

When the LLM needs to rename, refactor, or understand a specific function:

pit explain --symbol update analysis.cm

This returns the declaration (with doc comment and parameter list), every reference, and every call site. The LLM can use this to:

Rename safely — the references list has exact spans for every use of the name.
Understand callers — call_sites shows where and how the function is called, including argument counts.
Read the doc comment — often enough to understand intent without reading the function body.

Investigate a cursor position

When the LLM is looking at a specific line and column (e.g., from an error message or a user selection):

pit explain --span file.ce:17:4

This resolves whatever is at that position — declaration or reference — back to the underlying symbol, then returns all references and call sites. Useful for "what is this name?" queries.

Search across files

To find a symbol across multiple files, pass them all:

pit explain --symbol connect *.ce *.cm
pit explain --symbol send server.ce client.ce protocol.cm

This indexes each file and searches across all of them. The result merges all matching declarations, references, and call sites. Use this when the LLM needs to understand cross-file usage before making a change that touches multiple files.

Import resolution

Every import in the index includes the original module_path (the string passed to use()). For script modules, it also includes resolved_path — the filesystem path the module resolves to. This lets the LLM follow dependency chains:

{"local_name": "fd", "module_path": "fd", "resolved_path": ".cell/packages/core/fd.cm"}
{"local_name": "json", "module_path": "json"}

An import without resolved_path is a C built-in — no script source to read.

Recommended workflow

Start with pit index on the file to edit. Scan imports and symbols for an overview.
Use pit explain --symbol to drill into any function the LLM needs to understand or modify. The doc comment and parameter list are usually sufficient.
Follow resolved_path on imports when the LLM needs to understand a dependency — index or read the resolved file.
Before renaming, use pit explain --symbol (or --span) to get all reference spans, then apply edits to each span.
For cross-file changes, pass all affected files to pit explain --symbol to see the full picture before editing.

12 KiB Raw Blame History