cell/docs/semantic-index.md

---
title: "Semantic Index"
description: "Index and query symbols, references, and call sites in source files"
weight: 55
type: "docs"
---

ƿit includes a semantic indexer that extracts symbols, references, call sites, and imports from source files. The index powers the LSP (find references, rename) and is available as a CLI tool for scripting and debugging.

## Overview

The indexer walks the parsed AST without modifying it. It produces a JSON structure that maps every declaration, every reference to that declaration, and every call site in a file.

```
source → tokenize → parse → fold → index
                                      ↓
                              symbols, references,
                              call sites, imports,
                              exports, reverse refs
```

Two CLI commands expose this:

| Command | Purpose |
|---------|---------|
| `pit index <file>` | Produce the full semantic index as JSON |
| `pit explain` | Query the index for a specific symbol or position |

## pit index

Index a source file and print the result as JSON.

```bash
pit index <file.ce|file.cm>
pit index <file> -o output.json
```

### Output

The index contains these sections:

| Section | Description |
|---------|-------------|
| `imports` | All `use()` calls with local name, module path, and span |
| `symbols` | Every declaration: vars, defs, functions, params |
| `references` | Every use of a name, classified as read, write, or call |
| `call_sites` | Every function call with callee, args count, and enclosing function |
| `exports` | For `.cm` modules, the keys of the top-level `return` record |
| `reverse_refs` | Inverted index: name to list of reference spans |

### Example

Given a file `graph.ce` with functions `make_node`, `connect`, and `build_graph`:

```bash
pit index graph.ce
```

```json
{
  "version": 1,
  "path": "graph.ce",
  "is_actor": true,
  "imports": [
    {"local_name": "json", "module_path": "json", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
  ],
  "symbols": [
    {
      "symbol_id": "graph.ce:make_node:fn",
      "name": "make_node",
      "kind": "fn",
      "params": ["name", "kind"],
      "doc_comment": "// A node in the graph.",
      "decl_span": {"from_row": 6, "from_col": 0, "to_row": 8, "to_col": 1},
      "scope_fn_nr": 0
    }
  ],
  "references": [
    {"node_id": 20, "name": "make_node", "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
  ],
  "call_sites": [
    {"node_id": 20, "callee": "make_node", "args_count": 2, "span": {"from_row": 17, "from_col": 22, "to_row": 17, "to_col": 40}}
  ],
  "exports": [],
  "reverse_refs": {
    "make_node": [
      {"node_id": 20, "ref_kind": "call", "span": {"from_row": 17, "from_col": 13, "to_row": 17, "to_col": 22}}
    ]
  }
}
```

### Symbol Kinds

| Kind | Description |
|------|-------------|
| `fn` | Function (var or def with function value) |
| `var` | Mutable variable |
| `def` | Constant |
| `param` | Function parameter |

Each symbol has a `symbol_id` in the format `filename:name:kind` and a `decl_span` with `from_row`, `from_col`, `to_row`, `to_col` (0-based).

### Reference Kinds

| Kind | Description |
|------|-------------|
| `read` | Value is read |
| `write` | Value is assigned |
| `call` | Used as a function call target |

### Module Exports

For `.cm` files, the indexer detects the top-level `return` statement. If it returns a record literal, each key becomes an export linked to its symbol:

```javascript
// math_utils.cm
var add = function(a, b) { return a + b }
var sub = function(a, b) { return a - b }
return {add: add, sub: sub}
```

```bash
pit index math_utils.cm
```

The `exports` section will contain:

```json
[
  {"name": "add", "symbol_id": "math_utils.cm:add:fn"},
  {"name": "sub", "symbol_id": "math_utils.cm:sub:fn"}
]
```

## pit explain

Query the semantic index for a specific symbol or cursor position. This is the targeted query interface — instead of dumping the full index, it answers a specific question.

```bash
pit explain --span <file>:<line>:<col>
pit explain --symbol <name> <file>
```

### --span: What is at this position?

Point at a line and column (0-based) to find out what symbol or reference is there.

```bash
pit explain --span demo.ce:6:4
```

If the position lands on a declaration, that symbol is returned along with all its references and call sites. If it lands on a reference, the indexer traces back to the declaration and returns the same information.

The result includes:

| Field | Description |
|-------|-------------|
| `symbol` | The resolved declaration (name, kind, params, doc comment, span) |
| `reference` | The reference at the cursor, if the cursor was on a reference |
| `references` | All references to this symbol across the file |
| `call_sites` | All call sites for this symbol |
| `imports` | The file's imports (for context) |

```json
{
  "symbol": {
    "name": "build_graph",
    "symbol_id": "demo.ce:build_graph:fn",
    "kind": "fn",
    "params": [],
    "doc_comment": "// Build a sample graph and return it."
  },
  "references": [
    {"node_id": 71, "ref_kind": "call", "span": {"from_row": 39, "from_col": 12, "to_row": 39, "to_col": 23}}
  ],
  "call_sites": []
}
```

### --symbol: Find a symbol by name

Look up a symbol by name, returning all matching declarations and every reference.

```bash
pit explain --symbol connect demo.ce
```

```json
{
  "symbols": [
    {
      "name": "connect",
      "symbol_id": "demo.ce:connect:fn",
      "kind": "fn",
      "params": ["from", "to", "label"],
      "doc_comment": "// Connect two nodes with a labeled edge."
    }
  ],
  "references": [
    {"node_id": 29, "ref_kind": "call", "span": {"from_row": 21, "from_col": 2, "to_row": 21, "to_col": 9}},
    {"node_id": 33, "ref_kind": "call", "span": {"from_row": 22, "from_col": 2, "to_row": 22, "to_col": 9}},
    {"node_id": 37, "ref_kind": "call", "span": {"from_row": 23, "from_col": 2, "to_row": 23, "to_col": 9}}
  ],
  "call_sites": [
    {"callee": "connect", "args_count": 3, "span": {"from_row": 21, "from_col": 9, "to_row": 21, "to_col": 29}},
    {"callee": "connect", "args_count": 3, "span": {"from_row": 22, "from_col": 9, "to_row": 22, "to_col": 31}},
    {"callee": "connect", "args_count": 3, "span": {"from_row": 23, "from_col": 9, "to_row": 23, "to_col": 29}}
  ]
}
```

This tells you: `connect` is a function taking `(from, to, label)`, declared on line 11, and called 3 times inside `build_graph`.

## Programmatic Use

The index and explain modules can be used directly from ƿit scripts:

### index.cm

```javascript
var tokenize_mod = use('tokenize')
var parse_mod = use('parse')
var fold_mod = use('fold')
var index_mod = use('index')

var pipeline = {tokenize: tokenize_mod, parse: parse_mod, fold: fold_mod}
var idx = index_mod.index_file(src, filename, pipeline)
```

`index_file` runs the full pipeline (tokenize, parse, fold) and returns the index. If you already have a parsed AST and tokens, use `index_ast` instead:

```javascript
var idx = index_mod.index_ast(ast, tokens, filename)
```

### explain.cm

```javascript
var explain_mod = use('explain')
var expl = explain_mod.make(idx)

// What is at line 10, column 5?
var result = expl.at_span(10, 5)

// Find all symbols named "connect"
var result = expl.by_symbol("connect")

// Get callers and callees of a symbol
var chain = expl.call_chain("demo.ce:connect:fn", 2)
```

For cross-file queries:

```javascript
var result = explain_mod.explain_across([idx1, idx2, idx3], "connect")
```

## LSP Integration

The semantic index powers these LSP features:

| Feature | LSP Method | Description |
|---------|------------|-------------|
| Find References | `textDocument/references` | All references to the symbol under the cursor |
| Rename | `textDocument/rename` | Rename a symbol and all its references |
| Prepare Rename | `textDocument/prepareRename` | Validate that the cursor is on a renameable symbol |
| Go to Definition | `textDocument/definition` | Jump to a symbol's declaration (index-backed with AST fallback) |

These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.