improved semantic indexing

2026-02-17 01:08:10 -06:00
parent 0ac575db85
commit 2633fb986f
5 changed files with 217 additions and 65 deletions
--- a/docs/semantic-index.md
+++ b/docs/semantic-index.md
@@ -41,7 +41,7 @@ The index contains these sections:

 | Section | Description |
 |---------|-------------|
-| `imports` | All `use()` calls with local name, module path, and span |
+| `imports` | All `use()` calls with local name, module path, resolved filesystem path, and span |
 | `symbols` | Every declaration: vars, defs, functions, params |
 | `references` | Every use of a name, classified as read, write, or call |
 | `call_sites` | Every function call with callee, args count, and enclosing function |
@@ -62,7 +62,7 @@ pit index graph.ce
  "path": "graph.ce",
  "is_actor": true,
  "imports": [
-    {"local_name": "json", "module_path": "json", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
+    {"local_name": "json", "module_path": "json", "resolved_path": ".cell/packages/core/json.cm", "span": {"from_row": 2, "from_col": 0, "to_row": 2, "to_col": 22}}
  ],
  "symbols": [
    {
@@ -139,7 +139,7 @@ Query the semantic index for a specific symbol or cursor position. This is the t

 ```bash
 pit explain --span <file>:<line>:<col>
-pit explain --symbol <name> <file>
+pit explain --symbol <name> <file>...
 ```

 ### --span: What is at this position?
@@ -180,10 +180,11 @@ The result includes:

 ### --symbol: Find a symbol by name

-Look up a symbol by name, returning all matching declarations and every reference.
+Look up a symbol by name. Pass one file for a focused result, or multiple files (including shell globs) to search across them all:

 ```bash
 pit explain --symbol connect demo.ce
+pit explain --symbol connect *.ce *.cm
 ```

 ```json
@@ -268,3 +269,75 @@ The semantic index powers these LSP features:
 | Go to Definition | `textDocument/definition` | Jump to a symbol's declaration (index-backed with AST fallback) |

 These work automatically in any editor with ƿit LSP support. The index is rebuilt on every file change.
+
+## LLM / AI Assistance
+
+The semantic index is designed to give LLMs the context they need to read and edit ƿit code accurately. ƿit is not in any training set, so an LLM cannot rely on memorized patterns — it needs structured information about names, scopes, and call relationships. The commands below are the recommended way to provide that.
+
+### Understand a file before editing
+
+Before modifying a file, index it to see its structure:
+
+```bash
+pit index file.ce
+```
+
+This gives the LLM every declaration, every reference, every call site, and the import list with resolved paths. Key things to extract:
+
+- **`symbols`** — what functions exist, their parameters, and their doc comments. This is enough to understand the file's API without reading every line.
+- **`imports`** with `resolved_path` — which modules are used, and where they live on disk. The LLM can follow these paths to read dependency source when it needs to understand a called function. Imports without a `resolved_path` are C built-ins (like `json`) with no script source to read.
+- **`exports`** — for `.cm` modules, what the public API is. This tells the LLM what names other files can access.
+
+### Investigate a specific symbol
+
+When the LLM needs to rename, refactor, or understand a specific function:
+
+```bash
+pit explain --symbol update analysis.cm
+```
+
+This returns the declaration (with doc comment and parameter list), every reference, and every call site. The LLM can use this to:
+
+- **Rename safely** — the references list has exact spans for every use of the name.
+- **Understand callers** — `call_sites` shows where and how the function is called, including argument counts.
+- **Read the doc comment** — often enough to understand intent without reading the function body.
+
+### Investigate a cursor position
+
+When the LLM is looking at a specific line and column (e.g., from an error message or a user selection):
+
+```bash
+pit explain --span file.ce:17:4
+```
+
+This resolves whatever is at that position — declaration or reference — back to the underlying symbol, then returns all references and call sites. Useful for "what is this name?" queries.
+
+### Search across files
+
+To find a symbol across multiple files, pass them all:
+
+```bash
+pit explain --symbol connect *.ce *.cm
+pit explain --symbol send server.ce client.ce protocol.cm
+```
+
+This indexes each file and searches across all of them. The result merges all matching declarations, references, and call sites. Use this when the LLM needs to understand cross-file usage before making a change that touches multiple files.
+
+### Import resolution
+
+Every import in the index includes the original `module_path` (the string passed to `use()`). For script modules, it also includes `resolved_path` — the filesystem path the module resolves to. This lets the LLM follow dependency chains:
+
+```json
+{"local_name": "fd", "module_path": "fd", "resolved_path": ".cell/packages/core/fd.cm"}
+{"local_name": "json", "module_path": "json"}
+```
+
+An import without `resolved_path` is a C built-in — no script source to read.
+
+### Recommended workflow
+
+1. **Start with `pit index`** on the file to edit. Scan imports and symbols for an overview.
+2. **Use `pit explain --symbol`** to drill into any function the LLM needs to understand or modify. The doc comment and parameter list are usually sufficient.
+3. **Follow `resolved_path`** on imports when the LLM needs to understand a dependency — index or read the resolved file.
+4. **Before renaming**, use `pit explain --symbol` (or `--span`) to get all reference spans, then apply edits to each span.
+5. **For cross-file changes**, pass all affected files to `pit explain --symbol` to see the full picture before editing.