proper shop caching

2026-02-13 09:04:25 -06:00
parent d26a96bc62
commit f2556c5622
8 changed files with 7215 additions and 6438 deletions
--- a/docs/_index.md
+++ b/docs/_index.md
@@ -34,6 +34,7 @@ pit hello
 - [**Actors and Modules**](/docs/actors/) — the execution model
 - [**Requestors**](/docs/requestors/) — asynchronous composition
 - [**Packages**](/docs/packages/) — code organization and sharing
+- [**Shop Architecture**](/docs/shop/) — module resolution, compilation, and caching

 ## Reference

--- a/docs/shop.md
+++ b/docs/shop.md
@@ -0,0 +1,169 @@
+---
+title: "Shop Architecture"
+description: "How the shop resolves, compiles, caches, and loads modules"
+weight: 35
+type: "docs"
+---
+
+The shop is the module resolution and loading engine behind `use()`. It handles finding modules, compiling them, caching the results, and loading C extensions. The shop lives in `internal/shop.cm`.
+
+## Startup Pipeline
+
+When `pit` runs a program, three layers bootstrap in sequence:
+
+```
+bootstrap.cm → engine.cm → shop.cm → user program
+```
+
+**bootstrap.cm** loads the compiler toolchain (tokenize, parse, fold, mcode, streamline) from pre-compiled bytecode. It defines `analyze()` (source to AST) and `compile_to_blob()` (AST to binary blob). It then loads engine.cm.
+
+**engine.cm** creates the actor runtime (`$_`), defines `use_core()` for loading core modules, and populates the environment that shop receives. It then loads shop.cm via `use_core('internal/shop')`.
+
+**shop.cm** receives its dependencies through the module environment — `analyze`, `run_ast_fn`, `use_cache`, `shop_path`, `runtime_env`, `content_hash`, `cache_path`, and others. It defines `Shop.use()`, which is the function behind every `use()` call in user code.
+
+## Module Resolution
+
+When `use('path')` is called from a package context, the shop resolves the module through a multi-layer search. Both the `.cm` script file and C symbol are resolved independently, and the one with the narrowest scope wins.
+
+### Resolution Order
+
+For a call like `use('sprite')` from package `myapp`:
+
+1. **Own package** — `~/.pit/packages/myapp/sprite.cm` and C symbol `js_myapp_sprite_use`
+2. **Aliased dependencies** — if `myapp/pit.toml` has `renderer = "gitea.pockle.world/john/renderer"`, checks `renderer/sprite.cm` and its C symbols
+3. **Core** — built-in core modules and internal C symbols
+
+For calls without a package context (from core modules), only core is searched.
+
+### Private Modules
+
+Paths starting with `internal/` are private to their package:
+
+```javascript
+use('internal/helpers')  // OK from within the same package
+// Cannot be accessed from other packages
+```
+
+### Explicit Package Imports
+
+Paths containing a dot in the first component are treated as explicit package references:
+
+```javascript
+use('gitea.pockle.world/john/renderer/sprite')
+// Resolves directly to the renderer package's sprite.cm
+```
+
+## Compilation and Caching
+
+Every module goes through a content-addressed caching pipeline. The cache key is the BLAKE2 hash of the source content, so changing the source automatically invalidates the cache.
+
+### Cache Hierarchy
+
+When loading a module, the shop checks (in order):
+
+1. **In-memory cache** — `use_cache[key]`, checked first on every `use()` call
+2. **Native dylib** — pre-compiled platform-specific `.dylib` in the content-addressed store
+3. **Cached .mach blob** — binary bytecode in `~/.pit/build/<hash>.mach`
+4. **Cached .mcode IR** — JSON IR in `~/.pit/build/<hash>.mcode`
+5. **Adjacent .mach/.mcode** — files alongside the source (e.g., `sprite.mach`)
+6. **Source compilation** — full pipeline: analyze, mcode, streamline, serialize
+
+Results from steps 4-6 are cached back to the content-addressed store for future loads.
+
+### Content-Addressed Store
+
+All cached artifacts live in `~/.pit/build/` named by the BLAKE2 hash of their source content:
+
+```
+~/.pit/build/
+├── a1b2c3d4...mach       # compiled bytecode blob
+├── e5f6a7b8...mach       # another compiled module
+├── c9d0e1f2...mcode      # cached JSON IR
+└── f3a4b5c6...macos_arm64.dylib  # native compiled module
+```
+
+This scheme provides automatic cache invalidation: when source changes, its hash changes, and the old cache entry is simply never looked up again.
+
+### Core Module Caching
+
+Core modules loaded via `use_core()` in engine.cm follow the same pattern. On first startup after a fresh install, core modules are compiled from `.cm.mcode` JSON IR and cached as `.mach` blobs. Subsequent startups load from cache, skipping the JSON parse and compile steps entirely.
+
+User scripts (`.ce` files) are also cached. The first run compiles and caches; subsequent runs with unchanged source load from cache.
+
+## C Extension Resolution
+
+C extensions are resolved alongside script modules. A C module is identified by a symbol name derived from the package and file name:
+
+```
+package: gitea.pockle.world/john/prosperon
+file:    sprite.c
+symbol:  js_gitea_pockle_world_john_prosperon_sprite_use
+```
+
+### C Resolution Sources
+
+1. **Internal symbols** — statically linked into the `pit` binary (core modules)
+2. **Per-module dylibs** — loaded from `~/.pit/lib/` via a manifest file
+
+### Manifest Files
+
+Each package with C extensions has a manifest at `~/.pit/lib/<package>.manifest.json` mapping symbol names to dylib paths:
+
+```json
+{
+  "js_mypackage_render_use": "/Users/john/.pit/lib/mypackage_render.dylib",
+  "js_mypackage_audio_use": "/Users/john/.pit/lib/mypackage_audio.dylib"
+}
+```
+
+The shop loads manifests lazily on first access and caches them.
+
+### Combined Resolution
+
+When both a `.cm` script and a C symbol exist for the same module name, both are resolved. The C module is loaded first (as the base), then the `.cm` script can extend it:
+
+```javascript
+// render.cm — extends the C render module
+var c_render = use('internal/render_c')
+// Add ƿit-level helpers on top of C functions
+return record(c_render, {
+  draw_circle: function(x, y, r) { /* ... */ }
+})
+```
+
+## Environment Injection
+
+When a module is loaded, the shop builds an `env` object that becomes the module's set of free variables. This includes:
+
+- **Runtime functions** — `logical`, `some`, `every`, `starts_with`, `ends_with`, `is_actor`, `log`, `send`, `fallback`, `parallel`, `race`, `sequence`
+- **Capability injections** — actor intrinsics like `$self`, `$delay`, `$start`, `$receiver`, `$fd`, etc.
+- **`use` function** — scoped to the module's package context
+
+The set of injected capabilities is controlled by `script_inject_for()`, which can be tuned per package or file.
+
+## Shop Directory Layout
+
+```
+~/.pit/
+├── packages/         # installed packages (directories and symlinks)
+│   └── core -> ...   # symlink to the ƿit core
+├── lib/              # compiled C extension dylibs + manifests
+├── build/            # content-addressed compilation cache
+│   ├── <hash>.mach   # cached bytecode blobs
+│   ├── <hash>.mcode  # cached JSON IR
+│   └── <hash>.<target>.dylib  # native compiled modules
+├── cache/            # downloaded package zip archives
+├── lock.toml         # installed package versions and commit hashes
+└── link.toml         # local development link overrides
+```
+
+## Key Files
+
+| File | Role |
+|------|------|
+| `internal/bootstrap.cm` | Loads compiler, defines `analyze()` and `compile_to_blob()` |
+| `internal/engine.cm` | Actor runtime, `use_core()`, environment setup |
+| `internal/shop.cm` | Module resolution, compilation, caching, C extension loading |
+| `internal/os.c` | OS intrinsics: dylib ops, internal symbol lookup, embedded modules |
+| `package.cm` | Package directory detection, alias resolution, file listing |
+| `link.cm` | Development link management (link.toml read/write) |