Files
cell/gc_plan.md
2026-02-01 20:58:42 -06:00

404 lines
13 KiB
Markdown

# Plan: Complete Copying GC Implementation
## Overview
Remove reference counting (DupValue/FreeValue) entirely and complete the Cheney copying garbage collector. Each JSContext will use bump allocation from a heap block, and when out of memory, request a new heap from JSRuntime's buddy allocator and copy live objects to the new heap.
## Target Architecture (from docs/memory.md)
### Object Types (simplified from current):
**Type 0 - Array**: `{ header, length, elements[] }`
**Type 1 - Blob**: `{ header, length, bits[] }`
**Type 2 - Text**: `{ header, length_or_hash, packed_chars[] }`
**Type 3 - Record**: `{ header, prototype, length, key_value_pairs[] }`
**Type 4 - Function**: `{ header, code_ptr, outer_frame_ptr }` - 3 words only, always stone
**Type 5 - Frame**: `{ header, function_ptr, caller_ptr, ret_addr, args[], closure_vars[], local_vars[], temps[] }`
**Type 6 - Code**: Lives in immutable memory only, never copied
**Type 7 - Forward**: Object has moved; cap56 contains new address
### Key Design Points:
- **JSFunction** is just a pointer to code and a pointer to the frame that created it (3 words)
- **Closure variables live in frames** - when a function returns, its frame is "reduced" to just the closure variables
- **Code objects are immutable** - stored in stone memory, never copied during GC
- **Frame reduction**: When a function returns, `caller` is set to zero, signaling the frame can be shrunk
## Current State (needs refactoring)
1. **Partial Cheney GC exists** at `source/quickjs.c:1844-2030`: `ctx_gc`, `gc_copy_value`, `gc_scan_object`
2. **744 calls to JS_DupValue/JS_FreeValue** scattered throughout (currently undefined, causing compilation errors)
3. **Current JSFunction** is bloated (has kind, name, union of cfunc/bytecode/bound) - needs simplification
4. **Current JSVarRef** is a separate object - should be eliminated, closures live in frames
5. **Bump allocator** in `js_malloc` (line 1495) with `heap_base`/`heap_free`/`heap_end`
6. **Buddy allocator** for memory blocks (lines 1727-1837)
7. **Header offset inconsistency** - some structs have header at offset 0, some at offset 8
## Implementation Steps
### Phase 1: Define No-Op DupValue/FreeValue (To Enable Compilation)
Add these near line 100 in `source/quickjs.c`:
```c
/* Copying GC - no reference counting needed */
#define JS_DupValue(ctx, v) (v)
#define JS_FreeValue(ctx, v) ((void)0)
#define JS_DupValueRT(rt, v) (v)
#define JS_FreeValueRT(rt, v) ((void)0)
```
This makes the code compile while keeping existing call sites (they become no-ops).
### Phase 2: Standardize Object Headers (offset 0)
Remove `JSGCObjectHeader` (ref counting remnant) and put `objhdr_t` at offset 0:
```c
typedef struct JSArray {
objhdr_t hdr; // offset 0
word_t length;
JSValue values[];
} JSArray;
typedef struct JSRecord {
objhdr_t hdr; // offset 0
JSRecord *proto;
word_t length;
slot slots[];
} JSRecord;
typedef struct JSText {
objhdr_t hdr; // offset 0
word_t length; // pretext: length, text: hash
word_t packed[];
} JSText;
typedef struct JSBlob {
objhdr_t hdr; // offset 0
word_t length;
uint8_t bits[];
} JSBlob;
/* Simplified JSFunction per memory.md - 3 words */
typedef struct JSFunction {
objhdr_t hdr; // offset 0, always stone
JSCode *code; // pointer to immutable code object
struct JSFrame *outer; // frame that created this function
} JSFunction;
/* JSFrame per memory.md */
typedef struct JSFrame {
objhdr_t hdr; // offset 0
JSFunction *function; // function being executed
struct JSFrame *caller; // calling frame (NULL = reduced/returned)
word_t ret_addr; // return instruction address
JSValue slots[]; // args, closure vars, locals, temps
} JSFrame;
/* JSCode - always in immutable (stone) memory */
typedef struct JSCode {
objhdr_t hdr; // offset 0, always stone
word_t arity; // max number of inputs
word_t frame_size; // capacity of activation frame
word_t closure_size; // reduced capacity for returned frames
word_t entry_point; // address to begin execution
word_t disruption_point;// address of disruption clause
uint8_t bytecode[]; // actual bytecode
} JSCode;
```
### Phase 3: Complete gc_object_size for All Types
Update `gc_object_size` (line 1850) to read header at offset 0:
```c
static size_t gc_object_size(void *ptr) {
objhdr_t hdr = *(objhdr_t*)ptr; // Header at offset 0
uint8_t type = objhdr_type(hdr);
uint64_t cap = objhdr_cap56(hdr);
switch (type) {
case OBJ_ARRAY:
return sizeof(JSArray) + cap * sizeof(JSValue);
case OBJ_BLOB:
return sizeof(JSBlob) + (cap + 7) / 8; // cap is bits
case OBJ_TEXT:
return sizeof(JSText) + ((cap + 1) / 2) * sizeof(uint64_t);
case OBJ_RECORD:
return sizeof(JSRecord) + (cap + 1) * sizeof(slot); // cap is mask
case OBJ_FUNCTION:
return sizeof(JSFunction); // 3 words
case OBJ_FRAME:
return sizeof(JSFrame) + cap * sizeof(JSValue); // cap is slot count
case OBJ_CODE:
return 0; // Code is never copied (immutable)
default:
return 64; // Conservative fallback
}
}
```
### Phase 4: Complete gc_scan_object for All Types
Update `gc_scan_object` (line 1924):
```c
static void gc_scan_object(JSContext *ctx, void *ptr, uint8_t **to_free, uint8_t *to_end) {
objhdr_t hdr = *(objhdr_t*)ptr;
uint8_t type = objhdr_type(hdr);
switch (type) {
case OBJ_ARRAY: {
JSArray *arr = (JSArray*)ptr;
for (uint32_t i = 0; i < arr->length; i++) {
arr->values[i] = gc_copy_value(ctx, arr->values[i], to_free, to_end);
}
break;
}
case OBJ_RECORD: {
JSRecord *rec = (JSRecord*)ptr;
// Copy prototype
if (rec->proto) {
JSValue proto_val = JS_MKPTR(rec->proto);
proto_val = gc_copy_value(ctx, proto_val, to_free, to_end);
rec->proto = (JSRecord*)JS_VALUE_GET_PTR(proto_val);
}
// Copy table entries
uint32_t mask = objhdr_cap56(rec->hdr);
for (uint32_t i = 1; i <= mask; i++) { // Skip slot 0
JSValue k = rec->slots[i].key;
if (!rec_key_is_empty(k) && !rec_key_is_tomb(k)) {
rec->slots[i].key = gc_copy_value(ctx, k, to_free, to_end);
rec->slots[i].value = gc_copy_value(ctx, rec->slots[i].value, to_free, to_end);
}
}
break;
}
case OBJ_FUNCTION: {
JSFunction *func = (JSFunction*)ptr;
// Code is immutable, don't copy - but outer frame needs copying
if (func->outer) {
JSValue outer_val = JS_MKPTR(func->outer);
outer_val = gc_copy_value(ctx, outer_val, to_free, to_end);
func->outer = (JSFrame*)JS_VALUE_GET_PTR(outer_val);
}
break;
}
case OBJ_FRAME: {
JSFrame *frame = (JSFrame*)ptr;
// Copy function pointer
if (frame->function) {
JSValue func_val = JS_MKPTR(frame->function);
func_val = gc_copy_value(ctx, func_val, to_free, to_end);
frame->function = (JSFunction*)JS_VALUE_GET_PTR(func_val);
}
// Copy caller (unless NULL = reduced frame)
if (frame->caller) {
JSValue caller_val = JS_MKPTR(frame->caller);
caller_val = gc_copy_value(ctx, caller_val, to_free, to_end);
frame->caller = (JSFrame*)JS_VALUE_GET_PTR(caller_val);
}
// Copy all slots (args, closure vars, locals, temps)
uint32_t slot_count = objhdr_cap56(frame->hdr);
for (uint32_t i = 0; i < slot_count; i++) {
frame->slots[i] = gc_copy_value(ctx, frame->slots[i], to_free, to_end);
}
break;
}
case OBJ_TEXT:
case OBJ_BLOB:
case OBJ_CODE:
// No internal references to scan
break;
}
}
```
### Phase 5: Fix gc_copy_value Forwarding
Update `gc_copy_value` (line 1883) for offset 0 headers:
```c
static JSValue gc_copy_value(JSContext *ctx, JSValue v, uint8_t **to_free, uint8_t *to_end) {
if (!JS_IsPtr(v)) return v; // Immediate value
void *ptr = JS_VALUE_GET_PTR(v);
// Stone memory - don't copy (includes Code objects)
objhdr_t hdr = *(objhdr_t*)ptr;
if (objhdr_s(hdr)) return v;
// Check if in current heap
if ((uint8_t*)ptr < ctx->heap_base || (uint8_t*)ptr >= ctx->heap_end)
return v; // External allocation
// Already forwarded?
if (objhdr_type(hdr) == OBJ_FORWARD) {
void *new_ptr = (void*)(uintptr_t)objhdr_cap56(hdr);
return JS_MKPTR(new_ptr);
}
// Copy object to new space
size_t size = gc_object_size(ptr);
void *new_ptr = *to_free;
*to_free += size;
memcpy(new_ptr, ptr, size);
// Leave forwarding pointer in old location
*(objhdr_t*)ptr = objhdr_make((uint64_t)(uintptr_t)new_ptr, OBJ_FORWARD, 0, 0, 0, 0);
return JS_MKPTR(new_ptr);
}
```
### Phase 6: Complete GC Root Tracing
Update `ctx_gc` (line 1966) to trace all roots including JSGCRef:
```c
static int ctx_gc(JSContext *ctx) {
// ... existing setup code ...
// Copy roots: global object, class prototypes, etc. (existing)
ctx->global_obj = gc_copy_value(ctx, ctx->global_obj, &to_free, to_end);
ctx->global_var_obj = gc_copy_value(ctx, ctx->global_var_obj, &to_free, to_end);
// ... other existing root copying ...
// Copy GC root stack (JS_PUSH_VALUE/JS_POP_VALUE)
for (JSGCRef *ref = ctx->top_gc_ref; ref; ref = ref->prev) {
ref->val = gc_copy_value(ctx, ref->val, &to_free, to_end);
}
// Copy GC root list (JS_AddGCRef/JS_DeleteGCRef)
for (JSGCRef *ref = ctx->last_gc_ref; ref; ref = ref->prev) {
ref->val = gc_copy_value(ctx, ref->val, &to_free, to_end);
}
// Copy current exception
ctx->current_exception = gc_copy_value(ctx, ctx->current_exception, &to_free, to_end);
// Cheney scan (existing)
// ...
}
```
### Phase 7: Trigger GC on Allocation Failure
Update `js_malloc` (line 1495):
```c
void *js_malloc(JSContext *ctx, size_t size) {
size = (size + 7) & ~7; // Align to 8 bytes
if ((uint8_t*)ctx->heap_free + size > (uint8_t*)ctx->heap_end) {
if (ctx_gc(ctx) < 0) {
JS_ThrowOutOfMemory(ctx);
return NULL;
}
// Retry after GC
if ((uint8_t*)ctx->heap_free + size > (uint8_t*)ctx->heap_end) {
JS_ThrowOutOfMemory(ctx);
return NULL;
}
}
void *ptr = ctx->heap_free;
ctx->heap_free = (uint8_t*)ctx->heap_free + size;
return ptr;
}
```
### Phase 8: Frame Reduction (for closures)
When a function returns, "reduce" its frame to just closure variables:
```c
static void reduce_frame(JSContext *ctx, JSFrame *frame) {
if (frame->caller == NULL) return; // Already reduced
JSCode *code = frame->function->code;
uint32_t closure_size = code->closure_size;
// Shrink capacity to just closure variables
frame->hdr = objhdr_make(closure_size, OBJ_FRAME, 0, 0, 0, 0);
frame->caller = NULL; // Signal: frame is reduced
}
```
### Phase 9: Remove Unused Reference Counting Code
Delete:
- `gc_decref`, `gc_decref_child` functions
- `gc_scan_incref_child`, `gc_scan_incref_child2` functions
- `JS_GCPhaseEnum`, `gc_phase` fields
- `JSGCObjectHeader` struct (merge into objhdr_t)
- `ref_count` fields from any remaining structs
- `mark_function_children_decref` function
- All `free_*` functions that rely on ref counting
## Files to Modify
1. **source/quickjs.c** - Main implementation:
- Add DupValue/FreeValue no-op macros (~line 100)
- Restructure JSArray, JSBlob, JSText, JSRecord (lines 468-499)
- Simplify JSFunction to 3-word struct (line 1205)
- Add JSFrame as heap object (new)
- Restructure JSCode/JSFunctionBytecode (line 1293)
- Fix gc_object_size (line 1850)
- Fix gc_copy_value (line 1883)
- Complete gc_scan_object (line 1924)
- Update ctx_gc for all roots (line 1966)
- Update js_malloc to trigger GC (line 1495)
- Delete ref counting code throughout
2. **source/quickjs.h** - Public API:
- Remove JSGCObjectHeader
- Update JSValue type checks if needed
- Ensure JS_IsStone works with offset 0 headers
## Execution Order
1. **First**: Add DupValue/FreeValue macros (enables compilation)
2. **Second**: Standardize struct layouts (header at offset 0)
3. **Third**: Fix gc_object_size and gc_copy_value
4. **Fourth**: Complete gc_scan_object for all types
5. **Fifth**: Update ctx_gc with complete root tracing
6. **Sixth**: Wire js_malloc to trigger GC
7. **Seventh**: Add frame reduction for closures
8. **Finally**: Remove ref counting dead code
## Verification
1. **Compile test**: `make` should succeed without errors
2. **Basic test**: Run simple scripts:
```js
var a = [1, 2, 3]
log.console(a[1])
```
3. **Stress test**: Allocate many objects to trigger GC:
```js
for (var i = 0; i < 100000; i++) {
var x = { value: i }
}
log.console("done")
```
4. **Closure test**: Test functions with closures survive GC:
```js
fn make_counter() {
var count = 0
fn inc() { count = count + 1; return count }
return inc
}
var c = make_counter()
log.console(c()) // 1
log.console(c()) // 2
```
5. **GC stress with closures**: Create many closures, trigger GC, verify they still work
## Key Design Decisions (Resolved)
1. **JSCode storage**: Lives in stone (immutable) memory, never copied during GC ✓
2. **Header offset**: Standardized to offset 0 for all heap objects ✓
3. **Closure variables**: Live in JSFrame objects; frames are "reduced" when functions return ✓
4. **JSVarRef**: Eliminated - closures reference their outer frame directly ✓