Files
cell/plan.md
2026-01-31 14:43:21 -06:00

339 lines
14 KiB
Markdown

# Cell/QuickJS Refactoring Plan: Remove Atoms, Shapes, and Dual-Encoding
## Overview
Refactor `source/quickjs.c` to match `docs/memory.md` specification:
- Remove JSAtom system (171 references → ~41 remaining)
- Remove JSShape system (94 references) ✓
- Remove IC caches (shape-based inline caches) ✓
- Remove `is_wide_char` dual-encoding (18 locations) ✓
- Use JSValue texts directly as property keys
- Reference: `mquickjs.c` shows the target pattern
## Completed Phases
### Phase 1: Remove is_wide_char Remnants ✓
### Phase 2: Remove IC Caches ✓
### Phase 3: Remove JSShape System ✓
### Phase 4: Complete Property Access with JSValue Keys ✓
Completed:
- Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_get_field
- Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_put_field
- Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_define_field
- Created emit_key() function that adds JSValue to cpool and emits index
---
## Phase 5: Convert JSAtom to JSValue Text (IN PROGRESS)
This is the core transformation. All identifier handling moves from atoms to JSValue.
### Completed Items
**Token and Parser Infrastructure:**
- [x] Change JSToken.u.ident.atom to JSToken.u.ident.str (JSValue)
- [x] Change parse_ident() to return JSValue
- [x] Create emit_key() function (cpool-based)
- [x] Create JS_KEY_* macros for common names (lines ~279-335 in quickjs.c)
- [x] Update all token.u.ident.atom references to .str
- [x] Create keyword lookup table (js_keywords[]) with string comparison
- [x] Rewrite update_token_ident() to use js_keyword_lookup()
- [x] Rewrite is_strict_future_keyword() to use JSValue
- [x] Update token_is_pseudo_keyword() to use JSValue and js_key_equal()
**Function Declaration Parsing:**
- [x] Update js_parse_function_decl() signature to use JSValue func_name
- [x] Update js_parse_function_decl2() to use JSValue func_name throughout
- [x] Update js_parse_function_check_names() to use JSValue
- [x] Convert JS_DupAtom/JS_FreeAtom to JS_DupValue/JS_FreeValue in function parsing
**Variable Definition and Lookup:**
- [x] Update find_global_var() to use JSValue and js_key_equal()
- [x] Update find_lexical_global_var() to use JSValue
- [x] Update find_lexical_decl() to use JSValue and js_key_equal()
- [x] Update js_define_var() to use JSValue
- [x] Update js_parse_check_duplicate_parameter() to use JSValue and js_key_equal()
- [x] Update js_parse_destructuring_var() to return JSValue
- [x] Update js_parse_var() to use JSValue for variable names
**Comparison Helpers:**
- [x] Create js_key_equal_str() for comparing JSValue with C string literals
- [x] Update is_var_in_arg_scope() to use js_key_equal/js_key_equal_str
- [x] Update has_with_scope() to use js_key_equal_str
- [x] Update closure variable comparisons (cv->var_name) to use js_key_equal_str
**Property Access:**
- [x] Fix JS_GetPropertyStr to create proper JSValue keys
- [x] Fix JS_SetPropertyInternal callers to use JS_KEY_* instead of JS_ATOM_*
### JS_KEY_* Macros Added
Compile-time immediate ASCII string constants (≤7 chars):
```c
JS_KEY_empty, JS_KEY_name, JS_KEY_message, JS_KEY_stack,
JS_KEY_errors, JS_KEY_Error, JS_KEY_cause, JS_KEY_length,
JS_KEY_value, JS_KEY_get, JS_KEY_set, JS_KEY_raw,
JS_KEY_flags, JS_KEY_source, JS_KEY_exec, JS_KEY_toJSON,
JS_KEY_eval, JS_KEY_this, JS_KEY_true, JS_KEY_false,
JS_KEY_null, JS_KEY_NaN, JS_KEY_default, JS_KEY_index,
JS_KEY_input, JS_KEY_groups, JS_KEY_indices, JS_KEY_let,
JS_KEY_var, JS_KEY_new, JS_KEY_of, JS_KEY_yield,
JS_KEY_async, JS_KEY_target, JS_KEY_from, JS_KEY_meta,
JS_KEY_as, JS_KEY_with
```
Runtime macro for strings >7 chars:
```c
#define JS_KEY_STR(ctx, str) JS_NewStringLen((ctx), (str), sizeof(str) - 1)
```
Helper function for comparing JSValue with C string literals:
```c
static JS_BOOL js_key_equal_str(JSValue a, const char *str);
```
### Remaining Work
#### 5.3 Update js_parse_property_name() ✓
- [x] Change return type from JSAtom* to JSValue*
- [x] Update all callers (js_parse_object_literal, etc.)
- [x] Updated get_lvalue(), put_lvalue(), js_parse_destructuring_element()
#### 5.4 Replace remaining emit_atom() calls with emit_key() ✓
- [x] Removed emit_atom wrapper function
- [x] Changed last emit_atom(JS_ATOM_this) to emit_key(JS_KEY_this)
#### 5.5 Update Variable Opcode Format in quickjs-opcode.h
- [ ] Change `atom` format opcodes to `key` format
- [ ] Change `atom_u8` and `atom_u16` to `key_u8` and `key_u16`
#### 5.6 Update VM Opcode Handlers ✓
These now read cpool indices and look up JSValue:
- [x] OP_check_var, OP_get_var_undef, OP_get_var
- [x] OP_put_var, OP_put_var_init, OP_put_var_strict
- [x] OP_set_name, OP_make_var_ref, OP_delete_var
- [x] OP_define_var, OP_define_func, OP_throw_error
- [x] OP_make_loc_ref, OP_make_arg_ref
- [x] OP_define_method, OP_define_method_computed
#### 5.7 Update resolve_scope_var() ✓
- [x] Changed signature to use JSValue var_name
- [x] Updated all comparisons to use js_key_equal()/js_key_equal_str()
- [x] Updated var_object_test() to use JSValue
- [x] Updated optimize_scope_make_global_ref() to use JSValue
- [x] Updated resolve_variables() callers to read from cpool
#### 5.8 Convert Remaining JS_ATOM_* Usages
Categories remaining:
- Some debug/print functions still use JSAtom
- Some function signatures not yet converted
- Will be addressed in Phase 7 cleanup
---
## Phase 6: Update Bytecode Serialization ✓
### 6.1 JS_WriteObjectTag Changes ✓
- [x] Changed JS_WriteObjectTag to use bc_put_key() directly for property keys
- [x] Removed JS_ValueToAtom/bc_put_atom path (was broken anyway)
- [x] cpool values serialized via JS_WriteObjectRec()
### 6.2 JS_ReadObject Changes ✓
- [x] Changed JS_ReadObjectTag to use bc_get_key() for property keys
- [x] Uses JS_SetPropertyInternal with JSValue keys
### 6.3 Opcode Format Updates ✓
- [x] Added OP_FMT_key_u8, OP_FMT_key_u16, OP_FMT_key_label_u16 formats
- [x] Updated variable opcodes to use key formats instead of atom formats
- [x] Updated bc_byte_swap() to handle new key formats
- [x] Updated JS_WriteFunctionBytecode() to skip key format opcodes
- [x] Updated JS_ReadFunctionBytecode() to skip key format opcodes
### 6.4 Version Bump ✓
- [x] Incremented BC_VERSION from 5 to 6
---
## Phase 7: Final Cleanup ✓
### 7.1 Additional Parser/Compiler Fixes ✓
- [x] Fixed TOK_IDENT case to use JSValue name, JS_DupValue, emit_key
- [x] Fixed TOK_TRY catch clause to use JSValue name
- [x] Fixed js_parse_statement_or_decl label_name to use JSValue
- [x] Fixed OP_scope_get_var "eval" check to use js_key_equal_str
- [x] Fixed js_parse_delete to use JSValue for name comparison
- [x] Fixed JSON parsing to use js_key_from_string for property names
- [x] Added js_key_from_string() helper function
- [x] Added JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_ for internal names
- [x] Updated add_closure_var, get_closure_var2, get_closure_var to use JSValue var_name
- [x] Updated set_closure_from_var to use JS_DupValue
- [x] Updated add_closure_variables to use JS_DupValue
- [x] Updated instantiate_hoisted_definitions to use fd_cpool_add for keys
- [x] Updated resolve_variables to use js_key_equal and fd_cpool_add
### 7.1.1 Property Access and Runtime Fixes ✓
- [x] Fixed JS_GetPropertyValue to use js_key_from_string instead of JS_ValueToAtom
- [x] Fixed JS_GetPropertyKey to use js_key_from_string for string keys
- [x] Fixed JS_SetPropertyKey to use js_key_from_string for string keys
- [x] Fixed JS_HasPropertyKey to use js_key_from_string for string keys
- [x] Fixed JS_DeletePropertyKey to use js_key_from_string for string keys
- [x] Updated JS_HasProperty signature to take JSValue prop
- [x] Fixed OP_get_ref_value handler to use JSValue key
- [x] Fixed OP_put_ref_value handler to use JSValue key
- [x] Updated free_func_def to use JS_FreeValue for JSValue fields
### 7.2 Remove JSAtom Type and Functions ✓
- [x] Removed most JS_ATOM_* constants (kept JS_ATOM_NULL, JS_ATOM_END for BC compat)
- [x] JS_NewAtomString now returns JSValue using js_key_new
- [x] JS_FreeAtom, JS_DupAtom are stubs (no-op for backward compat)
- [x] JS_AtomToValue, JS_ValueToAtom are stubs (minimal BC compat)
- [x] Replaced JS_ATOM_* usages with JS_KEY_* or JS_GetPropertyStr
### 7.3 Additional Runtime Fixes ✓
- [x] Fixed free_function_bytecode to use JS_FreeValueRT for JSValue fields
- [x] Fixed JS_SetPropertyFunctionList to use JSValue keys via find_key()
- [x] Fixed JS_InstantiateFunctionListItem to use JSValue keys
- [x] Fixed internalize_json_property to use JSValue names
- [x] Fixed emit_break and push_break_entry to use JSValue label_name
- [x] Implemented JS_Invoke to use JSValue method key
### 7.4 Remaining Stubs (kept for bytecode backward compatibility)
- JSAtom typedef (uint32_t) - used in BC serialization
- JS_ATOM_NULL, JS_ATOM_END - bytecode format markers
- JS_FreeAtom, JS_DupAtom - no-op stubs
- JS_FreeAtomRT, JS_DupAtomRT - no-op stubs
- Legacy BC reader (idx_to_atom array) - for reading old bytecode
---
## Current Build Status
**Build: SUCCEEDS** with warnings (unused variables, labels)
**Statistics:**
- JS_ATOM_* usages: Minimal (only BC serialization compat)
- Property access uses JS_KEY_* macros or JS_GetPropertyStr
- BC_VERSION: 6 (updated for new key-based format)
**What Works:**
- All property access via JSValue keys
- Keyword detection via string comparison
- Function declaration parsing with JSValue names
- Variable definition with JSValue names
- Closure variable tracking with JSValue names
- VM opcode handlers read cpool indices and look up JSValue
- resolve_scope_var() uses JSValue throughout
- js_parse_property_name() returns JSValue
- Bytecode serialization uses bc_put_key/bc_get_key for property keys
- Variable opcodes use key format (cpool indices)
- JSON parsing uses JSValue keys
- Internal variable names use JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_
- JS_SetPropertyFunctionList uses JSValue keys
- JS_Invoke uses JSValue method keys
- break/continue labels use JSValue
---
## Phase 8: Migrate to New Tagging System (IN PROGRESS)
**Problem**: `JS_VALUE_GET_TAG` returns `JS_TAG_PTR` for all pointers, but ~200 places check for obsolete tags like `JS_TAG_OBJECT`, `JS_TAG_STRING`, `JS_TAG_FUNCTION`, etc., which are never returned. This causes crashes.
**Target Design** (from memory.md):
- JSValue tags: Only `JS_TAG_INT`, `JS_TAG_PTR`, `JS_TAG_SHORT_FLOAT`, `JS_TAG_SPECIAL`
- Pointer types determined by `objhdr_t` header at offset 8 in heap objects
- mist_obj_type: `OBJ_ARRAY(0)`, `OBJ_BLOB(1)`, `OBJ_TEXT(2)`, `OBJ_RECORD(3)`, `OBJ_FUNCTION(4)`, etc.
### 8.1 Unified Heap Object Layout ✓
- [x] Updated mist_text structure to have objhdr_t at offset 8:
```c
typedef struct mist_text {
JSRefCountHeader _dummy_header; /* unused, for offset alignment */
uint32_t _pad; /* padding to align objhdr_t to offset 8 */
objhdr_t hdr; /* NOW at offset 8, like JSString */
word_t length;
word_t packed[];
} mist_text;
```
- [x] JSString already has objhdr_t at offset 8
### 8.2 Type-Checking Helper Functions ✓
Added lowercase internal helpers (to avoid conflict with quickjs.h declarations):
```c
static inline JS_BOOL js_is_gc_object(JSValue v) { return JS_IsPtr(v); }
static inline JSGCObjectTypeEnum js_get_gc_type(JSValue v) {
return ((JSGCObjectHeader *)JS_VALUE_GET_PTR(v))->gc_obj_type;
}
static inline JS_BOOL js_is_record(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
return js_get_gc_type(v) == JS_GC_OBJ_TYPE_RECORD;
}
static inline JS_BOOL js_is_array(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
return js_get_gc_type(v) == JS_GC_OBJ_TYPE_ARRAY;
}
static inline JS_BOOL js_is_function(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
return js_get_gc_type(v) == JS_GC_OBJ_TYPE_FUNCTION;
}
static inline JS_BOOL js_is_object(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
JSGCObjectTypeEnum t = js_get_gc_type(v);
return t == JS_GC_OBJ_TYPE_RECORD || t == JS_GC_OBJ_TYPE_ARRAY;
}
```
### 8.3 Updated Core Functions ✓
- [x] Updated JS_IsString to read objhdr_t from offset 8
- [x] Updated js_key_hash to read objhdr_t from offset 8
- [x] Updated js_key_equal to read objhdr_t from offset 8
- [x] Updated __JS_FreeValueRT to use objhdr_type for type dispatch
- [x] Updated JS_MarkValue, JS_MarkValueEdgeEx for GC
- [x] Added JS_SetPropertyValue function
- [x] Changed quickjs.h JS_IsFunction/JS_IsObject from inline to extern declarations
### 8.4 Tag Check Migration (PARTIAL)
Updated some critical tag checks:
- [x] Some JS_TAG_OBJECT checks → js_is_object() or js_is_record()
- [ ] Many more JS_TAG_OBJECT checks remain (~200 total)
- [ ] JS_TAG_FUNCTION checks → js_is_function()
- [ ] JS_TAG_STRING checks (some already use JS_IsString)
### 8.5 Remaining Work
- [ ] Fix ASAN memory corruption error (attempting free on address not malloc'd)
- Crash occurs in js_def_realloc during js_realloc_array
- Address is 112 bytes inside a JSFunctionDef allocation
- [ ] Complete remaining ~200 tag check migrations
- [ ] Add mist_hdr to JSFunction (optional, gc_obj_type already works)
- [ ] Remove obsolete tag definitions from quickjs.h:
- JS_TAG_STRING = -8
- JS_TAG_ARRAY = -6
- JS_TAG_FUNCTION = -5
- JS_TAG_FUNCTION_BYTECODE = -2
- JS_TAG_OBJECT = -1
### Current Status
**Build: SUCCEEDS** with warnings
**Runtime: CRASHES** with ASAN error:
```
==16122==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed
```
The crash occurs during test execution in `js_def_realloc` called from `js_realloc_array`.
Root cause not yet identified - likely a pointer being passed to realloc that wasn't allocated with malloc.
---
## Notes
- JSVarDef.var_name is JSValue
- JSClosureVar.var_name is JSValue
- JSGlobalVar.var_name is JSValue
- JSFunctionDef.func_name is JSValue
- BlockEnv.label_name is JSValue
- OP_get_field/put_field/define_field already use cpool index format
- JSRecord with open addressing is fully implemented
- js_key_hash and js_key_equal work with both immediate and heap text
- js_key_equal_str enables comparison with C string literals for internal names