# Cell/QuickJS Refactoring Plan: Remove Atoms, Shapes, and Dual-Encoding ## Overview Refactor `source/quickjs.c` to match `docs/memory.md` specification: - Remove JSAtom system (171 references → ~41 remaining) - Remove JSShape system (94 references) ✓ - Remove IC caches (shape-based inline caches) ✓ - Remove `is_wide_char` dual-encoding (18 locations) ✓ - Use JSValue texts directly as property keys - Reference: `mquickjs.c` shows the target pattern ## Completed Phases ### Phase 1: Remove is_wide_char Remnants ✓ ### Phase 2: Remove IC Caches ✓ ### Phase 3: Remove JSShape System ✓ ### Phase 4: Complete Property Access with JSValue Keys ✓ Completed: - Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_get_field - Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_put_field - Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_define_field - Created emit_key() function that adds JSValue to cpool and emits index --- ## Phase 5: Convert JSAtom to JSValue Text (IN PROGRESS) This is the core transformation. All identifier handling moves from atoms to JSValue. ### Completed Items **Token and Parser Infrastructure:** - [x] Change JSToken.u.ident.atom to JSToken.u.ident.str (JSValue) - [x] Change parse_ident() to return JSValue - [x] Create emit_key() function (cpool-based) - [x] Create JS_KEY_* macros for common names (lines ~279-335 in quickjs.c) - [x] Update all token.u.ident.atom references to .str - [x] Create keyword lookup table (js_keywords[]) with string comparison - [x] Rewrite update_token_ident() to use js_keyword_lookup() - [x] Rewrite is_strict_future_keyword() to use JSValue - [x] Update token_is_pseudo_keyword() to use JSValue and js_key_equal() **Function Declaration Parsing:** - [x] Update js_parse_function_decl() signature to use JSValue func_name - [x] Update js_parse_function_decl2() to use JSValue func_name throughout - [x] Update js_parse_function_check_names() to use JSValue - [x] Convert JS_DupAtom/JS_FreeAtom to JS_DupValue/JS_FreeValue in function parsing **Variable Definition and Lookup:** - [x] Update find_global_var() to use JSValue and js_key_equal() - [x] Update find_lexical_global_var() to use JSValue - [x] Update find_lexical_decl() to use JSValue and js_key_equal() - [x] Update js_define_var() to use JSValue - [x] Update js_parse_check_duplicate_parameter() to use JSValue and js_key_equal() - [x] Update js_parse_destructuring_var() to return JSValue - [x] Update js_parse_var() to use JSValue for variable names **Comparison Helpers:** - [x] Create js_key_equal_str() for comparing JSValue with C string literals - [x] Update is_var_in_arg_scope() to use js_key_equal/js_key_equal_str - [x] Update has_with_scope() to use js_key_equal_str - [x] Update closure variable comparisons (cv->var_name) to use js_key_equal_str **Property Access:** - [x] Fix JS_GetPropertyStr to create proper JSValue keys - [x] Fix JS_SetPropertyInternal callers to use JS_KEY_* instead of JS_ATOM_* ### JS_KEY_* Macros Added Compile-time immediate ASCII string constants (≤7 chars): ```c JS_KEY_empty, JS_KEY_name, JS_KEY_message, JS_KEY_stack, JS_KEY_errors, JS_KEY_Error, JS_KEY_cause, JS_KEY_length, JS_KEY_value, JS_KEY_get, JS_KEY_set, JS_KEY_raw, JS_KEY_flags, JS_KEY_source, JS_KEY_exec, JS_KEY_toJSON, JS_KEY_eval, JS_KEY_this, JS_KEY_true, JS_KEY_false, JS_KEY_null, JS_KEY_NaN, JS_KEY_default, JS_KEY_index, JS_KEY_input, JS_KEY_groups, JS_KEY_indices, JS_KEY_let, JS_KEY_var, JS_KEY_new, JS_KEY_of, JS_KEY_yield, JS_KEY_async, JS_KEY_target, JS_KEY_from, JS_KEY_meta, JS_KEY_as, JS_KEY_with ``` Runtime macro for strings >7 chars: ```c #define JS_KEY_STR(ctx, str) JS_NewStringLen((ctx), (str), sizeof(str) - 1) ``` Helper function for comparing JSValue with C string literals: ```c static JS_BOOL js_key_equal_str(JSValue a, const char *str); ``` ### Remaining Work #### 5.3 Update js_parse_property_name() ✓ - [x] Change return type from JSAtom* to JSValue* - [x] Update all callers (js_parse_object_literal, etc.) - [x] Updated get_lvalue(), put_lvalue(), js_parse_destructuring_element() #### 5.4 Replace remaining emit_atom() calls with emit_key() ✓ - [x] Removed emit_atom wrapper function - [x] Changed last emit_atom(JS_ATOM_this) to emit_key(JS_KEY_this) #### 5.5 Update Variable Opcode Format in quickjs-opcode.h - [ ] Change `atom` format opcodes to `key` format - [ ] Change `atom_u8` and `atom_u16` to `key_u8` and `key_u16` #### 5.6 Update VM Opcode Handlers ✓ These now read cpool indices and look up JSValue: - [x] OP_check_var, OP_get_var_undef, OP_get_var - [x] OP_put_var, OP_put_var_init, OP_put_var_strict - [x] OP_set_name, OP_make_var_ref, OP_delete_var - [x] OP_define_var, OP_define_func, OP_throw_error - [x] OP_make_loc_ref, OP_make_arg_ref - [x] OP_define_method, OP_define_method_computed #### 5.7 Update resolve_scope_var() ✓ - [x] Changed signature to use JSValue var_name - [x] Updated all comparisons to use js_key_equal()/js_key_equal_str() - [x] Updated var_object_test() to use JSValue - [x] Updated optimize_scope_make_global_ref() to use JSValue - [x] Updated resolve_variables() callers to read from cpool #### 5.8 Convert Remaining JS_ATOM_* Usages Categories remaining: - Some debug/print functions still use JSAtom - Some function signatures not yet converted - Will be addressed in Phase 7 cleanup --- ## Phase 6: Update Bytecode Serialization ✓ ### 6.1 JS_WriteObjectTag Changes ✓ - [x] Changed JS_WriteObjectTag to use bc_put_key() directly for property keys - [x] Removed JS_ValueToAtom/bc_put_atom path (was broken anyway) - [x] cpool values serialized via JS_WriteObjectRec() ### 6.2 JS_ReadObject Changes ✓ - [x] Changed JS_ReadObjectTag to use bc_get_key() for property keys - [x] Uses JS_SetPropertyInternal with JSValue keys ### 6.3 Opcode Format Updates ✓ - [x] Added OP_FMT_key_u8, OP_FMT_key_u16, OP_FMT_key_label_u16 formats - [x] Updated variable opcodes to use key formats instead of atom formats - [x] Updated bc_byte_swap() to handle new key formats - [x] Updated JS_WriteFunctionBytecode() to skip key format opcodes - [x] Updated JS_ReadFunctionBytecode() to skip key format opcodes ### 6.4 Version Bump ✓ - [x] Incremented BC_VERSION from 5 to 6 --- ## Phase 7: Final Cleanup ✓ ### 7.1 Additional Parser/Compiler Fixes ✓ - [x] Fixed TOK_IDENT case to use JSValue name, JS_DupValue, emit_key - [x] Fixed TOK_TRY catch clause to use JSValue name - [x] Fixed js_parse_statement_or_decl label_name to use JSValue - [x] Fixed OP_scope_get_var "eval" check to use js_key_equal_str - [x] Fixed js_parse_delete to use JSValue for name comparison - [x] Fixed JSON parsing to use js_key_from_string for property names - [x] Added js_key_from_string() helper function - [x] Added JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_ for internal names - [x] Updated add_closure_var, get_closure_var2, get_closure_var to use JSValue var_name - [x] Updated set_closure_from_var to use JS_DupValue - [x] Updated add_closure_variables to use JS_DupValue - [x] Updated instantiate_hoisted_definitions to use fd_cpool_add for keys - [x] Updated resolve_variables to use js_key_equal and fd_cpool_add ### 7.1.1 Property Access and Runtime Fixes ✓ - [x] Fixed JS_GetPropertyValue to use js_key_from_string instead of JS_ValueToAtom - [x] Fixed JS_GetPropertyKey to use js_key_from_string for string keys - [x] Fixed JS_SetPropertyKey to use js_key_from_string for string keys - [x] Fixed JS_HasPropertyKey to use js_key_from_string for string keys - [x] Fixed JS_DeletePropertyKey to use js_key_from_string for string keys - [x] Updated JS_HasProperty signature to take JSValue prop - [x] Fixed OP_get_ref_value handler to use JSValue key - [x] Fixed OP_put_ref_value handler to use JSValue key - [x] Updated free_func_def to use JS_FreeValue for JSValue fields ### 7.2 Remove JSAtom Type and Functions ✓ - [x] Removed most JS_ATOM_* constants (kept JS_ATOM_NULL, JS_ATOM_END for BC compat) - [x] JS_NewAtomString now returns JSValue using js_key_new - [x] JS_FreeAtom, JS_DupAtom are stubs (no-op for backward compat) - [x] JS_AtomToValue, JS_ValueToAtom are stubs (minimal BC compat) - [x] Replaced JS_ATOM_* usages with JS_KEY_* or JS_GetPropertyStr ### 7.3 Additional Runtime Fixes ✓ - [x] Fixed free_function_bytecode to use JS_FreeValueRT for JSValue fields - [x] Fixed JS_SetPropertyFunctionList to use JSValue keys via find_key() - [x] Fixed JS_InstantiateFunctionListItem to use JSValue keys - [x] Fixed internalize_json_property to use JSValue names - [x] Fixed emit_break and push_break_entry to use JSValue label_name - [x] Implemented JS_Invoke to use JSValue method key ### 7.4 Remaining Stubs (kept for bytecode backward compatibility) - JSAtom typedef (uint32_t) - used in BC serialization - JS_ATOM_NULL, JS_ATOM_END - bytecode format markers - JS_FreeAtom, JS_DupAtom - no-op stubs - JS_FreeAtomRT, JS_DupAtomRT - no-op stubs - Legacy BC reader (idx_to_atom array) - for reading old bytecode --- ## Current Build Status **Build: SUCCEEDS** with warnings (unused variables, labels) **Statistics:** - JS_ATOM_* usages: Minimal (only BC serialization compat) - Property access uses JS_KEY_* macros or JS_GetPropertyStr - BC_VERSION: 6 (updated for new key-based format) **What Works:** - All property access via JSValue keys - Keyword detection via string comparison - Function declaration parsing with JSValue names - Variable definition with JSValue names - Closure variable tracking with JSValue names - VM opcode handlers read cpool indices and look up JSValue - resolve_scope_var() uses JSValue throughout - js_parse_property_name() returns JSValue - Bytecode serialization uses bc_put_key/bc_get_key for property keys - Variable opcodes use key format (cpool indices) - JSON parsing uses JSValue keys - Internal variable names use JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_ - JS_SetPropertyFunctionList uses JSValue keys - JS_Invoke uses JSValue method keys - break/continue labels use JSValue --- ## Phase 8: Migrate to New Tagging System (IN PROGRESS) **Problem**: `JS_VALUE_GET_TAG` returns `JS_TAG_PTR` for all pointers, but ~200 places check for obsolete tags like `JS_TAG_OBJECT`, `JS_TAG_STRING`, `JS_TAG_FUNCTION`, etc., which are never returned. This causes crashes. **Target Design** (from memory.md): - JSValue tags: Only `JS_TAG_INT`, `JS_TAG_PTR`, `JS_TAG_SHORT_FLOAT`, `JS_TAG_SPECIAL` - Pointer types determined by `objhdr_t` header at offset 8 in heap objects - mist_obj_type: `OBJ_ARRAY(0)`, `OBJ_BLOB(1)`, `OBJ_TEXT(2)`, `OBJ_RECORD(3)`, `OBJ_FUNCTION(4)`, etc. ### 8.1 Unified Heap Object Layout ✓ - [x] Updated mist_text structure to have objhdr_t at offset 8: ```c typedef struct mist_text { JSRefCountHeader _dummy_header; /* unused, for offset alignment */ uint32_t _pad; /* padding to align objhdr_t to offset 8 */ objhdr_t hdr; /* NOW at offset 8, like JSString */ word_t length; word_t packed[]; } mist_text; ``` - [x] JSString already has objhdr_t at offset 8 ### 8.2 Type-Checking Helper Functions ✓ Added lowercase internal helpers (to avoid conflict with quickjs.h declarations): ```c static inline JS_BOOL js_is_gc_object(JSValue v) { return JS_IsPtr(v); } static inline JSGCObjectTypeEnum js_get_gc_type(JSValue v) { return ((JSGCObjectHeader *)JS_VALUE_GET_PTR(v))->gc_obj_type; } static inline JS_BOOL js_is_record(JSValue v) { if (!JS_IsPtr(v)) return FALSE; return js_get_gc_type(v) == JS_GC_OBJ_TYPE_RECORD; } static inline JS_BOOL js_is_array(JSValue v) { if (!JS_IsPtr(v)) return FALSE; return js_get_gc_type(v) == JS_GC_OBJ_TYPE_ARRAY; } static inline JS_BOOL js_is_function(JSValue v) { if (!JS_IsPtr(v)) return FALSE; return js_get_gc_type(v) == JS_GC_OBJ_TYPE_FUNCTION; } static inline JS_BOOL js_is_object(JSValue v) { if (!JS_IsPtr(v)) return FALSE; JSGCObjectTypeEnum t = js_get_gc_type(v); return t == JS_GC_OBJ_TYPE_RECORD || t == JS_GC_OBJ_TYPE_ARRAY; } ``` ### 8.3 Updated Core Functions ✓ - [x] Updated JS_IsString to read objhdr_t from offset 8 - [x] Updated js_key_hash to read objhdr_t from offset 8 - [x] Updated js_key_equal to read objhdr_t from offset 8 - [x] Updated __JS_FreeValueRT to use objhdr_type for type dispatch - [x] Updated JS_MarkValue, JS_MarkValueEdgeEx for GC - [x] Added JS_SetPropertyValue function - [x] Changed quickjs.h JS_IsFunction/JS_IsObject from inline to extern declarations ### 8.4 Tag Check Migration (PARTIAL) Updated some critical tag checks: - [x] Some JS_TAG_OBJECT checks → js_is_object() or js_is_record() - [ ] Many more JS_TAG_OBJECT checks remain (~200 total) - [ ] JS_TAG_FUNCTION checks → js_is_function() - [ ] JS_TAG_STRING checks (some already use JS_IsString) ### 8.5 Remaining Work - [ ] Fix ASAN memory corruption error (attempting free on address not malloc'd) - Crash occurs in js_def_realloc during js_realloc_array - Address is 112 bytes inside a JSFunctionDef allocation - [ ] Complete remaining ~200 tag check migrations - [ ] Add mist_hdr to JSFunction (optional, gc_obj_type already works) - [ ] Remove obsolete tag definitions from quickjs.h: - JS_TAG_STRING = -8 - JS_TAG_ARRAY = -6 - JS_TAG_FUNCTION = -5 - JS_TAG_FUNCTION_BYTECODE = -2 - JS_TAG_OBJECT = -1 ### Current Status **Build: SUCCEEDS** with warnings **Runtime: CRASHES** with ASAN error: ``` ==16122==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed ``` The crash occurs during test execution in `js_def_realloc` called from `js_realloc_array`. Root cause not yet identified - likely a pointer being passed to realloc that wasn't allocated with malloc. --- ## Notes - JSVarDef.var_name is JSValue - JSClosureVar.var_name is JSValue - JSGlobalVar.var_name is JSValue - JSFunctionDef.func_name is JSValue - BlockEnv.label_name is JSValue - OP_get_field/put_field/define_field already use cpool index format - JSRecord with open addressing is fully implemented - js_key_hash and js_key_equal work with both immediate and heap text - js_key_equal_str enables comparison with C string literals for internal names