14 KiB
Cell/QuickJS Refactoring Plan: Remove Atoms, Shapes, and Dual-Encoding
Overview
Refactor source/quickjs.c to match docs/memory.md specification:
- Remove JSAtom system (171 references → ~41 remaining)
- Remove JSShape system (94 references) ✓
- Remove IC caches (shape-based inline caches) ✓
- Remove
is_wide_chardual-encoding (18 locations) ✓ - Use JSValue texts directly as property keys
- Reference:
mquickjs.cshows the target pattern
Completed Phases
Phase 1: Remove is_wide_char Remnants ✓
Phase 2: Remove IC Caches ✓
Phase 3: Remove JSShape System ✓
Phase 4: Complete Property Access with JSValue Keys ✓
Completed:
- Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_get_field
- Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_put_field
- Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_define_field
- Created emit_key() function that adds JSValue to cpool and emits index
Phase 5: Convert JSAtom to JSValue Text (IN PROGRESS)
This is the core transformation. All identifier handling moves from atoms to JSValue.
Completed Items
Token and Parser Infrastructure:
- Change JSToken.u.ident.atom to JSToken.u.ident.str (JSValue)
- Change parse_ident() to return JSValue
- Create emit_key() function (cpool-based)
- Create JS_KEY_* macros for common names (lines ~279-335 in quickjs.c)
- Update all token.u.ident.atom references to .str
- Create keyword lookup table (js_keywords[]) with string comparison
- Rewrite update_token_ident() to use js_keyword_lookup()
- Rewrite is_strict_future_keyword() to use JSValue
- Update token_is_pseudo_keyword() to use JSValue and js_key_equal()
Function Declaration Parsing:
- Update js_parse_function_decl() signature to use JSValue func_name
- Update js_parse_function_decl2() to use JSValue func_name throughout
- Update js_parse_function_check_names() to use JSValue
- Convert JS_DupAtom/JS_FreeAtom to JS_DupValue/JS_FreeValue in function parsing
Variable Definition and Lookup:
- Update find_global_var() to use JSValue and js_key_equal()
- Update find_lexical_global_var() to use JSValue
- Update find_lexical_decl() to use JSValue and js_key_equal()
- Update js_define_var() to use JSValue
- Update js_parse_check_duplicate_parameter() to use JSValue and js_key_equal()
- Update js_parse_destructuring_var() to return JSValue
- Update js_parse_var() to use JSValue for variable names
Comparison Helpers:
- Create js_key_equal_str() for comparing JSValue with C string literals
- Update is_var_in_arg_scope() to use js_key_equal/js_key_equal_str
- Update has_with_scope() to use js_key_equal_str
- Update closure variable comparisons (cv->var_name) to use js_key_equal_str
Property Access:
- Fix JS_GetPropertyStr to create proper JSValue keys
- Fix JS_SetPropertyInternal callers to use JS_KEY_* instead of JS_ATOM_*
JS_KEY_* Macros Added
Compile-time immediate ASCII string constants (≤7 chars):
JS_KEY_empty, JS_KEY_name, JS_KEY_message, JS_KEY_stack,
JS_KEY_errors, JS_KEY_Error, JS_KEY_cause, JS_KEY_length,
JS_KEY_value, JS_KEY_get, JS_KEY_set, JS_KEY_raw,
JS_KEY_flags, JS_KEY_source, JS_KEY_exec, JS_KEY_toJSON,
JS_KEY_eval, JS_KEY_this, JS_KEY_true, JS_KEY_false,
JS_KEY_null, JS_KEY_NaN, JS_KEY_default, JS_KEY_index,
JS_KEY_input, JS_KEY_groups, JS_KEY_indices, JS_KEY_let,
JS_KEY_var, JS_KEY_new, JS_KEY_of, JS_KEY_yield,
JS_KEY_async, JS_KEY_target, JS_KEY_from, JS_KEY_meta,
JS_KEY_as, JS_KEY_with
Runtime macro for strings >7 chars:
#define JS_KEY_STR(ctx, str) JS_NewStringLen((ctx), (str), sizeof(str) - 1)
Helper function for comparing JSValue with C string literals:
static JS_BOOL js_key_equal_str(JSValue a, const char *str);
Remaining Work
5.3 Update js_parse_property_name() ✓
- Change return type from JSAtom* to JSValue*
- Update all callers (js_parse_object_literal, etc.)
- Updated get_lvalue(), put_lvalue(), js_parse_destructuring_element()
5.4 Replace remaining emit_atom() calls with emit_key() ✓
- Removed emit_atom wrapper function
- Changed last emit_atom(JS_ATOM_this) to emit_key(JS_KEY_this)
5.5 Update Variable Opcode Format in quickjs-opcode.h
- Change
atomformat opcodes tokeyformat - Change
atom_u8andatom_u16tokey_u8andkey_u16
5.6 Update VM Opcode Handlers ✓
These now read cpool indices and look up JSValue:
- OP_check_var, OP_get_var_undef, OP_get_var
- OP_put_var, OP_put_var_init, OP_put_var_strict
- OP_set_name, OP_make_var_ref, OP_delete_var
- OP_define_var, OP_define_func, OP_throw_error
- OP_make_loc_ref, OP_make_arg_ref
- OP_define_method, OP_define_method_computed
5.7 Update resolve_scope_var() ✓
- Changed signature to use JSValue var_name
- Updated all comparisons to use js_key_equal()/js_key_equal_str()
- Updated var_object_test() to use JSValue
- Updated optimize_scope_make_global_ref() to use JSValue
- Updated resolve_variables() callers to read from cpool
5.8 Convert Remaining JS_ATOM_* Usages
Categories remaining:
- Some debug/print functions still use JSAtom
- Some function signatures not yet converted
- Will be addressed in Phase 7 cleanup
Phase 6: Update Bytecode Serialization ✓
6.1 JS_WriteObjectTag Changes ✓
- Changed JS_WriteObjectTag to use bc_put_key() directly for property keys
- Removed JS_ValueToAtom/bc_put_atom path (was broken anyway)
- cpool values serialized via JS_WriteObjectRec()
6.2 JS_ReadObject Changes ✓
- Changed JS_ReadObjectTag to use bc_get_key() for property keys
- Uses JS_SetPropertyInternal with JSValue keys
6.3 Opcode Format Updates ✓
- Added OP_FMT_key_u8, OP_FMT_key_u16, OP_FMT_key_label_u16 formats
- Updated variable opcodes to use key formats instead of atom formats
- Updated bc_byte_swap() to handle new key formats
- Updated JS_WriteFunctionBytecode() to skip key format opcodes
- Updated JS_ReadFunctionBytecode() to skip key format opcodes
6.4 Version Bump ✓
- Incremented BC_VERSION from 5 to 6
Phase 7: Final Cleanup ✓
7.1 Additional Parser/Compiler Fixes ✓
- Fixed TOK_IDENT case to use JSValue name, JS_DupValue, emit_key
- Fixed TOK_TRY catch clause to use JSValue name
- Fixed js_parse_statement_or_decl label_name to use JSValue
- Fixed OP_scope_get_var "eval" check to use js_key_equal_str
- Fixed js_parse_delete to use JSValue for name comparison
- Fixed JSON parsing to use js_key_from_string for property names
- Added js_key_from_string() helper function
- Added JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_ for internal names
- Updated add_closure_var, get_closure_var2, get_closure_var to use JSValue var_name
- Updated set_closure_from_var to use JS_DupValue
- Updated add_closure_variables to use JS_DupValue
- Updated instantiate_hoisted_definitions to use fd_cpool_add for keys
- Updated resolve_variables to use js_key_equal and fd_cpool_add
7.1.1 Property Access and Runtime Fixes ✓
- Fixed JS_GetPropertyValue to use js_key_from_string instead of JS_ValueToAtom
- Fixed JS_GetPropertyKey to use js_key_from_string for string keys
- Fixed JS_SetPropertyKey to use js_key_from_string for string keys
- Fixed JS_HasPropertyKey to use js_key_from_string for string keys
- Fixed JS_DeletePropertyKey to use js_key_from_string for string keys
- Updated JS_HasProperty signature to take JSValue prop
- Fixed OP_get_ref_value handler to use JSValue key
- Fixed OP_put_ref_value handler to use JSValue key
- Updated free_func_def to use JS_FreeValue for JSValue fields
7.2 Remove JSAtom Type and Functions ✓
- Removed most JS_ATOM_* constants (kept JS_ATOM_NULL, JS_ATOM_END for BC compat)
- JS_NewAtomString now returns JSValue using js_key_new
- JS_FreeAtom, JS_DupAtom are stubs (no-op for backward compat)
- JS_AtomToValue, JS_ValueToAtom are stubs (minimal BC compat)
- Replaced JS_ATOM_* usages with JS_KEY_* or JS_GetPropertyStr
7.3 Additional Runtime Fixes ✓
- Fixed free_function_bytecode to use JS_FreeValueRT for JSValue fields
- Fixed JS_SetPropertyFunctionList to use JSValue keys via find_key()
- Fixed JS_InstantiateFunctionListItem to use JSValue keys
- Fixed internalize_json_property to use JSValue names
- Fixed emit_break and push_break_entry to use JSValue label_name
- Implemented JS_Invoke to use JSValue method key
7.4 Remaining Stubs (kept for bytecode backward compatibility)
- JSAtom typedef (uint32_t) - used in BC serialization
- JS_ATOM_NULL, JS_ATOM_END - bytecode format markers
- JS_FreeAtom, JS_DupAtom - no-op stubs
- JS_FreeAtomRT, JS_DupAtomRT - no-op stubs
- Legacy BC reader (idx_to_atom array) - for reading old bytecode
Current Build Status
Build: SUCCEEDS with warnings (unused variables, labels)
Statistics:
- JS_ATOM_* usages: Minimal (only BC serialization compat)
- Property access uses JS_KEY_* macros or JS_GetPropertyStr
- BC_VERSION: 6 (updated for new key-based format)
What Works:
- All property access via JSValue keys
- Keyword detection via string comparison
- Function declaration parsing with JSValue names
- Variable definition with JSValue names
- Closure variable tracking with JSValue names
- VM opcode handlers read cpool indices and look up JSValue
- resolve_scope_var() uses JSValue throughout
- js_parse_property_name() returns JSValue
- Bytecode serialization uses bc_put_key/bc_get_key for property keys
- Variable opcodes use key format (cpool indices)
- JSON parsing uses JSValue keys
- Internal variable names use JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_
- JS_SetPropertyFunctionList uses JSValue keys
- JS_Invoke uses JSValue method keys
- break/continue labels use JSValue
Phase 8: Migrate to New Tagging System (IN PROGRESS)
Problem: JS_VALUE_GET_TAG returns JS_TAG_PTR for all pointers, but ~200 places check for obsolete tags like JS_TAG_OBJECT, JS_TAG_STRING, JS_TAG_FUNCTION, etc., which are never returned. This causes crashes.
Target Design (from memory.md):
- JSValue tags: Only
JS_TAG_INT,JS_TAG_PTR,JS_TAG_SHORT_FLOAT,JS_TAG_SPECIAL - Pointer types determined by
objhdr_theader at offset 8 in heap objects - mist_obj_type:
OBJ_ARRAY(0),OBJ_BLOB(1),OBJ_TEXT(2),OBJ_RECORD(3),OBJ_FUNCTION(4), etc.
8.1 Unified Heap Object Layout ✓
- Updated mist_text structure to have objhdr_t at offset 8:
typedef struct mist_text { JSRefCountHeader _dummy_header; /* unused, for offset alignment */ uint32_t _pad; /* padding to align objhdr_t to offset 8 */ objhdr_t hdr; /* NOW at offset 8, like JSString */ word_t length; word_t packed[]; } mist_text; - JSString already has objhdr_t at offset 8
8.2 Type-Checking Helper Functions ✓
Added lowercase internal helpers (to avoid conflict with quickjs.h declarations):
static inline JS_BOOL js_is_gc_object(JSValue v) { return JS_IsPtr(v); }
static inline JSGCObjectTypeEnum js_get_gc_type(JSValue v) {
return ((JSGCObjectHeader *)JS_VALUE_GET_PTR(v))->gc_obj_type;
}
static inline JS_BOOL js_is_record(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
return js_get_gc_type(v) == JS_GC_OBJ_TYPE_RECORD;
}
static inline JS_BOOL js_is_array(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
return js_get_gc_type(v) == JS_GC_OBJ_TYPE_ARRAY;
}
static inline JS_BOOL js_is_function(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
return js_get_gc_type(v) == JS_GC_OBJ_TYPE_FUNCTION;
}
static inline JS_BOOL js_is_object(JSValue v) {
if (!JS_IsPtr(v)) return FALSE;
JSGCObjectTypeEnum t = js_get_gc_type(v);
return t == JS_GC_OBJ_TYPE_RECORD || t == JS_GC_OBJ_TYPE_ARRAY;
}
8.3 Updated Core Functions ✓
- Updated JS_IsString to read objhdr_t from offset 8
- Updated js_key_hash to read objhdr_t from offset 8
- Updated js_key_equal to read objhdr_t from offset 8
- Updated __JS_FreeValueRT to use objhdr_type for type dispatch
- Updated JS_MarkValue, JS_MarkValueEdgeEx for GC
- Added JS_SetPropertyValue function
- Changed quickjs.h JS_IsFunction/JS_IsObject from inline to extern declarations
8.4 Tag Check Migration (PARTIAL)
Updated some critical tag checks:
- Some JS_TAG_OBJECT checks → js_is_object() or js_is_record()
- Many more JS_TAG_OBJECT checks remain (~200 total)
- JS_TAG_FUNCTION checks → js_is_function()
- JS_TAG_STRING checks (some already use JS_IsString)
8.5 Remaining Work
- Fix ASAN memory corruption error (attempting free on address not malloc'd)
- Crash occurs in js_def_realloc during js_realloc_array
- Address is 112 bytes inside a JSFunctionDef allocation
- Complete remaining ~200 tag check migrations
- Add mist_hdr to JSFunction (optional, gc_obj_type already works)
- Remove obsolete tag definitions from quickjs.h:
- JS_TAG_STRING = -8
- JS_TAG_ARRAY = -6
- JS_TAG_FUNCTION = -5
- JS_TAG_FUNCTION_BYTECODE = -2
- JS_TAG_OBJECT = -1
Current Status
Build: SUCCEEDS with warnings
Runtime: CRASHES with ASAN error:
==16122==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed
The crash occurs during test execution in js_def_realloc called from js_realloc_array.
Root cause not yet identified - likely a pointer being passed to realloc that wasn't allocated with malloc.
Notes
- JSVarDef.var_name is JSValue
- JSClosureVar.var_name is JSValue
- JSGlobalVar.var_name is JSValue
- JSFunctionDef.func_name is JSValue
- BlockEnv.label_name is JSValue
- OP_get_field/put_field/define_field already use cpool index format
- JSRecord with open addressing is fully implemented
- js_key_hash and js_key_equal work with both immediate and heap text
- js_key_equal_str enables comparison with C string literals for internal names