Files
cell/plan.md
2026-01-31 14:43:21 -06:00

14 KiB

Cell/QuickJS Refactoring Plan: Remove Atoms, Shapes, and Dual-Encoding

Overview

Refactor source/quickjs.c to match docs/memory.md specification:

  • Remove JSAtom system (171 references → ~41 remaining)
  • Remove JSShape system (94 references) ✓
  • Remove IC caches (shape-based inline caches) ✓
  • Remove is_wide_char dual-encoding (18 locations) ✓
  • Use JSValue texts directly as property keys
  • Reference: mquickjs.c shows the target pattern

Completed Phases

Phase 1: Remove is_wide_char Remnants ✓

Phase 2: Remove IC Caches ✓

Phase 3: Remove JSShape System ✓

Phase 4: Complete Property Access with JSValue Keys ✓

Completed:

  • Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_get_field
  • Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_put_field
  • Removed JS_GC_OBJ_TYPE_JS_OBJECT fallbacks from OP_define_field
  • Created emit_key() function that adds JSValue to cpool and emits index

Phase 5: Convert JSAtom to JSValue Text (IN PROGRESS)

This is the core transformation. All identifier handling moves from atoms to JSValue.

Completed Items

Token and Parser Infrastructure:

  • Change JSToken.u.ident.atom to JSToken.u.ident.str (JSValue)
  • Change parse_ident() to return JSValue
  • Create emit_key() function (cpool-based)
  • Create JS_KEY_* macros for common names (lines ~279-335 in quickjs.c)
  • Update all token.u.ident.atom references to .str
  • Create keyword lookup table (js_keywords[]) with string comparison
  • Rewrite update_token_ident() to use js_keyword_lookup()
  • Rewrite is_strict_future_keyword() to use JSValue
  • Update token_is_pseudo_keyword() to use JSValue and js_key_equal()

Function Declaration Parsing:

  • Update js_parse_function_decl() signature to use JSValue func_name
  • Update js_parse_function_decl2() to use JSValue func_name throughout
  • Update js_parse_function_check_names() to use JSValue
  • Convert JS_DupAtom/JS_FreeAtom to JS_DupValue/JS_FreeValue in function parsing

Variable Definition and Lookup:

  • Update find_global_var() to use JSValue and js_key_equal()
  • Update find_lexical_global_var() to use JSValue
  • Update find_lexical_decl() to use JSValue and js_key_equal()
  • Update js_define_var() to use JSValue
  • Update js_parse_check_duplicate_parameter() to use JSValue and js_key_equal()
  • Update js_parse_destructuring_var() to return JSValue
  • Update js_parse_var() to use JSValue for variable names

Comparison Helpers:

  • Create js_key_equal_str() for comparing JSValue with C string literals
  • Update is_var_in_arg_scope() to use js_key_equal/js_key_equal_str
  • Update has_with_scope() to use js_key_equal_str
  • Update closure variable comparisons (cv->var_name) to use js_key_equal_str

Property Access:

  • Fix JS_GetPropertyStr to create proper JSValue keys
  • Fix JS_SetPropertyInternal callers to use JS_KEY_* instead of JS_ATOM_*

JS_KEY_* Macros Added

Compile-time immediate ASCII string constants (≤7 chars):

JS_KEY_empty, JS_KEY_name, JS_KEY_message, JS_KEY_stack,
JS_KEY_errors, JS_KEY_Error, JS_KEY_cause, JS_KEY_length,
JS_KEY_value, JS_KEY_get, JS_KEY_set, JS_KEY_raw,
JS_KEY_flags, JS_KEY_source, JS_KEY_exec, JS_KEY_toJSON,
JS_KEY_eval, JS_KEY_this, JS_KEY_true, JS_KEY_false,
JS_KEY_null, JS_KEY_NaN, JS_KEY_default, JS_KEY_index,
JS_KEY_input, JS_KEY_groups, JS_KEY_indices, JS_KEY_let,
JS_KEY_var, JS_KEY_new, JS_KEY_of, JS_KEY_yield,
JS_KEY_async, JS_KEY_target, JS_KEY_from, JS_KEY_meta,
JS_KEY_as, JS_KEY_with

Runtime macro for strings >7 chars:

#define JS_KEY_STR(ctx, str) JS_NewStringLen((ctx), (str), sizeof(str) - 1)

Helper function for comparing JSValue with C string literals:

static JS_BOOL js_key_equal_str(JSValue a, const char *str);

Remaining Work

5.3 Update js_parse_property_name() ✓

  • Change return type from JSAtom* to JSValue*
  • Update all callers (js_parse_object_literal, etc.)
  • Updated get_lvalue(), put_lvalue(), js_parse_destructuring_element()

5.4 Replace remaining emit_atom() calls with emit_key() ✓

  • Removed emit_atom wrapper function
  • Changed last emit_atom(JS_ATOM_this) to emit_key(JS_KEY_this)

5.5 Update Variable Opcode Format in quickjs-opcode.h

  • Change atom format opcodes to key format
  • Change atom_u8 and atom_u16 to key_u8 and key_u16

5.6 Update VM Opcode Handlers ✓

These now read cpool indices and look up JSValue:

  • OP_check_var, OP_get_var_undef, OP_get_var
  • OP_put_var, OP_put_var_init, OP_put_var_strict
  • OP_set_name, OP_make_var_ref, OP_delete_var
  • OP_define_var, OP_define_func, OP_throw_error
  • OP_make_loc_ref, OP_make_arg_ref
  • OP_define_method, OP_define_method_computed

5.7 Update resolve_scope_var() ✓

  • Changed signature to use JSValue var_name
  • Updated all comparisons to use js_key_equal()/js_key_equal_str()
  • Updated var_object_test() to use JSValue
  • Updated optimize_scope_make_global_ref() to use JSValue
  • Updated resolve_variables() callers to read from cpool

5.8 Convert Remaining JS_ATOM_* Usages

Categories remaining:

  • Some debug/print functions still use JSAtom
  • Some function signatures not yet converted
  • Will be addressed in Phase 7 cleanup

Phase 6: Update Bytecode Serialization ✓

6.1 JS_WriteObjectTag Changes ✓

  • Changed JS_WriteObjectTag to use bc_put_key() directly for property keys
  • Removed JS_ValueToAtom/bc_put_atom path (was broken anyway)
  • cpool values serialized via JS_WriteObjectRec()

6.2 JS_ReadObject Changes ✓

  • Changed JS_ReadObjectTag to use bc_get_key() for property keys
  • Uses JS_SetPropertyInternal with JSValue keys

6.3 Opcode Format Updates ✓

  • Added OP_FMT_key_u8, OP_FMT_key_u16, OP_FMT_key_label_u16 formats
  • Updated variable opcodes to use key formats instead of atom formats
  • Updated bc_byte_swap() to handle new key formats
  • Updated JS_WriteFunctionBytecode() to skip key format opcodes
  • Updated JS_ReadFunctionBytecode() to skip key format opcodes

6.4 Version Bump ✓

  • Incremented BC_VERSION from 5 to 6

Phase 7: Final Cleanup ✓

7.1 Additional Parser/Compiler Fixes ✓

  • Fixed TOK_IDENT case to use JSValue name, JS_DupValue, emit_key
  • Fixed TOK_TRY catch clause to use JSValue name
  • Fixed js_parse_statement_or_decl label_name to use JSValue
  • Fixed OP_scope_get_var "eval" check to use js_key_equal_str
  • Fixed js_parse_delete to use JSValue for name comparison
  • Fixed JSON parsing to use js_key_from_string for property names
  • Added js_key_from_string() helper function
  • Added JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_ for internal names
  • Updated add_closure_var, get_closure_var2, get_closure_var to use JSValue var_name
  • Updated set_closure_from_var to use JS_DupValue
  • Updated add_closure_variables to use JS_DupValue
  • Updated instantiate_hoisted_definitions to use fd_cpool_add for keys
  • Updated resolve_variables to use js_key_equal and fd_cpool_add

7.1.1 Property Access and Runtime Fixes ✓

  • Fixed JS_GetPropertyValue to use js_key_from_string instead of JS_ValueToAtom
  • Fixed JS_GetPropertyKey to use js_key_from_string for string keys
  • Fixed JS_SetPropertyKey to use js_key_from_string for string keys
  • Fixed JS_HasPropertyKey to use js_key_from_string for string keys
  • Fixed JS_DeletePropertyKey to use js_key_from_string for string keys
  • Updated JS_HasProperty signature to take JSValue prop
  • Fixed OP_get_ref_value handler to use JSValue key
  • Fixed OP_put_ref_value handler to use JSValue key
  • Updated free_func_def to use JS_FreeValue for JSValue fields

7.2 Remove JSAtom Type and Functions ✓

  • Removed most JS_ATOM_* constants (kept JS_ATOM_NULL, JS_ATOM_END for BC compat)
  • JS_NewAtomString now returns JSValue using js_key_new
  • JS_FreeAtom, JS_DupAtom are stubs (no-op for backward compat)
  • JS_AtomToValue, JS_ValueToAtom are stubs (minimal BC compat)
  • Replaced JS_ATOM_* usages with JS_KEY_* or JS_GetPropertyStr

7.3 Additional Runtime Fixes ✓

  • Fixed free_function_bytecode to use JS_FreeValueRT for JSValue fields
  • Fixed JS_SetPropertyFunctionList to use JSValue keys via find_key()
  • Fixed JS_InstantiateFunctionListItem to use JSValue keys
  • Fixed internalize_json_property to use JSValue names
  • Fixed emit_break and push_break_entry to use JSValue label_name
  • Implemented JS_Invoke to use JSValue method key

7.4 Remaining Stubs (kept for bytecode backward compatibility)

  • JSAtom typedef (uint32_t) - used in BC serialization
  • JS_ATOM_NULL, JS_ATOM_END - bytecode format markers
  • JS_FreeAtom, JS_DupAtom - no-op stubs
  • JS_FreeAtomRT, JS_DupAtomRT - no-op stubs
  • Legacy BC reader (idx_to_atom array) - for reading old bytecode

Current Build Status

Build: SUCCEEDS with warnings (unused variables, labels)

Statistics:

  • JS_ATOM_* usages: Minimal (only BC serialization compat)
  • Property access uses JS_KEY_* macros or JS_GetPropertyStr
  • BC_VERSION: 6 (updated for new key-based format)

What Works:

  • All property access via JSValue keys
  • Keyword detection via string comparison
  • Function declaration parsing with JSValue names
  • Variable definition with JSValue names
  • Closure variable tracking with JSValue names
  • VM opcode handlers read cpool indices and look up JSValue
  • resolve_scope_var() uses JSValue throughout
  • js_parse_property_name() returns JSValue
  • Bytecode serialization uses bc_put_key/bc_get_key for property keys
  • Variable opcodes use key format (cpool indices)
  • JSON parsing uses JSValue keys
  • Internal variable names use JS_KEY__ret_, JS_KEY__eval_, JS_KEY__var_
  • JS_SetPropertyFunctionList uses JSValue keys
  • JS_Invoke uses JSValue method keys
  • break/continue labels use JSValue

Phase 8: Migrate to New Tagging System (IN PROGRESS)

Problem: JS_VALUE_GET_TAG returns JS_TAG_PTR for all pointers, but ~200 places check for obsolete tags like JS_TAG_OBJECT, JS_TAG_STRING, JS_TAG_FUNCTION, etc., which are never returned. This causes crashes.

Target Design (from memory.md):

  • JSValue tags: Only JS_TAG_INT, JS_TAG_PTR, JS_TAG_SHORT_FLOAT, JS_TAG_SPECIAL
  • Pointer types determined by objhdr_t header at offset 8 in heap objects
  • mist_obj_type: OBJ_ARRAY(0), OBJ_BLOB(1), OBJ_TEXT(2), OBJ_RECORD(3), OBJ_FUNCTION(4), etc.

8.1 Unified Heap Object Layout ✓

  • Updated mist_text structure to have objhdr_t at offset 8:
    typedef struct mist_text {
      JSRefCountHeader _dummy_header; /* unused, for offset alignment */
      uint32_t _pad;                  /* padding to align objhdr_t to offset 8 */
      objhdr_t hdr;                   /* NOW at offset 8, like JSString */
      word_t length;
      word_t packed[];
    } mist_text;
    
  • JSString already has objhdr_t at offset 8

8.2 Type-Checking Helper Functions ✓

Added lowercase internal helpers (to avoid conflict with quickjs.h declarations):

static inline JS_BOOL js_is_gc_object(JSValue v) { return JS_IsPtr(v); }
static inline JSGCObjectTypeEnum js_get_gc_type(JSValue v) {
  return ((JSGCObjectHeader *)JS_VALUE_GET_PTR(v))->gc_obj_type;
}
static inline JS_BOOL js_is_record(JSValue v) {
  if (!JS_IsPtr(v)) return FALSE;
  return js_get_gc_type(v) == JS_GC_OBJ_TYPE_RECORD;
}
static inline JS_BOOL js_is_array(JSValue v) {
  if (!JS_IsPtr(v)) return FALSE;
  return js_get_gc_type(v) == JS_GC_OBJ_TYPE_ARRAY;
}
static inline JS_BOOL js_is_function(JSValue v) {
  if (!JS_IsPtr(v)) return FALSE;
  return js_get_gc_type(v) == JS_GC_OBJ_TYPE_FUNCTION;
}
static inline JS_BOOL js_is_object(JSValue v) {
  if (!JS_IsPtr(v)) return FALSE;
  JSGCObjectTypeEnum t = js_get_gc_type(v);
  return t == JS_GC_OBJ_TYPE_RECORD || t == JS_GC_OBJ_TYPE_ARRAY;
}

8.3 Updated Core Functions ✓

  • Updated JS_IsString to read objhdr_t from offset 8
  • Updated js_key_hash to read objhdr_t from offset 8
  • Updated js_key_equal to read objhdr_t from offset 8
  • Updated __JS_FreeValueRT to use objhdr_type for type dispatch
  • Updated JS_MarkValue, JS_MarkValueEdgeEx for GC
  • Added JS_SetPropertyValue function
  • Changed quickjs.h JS_IsFunction/JS_IsObject from inline to extern declarations

8.4 Tag Check Migration (PARTIAL)

Updated some critical tag checks:

  • Some JS_TAG_OBJECT checks → js_is_object() or js_is_record()
  • Many more JS_TAG_OBJECT checks remain (~200 total)
  • JS_TAG_FUNCTION checks → js_is_function()
  • JS_TAG_STRING checks (some already use JS_IsString)

8.5 Remaining Work

  • Fix ASAN memory corruption error (attempting free on address not malloc'd)
    • Crash occurs in js_def_realloc during js_realloc_array
    • Address is 112 bytes inside a JSFunctionDef allocation
  • Complete remaining ~200 tag check migrations
  • Add mist_hdr to JSFunction (optional, gc_obj_type already works)
  • Remove obsolete tag definitions from quickjs.h:
    • JS_TAG_STRING = -8
    • JS_TAG_ARRAY = -6
    • JS_TAG_FUNCTION = -5
    • JS_TAG_FUNCTION_BYTECODE = -2
    • JS_TAG_OBJECT = -1

Current Status

Build: SUCCEEDS with warnings

Runtime: CRASHES with ASAN error:

==16122==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed

The crash occurs during test execution in js_def_realloc called from js_realloc_array. Root cause not yet identified - likely a pointer being passed to realloc that wasn't allocated with malloc.


Notes

  • JSVarDef.var_name is JSValue
  • JSClosureVar.var_name is JSValue
  • JSGlobalVar.var_name is JSValue
  • JSFunctionDef.func_name is JSValue
  • BlockEnv.label_name is JSValue
  • OP_get_field/put_field/define_field already use cpool index format
  • JSRecord with open addressing is fully implemented
  • js_key_hash and js_key_equal work with both immediate and heap text
  • js_key_equal_str enables comparison with C string literals for internal names