249 lines
11 KiB
Markdown
249 lines
11 KiB
Markdown
# Cell actor scripting language
|
|
|
|
Cell is a Misty [https://mistysystem.com](https://mistysystem.com) implementation.
|
|
|
|
## Memory
|
|
Values are 32 bit for 32 bit builds and 64 bit for 64 bit builds.
|
|
|
|
### 32 bit value
|
|
|
|
LSB = 0
|
|
payload is a 31 bit signed int
|
|
|
|
LSB = 01
|
|
payload is a 30 bit pointer
|
|
|
|
LSB = 11
|
|
next 3 bits = special tag. 27 bits of payload.
|
|
|
|
### 64 bit value
|
|
LSB = 0
|
|
payload is a 32 bit signed int, using high 32 bits
|
|
|
|
LSB = 01
|
|
payload is a 61 bit pointer
|
|
|
|
LSB = 101
|
|
Short float: a 61 bit double, with 3 less exponent bits
|
|
|
|
LSB = 11
|
|
Special tag: next 3 bits. 5 bits total. 59 bits of payload. 8 total special tags.
|
|
|
|
Special tags:
|
|
1: Bool. Payload is 0 or 1.
|
|
2: null. payload is 0.
|
|
3: exception.
|
|
4: string.
|
|
Immediate string. Next 3 low bits = length in bytes. Rest is string data. This allows for strings up to 7 ascii letters. Encoded in utf8.
|
|
|
|
## Numbers and math
|
|
Cell can be compiled with different levels of exactness for numeracy. Any number which cannot be represented exactly becomes "null". Any numeric operation which includes "null" results in "null".
|
|
|
|
Using short floats in a 64 bit system means you have doubles in the range of +- 10^38, not the full range of double. If you create a number out of that range, it's null.
|
|
|
|
You can also compile a 64 bit system with full precision doubles, but this will use more memory and may be slower.
|
|
|
|
You can also compile a 64 bit system with 32 bit floats, stored as a 32 bit int is. Again, out of the 32 bit float range = null.
|
|
|
|
You can compile without floating point support at all; 32 bit ints are then used for fixed point calculations.
|
|
|
|
Or, you can compile using Dec64, which is a 64 bit decimal floating point format, for exact precision.
|
|
|
|
## Objects
|
|
Objects are heap allocated, referenced by a pointer value. They are all preceded by an object header, the length of a word on the system.
|
|
|
|
### 64 bit build
|
|
56 bits capacity
|
|
1 bit memory reclamation flag: note that this obj has already been moved
|
|
2 bit reserved (per object)
|
|
1 bit stone: note that this obj is immutable
|
|
3 bit type: note the type of the object
|
|
1 bit: fwd: note that this obj is a forward linkage
|
|
|
|
Last bit ..1:
|
|
The forward type indicates that the object (an array, blob, pretext, or record) has grown beyond its capacity and is now residing at a new address. The remaining 63 bits contain the address of the enlarged object. Forward linkages are cleaned up by the memory reclaimer.
|
|
|
|
Type 7: C light C object
|
|
|
|
Header
|
|
Pointer
|
|
|
|
Capacity is an ID of a registered C type.
|
|
Pointer is a pointer to the opaque C object.
|
|
|
|
Type 0: Array
|
|
Header
|
|
Length
|
|
Element[]
|
|
|
|
Capacity is number of elements the array can hold. Length is number of elements in use. Number of words used by an array is capacity + 2.
|
|
|
|
Type 1: blob
|
|
Header
|
|
Length
|
|
Bit[]
|
|
Capacity is number of bits the blob can hold. Length is number of bits in use. Bits follow, from [0] to [capacity - 1], with [0] bit in the most significant position of word 2, and [63] in the least significant position of word 2. The last word is zero filled, if necessary.
|
|
|
|
Number of words used is (capacity + 63) // 64 + 2
|
|
|
|
Type 2: Text
|
|
Text has two forms, depending on if it is stone or not, which changes the meaning of its length word.
|
|
|
|
Header
|
|
Length(pretext) or Hash(text)
|
|
Character[0] and character[1]
|
|
|
|
Capacity of pretex is the number of characters it can hold. During stoning and reclamation, capacity is set to the length.
|
|
|
|
The capacity of a text is its length.
|
|
|
|
The length of a pretext is the number of characters it contains; it is not greater than the capacity.
|
|
|
|
Hash of a text is used for organizing records. If the hash is zero, it's not been computed yet. All texts in the immutable memory have hashes.
|
|
|
|
A text object contains UTF32 characters, packed two per word. If the number of characters is odd, the least significant half of the last word is zero filled.
|
|
|
|
The number of words used by a text is (capacity + 1) // 2 + 2
|
|
|
|
Type 3: Record
|
|
|
|
A record is an array of fields represented as key/value pairs. Fields are located by hashes of texts, using open addressing with linear probing and lazy deletion. The load factor is less than 0.5.
|
|
|
|
Header
|
|
Prototype
|
|
Length
|
|
Key[0]
|
|
Value[0]
|
|
Key[1]
|
|
Value[1]
|
|
...
|
|
|
|
The capacity is the number of fields the record can hold. It is a power of two minus one. It is at least twice the length.
|
|
|
|
The length is the number of fields that the record currently contains.
|
|
|
|
A field candidate number is identified by and(key.hash, capacity). In case of hash collision, advance to the next field. If this goes past the end, continue with field 1. Field 0 is reserved.
|
|
|
|
The "exception" special tag is used to mark deleted entries in the object map.
|
|
|
|
The number of words used by a record is (capacity + 1) * 2.
|
|
|
|
Prototypes are searched for for properties if one cannot be found on the record itself. Prototypes can have prototypes.
|
|
|
|
#### key[0] and value[0]
|
|
These are reserved for internal use, and skipped over during key probing.
|
|
|
|
The first 32 bits of key are used as a 32 bit integer key, if this object has ever been used as a key itself.
|
|
|
|
The last 32 bits are used as an opaque C class key. C types can be registered with the system, and each are assigned a monotonically increasing number. In the case that this object has a C type, then the bottom 32 bits of key[0] are not 0. If that is the case, then a pointer to its C object is stored in value[0].
|
|
|
|
#### Valid keys & Hashing
|
|
Keys are stored directly in object maps. There are three possibilities for a vaild key: an object text, an object record, or an immediate text.
|
|
|
|
In the case of an immediate text, the hash is computed on the fly using the fash64_hash_one function, before being used to look up the key in the object map. Direct value comparison is used to confirm the key.
|
|
|
|
For object texts (texts longer than 7 ascii chars), the hash is stored in the text object itself. When an object text is used as a key, a stone version is created and interned. Any program static texts reference this stoned, interned text. When looking up a heap text as a key, it is first discovered if it's in the interned table. If it's not, the key is not in the object (since all keys are interned). If it is, the interned version is returned to check against the object map. The hash of the interned text is used to look up the key in the object map, and then direct pointer comparison is used to confirm the key.
|
|
|
|
For record keys, these are unique; once a record is used as a key, it gets assigned a monotonically increasing 32 bit integer, stored in key[0]. When checking it in an object map, the integer is used directly as the key. If key[0] is 0, the record has not been used as a key yet. If it's not 0, fash64_hash_one is used to compute a hash of its ID, and then direct value pointer comparison is used to confirm.
|
|
|
|
### Text interning
|
|
Texts that cannot fit in an immediate, and which are used as an object key, create a stoned and interned version (the pointer which is used as the key). Any text literals are also stoned and interned.
|
|
|
|
The interning table is an open addressed hash, with a load of 0.8, using a robin hood value. Probing is done using the text hash, confirmation is done using length, and then memcmp of the text.
|
|
|
|
When the GC run, a new interned text table is created. Each text literal, and each text used as a key, is added to the new table, as the live objects are copied. This keeps the interning table from becoming a graveyard. Interned values are never deleted until a GC.
|
|
|
|
Type 4: Function
|
|
|
|
Header
|
|
Code
|
|
Outer
|
|
A function object has zero capacity and is always stone.
|
|
|
|
Code is a pointer to the code object that the function executes.
|
|
|
|
Outer is a pointer to the frame that created this function object.
|
|
|
|
Size is 3 words.
|
|
|
|
Type 5: Frame
|
|
|
|
Header
|
|
Function
|
|
Caller
|
|
Return address
|
|
|
|
The activation frame is created when a function is invoked to hold its linkages and state.
|
|
|
|
The capacity is the number of slots, including the inputs, variables, temporaries, and the four words of overhead. A frame, unlike the other types, is never stone.
|
|
|
|
The function is the address of the function object being called.
|
|
|
|
The caller is the address of the frame that is invoking the function.
|
|
|
|
The return address is the address of the instruction in the code that should be executed upon return.
|
|
|
|
Next come the input arguments, if any.
|
|
|
|
Then the variables closed over by the inner functions.
|
|
|
|
Then the variables that are not closed over, followed by the temporaries.
|
|
|
|
When a function returns, the caller is set to zero. This is a signal to the memory reclaimer that the frame can be reduced.
|
|
|
|
Type 6: Code
|
|
|
|
Header
|
|
Arity
|
|
Size
|
|
Closure size
|
|
Entry point
|
|
Disruption point
|
|
|
|
A code object exists in the actor's immutable memory. A code object never exists in mutable memory.
|
|
|
|
A code object has a zero capacity and is always stone.
|
|
|
|
The arity is the maximum number of inputs.
|
|
|
|
The size is the capacity of an activation frame that will execute this code.
|
|
|
|
The closure size is a reduced capacity for returned frames that survive memory reclamation.
|
|
|
|
The entry point is the address at which to begin execution.
|
|
|
|
The disruption point is the address of the disruption clause.
|
|
|
|
### opaque C objects
|
|
Records can have opaque C data attached to them.
|
|
|
|
A C class can register a GC clean up, and a GC trace function. The trace function is called when the record is encountered in the live object graph; and it should mark any values it wants to keep alive in that function.
|
|
|
|
The system maintains an array of live opaque C objects. When such an object is encountered, it marks it as live in the array. When the GC completes, it iterates this array and calls the GC clean up function for each C object in the array with alive=0. Alive is then cleared for the next GC cycle.
|
|
|
|
## 32 bit build
|
|
~3 bit type
|
|
1 bit stone
|
|
1 bit memory reclamation flag
|
|
27 bit capacity
|
|
|
|
Key differences here are
|
|
|
|
blob max capacity is 2**27 bits = 2**24 bytes = 16 MB [this likely needs addressed]
|
|
|
|
fwd is type ...0, and the pointer is 31 bits
|
|
other types are
|
|
111 array
|
|
101 object
|
|
011 blob
|
|
001
|
|
|
|
## Memory
|
|
Cell uses a single block of memory that it doles out as needed to the actors in its system.
|
|
|
|
Actors are given a block of memory in standard sizes using a doubling buddy memory manager. An actor is given an immutable data section on birth, as well as a mutable data section. When its mutable data becomes full, it requests a new one. Actors utilize their mutable memory with a simple bump allocation. If there is not sufficient memory available, the actor suspends and its status changes to exhausted.
|
|
|
|
The smallest block size is determined per platform, but it can be as small as 4KB on 64 bit systems.
|
|
|
|
The actor is then given a new block of memory of the same size, and it runs a garbage collector to reclaim memory. It uses the cheney copying algorithm. If a disappointing amount of memory was reclaimed, it is noted, and the actor is given a larger block of memory on the next request.
|