10
edits
m (→Design: fix typos) |
|||
| (One intermediate revision by one other user not shown) | |||
| Line 12: | Line 12: | ||
# Direct strings, where the character data immediately follows the instance data. Memory is allocated via low-level calls to GC-Alloc() with an in-place constructor call. | # Direct strings, where the character data immediately follows the instance data. Memory is allocated via low-level calls to GC-Alloc() with an in-place constructor call. | ||
# Static strings, where the character data is kept elsewhere. This data must be guaranteed to exist longer than the String instance itself. C character | # Static strings, where the character data is kept elsewhere. This data must be guaranteed to exist longer than the String instance itself. C character constants are good candidates. ABC data is also be a good candidate as long as the unload of the ABC data does not cause the data in these strings to become invalid. | ||
# Dependent strings, where a DRC'ed pointer keeps a reference to the master string, and the string contains a pointer to the start of the character data, and a length count. | # Dependent strings, where a DRC'ed pointer keeps a reference to the master string, and the string contains a pointer to the start of the character data, and a length count. | ||
Out-of-memory conditions will be handled by the allocator in a future version. The String class uses checks for NULL, and returns NULL for new strings whose allocation failed. | Out-of-memory conditions will be handled by the allocator in a future version. The String class uses checks for NULL, and returns NULL for new strings whose allocation failed. | ||
=== UTF-8, UTF-16 and UTF-32 === | === UTF-8, UTF-16 and UTF-32 === | ||
| Line 224: | Line 224: | ||
* JS_GetStringChars() returns a pointer to UTF-16 characters, and JS_GetStringBytes() returns a pointer to UTF-8 characters. Both buffers are guaranteed to live as long as the string instance lives. SM maintains a separate cache for this purpose, where string buffers are garbage-collected. Other encodings may be requested as well. | * JS_GetStringChars() returns a pointer to UTF-16 characters, and JS_GetStringBytes() returns a pointer to UTF-8 characters. Both buffers are guaranteed to live as long as the string instance lives. SM maintains a separate cache for this purpose, where string buffers are garbage-collected. Other encodings may be requested as well. | ||
=== | === StUTF8String === | ||
This TT helper class was used to wrap a String instance (which contained UTF-8 data) into a class providing direct access to the string buffer. The new String code offers a stack-based <tt>StUTF8String</tt> containing UTF-8 data and provides access to that data. The pcre code needs this class and another class <tt>StIndexableUTF8String</tt> class, since pcre is UTF-8 based. This leads to a performance slowdown that could be avoided if a regular expression parser was used that worked with UTF-16 data. | This TT helper class was used to wrap a String instance (which contained UTF-8 data) into a class providing direct access to the string buffer. The new String code offers a stack-based <tt>StUTF8String</tt> containing UTF-8 data and provides access to that data. The pcre code needs this class and another class <tt>StIndexableUTF8String</tt> class, since pcre is UTF-8 based. This leads to a performance slowdown that could be avoided if a regular expression parser was used that worked with UTF-16 data. | ||
edits