Tamarin:String implementation: Difference between revisions

m
→‎Design: fix typos
m (→‎Design: fix typos)
 
(One intermediate revision by one other user not shown)
Line 12: Line 12:


# Direct strings, where the character data immediately follows the instance data. Memory is allocated via low-level calls to GC-Alloc() with an in-place constructor call.
# Direct strings, where the character data immediately follows the instance data. Memory is allocated via low-level calls to GC-Alloc() with an in-place constructor call.
# Static strings, where the character data is kept elsewhere. This data must be guaranteed to exist longer than the String instance itself. C character constans are good candidates. ABC data is also be a good candidate as long as the unload of the ABC data does not cause the data these strings to become invalid.
# Static strings, where the character data is kept elsewhere. This data must be guaranteed to exist longer than the String instance itself. C character constants are good candidates. ABC data is also be a good candidate as long as the unload of the ABC data does not cause the data in these strings to become invalid.
# Dependent strings, where a DRC'ed pointer keeps a reference to the master string, and the string contains a pointer to the start of the character data, and a length count.
# Dependent strings, where a DRC'ed pointer keeps a reference to the master string, and the string contains a pointer to the start of the character data, and a length count.


Out-of-memory conditions will be handled by the allocator in a future version. The String class uses checks for NULL, and returns NULL for new strings whose allocation failed.  
Out-of-memory conditions will be handled by the allocator in a future version. The String class uses checks for NULL, and returns NULL for new strings whose allocation failed.


=== UTF-8, UTF-16 and UTF-32 ===
=== UTF-8, UTF-16 and UTF-32 ===
Line 224: Line 224:
* JS_GetStringChars() returns a pointer to UTF-16 characters, and JS_GetStringBytes() returns a pointer to UTF-8 characters. Both buffers are guaranteed to live as long as the string instance lives. SM maintains a separate cache for this purpose, where string buffers are garbage-collected. Other encodings may be requested as well.
* JS_GetStringChars() returns a pointer to UTF-16 characters, and JS_GetStringBytes() returns a pointer to UTF-8 characters. Both buffers are guaranteed to live as long as the string instance lives. SM maintains a separate cache for this purpose, where string buffers are garbage-collected. Other encodings may be requested as well.


=== StringDataUTF8 ===
=== StUTF8String ===


This TT helper class was used to wrap a String instance (which contained UTF-8 data) into a class providing direct access to the string buffer. The new String code offers a stack-based <tt>StUTF8String</tt> containing UTF-8 data and provides access to that data. The pcre code needs this class and another class <tt>StIndexableUTF8String</tt> class, since pcre is UTF-8 based. This leads to a performance slowdown that could be avoided if a regular expression parser was used that worked with UTF-16 data.
This TT helper class was used to wrap a String instance (which contained UTF-8 data) into a class providing direct access to the string buffer. The new String code offers a stack-based <tt>StUTF8String</tt> containing UTF-8 data and provides access to that data. The pcre code needs this class and another class <tt>StIndexableUTF8String</tt> class, since pcre is UTF-8 based. This leads to a performance slowdown that could be avoided if a regular expression parser was used that worked with UTF-16 data.
10

edits