55
edits
m (→String types) |
|||
| Line 26: | Line 26: | ||
#16 bits: If the source data contains 32 bit values, surrogate pairs are created. If a source data value of > 0x10FFFF, null is returned. | #16 bits: If the source data contains 32 bit values, surrogate pairs are created. If a source data value of > 0x10FFFF, null is returned. | ||
=== | === Concatenation, substrings, and flattening === | ||
It would not be a good idea to create a new, flat string every time two strings are concatenated. Consider this loop: | It would not be a good idea to create a new, flat string every time two strings are concatenated. Consider this loop: | ||
var s = ""; | var s = ""; | ||
| Line 34: | Line 34: | ||
The above example would create a deep tree, which is also undesirable. Therefore, a String instance contains a <tt>treeDepth</tt> field that contains the deepest depth of both subtrees plus one. The concat operation will contain a threshold where a string will be flattened before it is used for concatenation. This value should be determined using various benchmarks for optimal memory/performance ration. Also, the field is limited in size (10 bits?), so at some point automatic flattening is forced. | The above example would create a deep tree, which is also undesirable. Therefore, a String instance contains a <tt>treeDepth</tt> field that contains the deepest depth of both subtrees plus one. The concat operation will contain a threshold where a string will be flattened before it is used for concatenation. This value should be determined using various benchmarks for optimal memory/performance ration. Also, the field is limited in size (10 bits?), so at some point automatic flattening is forced. | ||
Getting a substring also flattens the source string. The substring is an instance that contains a pointer to the source string, and pointer to the start of the source string buffer. The length field contains the string length. This string is already flat, although it contains a reference to another string. It may be desirable to have a separate flattening function for this case. | |||
When a string is flattened, its two String pointers are replaced with a flat data buffer. The resulting width of the string is determined by the widths of the strings in the tree. Usually, the resulting string width is the widest of all substrings found. If desired (with an #ifdef), substrings could also be analyzed if they are wider than the containing data, if e.g. a 16-bit strings only contains 8-bit characters. This is, of course, a performance hit, but may be desired if memory footprint is important. A global setting for the maximum string width may be necessary if strings were created using UTF-8 data, because all widths may exist. | When a string is flattened, its two String pointers are replaced with a flat data buffer. The resulting width of the string is determined by the widths of the strings in the tree. Usually, the resulting string width is the widest of all substrings found. If desired (with an #ifdef), substrings could also be analyzed if they are wider than the containing data, if e.g. a 16-bit strings only contains 8-bit characters. This is, of course, a performance hit, but may be desired if memory footprint is important. A global setting for the maximum string width may be necessary if strings were created using UTF-8 data, because all widths may exist. | ||
=== Thread safety === | |||
Since strings are immutable, they are by definition thread safe. The only unsafe operation if the flattening operation. Therefore, the <tt>flatten()</tt> should look like this: | |||
void String::flatten() { | |||
if (0 != this.treeLevel) { | |||
ENTER_CRITICAL_SECTION; | |||
if (0 != treeLevel) { | |||
// do the magic | |||
} | |||
LEAVE_CRITICAL_SECTION | |||
} | |||
} | |||
edits