|
|
| (39 intermediate revisions by 3 users not shown) |
| Line 1: |
Line 1: |
| === Rationale ===
| | This page has moved to [[Tamarin:String_implementation]]. |
| | |
| Currently, Tamarin-Tracing strings are UTF-8. As much as these are desirable in terms of memory consumption, index access and other functions are unbearably slow.
| |
| | |
| === Design Goals ===
| |
| | |
| # Fixed-width strings with fixed widths of 8, 16 and 32 bits. Widths are either automatic (based on UTF-8 input), or manual (as created).
| |
| #* 8-bit strings contain the first Unicode 256 characters. UTF-8 is not supported in strings.
| |
| #* 16-bit strings are UTF-16, including surrogate pairs.
| |
| # String contents may be based upon static data.
| |
| # String concatenation creates a tree structure, which is flattened when needed.
| |
| | |
| | |
| === String types ===
| |
| | |
| Strings are immutable, but their contents may be different. String data can be 8, 16, or 32 bits wide; A string may contain a pointer to a string buffer, which may need to be deleted or not; or the string data may immediately follow the String instance in memory (by doing a raw allocate followed by an in-place constructor call); or it can hold two String references if is the result of a concat operation. The contents may change dynamically during a flatten operation. Therefore, a String instance contains a <tt>union</tt> with a tag.
| |
| | |
| === String concatenation ====
| |
| It would not be a good idea to create a new, flat string every time two strings are concatenated. Consider this loop:
| |
| var s = "";
| |
| for (var i = 32; i <= 1024; i++)
| |
| s += String.fromCharCode (i);
| |
| If a new, flat string would be created on every iteration, this would lead to a almost 1000 copy operations, with a growing string buffer. Instead, the resulting string contains two String pointers that point to the two source strings.
| |
| | |
| The above example would create a deep tree, which is also undesirable. Therefore, a String instance contains a <pre>treeDepth</pre> field that contains the deepest depth of both subtrees plus one. The concat operation will contain a threshold where a string will be flattened before it is used for concatenation. This value should be determined using various benchmarks for optimal memory/performance ration. Also, the field is limited in size (10 bits?), so at some point automatic flattening is forced.
| |