Fixed-width strings: Difference between revisions

Jump to navigation Jump to search
Line 18: Line 18:
=== String creation ===
=== String creation ===


Strings may either be created with 8, 16, or 32 bit data. In addition, strings may be created with UTF-8 data, which results in the smallest width that can hold the data.
Strings may either be created with 8, 16, or 32 bit data. In addition, strings may be created with UTF-8 data, which results in the smallest width that can hold the data, or a desired width that may cause the creation method to return NULL if the UTF-8 string contains characters that cannot be represented in the desired width. This is the case for 8-bit strings, and for 16-bit strings, if the character exceeds the value 0x10FFFF.  


String are created using static creator functions. This allows the implementation to use raw memory allocation and in-place constructor calls to avoid having to do two memory allocations, one for the instance, and the other for the data. Strings created that way contain the data right behind the instance data.
Strings are never zero-terminated.
 
Strings are created using static creator functions. This allows the implementation to use raw memory allocation and in-place constructor calls to avoid having to do two memory allocations, one for the instance, and the other for the data. Strings created that way contain the data right behind the instance data.


The maximum string width determines the way strings are created. It is an optional argument to the string constructors.
The maximum string width determines the way strings are created. It is an optional argument to the string constructors.
# 8 bits: If the source data contains 16 or 32 bit data, the return value is null.
# 8 bits: If the source data contains 16 or 32 bit data, the return value is null.
#16 bits: If the source data contains 32 bit values, surrogate pairs are created. If a character is > 0x10FFFF, null is returned.
#16 bits: If the source data contains 32 bit values, surrogate pairs are created. If a character is > 0x10FFFF, NULL is returned.


This allows implementers to define the maximum width of strings; they can choose to use 8, 16 or 32 bits throughout, or they can choose to go with whatever width that fits best. If they choose best-fit widths, string creation methods do not create UTF-16 surrogate pairs. If a script creates surrogate pairs, these will remain in strings, though, although a flattening operation could detect surrogate pairs and widen the flattened string to 32 bits. This should be a global setting.
This allows implementers to define the maximum width of strings; they can choose to use 8, 16 or 32 bits only, or they can choose to go with whatever width that fits best. If they choose best-fit widths, string creation methods do not create UTF-16 surrogate pairs. If a script creates surrogate pairs, these will remain in strings, though, although a flattening operation could detect surrogate pairs and widen the flattened string to 32 bits.


''Question: How are out-of-memory conditions handled? The current implementation often just assumes success. There should be some sort of exception, and the same mechanism should be used to report strings that cannot be created.''
''Question: How are out-of-memory conditions handled? The current implementation often just assumes success. There should be some sort of exception, and the same mechanism should be used to report strings that cannot be created.''
55

edits

Navigation menu