55
edits
| Line 7: | Line 7: | ||
=== General === | === General === | ||
The new String class contains strings of variable width. A string can either be 8, 16, or even 32 bits (if 32-bit support is enabled). 8-bit strings contain the first 256 characters of the Unicode alphabet, often referred to as Latin-1. | The new String class contains strings of variable width. A string can either be 8, 16, or even 32 bits (if 32-bit support is enabled). 8-bit strings contain the first 256 characters of the Unicode alphabet, often referred to as Latin-1. A special constructor accepts a null-terminated UTF-8 string. | ||
Support for string widths of 32 bits is disabled ; a special constant enables this support. | |||
There is no ''new'' operator; instead, there are a number of static creation methods that create and return a string: <tt>createUTF8(), createLatin1(), createUTF16(), createUTF32()</tt>. All of these creators accept a width constant, so strings are created with widths of 8, 16, or 32 bits. The value ''kAuto'' lets the creators determine the width that fits best. If, for example, <tt>createUTF8()</tt> is invoked with a string that decodes to the Latin-1 character set, the resulting string width is 8 bits. All creators accepts a Boolean value that, if true, declares the character data to be static, meaning that the String instance can use the buffer directly without having to copy the character data. Of course, the character data must be guaranteed to live longer as the string, or derivates of that string. | |||
Very important: Strings are never NUL-terminated, because they may contain | Usually, the character data is copied into a data buffer that the String instance points to. A substring contains a reference to the master string, the data pointers points into the master string, and the length also fits into the master string. Strings containing static data have a pointer to that data. | ||
'''Very important: Strings are never NUL-terminated, because they may contain NUL characters as valid characters. ''' | |||
=== Creation === | === Creation === | ||
The only way to create a string is to use one of the | The only way to create a string is to use one of the static creator methods: | ||
static Stringp String:: | static Stringp String::createLatin1(const AvmCore* core, const char* buffer, | ||
int32_t len, Width desiredWidth = kAuto, bool staticBuf = false); | int32_t len = -1, Width desiredWidth = kAuto, bool staticBuf = false); | ||
There is a | There is a <tt>createUTF16()</tt> method for ''const wchar*'' data, a method <tt>createUTF8()</tt> for UTF-8 character data, and <tt>createUTF32()</tt> (the latter only if 32-bit support is enabled). | ||
The default argument for the desired string | The default argument for the desired string width is ''kAuto''. In that case, the method checks the string and creates a String instance that best fits the string data. If the source data is 32 bits and the desired width is 16 bit, surrpgate pairs will be created. If the source data is 16 bits and the destination data is 32 bits, surrogate pairs will be combined into a single UTF-32 character. If the requested width is too small to fit the string, NULL is returned. | ||
If the ''staticBuf'' argument is ''true'', the buffer is considered to live as long as the supplied <tt>AvmCore</tt> instance, and the string data is not copied if it matches the criteria set by the requested width. For UTF-8 data, the data must be ASCII to match this criteria. | If the ''staticBuf'' argument is ''true'', the buffer is considered to live as long as the supplied <tt>AvmCore</tt> instance, and the string data is not copied if it matches the criteria set by the requested width. For UTF-8 data, the data must be ASCII to match this criteria. | ||
| Line 31: | Line 33: | ||
=== Data access === | === Data access === | ||
Direct access to the data buffer is not longer possible, since it is not guaranteed that the string data is unique, or even writable. Therefore, the <tt>c_str()</tt> method is gone. It is till possible to access single characters via the <tt>charAt()</tt> method or the <tt>StringIndexer>/tt> class. The latter class is a fast way to iterate through the string data. | |||
Example: | |||
// Create a string | |||
Stringp s = String::createLatin1(core, "Hello world"); | |||
// Iterate through the string | |||
StringIndexer indexer(s); | |||
for(int i = 0; i < inders.length(); i++) | |||
process (indexer[i]); | |||
To retrieve a character string that contains UTF-8 or UTF-16 data, use the classes <tt>StUTF8String()</tt> or <tt>StUTF16String()</tt>. These classes are stack-creatable only, and they contain a NUL-terminated string that can be accessed via its <tt>c_str()</tt> method. Note that creating such an instance on the stack causes a copy of the string to be created. Another class <tt>StIndexableUTF8String</tt> adds the computation between UTF-8 code points and byte offsets. All of these classes are data buffers only; they are not "real" String instances. | |||
Example: | Example: | ||
// | // Create a string | ||
Stringp s = String::createLatin1(core, "Hello world"); | |||
// Access | // Access that string as UTF-16 data | ||
StUTF16String s16(s); | |||
wchar* | const wchar* p = s16.c_str(); | ||
To get a string of a known fixed width, use the <tt>getFixedWidthString()</tt> method. The method returns ''this'' if the string already has the requested width; otherwise, it returns a copy of the string with the given width. Note that if the requested width is too narrow because e.g. a 16-bit string contains characters >= 0x0100, and a 8-bit string is requested, the return value is ''NULL''. | |||
Example: | |||
Stringp s = getSomeString(); | Stringp s = getSomeString(); | ||
| Line 49: | Line 62: | ||
if (!s16) | if (!s16) | ||
return error; | return error; | ||
=== Appending data === | |||
String | The String class offers several <tt>appendXXX()</tt> methods to append strings to a string. These methods return either a new String instance, or the String instance itself, if in-place concatenation was possible (see [[Tamarin:String_implementation]] for details). | ||
Stringp | Stringp append(const String str); // append a String instance | ||
// | Stringp appendLatin1(const char* data); // append characters | ||
wchar | Stringp append16(const wchar* data); // append UTF-16 data | ||
Stringp append32(const utf32_t* data); // append UTF-32 data | |||
for | If the appended data is too wide for the string, the string is widened. The latter three methods have overloads that adds the length of the string to be appended. | ||
The old static <tt>concatStrings()</tt> is still available. | |||
Example: | |||
// Create an XML attribute with namespace | // Create an XML attribute with namespace | ||
| Line 72: | Line 83: | ||
if (ns) { | if (ns) { | ||
name = ns; | name = ns; | ||
name = name-> | name = name->appendLatin1(":"); | ||
if (xml->isAttr) | if (xml->isAttr) | ||
name = name-> | name = name->appendLatin1("@"); | ||
name = name-> | name = name->appendLatin1(xml->getName()); | ||
} else { | } else { | ||
name = xml->getName(); | name = xml->getName(); | ||
| Line 82: | Line 93: | ||
=== Additional String methods === | === Additional String methods === | ||
The String class contains most of the usual JavaScript String methods like <tt>indexOf()</tt> etc. These are highly optimized and accept integer arguments, so it is OK to use them freely. There is a special version of <tt>indexOf</tt> that accepts a ''char*'' for a quick compare with a character constant, as well as | The String class contains most of the usual JavaScript String methods like <tt>indexOf()</tt> etc. These are highly optimized and accept integer arguments, so it is OK to use them freely. There is a special version of <tt>indexOf</tt> that accepts a ''char*'' for a quick compare with a character constant, as well as <tt>matchesXXX()</tt> methods that matches the string at a given position to an argument. Finally, there are <tt>containsXXX()</tt> methods to check for the existence of a substring. | ||
Example: | |||
Stringp s = ...; | Stringp s = ...; | ||
if (s-> | if (s->matchesLatin1("<?xml", 5)) ... | ||
else if (s-> | else if (s->matchesLatin1("<![CDATA[", 9)) ... | ||
if (s-> | if (s->indexOfLatin1(":")) < 0) ... | ||
The <tt>getIndependentString()</tt> converts a substring to a normal string. This is handy if the string needs to live for a long time, but if you do not want the master of a dependent substring to live for that long. | |||
Example: | |||
Stringp xml = parseVeryLargeXMLFile(); | |||
Stringp start = xml->substr (0, 3); | |||
// if start was to be stored anywhere, the entire, hige XML string would remain alive | |||
// unless makeDynamic was called | |||
start = start->getIndependentString(); | |||
The <tt>makeDynamic()</tt> converts a string with a static buffer, or a string that is a substring to a string with a dynamic buffer. | |||
edits