Tamarin:Strings: Difference between revisions

m
Line 7: Line 7:
=== General ===
=== General ===


The new String class contains strings of variable width. A string can either be 8, 16, or even 32 bits (if 32-bit support is enabled). 8-bit strings contain the first 256 characters of the Unicode alphabet, often referred to as Latin-1. UTF-8 is not supported directly, but there is a string creation method that accepts an UTF-8 string, and a <tt>toUTF8String()</tt> method that returns a 0-terminated UTF-8 string.
The new String class contains strings of variable width. A string can either be 8, 16, or even 32 bits (if 32-bit support is enabled). 8-bit strings contain the first 256 characters of the Unicode alphabet, often referred to as Latin-1. UTF-8 is not supported directly, but there is a string creation method that accepts an UTF-8 string, and a <tt>toUTF8String()</tt> method that returns a NUL--terminated UTF-8 string.


Strings do not have a data buffer of its own; instead, strings are created with the data directly following the instance data. Therefore, there is no ''new'' operator; instead, there are a number of static <tt>create()</tt> methods that create and return a string.
Strings do not have a data buffer of its own; instead, strings are created with the data directly following the instance data. Therefore, there is no ''new'' operator; instead, there are a number of static <tt>create()</tt> methods that create and return a string.
Line 13: Line 13:
The data inside a string can be stored in different ways. A substring contains a reference to the master string, the data pointers points into the master string, and the length also fits into the master string. String containing static data have a pointer to that data.
The data inside a string can be stored in different ways. A substring contains a reference to the master string, the data pointers points into the master string, and the length also fits into the master string. String containing static data have a pointer to that data.


Very important: Strings are never 0-terminated, because they may contain a 0 as a valid character. It can very well happen that e.g. a string seems to contain the string "abcd", while the length is just 2, so the actual contents are "ab".
Very important: Strings are never NUL-terminated, because they may contain a 0 as a valid character. It can very well happen that e.g. a string seems to contain the string "abcd", while the length is just 2, so the actual contents are "ab".


=== Creation ===
=== Creation ===
Line 19: Line 19:
The only way to create a string is to use one of the <tt>create()</tt> methods:
The only way to create a string is to use one of the <tt>create()</tt> methods:


   static Stringp String::create (const AvmCore* core, const char* buffer,
   static Stringp String::create(const AvmCore* core, const char* buffer,
     int32_t len, Width desiredWidth = kAuto, bool staticBuf = false);
     int32_t len, Width desiredWidth = kAuto, bool staticBuf = false);


Line 38: Line 38:


   // create a 16-bit string with 1024 characters
   // create a 16-bit string with 1024 characters
   String* myString = String::create (core, NULL, 1024, String::k16);
   String* myString = String::create(core, NULL, 1024, String::k16);
   // Access the string data; this string is new and referenced nowhere,
   // Access the string data; this string is new and referenced nowhere,
   // so this is a safe operation.
   // so this is a safe operation.
   wchar* myChars = (wchar*) myString->getData();
   wchar* myChars = (wchar*) myString->getData();


To get a string of a known fixed width, use the <tt>getFixedWidthString()</tt> method. The method returns ''this'' if the string already has the requested width; otherwise, it returns a copy of the string with the given width. Note that if the requested width is too narrow because e.g. a 16-bit string contains characters >= 0x0100, and a 8-bit string is requested, the return value is ''NULL''. ALso, the returned string is not 0-terminated.
To get a string of a known fixed width, use the <tt>getFixedWidthString()</tt> method. The method returns ''this'' if the string already has the requested width; otherwise, it returns a copy of the string with the given width. Note that if the requested width is too narrow because e.g. a 16-bit string contains characters >= 0x0100, and a 8-bit string is requested, the return value is ''NULL''. ALso, the returned string is not NUL-terminated.


   Stringp s = getSomeString();
   Stringp s = getSomeString();
Line 72: Line 72:
   if (ns) {
   if (ns) {
     name = ns;
     name = ns;
     name = name->append (":");
     name = name->append(":");
     if (xml->isAttr)
     if (xml->isAttr)
       name = name->append ("@");
       name = name->append("@");
     name = name->append (xml->getName());
     name = name->append(xml->getName());
   } else {
   } else {
     name = xml->getName();
     name = xml->getName();
Line 85: Line 85:


   Stringp s = ...;
   Stringp s = ...;
   if (s->matches ("<?xml", 5)) ...
   if (s->matches("<?xml", 5)) ...
   else if (s->matches ("<![CDATA[", 9)) ...
   else if (s->matches("<![CDATA[", 9)) ...
   if (s->indexOf (":")) < 0) ...
   if (s->indexOf(":")) < 0) ...
55

edits