Mozilla 2/Strings/Static Analysis

From MozillaWiki
< Mozilla 2‎ | Strings
Revision as of 03:01, 28 February 2008 by Dmandelin (talk | contribs) (gc strings backgroudn)
Jump to navigation Jump to search

How many strings exist *only* for conversion?

Procedure:

  • find calls to NS_ConvertUTF8toUTF16 and CopyUTF8toUTF16
  • check to see whether that string is modified after the conversion takes place

TODO: define "modification" [dmandelin] Can modification be defined as any of (1) calling a non-const method, (2) passing as a non-const argument, (3) being a parameter, or (4) escaping (by having a pointer stored or a pointer or reference returned)?

repeat for utf16->ut8

If "AString" were immutable, where would we fail?

Imagine that all nsAStrings currently allocated on the stack became a different type (nsAStringBuilder or std::wstring or something). But when we pass strings around, they are immutable. Classify any cases where this wouldn't work:

Take the following methods:

nsresult GetAString(nsAString &result)
{
  result.Assign("foo"); // this is ok, it can be converted to return a new
                        // immutable string
}

nsresult AppendToAString(nsAString &result)
{
  result.Append("foo"); // this won't work... it modifies the inout param so
                        // we would have to rewrite "result" to be
                        // nsAStringBuilder&, or split it into two separate
                        // params, one in, one out.
}

Finalizer-less strings

[dmandelin] I'm writing up my thoughts on how to make strings work without requiring any MMGC finalizers, which is the first application of this analysis.

String Finalizers. Currently, nsTSubstring has a finalizer, which can (1) decrement the reference count on the nsStringBuffer (if the string is F_SHARED), (2) free the character data buffer (if the string is F_OWNED), or (3) do nothing (if the string is F_FIXED).

To get rid of (1), we need to (a) enable GC for the nsStringBuffer member (a local change) and (b) live without the reference count. The RC is used in mutation methods to know when copy-on-write must occur, so without the RC we can't do copy-on-write. Thus, F_SHARED strings need to be immutable.

To get rid of (2), we can (a) require the data buffer of an F_OWNED string to be allocated with MMGC, so that it can be GC'd, or (b) require the programmer to ensure the destructor is invoked on F_OWNED strings (e.g., by allocating the string as an automatic stack variable). Whether to support (a), (b), or both will depend on the use cases in the existing code base. (q.v. calls to Adopt.)

String Classes. Here are some tentative thoughts on what string classes can be used to support finalizer removal. Please note this is really on a conceptual level, so it wouldn't necessarily be implemented as this set of C++ classes, and I'm also leaving out discussion of dependent strings, etc., for now. This is just about nsSubstring/nsString/nsAutoString.

As explained above, we need an immutable string type. Call it ImmutableString. Its API could be just like nsSubstring except without any mutator methods. All ImmutableStrings have an immutable, sharable MMGC-controlled nsStringBuffer.

The mutable string type, MutableString, adds back the mutator methods. Because it offers a superset of behavior of ImmutableString, MutableString is a subclass of ImmutableString.

There are several kinds of MutableString, with different storage for the string data. An AutoStackString stores character data on the stack and is meant to be used for local variables (just like nsAutoString). An AutoHeapString stores character data on the heap and cleans up the heap in its destructor; this would replace stack-local F_OWNED strings. A HeapString stores character data in an MMGC-controlled heap buffer; this can replace heap-stored F_OWNED strings iff the buffer was originally allocated by MMGC.

Another interesting application of HeapString is as a builder for string values that are to be stored in ImmutableStrings: the buffer can be passed (instead of copied) from a HeapString to a new ImmutableString as long as the HeapString's buffer pointer is set to null at the same time.

Simplest Rewriting. TBD