Performance:Footprint Reduction Techniques

From MozillaWiki
Jump to: navigation, search

This document will attempt to describe some patterns I've seen in the mozilla codebase that contributes to unnecessarily large code. It describes different patterns that one can use to reduce code footprint, while trying to maintain performance and minimal heap usage.

Note: take everything here with a grain of salt. There are times when the "bad" patterns shown here are very appropriate and are faster or smaller than the "good" pattern. Choosing the correct pattern requires proper understanding of how the code is used, including how often code is called, the quantity of objects used at any given time, the current object size, and so forth.

Overuse of virtual methods to indicate class attributes

A common pattern is to use virtual methods to indicate if a specific class (as opposed to a class instance) has a given attribute. Implementations will override this virtual method to override some default value. This often results in multiple implementations of some method when a simple member variable in a base class would have worked fine. (in some sense what you really want is a static virtual method. Sadly, this is not possible in C++)

For example in the class below, GetOpaque is a virtual method used simply to decide if a given class should be filled. The value of GetOpaque is an attribute of a particular class, and does not need to be defined on a per-instance basis:

// interface that all shapes must implement
class nsIDrawable : nsISupports {
    NS_IMETHOD GetOpaque(PRBool* aResult) = 0;
    NS_IMETHOD Fill() = 0;
}

// base class, does not override GetOpaque()
class baseShape : public nsIDrawable
{
public:
    baseImplementation() {}

    NS_IMETHODIMP Draw() {
        PRBool opaque;
        GetOpaque(&opaque);
        if (opaque)
            Fill();
    }

    // more implementation below
    NS_IMETHODIMP Fill()
    {
        // ... implementation ...
    }
};

class filledCircle : public baseShape {
public:
    filledCircle() { }

    // a filled circle is always opaque
    NS_IMETHODIMP GetOpaque(PRBool* aResult) {
        *aResult = PR_TRUE;
        return NS_OK;
    }
};

class emptySquare : public baseShape {
public:
    emptySquare() {}

    // emptySquare is never opaque
    NS_IMETHODIMP GetOpaque(PRBool* aResult) {
        *aResult = PR_FALSE;
        return NS_OK;
    }
};

What's wrong with this example? From a code bloat perspective there is a lot of code just to store a flag that is shared across all implementations. In addition, there is the overhead of calling a virtual method just to get a boolean value.

Here's a better implementation that generates less compiled code, and is potentially faster:

// base class, does not override GetOpaque();
class baseShape : public nsIDrawable
{
public:
    baseImplementation(PRBool aOpaque) : mOpaque(aOpaque) {}

    NS_IMETHODIMP Draw() {
        if (mOpaque)
            Fill();
    }

    NS_IMETHODIMP GetOpaque(PRBool *aResult) {
        *aResult = mOpaque;
    }

    // more implementation below
    NS_IMETHODIMP Fill()
    {
        // ... implementation ...
    }
protected:
    PRPackedBool mOpaque;
};

class filledCircle : public baseShape {
public:
    // a filled circle is always opaque
    filledCircle() : baseShape(PR_TRUE) {}

};

class emptySquare : public baseShape {
public:
    // emptySquare is never opaque
    emptySquare() : baseShape(PR_FALSE) {}

};

Notice that there is no per-class method to determine the opaqueness of an object. The new subclasses are much simpler, and they declare their attributes during construction, rather than every time some virtual method is called.

But there are tradeoffs! The base class now has a new member variable, mOpaque. You've just increased the size of this object by at least 1 byte, and perhaps you've increased it even more depending on the layout of the base class. For this example, the tradeoff is questionable, and the value of this tradeoff depends heavily on the use of these classes.

For example, say there are only 5 subclasses of baseShape. The compiled implementations of the old virtual GetOpaque() might amount to 150 bytes of code. If there are typically 10,000 shapes created, then you've just increased the heap footprint by 10,000 bytes. This is clearly a loss (10,000 vs. 150 bytes) On the other hand, if GetOpaque() is called 3 times for each object, then you've just saved yourself 30,000 virtual method calls, which can certainly be expensive.

In an example that used to exist in the tree, we had 150 subclasses, each of which overrode a virtual method to return an integer. But typically only 2-4 were instantiated at a given time. The win of over 4500 bytes of compiled code clearly outweighed the extra heap impact of 4-16 bytes.

Finally, imagine there are a many more attributes beyond the simple opaqueness of an object. A simple static structure can store per-class attributes, and the base class needs merely to store a pointer to that structure.

// common per-class attributes
struct shapeAttributes {
  PRPackedBool mOpaque;
  PRPackedBool mHasCorners;
  PRPackedBool mSupports3D;
  PRPackedBool mSupportsPatternFill;
  PRInt32 mNumberOfSides;
  PRInt32 mMaximumColors;
  PRInt32 mAlphaMask;
};

// base class knows how to use shapeAttributes
class baseShape : public nsIDrawable
{
public:
    baseImplementation(const shapeAttributes* aAttributes) : mAttributes(aAttributes) {}

    NS_IMETHODIMP Draw() {
        if (mAttributes.mOpaque)
            Fill();
    }

    NS_IMETHODIMP GetOpaque(PRBool *aResult) {
        *aResult = mAttributes.mOpaque;
    }

    // more implementation below
    // ...
protected:
    const shapeAttributes* mAttributes;
};

// shared among all filledCircle instances
static const shapeAttributes filledCircleAttributes {
  PR_TRUE,  // opaque
  PR_FALSE, // corners
  PR_TRUE,  // 3d
  PR_TRUE,  // pattern fill
  0,        // no sides
  64,       // max colors
  RGB(0,1,2), // alpha mask
};

class filledCircle : public baseShape {
public:
    // a filled circle is always opaque
    filledCircle() : baseShape(&circleAttributes) {}

};

// shared among all emptySquare instances
static const shapeAttributes squareAttributes {
// ...
};

class emptySquare : public baseShape {
public:
    // emptySquare is never opaque
    emptySquare() : baseShape(&squareAttributes) {}

};

In this more complex example, the base class only maintains a single 4 byte pointer to a class structure that defines unchanging attributes that are shared among all instances of a given base class. The base class can access all the attributes without resorting to a virtual method call into the subclass.

Too much class specialization

Huge switch statements

It is common to write a large switch statement to account for a number of common actions. However, a large switch statement can also result in slow, bloaty code as the compiler generates a series of test-and-jumps. This is equivalent to a large loop, and can result in blowing out the instruction cache.

For instance, this switch statement, taken from nsHTMLContentSink is really just a massive loop, trying to to find a tag name among 75 candidates:

switch (aNodeType) {
case eHTMLTag_a:
    rv = NS_NewHTMLAnchorElement(..);
    break;
case eHTMLTag_applet:
    rv = NS_NewHTMLAppletElement(..);
    break;
// etc for 73 more tags!
}

The problem here is that each "case" statement will result in a seperate set of compare, jump and call statements. This means lots of code bloat, in addition to saturating the instruction cache with code that won't be executed until the next time through this loop.

The solution here is to declare a data structure that maps tags to callbacks.

// callback which will create the content node
typedef nsresult (*contentCreatorCallback)(nsIHTMLContent**, nsINodeInfo*);

struct tagToCreator {
    nsHTMLTag mTag;
    contentCreatorCallback mCreateObject;
};

// this table is small! 8 bytes per entry, only 600 bytes
tagToCreator tagConstructors[] = {
    { eHTMLTag_a,         NS_NewHTMLAnchorElement },
    { eHTMLTag_applet,    NS_NewHTMLAppletElement },
    {...},
};

#define tagCreators (sizeof(tagToCreator)/sizeof(tagToCreator[0]))

// look how tight this loop is!
for (nsHTMLTag tag = 0; tag<tagCreators; tag++) {
    if (aNodeType == tagConstructors[tag].mTag) {
         rv = tagConstructors[tag].mCreateObject(...);
         break;
    }
}

Heap-based readonly tables

In trying to create a lookup table, it is common to make a static list of name/value pairs and insert them into a hashtable. The hashtable ends up taking up space on the heap and may contain duplicated values from the original static list.

For instance, this code taken from nsHTMLTags.h declares a static HTML tag table. Then, a hashtable is created in nsHTMLTags.cpp resulting in the hashtable containing pointers to each entry in the table, as well as seperate smaller objects for each entry.

This means there simply unnecessary initialization of a hashtable that will look exactly the same during each execution.

A better mechanism would be to generate the hashtable at compile time such that it is entirely declared as static data. This is a work in progress.

Using pointers to member classes

When a class contains other classes as member variables, it is often unnecessary to include those members as pointers to the member classes. Instead, include the member variable directly. This will save 4 bytes (one pointer) in the total size of your class.

For example, it is common to see nsHashtable* members in a class, when in fact the class could just include the hashtable directly as a member variable.

class nsFoo  {

public:
    // look at all this extra work we have to do!
    nsFoo() {
      mTags = new nsHashtable(32);
    }
    ~nsFoo() {
      delete mTags;
    }

private:
    nsHashtable* mTags;

};

Now, sizeof(nsFoo) may tell you that it is 4 bytes, but that doesn't include the size of the hashtable so the total size is sizeof(nsFoo) + sizeof(nsHashtable) == 16 bytes.

If instead you make mTags be a non-pointer member, you'll see that the code is simpler, and the object size is smaller:

class nsFoo  {

public:
    // very little work, and no need to remember call delete!
    nsFoo() : mTags(32) {
    }
    ~nsFoo() {
    }

private:
    nsHashtable mTags;

};

Here, sizeof(nsFoo) includes the size of the hashtable, and so size(nsFoo) == 12 bytes.

Also note that the constructors for the objects all happen as one atomic event, giving the compiler more flexibility when optimizing this initialization.