Linux/Compiler Options
Overview
Proper use of compiler options
Don't set a default optimization level for the entire browser build.
Different parts of Mozilla run faster at different optimization levels. For example, cairo, pixman and sqlite are compiled at -O2 because they are fastest at that level while the JS engine is fastest at -Os. [1] If you want to use --enable-optimize, don't add extra optimization flags there. That's a global setting that sets optimization levels throughout the source tree. Instead pass non-optimization flags that you care about via CFLAGS and CXXFLAGS during the build.
Please use the right optimization flags for your compiler.
Please see the chart below for information on what compiler flags you should be using. It depends on what compiler you have.
Compilers
Notes from dwitte on gcc 4.3 vs. 4.1.2. [2] Also see the original post about possible ways to make gcc 4.1.2 faster as well by using -Os and -finline-limit.
gcc 4.1.2 notes
it turns out that gcc 4.1.2 on linux, at our default optimization setting "-Os -freorder-blocks -fno-reorder-functions", avoids inlining even trivial functions (where the cost of doing so is less than even the fncall overhead). this is bad news for things like nsTArray, nsCOMPtr etc, which can result in many layers of wrapper calls if not inlined sensibly. gcc has an option to control inlining, "-finline-limit=n", which will (roughly) inline functions up to length n pseudo-instructions. to give some sense for numbers, the default value of n at -O2 is 600. i ran some tests and found that with our current settings and -finline-limit=50 on a 32-bit linux build, which is enough to inline trivial (one or two line) wrapper methods but no more, we can get a codesize saving of 225kb (2%), a Ts win of 3%, a Txul win of 18%, and a Tp2 win of about 25% (!). i also compared this to plain -O2: Txul is unchanged, Ts improves 3%, and Tp2 improves about 4%. however, codesize jumps 2,414kb (19%). maybe we can increase the inline limit at -Os to get back a bit of this perf, without exploding codesize. (we originally moved from -O2 to -Os on gcc 3.x, because it gave a huge codesize win and also a perf win of a few percent on Ts, Txul, and Tp. so, it seems gcc4.x behaves quite differently.)
gcc 4.3 notes
i've tested gcc 4.3 a bit. to summarize, it looks like this pathological -Os behavior is specific to 4.1 branch, and possibly just 4.1.2. also, there are some substantial perf and codesize wins to be had with gcc 4.3. gory details: tested with gcc 4.3 (20080104 pull). "stock configuration" is "-Os -freorder-blocks -fno-reorder-functions". some Tp2 numbers: baseline: gcc 4.3, stock: 142.78 ms stock, with -finline-limit=50: 146.89 ms (+2.9%) -O2: 131.56 ms (-7.9%) for comparison with previous results (comment 0): gcc 4.1.2, stock: 199 ms (+39%) stock, with -finline-limit=50: 149.33 ms (+4.6%) -O2: 142.67 ms (even) |size libxul.so| gcc 4.3, stock: 12,387kb stock, with -finline-limit=50: 12,325kb (-62kb) -O2: 15,061kb (+2,674kb) gcc 4.1.2, stock: 13,249kb (+862kb) stock, with -finline-limit=50: 13,025kb (+638kb) -O2: 15,440kb (+3,053kb) a few points from this data: 1) -Os is very sane on 4.3 by default. 2) on 4.3, relative to -Os, -O2 has improved a lot (8% Tp win, although at a 2.7Mb codesize cost). 3) 4.3 is 5 - 8% faster on Tp2 than 4.1.2, depending on -Os/-O2. 4) 4.3 gives an 400-800k codesize saving over 4.1.2. 3 & 4) are probably the same thing - a result of the hidden visibility propagation improvements introduced in gcc 4.2. these are a major win for us.
Distributions
Name |
GCC Version |
Last Build |
---|---|---|
Ubuntu 7.10 |
gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2) |
2.0.0.11+2nobinonly-0ubuntu0.7.10 (2008-01-07) |
gcc flags |
||
Fedora 8 |
gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) |
firefox-2.0.0.10-3.fc8 (2008-01-04) |
gcc flags |
||
CentOS 5.1 |
gcc version 4.1.2 20070626 (Red Hat 4.1.2-14) |
firefox-1.5.0.12-7.el5.centos (2008-01-07) |
gcc flags |