Firefox/Projects/Startup Time Improvements/joelr notes

From MozillaWiki
Jump to navigation Jump to search

Intro

I'm trying to figure out where Firefox startup time goes, up to the return from BrowserStartup (Javascript function). I'm also manipulating DTrace into telling me where time is going, without making any assumptions.

Current status

August 27, 2009

The dyld shared cache lives in /var/db/dyld/. The two files of interest are dyld_shared_cache_i386.map (for x86-32) and shared_region_roots/Applications.paths. Both are regular text files. The former shows the contents of the shared cache for the i386 architecture and the latter is what update_dyld_shared_cache inspects.

There's no prebinding on newer versions of Mac OSX and the dyld shared cache is automatically updated as needed. Tracing Safari disk activity during startup reveals that basically all its dynamic libraries are pulled from the dyld shared cache.

It's possible to add Firefox (.../Firefox.app/Contents/MacOS/firefox-bin) to Applications.paths and the change will persist across reboots. Unfortunately, only a handful of libraries that Firefox uses are pulled into the cache by update_dyld_shared_cache. I'm speculating that this may have something to do with @executable_path/XUL and friends (otool -L .../firefox-bin).

Safari uses absolute paths to frameworks in /System/Library/Frameworks so I speculate that relative paths are what is preventing XUL and others from going into the cache. We may want to investigate /System/Library/Frameworks/Firefox.framework or similar.

Bug 513076.

It's possible to fix relative library paths in a given library, e.g. fix.sh XUL where fix.sh looks like this

#!/bin/bash 

function dylibs () {
  otool -L $1 |grep executable_path|awk '{print $1}'|cut -d"/" -f2
}

for i in `dylibs $1`
do
        install_name_tool -change @executable_path/$i `pwd`/$i $1
done

install_name_tool -id `pwd`/$1 $1

Firefox has to be recompiled with LDFLAGS=-header-pad_max_install_names in MOZCONFIG to make this happen since new library paths are greater than the space allocated in the Mach-O binary. See the man page for install_name_tool for details.

It's possible to force dynamic libraries into the cache by putting dynamic library paths into shared_region_roots/Applications.paths instead of executables. I wasn't successful in caching XUL, though, regardless of what I did.

In the end it doesn't seem to matter since there's a baffling lack of difference between Firefox and Safari cold stats, despite Safari pulling everything from the cache and Firefox using a large number of non-cached dylibs.

Here's the cold startup stats for Safari

sync && purge && DYLD_PRINT_STATISTICS=1 /Applications/Safari.app/Contents/MacOS/Safari
total time: 696.9 milliseconds (100.0%)
total images loaded:  116 (114 from dyld shared cache, 114 needed no fixups)
total segments mapped: 5, into 30 pages with 10 pages pre-fetched
total images loading time: 204.9 milliseconds (29.4%)
total rebase fixups:  1,298
total rebase fixups time: 0.1 milliseconds (0.0%)
total binding fixups: 2,476
total binding symbol lookups: 234, average images searched per symbol: 1.6
total binding fixups time: 80.5 milliseconds (11.5%)
total bindings lazily fixed up: 3 of 901
total init time time: 411.3 milliseconds (59.0%)
total images with weak exports:  1

and Firefox

total time: 731.2 milliseconds (100.0%)
total images loaded:  106 (93 from dyld shared cache, 56 needed no fixups)
total segments mapped: 49, into 5903 pages with 684 pages pre-fetched
total images loading time: 235.3 milliseconds (32.1%)
total rebase fixups:  149,011
total rebase fixups time: 3.7 milliseconds (0.5%)
total binding fixups: 24,932
total binding symbol lookups: 797, average images searched per symbol: 2.3
total binding fixups time: 149.9 milliseconds (20.5%)
total bindings lazily fixed up: 45 of 19,109
total init time time: 342.2 milliseconds (46.8%)
total images with weak exports:  3

Notice a large and significant difference? Me neither.

The other thing that I cannot explain at the moment is where the rest of the startup time goes, e.g.

./cold.sh startup.d
Total: 10001.723521ms

So it took less than 1 second to dynamically link Firefox but where did the other 9 seconds of startup go?

cold.sh is rather simple

#!/bin/bash

cmd="./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2"

sync && purge && dtrace -x dynvarsize=64m -x evaltime=exec -c "$cmd" -wZs $1

and the startup.d script doesn't do much either

#pragma D option quiet

BEGIN
{
 start = timestamp;
}

/* stop tracing here */

mozilla$target:::main-entry
{
 exit(0);
}

END
{
 this->total = timestamp - start;
 printf("Total: %u.%06ums\n", this->total / 1000000, this->total % 1000000);
}

I don't have an explanation yet but 9 seconds is a very large difference.

Previous statuses

August 26, 2009

Suppose we sample at about 1Khz on a dual-core CPU and run an app for 10 seconds. An app that hogs the CPU should give us 20k samples, or 10k samples if it's pegged to a single core.

I coded this up in a cpu.d DTrace script and sampled Firefox. Apparently, Firefox only gets ~1700 samples on the CPU or 1.7s so its first 10 seconds of life are spent doing something else, e.g. disk IO.

We already know that Firefox is slow to start up but this type of sampling neatly points the finger in the right direction. Props to Brendan Gregg for teaching me how to fish!

What kind of IO does Firefox do upon startup, though? I find the output from iosnoop a bit confusing so I whipped up a version of my own that prints just disk access times, disk operation deltas, block size and file name accessed.

The script samples Firefox during the first 20 seconds of its life and the output from ./cold.sh io.d looks like this:

DELTA      DTIME      D    SIZE PATHNAME
1399       1407       R    4096 ??/dtrace/libdtrace_dyld.dylib
1141       1145       R    8192 ??/dtrace/libdtrace_dyld.dylib
1301       1305       R    4096 ??/MacOS/XUL
1701       1705       R    4096 ??/MacOS/XUL
11081      685        R    4096 ??/MacOS/libmozjs.dylib
1989       2253       R   36864 ??/MacOS/libmozjs.dylib
1768       1894       R   24576 ??/MacOS/libmozjs.dylib
27111      2135       R  200704 ??/MacOS/XUL
1411       1134       R    8192 ??/MacOS/libxpcom.dylib
1784       1968       R    4096 ??/MacOS/libplds4.dylib
2383       464        R    4096 ??/MacOS/libxpcom.dylib
2886       456        R    4096 ??/MacOS/libxpcom.dylib
1431       732        R    4096 ??/MacOS/libplds4.dylib
2640       706        R    4096 ??/MacOS/libnspr4.dylib
3092       484        R    4096 ??/MacOS/libnspr4.dylib
...

I also have a version that rolls disk operations by file name and read/write op. It also samples during 20 seconds and outputs something like this:

DELTA      DTIME         SIZE D PATHNAME
307        311           4096 R ??/en_US.ISO8859-1/LC_TIME
437        441           4096 R ??/pref/firefox-branding.js
439        443           4096 R ??/su3x40q6.2/extensions.cache
441        445           4096 R ??/en_US.ISO8859-1/LC_NUMERIC
640        644           4096 R ??/en_US.ISO8859-1/LC_MONETARY
643        648           4096 R ??/chrome/browser.manifest
644        650           4096 R ??/la_LN.US-ASCII/LC_COLLATE
645        650           4096 R ??/Firefox/profiles.ini
646        650           4096 R ??/chrome/en-US.manifest
...

All timings are in microseconds and specific to my Apple 256Gb SSD.

There's something else that I discovered during my snooping around with io.d. Compare the first 20 seconds of the life of Safari and the life of Firefox.

Basically, all the shared libraries used by Safari are pulled in from the dynamic library shared cache. Safari uses mostly system frameworks, e.g.

DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 /Applications/Safari.app/Contents/MacOS/Safari 

dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera
dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGXCoreImage.A.dylib

whereas Firefox uses a whole lot more custom shared libraries, e.g.

DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 ./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2

dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera
dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
dyld: loaded: 
/System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib

dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib
dyld: loaded: /Library/Internet Plug-Ins/QuickTime Plugin.plugin/Contents/MacOS/QuickTime Plugin
dyld: loaded: /System/Library/QuickTime/QuickTimeComponents.component/Contents/MacOS/QuickTimeComponents
dyld: loaded: /System/Library/Frameworks/AGL.framework/Versions/A/AGL
dyld: loaded: /System/Library/PrivateFrameworks/CoreAUC.framework/Versions/A/CoreAUC

dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowserdirprovider.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowsercomps.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib
dyld: unloaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib

I separated the list of libraries loaded by Firefox into three groups: same libraries as Safari, other system frameworks, custom libraries.

What needs to be done to make Firefox use the dynamic library shared cache as aggressively as Safari?

Another puzzling bit involves the difference between Safari libraries loaded by dyld (above) and the list of libraries that Safari is linked against, e.g.

otool -L  /Applications/Safari.app/Contents/MacOS/Safari 

	/System/Library/PrivateFrameworks/CrashReporterSupport.framework/Versions/A/CrashReporterSupport
	/System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
	/System/Library/Frameworks/Carbon.framework/Versions/A/Carbon
	/System/Library/Frameworks/WebKit.framework/Versions/A/WebKit
	/System/Library/Frameworks/AddressBook.framework/Versions/A/AddressBook
	/System/Library/Frameworks/Security.framework/Versions/A/Security
	/System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration
	/System/Library/Frameworks/SecurityFoundation.framework/Versions/A/SecurityFoundation
	/System/Library/Frameworks/SecurityInterface.framework/Versions/A/SecurityInterface
	/System/Library/PrivateFrameworks/SyndicationUI.framework/Versions/A/SyndicationUI
	/System/Library/Frameworks/JavaScriptCore.framework/Versions/A/JavaScriptCore
	/System/Library/Frameworks/IOKit.framework/Versions/A/IOKit
	/System/Library/Frameworks/Quartz.framework/Versions/A/Quartz
	/System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL
	/usr/lib/libicucore.A.dylib
	/usr/lib/libstdc++.6.dylib
	/usr/lib/libgcc_s.1.dylib
	/usr/lib/libSystem.B.dylib
	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices
	/usr/lib/libobjc.A.dylib
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
	/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/ApplicationServices
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
	/System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore

Does Safari lazily loading some frameworks? Why do I see OpenGL initializers run when I set DYLD_PRINT_INITIALIZERS then? The OpenGL framework does not appear to be loaded when Safari starts (dyld output above).

For the sake of comparing, here's the list of libraries that Firefox is linked against

otool -L ./Minefield.app/Contents/MacOS/firefox-bin./Minefield.app/Contents/MacOS/firefox-bin:

	/System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
	@executable_path/XUL
	@executable_path/libmozjs.dylib
	@executable_path/libxpcom.dylib
	@executable_path/libplds4.dylib
	@executable_path/libplc4.dylib
	@executable_path/libnspr4.dylib
	/System/Library/Frameworks/Carbon.framework/Versions/A/Carbon
	/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio
	/System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox
	/System/Library/Frameworks/AudioUnit.framework/Versions/A/AudioUnit
	/System/Library/Frameworks/AddressBook.framework/Versions/A/AddressBook
	/usr/lib/libstdc++.6.dylib 
	/usr/lib/libgcc_s.1.dylib
	/usr/lib/libSystem.B.dylib

as well as the list of libraries for XUL

otool -L ./Minefield.app/Contents/MacOS/XUL

	@executable_path/XUL 
	/System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
	@executable_path/libsqlite3.dylib
	@executable_path/libmozjs.dylib
	@executable_path/libsmime3.dylib
	@executable_path/libssl3.dylib
	@executable_path/libnss3.dylib
	@executable_path/libnssutil3.dylib
	@executable_path/libsoftokn3.dylib 
	/usr/lib/libcups.2.dylib
	@executable_path/libplds4.dylib
	@executable_path/libplc4.dylib
	@executable_path/libnspr4.dylib
	/System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration
	/System/Library/Frameworks/QuickTime.framework/Versions/A/QuickTime
	/System/Library/Frameworks/IOKit.framework/Versions/A/IOKit
	/usr/lib/libcrypto.0.9.7.dylib
	/System/Library/Frameworks/Carbon.framework/Versions/A/Carbon 
	/System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio
	/System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox
	/System/Library/Frameworks/AudioUnit.framework/Versions/A/AudioUnit
	/System/Library/Frameworks/AddressBook.framework/Versions/A/AddressBook
	/usr/lib/libstdc++.6.dylib
	/usr/lib/libgcc_s.1.dylib
	/usr/lib/libSystem.B.dylib
	/System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices
	/usr/lib/libobjc.A.dylib
	/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
	/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
	/System/Library/Frameworks/ApplicationServices.framework/Versions/A/ApplicationServices
	/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation

It turns out that DYLD_PRINT_LIBRARIES=1 is the way to show libraries loaded by dyld before a program's entry point (main) whereas DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 will show libraries loaded afterwards, e.g. using dlopen or NSBundle.

August 25, 2009

Created a static probe that fires first thing in XRE_main and updated my DTrace scripts to use it.

pid$target::function:entry probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by timestamp. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.

August 24, 2009

Startup time is measured from the entry to XRE_main to the return of the BrowserStartup JS function. It takes a good bit of time but nothing compared to the time elapsed from the start of Firefox to the call to XRE_main.

According to my static-init.d script, the static initialization time can be ignored. I'm recording the library name and then timing the following call to ImageLoader::runInitializers in dyld. The cumulative time is too small to be of essence, though.

0.000053548s for /System/Library/Frameworks/Carbon.framework/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0.000053732s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libsoftokn3.dylib
0.000069234s for /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
0.000070455s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/libnssckbi.dylib
0.000072754s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowsercomps.dylib
0.000073443s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib
0.000074363s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
0.000075845s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
0.000076892s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib
0.000089767s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib
0.000094390s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib
0.000100462s for /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore
0.000115375s for /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0.001161267s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowserdirprovider.dylib
---------------
= 0.002181527s

Initialization: 42.986413480s
Startup       : 7.015292701s
---------------
= 50.001706181s

Digging deeper...

August 21, 2009

Blogged.

My DTrace scripts live here. Use like this

sudo ./cold.sh static-init.d

DTrace tips and tricks

Combining scripts

I find it convenient to have separate DTrace scripts for various exit probes. This way I can avoid editing the main script whenever I want to stop sampling once 10 seconds go by or XRE_main is entered or the BrowserReturn JavaScript function returns.

It can be done like this

#!/bin/bash 

cmd="./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2"
scripts=""

for i in $*; do scripts="$scripts -s $i"; done

sync && purge && dtrace -x dynvarsize=64m -x evaltime=exec -c "$cmd" -wZ $scripts

and then

./cold.sh startup.d stop-main.d 
Total: 11001.481323ms

or

./cold.sh startup.d stop-js.d 
Total: 7004.799264ms

My scripts can be found on GitHub.

No sudo

sudo chmod u+s `which dtrace`

You can now run dtrace as yourself. Enjoy!

timestamp vs vtimestamp

vtimestamp measures CPU time of the current thread, excluding IO and DTrace overhead. timestamp can still be used for deltas but the goal is to use as few pid$target probes as possible as they affect timestamp when dtrace has to switch between kernel and userland. io and syscall providers are fast and run in the kernel.

Invalid address

Have you seen this kind of error before?

dtrace: error on enabled probe ID 27 (ID 22130: pid34547:libSystem.B.dylib:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28
dtrace: error on enabled probe ID 2 (ID 22782: pid34547:dyld:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28

More likely than not, you are using copyinstr on memory that hasn't been paged in yet. Try saving the pointer on entry and doing the copying on return or later. Italic text