Firefox/Projects/Startup Time Improvements/joelr notes
Intro
I'm trying to figure out where Firefox startup time goes, up to the return from BrowserStartup (Javascript function). I'm also manipulating DTrace into telling me where time is going, without making any assumptions.
Current status
August 26, 2009
Suppose we sample at about 1Khz on a dual-core CPU and run an app for 10 seconds. An app that hogs the CPU should give us 20k samples, or 10k samples if it's pegged to a single core.
I coded this up in a cpu.d DTrace script and sampled Firefox. Apparently, Firefox only gets ~1700 samples on the CPU or 1.7s so its first 10 seconds of life are spent doing something else, e.g. disk IO.
We already know that Firefox is slow to start up but this type of sampling neatly points the finger in the right direction. Props to Brendan Gregg for teaching me how to fish!
What kind of IO does Firefox do upon startup, though? I find the output from iosnoop a bit confusing so I whipped up a version of my own that prints just disk access times, disk operation deltas, block size and file name accessed.
The script samples Firefox during the first 20 seconds of its life and the output from ./cold.sh io.d looks like this:
DELTA DTIME D SIZE PATHNAME 1399 1407 R 4096 ??/dtrace/libdtrace_dyld.dylib 1141 1145 R 8192 ??/dtrace/libdtrace_dyld.dylib 1301 1305 R 4096 ??/MacOS/XUL 1701 1705 R 4096 ??/MacOS/XUL 11081 685 R 4096 ??/MacOS/libmozjs.dylib 1989 2253 R 36864 ??/MacOS/libmozjs.dylib 1768 1894 R 24576 ??/MacOS/libmozjs.dylib 27111 2135 R 200704 ??/MacOS/XUL 1411 1134 R 8192 ??/MacOS/libxpcom.dylib 1784 1968 R 4096 ??/MacOS/libplds4.dylib 2383 464 R 4096 ??/MacOS/libxpcom.dylib 2886 456 R 4096 ??/MacOS/libxpcom.dylib 1431 732 R 4096 ??/MacOS/libplds4.dylib 2640 706 R 4096 ??/MacOS/libnspr4.dylib 3092 484 R 4096 ??/MacOS/libnspr4.dylib ...
I also have a version that rolls disk operations by file name and read/write op. It also samples during 20 seconds and outputs something like this:
DELTA DTIME SIZE D PATHNAME 307 311 4096 R ??/en_US.ISO8859-1/LC_TIME 437 441 4096 R ??/pref/firefox-branding.js 439 443 4096 R ??/su3x40q6.2/extensions.cache 441 445 4096 R ??/en_US.ISO8859-1/LC_NUMERIC 640 644 4096 R ??/en_US.ISO8859-1/LC_MONETARY 643 648 4096 R ??/chrome/browser.manifest 644 650 4096 R ??/la_LN.US-ASCII/LC_COLLATE 645 650 4096 R ??/Firefox/profiles.ini 646 650 4096 R ??/chrome/en-US.manifest ...
All timings are in microseconds and specific to my Apple 256Gb SSD.
There's something else that I discovered during my snooping around with io.d. Compare the first 20 seconds of the life of Safari and the life of Firefox.
Basically, all the shared libraries used by Safari are pulled in from the dynamic library shared cache. Safari uses mostly system frameworks, e.g.
DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 /Applications/Safari.app/Contents/MacOS/Safari dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGXCoreImage.A.dylib
whereas Firefox uses a whole lot more custom shared libraries, e.g.
DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 ./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2 dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowserdirprovider.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowsercomps.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera dyld: loaded: /Library/Internet Plug-Ins/QuickTime Plugin.plugin/Contents/MacOS/QuickTime Plugin dyld: loaded: /System/Library/QuickTime/QuickTimeComponents.component/Contents/MacOS/QuickTimeComponents dyld: loaded: /System/Library/Frameworks/AGL.framework/Versions/A/AGL dyld: loaded: /System/Library/PrivateFrameworks/CoreAUC.framework/Versions/A/CoreAUC dyld: unloaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib
What needs to be done to make Firefox use the dynamic library shared cache as aggressively as Safari?
Previous statuses
August 25, 2009
Created a static probe that fires first thing in XRE_main and updated my DTrace scripts to use it.
pid$target::function:entry probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by timestamp. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.
August 24, 2009
Startup time is measured from the entry to XRE_main to the return of the BrowserStartup JS function. It takes a good bit of time but nothing compared to the time elapsed from the start of Firefox to the call to XRE_main.
According to my static-init.d script, the static initialization time can be ignored. I'm recording the library name and then timing the following call to ImageLoader::runInitializers in dyld. The cumulative time is too small to be of essence, though.
0.000053548s for /System/Library/Frameworks/Carbon.framework/Frameworks/HIToolbox.framework/Versions/A/HIToolbox 0.000053732s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libsoftokn3.dylib 0.000069234s for /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut 0.000070455s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/libnssckbi.dylib 0.000072754s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowsercomps.dylib 0.000073443s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib 0.000074363s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib 0.000075845s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib 0.000076892s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib 0.000089767s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib 0.000094390s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib 0.000100462s for /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore 0.000115375s for /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox 0.001161267s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowserdirprovider.dylib --------------- = 0.002181527s Initialization: 42.986413480s Startup : 7.015292701s --------------- = 50.001706181s
Digging deeper...
August 21, 2009
My DTrace scripts live here. Use like this
sudo ./cold.sh static-init.d
DTrace tips and tricks
timestamp vs vtimestamp
vtimestamp measures CPU time of the current thread, excluding IO and DTrace overhead. timestamp can still be used for deltas but the goal is to use as few pid$target probes as possible as they affect timestamp when dtrace has to switch between kernel and userland. io and syscall providers are fast and run in the kernel.
Invalid address
Have you seen this kind of error before?
dtrace: error on enabled probe ID 27 (ID 22130: pid34547:libSystem.B.dylib:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28 dtrace: error on enabled probe ID 2 (ID 22782: pid34547:dyld:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28
More likely than not, you are using copyinstr on memory that hasn't been paged in yet. Try saving the pointer on entry and doing the copying on return or later.