Firefox/Projects/Startup Time Improvements/joelr notes
Intro
I'm trying to figure out where Firefox startup time goes, up to the return from BrowserStartup (Javascript function). I'm also manipulating DTrace into telling me where time is going, without making any assumptions.
Current status
August 27, 2009
The dyld shared cache lives in /var/db/dyld/. The two files of interest are dyld_shared_cache_i386.map (for x86-32) and shared_region_roots/Applications.paths. Both are regular text files. The former shows the contents of the shared cache for the i386 architecture and the latter is what update_dyld_shared_cache inspects.
There's no prebinding on newer versions of Mac OSX and the dyld shared cache is automatically updated as needed. Tracing Safari disk activity during startup reveals that basically all its dynamic libraries are pulled from the dyld shared cache.
It's possible to add Firefox (.../Firefox.app/Contents/MacOS/firefox-bin) to Applications.paths and the change will persist across reboots. Unfortunately, only a handful of libraries that Firefox uses are pulled into the cache by update_dyld_shared_cache. I'm speculating that this may have something to do with @executable_path/XUL and friends (otool -L .../firefox-bin).
Safari uses absolute paths to frameworks in /System/Library/Frameworks so I speculate that relative paths are what is preventing XUL and others from going into the cache. We may want to investigate /System/Library/Frameworks/Firefox.framework or similar.
It's possible to fix relative library paths in a given library, e.g. fix.sh XUL where fix.sh looks like this
#!/bin/bash function dylibs () { otool -L $1 |grep executable_path|awk '{print $1}'|cut -d"/" -f2 } for i in `dylibs $1` do install_name_tool -change @executable_path/$i `pwd`/$i $1 done install_name_tool -id `pwd`/$1 $1
Firefox has to be recompiled with LDFLAGS=-header-pad_max_install_names in MOZCONFIG to make this happen since new library paths are greater than the space allocated in the Mach-O binary. See the man page for install_name_tool for details.
It's possible to force dynamic libraries into the cache by putting dynamic library paths into shared_region_roots/Applications.paths instead of executables. I wasn't successful in caching XUL, though, regardless of what I did.
In the end it doesn't seem to matter since there's a baffling lack of difference between Firefox and Safari cold stats, despite Safari pulling everything from the cache and Firefox using a large number of non-cached dylibs.
Here's the cold startup stats for Safari
sync && purge && DYLD_PRINT_STATISTICS=1 /Applications/Safari.app/Contents/MacOS/Safari total time: 696.9 milliseconds (100.0%) total images loaded: 116 (114 from dyld shared cache, 114 needed no fixups) total segments mapped: 5, into 30 pages with 10 pages pre-fetched total images loading time: 204.9 milliseconds (29.4%) total rebase fixups: 1,298 total rebase fixups time: 0.1 milliseconds (0.0%) total binding fixups: 2,476 total binding symbol lookups: 234, average images searched per symbol: 1.6 total binding fixups time: 80.5 milliseconds (11.5%) total bindings lazily fixed up: 3 of 901 total init time time: 411.3 milliseconds (59.0%) total images with weak exports: 1
and Firefox
total time: 731.2 milliseconds (100.0%) total images loaded: 106 (93 from dyld shared cache, 56 needed no fixups) total segments mapped: 49, into 5903 pages with 684 pages pre-fetched total images loading time: 235.3 milliseconds (32.1%) total rebase fixups: 149,011 total rebase fixups time: 3.7 milliseconds (0.5%) total binding fixups: 24,932 total binding symbol lookups: 797, average images searched per symbol: 2.3 total binding fixups time: 149.9 milliseconds (20.5%) total bindings lazily fixed up: 45 of 19,109 total init time time: 342.2 milliseconds (46.8%) total images with weak exports: 3
Notice a large and significant difference? Me neither.
The other thing that I cannot explain at the moment is where the rest of the startup time goes, e.g.
./cold.sh startup.d Total: 10001.723521ms
So it took less than 1 second to dynamically link Firefox but where did the other 9 seconds of startup go?
cold.sh is rather simple
#!/bin/bash cmd="./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2" sync && purge && dtrace -x dynvarsize=64m -x evaltime=exec -c "$cmd" -wZs $1
and the startup.d script doesn't do much either
#pragma D option quiet BEGIN { start = timestamp; } /* stop tracing here */ mozilla$target:::main-entry { exit(0); } END { this->total = timestamp - start; printf("Total: %u.%06ums\n", this->total / 1000000, this->total % 1000000); }
I don't have an explanation yet but 9 seconds is a very large difference.
Previous statuses
August 26, 2009
Suppose we sample at about 1Khz on a dual-core CPU and run an app for 10 seconds. An app that hogs the CPU should give us 20k samples, or 10k samples if it's pegged to a single core.
I coded this up in a cpu.d DTrace script and sampled Firefox. Apparently, Firefox only gets ~1700 samples on the CPU or 1.7s so its first 10 seconds of life are spent doing something else, e.g. disk IO.
We already know that Firefox is slow to start up but this type of sampling neatly points the finger in the right direction. Props to Brendan Gregg for teaching me how to fish!
What kind of IO does Firefox do upon startup, though? I find the output from iosnoop a bit confusing so I whipped up a version of my own that prints just disk access times, disk operation deltas, block size and file name accessed.
The script samples Firefox during the first 20 seconds of its life and the output from ./cold.sh io.d looks like this:
DELTA DTIME D SIZE PATHNAME 1399 1407 R 4096 ??/dtrace/libdtrace_dyld.dylib 1141 1145 R 8192 ??/dtrace/libdtrace_dyld.dylib 1301 1305 R 4096 ??/MacOS/XUL 1701 1705 R 4096 ??/MacOS/XUL 11081 685 R 4096 ??/MacOS/libmozjs.dylib 1989 2253 R 36864 ??/MacOS/libmozjs.dylib 1768 1894 R 24576 ??/MacOS/libmozjs.dylib 27111 2135 R 200704 ??/MacOS/XUL 1411 1134 R 8192 ??/MacOS/libxpcom.dylib 1784 1968 R 4096 ??/MacOS/libplds4.dylib 2383 464 R 4096 ??/MacOS/libxpcom.dylib 2886 456 R 4096 ??/MacOS/libxpcom.dylib 1431 732 R 4096 ??/MacOS/libplds4.dylib 2640 706 R 4096 ??/MacOS/libnspr4.dylib 3092 484 R 4096 ??/MacOS/libnspr4.dylib ...
I also have a version that rolls disk operations by file name and read/write op. It also samples during 20 seconds and outputs something like this:
DELTA DTIME SIZE D PATHNAME 307 311 4096 R ??/en_US.ISO8859-1/LC_TIME 437 441 4096 R ??/pref/firefox-branding.js 439 443 4096 R ??/su3x40q6.2/extensions.cache 441 445 4096 R ??/en_US.ISO8859-1/LC_NUMERIC 640 644 4096 R ??/en_US.ISO8859-1/LC_MONETARY 643 648 4096 R ??/chrome/browser.manifest 644 650 4096 R ??/la_LN.US-ASCII/LC_COLLATE 645 650 4096 R ??/Firefox/profiles.ini 646 650 4096 R ??/chrome/en-US.manifest ...
All timings are in microseconds and specific to my Apple 256Gb SSD.
There's something else that I discovered during my snooping around with io.d. Compare the first 20 seconds of the life of Safari and the life of Firefox.
Basically, all the shared libraries used by Safari are pulled in from the dynamic library shared cache. Safari uses mostly system frameworks, e.g.
DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 /Applications/Safari.app/Contents/MacOS/Safari dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGXCoreImage.A.dylib
whereas Firefox uses a whole lot more custom shared libraries, e.g.
DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 ./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2 dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib dyld: loaded: /Library/Internet Plug-Ins/QuickTime Plugin.plugin/Contents/MacOS/QuickTime Plugin dyld: loaded: /System/Library/QuickTime/QuickTimeComponents.component/Contents/MacOS/QuickTimeComponents dyld: loaded: /System/Library/Frameworks/AGL.framework/Versions/A/AGL dyld: loaded: /System/Library/PrivateFrameworks/CoreAUC.framework/Versions/A/CoreAUC dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowserdirprovider.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowsercomps.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib dyld: unloaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib
I separated the list of libraries loaded by Firefox into three groups: same libraries as Safari, other system frameworks, custom libraries.
What needs to be done to make Firefox use the dynamic library shared cache as aggressively as Safari?
Another puzzling bit involves the difference between Safari libraries loaded by dyld (above) and the list of libraries that Safari is linked against, e.g.
otool -L /Applications/Safari.app/Contents/MacOS/Safari /System/Library/PrivateFrameworks/CrashReporterSupport.framework/Versions/A/CrashReporterSupport /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon /System/Library/Frameworks/WebKit.framework/Versions/A/WebKit /System/Library/Frameworks/AddressBook.framework/Versions/A/AddressBook /System/Library/Frameworks/Security.framework/Versions/A/Security /System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration /System/Library/Frameworks/SecurityFoundation.framework/Versions/A/SecurityFoundation /System/Library/Frameworks/SecurityInterface.framework/Versions/A/SecurityInterface /System/Library/PrivateFrameworks/SyndicationUI.framework/Versions/A/SyndicationUI /System/Library/Frameworks/JavaScriptCore.framework/Versions/A/JavaScriptCore /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit /System/Library/Frameworks/Quartz.framework/Versions/A/Quartz /System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL /usr/lib/libicucore.A.dylib /usr/lib/libstdc++.6.dylib /usr/lib/libgcc_s.1.dylib /usr/lib/libSystem.B.dylib /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices /usr/lib/libobjc.A.dylib /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit /System/Library/Frameworks/ApplicationServices.framework/Versions/A/ApplicationServices /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore
Does Safari lazily loading some frameworks? Why do I see OpenGL initializers run when I set DYLD_PRINT_INITIALIZERS then? The OpenGL framework does not appear to be loaded when Safari starts (dyld output above).
For the sake of comparing, here's the list of libraries that Firefox is linked against
otool -L ./Minefield.app/Contents/MacOS/firefox-bin./Minefield.app/Contents/MacOS/firefox-bin: /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa @executable_path/XUL @executable_path/libmozjs.dylib @executable_path/libxpcom.dylib @executable_path/libplds4.dylib @executable_path/libplc4.dylib @executable_path/libnspr4.dylib /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio /System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox /System/Library/Frameworks/AudioUnit.framework/Versions/A/AudioUnit /System/Library/Frameworks/AddressBook.framework/Versions/A/AddressBook /usr/lib/libstdc++.6.dylib /usr/lib/libgcc_s.1.dylib /usr/lib/libSystem.B.dylib
as well as the list of libraries for XUL
otool -L ./Minefield.app/Contents/MacOS/XUL @executable_path/XUL /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa @executable_path/libsqlite3.dylib @executable_path/libmozjs.dylib @executable_path/libsmime3.dylib @executable_path/libssl3.dylib @executable_path/libnss3.dylib @executable_path/libnssutil3.dylib @executable_path/libsoftokn3.dylib /usr/lib/libcups.2.dylib @executable_path/libplds4.dylib @executable_path/libplc4.dylib @executable_path/libnspr4.dylib /System/Library/Frameworks/SystemConfiguration.framework/Versions/A/SystemConfiguration /System/Library/Frameworks/QuickTime.framework/Versions/A/QuickTime /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit /usr/lib/libcrypto.0.9.7.dylib /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon /System/Library/Frameworks/CoreAudio.framework/Versions/A/CoreAudio /System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox /System/Library/Frameworks/AudioUnit.framework/Versions/A/AudioUnit /System/Library/Frameworks/AddressBook.framework/Versions/A/AddressBook /usr/lib/libstdc++.6.dylib /usr/lib/libgcc_s.1.dylib /usr/lib/libSystem.B.dylib /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices /usr/lib/libobjc.A.dylib /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit /System/Library/Frameworks/ApplicationServices.framework/Versions/A/ApplicationServices /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
It turns out that DYLD_PRINT_LIBRARIES=1 is the way to show libraries loaded by dyld before a program's entry point (main) whereas DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 will show libraries loaded afterwards, e.g. using dlopen or NSBundle.
August 25, 2009
Created a static probe that fires first thing in XRE_main and updated my DTrace scripts to use it.
pid$target::function:entry probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by timestamp. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.
August 24, 2009
Startup time is measured from the entry to XRE_main to the return of the BrowserStartup JS function. It takes a good bit of time but nothing compared to the time elapsed from the start of Firefox to the call to XRE_main.
According to my static-init.d script, the static initialization time can be ignored. I'm recording the library name and then timing the following call to ImageLoader::runInitializers in dyld. The cumulative time is too small to be of essence, though.
0.000053548s for /System/Library/Frameworks/Carbon.framework/Frameworks/HIToolbox.framework/Versions/A/HIToolbox 0.000053732s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libsoftokn3.dylib 0.000069234s for /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut 0.000070455s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/libnssckbi.dylib 0.000072754s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowsercomps.dylib 0.000073443s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib 0.000074363s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib 0.000075845s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib 0.000076892s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib 0.000089767s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib 0.000094390s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib 0.000100462s for /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore 0.000115375s for /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox 0.001161267s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowserdirprovider.dylib --------------- = 0.002181527s Initialization: 42.986413480s Startup : 7.015292701s --------------- = 50.001706181s
Digging deeper...
August 21, 2009
My DTrace scripts live here. Use like this
sudo ./cold.sh static-init.d
DTrace tips and tricks
Combining scripts
I find it convenient to have separate DTrace scripts for various exit probes. This way I can avoid editing the main script whenever I want to stop sampling once 10 seconds go by or XRE_main is entered or the BrowserReturn JavaScript function returns.
It can be done like this
#!/bin/bash cmd="./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2" scripts="" for i in $*; do scripts="$scripts -s $i"; done sync && purge && dtrace -x dynvarsize=64m -x evaltime=exec -c "$cmd" -wZ $scripts
and then
./cold.sh startup.d stop-main.d Total: 11001.481323ms
or
./cold.sh startup.d stop-js.d Total: 7004.799264ms
My scripts can be found on GitHub.
No sudo
sudo chmod u+s `which dtrace`
You can now run dtrace as yourself. Enjoy!
timestamp vs vtimestamp
vtimestamp measures CPU time of the current thread, excluding IO and DTrace overhead. timestamp can still be used for deltas but the goal is to use as few pid$target probes as possible as they affect timestamp when dtrace has to switch between kernel and userland. io and syscall providers are fast and run in the kernel.
Invalid address
Have you seen this kind of error before?
dtrace: error on enabled probe ID 27 (ID 22130: pid34547:libSystem.B.dylib:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28 dtrace: error on enabled probe ID 2 (ID 22782: pid34547:dyld:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28
More likely than not, you are using copyinstr on memory that hasn't been paged in yet. Try saving the pointer on entry and doing the copying on return or later. Italic text