Firefox/Projects/Startup Time Improvements/joelr notes

From MozillaWiki
Jump to navigation Jump to search

Intro

I'm trying to figure out where Firefox startup time goes, up to the return from BrowserStartup (Javascript function). I'm also manipulating DTrace into telling me where time is going, without making any assumptions.

Current status

August 26, 2009

Suppose we sample at about 1Khz on a dual-core CPU and run an app for 10 seconds. An app that hogs the CPU should give us 20k samples, or 10k samples if it's pegged to a single core.

I coded this up in a cpu.d DTrace script and sampled Firefox. Apparently, Firefox only gets ~1700 samples on the CPU or 1.7s so its first 10 seconds of life are spent doing something else, e.g. disk IO.

We already know that Firefox is slow to start up but this type of sampling neatly points the finger in the right direction. Props to Brendan Gregg for teaching me how to fish!

What kind of IO does Firefox do upon startup, though? I find the output from iosnoop a bit confusing so I whipped up a version of my own that prints just disk access times, disk operation deltas, block size and file name accessed.

The script samples Firefox during the first 20 seconds of its life and the output from ./cold.sh io.d looks like this:

DELTA      DTIME      D    SIZE PATHNAME
1399       1407       R    4096 ??/dtrace/libdtrace_dyld.dylib
1141       1145       R    8192 ??/dtrace/libdtrace_dyld.dylib
1301       1305       R    4096 ??/MacOS/XUL
1701       1705       R    4096 ??/MacOS/XUL
11081      685        R    4096 ??/MacOS/libmozjs.dylib
1989       2253       R   36864 ??/MacOS/libmozjs.dylib
1768       1894       R   24576 ??/MacOS/libmozjs.dylib
27111      2135       R  200704 ??/MacOS/XUL
1411       1134       R    8192 ??/MacOS/libxpcom.dylib
1784       1968       R    4096 ??/MacOS/libplds4.dylib
2383       464        R    4096 ??/MacOS/libxpcom.dylib
2886       456        R    4096 ??/MacOS/libxpcom.dylib
1431       732        R    4096 ??/MacOS/libplds4.dylib
2640       706        R    4096 ??/MacOS/libnspr4.dylib
3092       484        R    4096 ??/MacOS/libnspr4.dylib
...

I also have a version that rolls disk operations by file name and read/write op. It also samples during 20 seconds and outputs something like this:

DELTA      DTIME         SIZE D PATHNAME
307        311           4096 R ??/en_US.ISO8859-1/LC_TIME
437        441           4096 R ??/pref/firefox-branding.js
439        443           4096 R ??/su3x40q6.2/extensions.cache
441        445           4096 R ??/en_US.ISO8859-1/LC_NUMERIC
640        644           4096 R ??/en_US.ISO8859-1/LC_MONETARY
643        648           4096 R ??/chrome/browser.manifest
644        650           4096 R ??/la_LN.US-ASCII/LC_COLLATE
645        650           4096 R ??/Firefox/profiles.ini
646        650           4096 R ??/chrome/en-US.manifest
...

All timings are in microseconds and specific to my Apple 256Gb SSD.

There's something else that I discovered during my snooping around with io.d. Compare the first 20 seconds of the life of Safari and the life of Firefox.

Basically, all the shared libraries used by Safari are pulled in from the dynamic library shared cache. Safari uses mostly system frameworks, e.g.

DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 /Applications/Safari.app/Contents/MacOS/Safari 

dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera
dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGXCoreImage.A.dylib

whereas Firefox uses a whole lot more custom shared libraries, e.g.

DYLD_PRINT_LIBRARIES_POST_LAUNCH=1 ./Minefield.app/Contents/MacOS/firefox-bin -no-remote -foreground -P 2
dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowserdirprovider.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/components/libbrowsercomps.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib
dyld: loaded: /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
dyld: loaded: /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
dyld: loaded: /System/Library/PrivateFrameworks/HelpData.framework/Versions/A/HelpData
dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib
dyld: loaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib
dyld: loaded: /System/Library/CoreServices/RawCamera.bundle/Contents/MacOS/RawCamera
dyld: loaded: /Library/Internet Plug-Ins/QuickTime Plugin.plugin/Contents/MacOS/QuickTime Plugin
dyld: loaded: /System/Library/QuickTime/QuickTimeComponents.component/Contents/MacOS/QuickTimeComponents
dyld: loaded: /System/Library/Frameworks/AGL.framework/Versions/A/AGL
dyld: loaded: /System/Library/PrivateFrameworks/CoreAUC.framework/Versions/A/CoreAUC
dyld: unloaded: /Users/joelr/Work/mozilla/startup/Minefield.app/Contents/MacOS/libnssckbi.dylib

What needs to be done to make Firefox use the dynamic library shared cache as aggressively as Safari?

Previous statuses

August 25, 2009

Created a static probe that fires first thing in XRE_main and updated my DTrace scripts to use it.

pid$target::function:entry probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by timestamp. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.

August 24, 2009

Startup time is measured from the entry to XRE_main to the return of the BrowserStartup JS function. It takes a good bit of time but nothing compared to the time elapsed from the start of Firefox to the call to XRE_main.

According to my static-init.d script, the static initialization time can be ignored. I'm recording the library name and then timing the following call to ImageLoader::runInitializers in dyld. The cumulative time is too small to be of essence, though.

0.000053548s for /System/Library/Frameworks/Carbon.framework/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0.000053732s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libsoftokn3.dylib
0.000069234s for /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
0.000070455s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/libnssckbi.dylib
0.000072754s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowsercomps.dylib
0.000073443s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib
0.000074363s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
0.000075845s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
0.000076892s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib
0.000089767s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib
0.000094390s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib
0.000100462s for /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore
0.000115375s for /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0.001161267s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowserdirprovider.dylib
---------------
= 0.002181527s

Initialization: 42.986413480s
Startup       : 7.015292701s
---------------
= 50.001706181s

Digging deeper...

August 21, 2009

Blogged.

My DTrace scripts live here. Use like this

sudo ./cold.sh static-init.d

DTrace tips and tricks

timestamp vs vtimestamp

vtimestamp measures CPU time of the current thread, excluding IO and DTrace overhead. timestamp can still be used for deltas but the goal is to use as few pid$target probes as possible as they affect timestamp when dtrace has to switch between kernel and userland. io and syscall providers are fast and run in the kernel.

Invalid address

Have you seen this kind of error before?

dtrace: error on enabled probe ID 27 (ID 22130: pid34547:libSystem.B.dylib:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28
dtrace: error on enabled probe ID 2 (ID 22782: pid34547:dyld:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28

More likely than not, you are using copyinstr on memory that hasn't been paged in yet. Try saving the pointer on entry and doing the copying on return or later.