Firefox/Projects/Startup Time Improvements/joelr notes: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 4: Line 4:


= Current status =
= Current status =
=== August 26, 2009 ===
Suppose we sample at about 1Khz on a dual-core CPU and run an app for 10 seconds. An app that hogs the CPU should give us 20k samples, or 10k samples if it's pegged to a single core.
I coded this up in a [http://github.com/wagerlabs/firefox-startup/blob/0daf8a950b91e8d9c24187d54b04fb51f2764490/cpu.d cpu.d] DTrace script and sampled Firefox. Apparently, Firefox only gets ~1700 samples on the CPU or 1.7s so its first 10 seconds of life are spent doing something else, e.g. disk IO.
We already know that Firefox is slow to start up but this type of sampling neatly points the finger in the right direction. Props to Brendan Gregg for teaching me to fish!
= Previous statuses =


=== August 25, 2009 ===
=== August 25, 2009 ===
Line 10: Line 20:


''pid$target::function:entry'' probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by ''timestamp''. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.  
''pid$target::function:entry'' probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by ''timestamp''. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.  
= Previous statuses =


=== August 24, 2009 ===
=== August 24, 2009 ===

Revision as of 16:30, 26 August 2009

Intro

I'm trying to figure out where Firefox startup time goes, up to the return from BrowserStartup (Javascript function). I'm also manipulating DTrace into telling me where time is going, without making any assumptions.

Current status

August 26, 2009

Suppose we sample at about 1Khz on a dual-core CPU and run an app for 10 seconds. An app that hogs the CPU should give us 20k samples, or 10k samples if it's pegged to a single core.

I coded this up in a cpu.d DTrace script and sampled Firefox. Apparently, Firefox only gets ~1700 samples on the CPU or 1.7s so its first 10 seconds of life are spent doing something else, e.g. disk IO.

We already know that Firefox is slow to start up but this type of sampling neatly points the finger in the right direction. Props to Brendan Gregg for teaching me to fish!

Previous statuses

August 25, 2009

Created a static probe that fires first thing in XRE_main and updated my DTrace scripts to use it.

pid$target::function:entry probes are very slow since DTrace may have to search thousands of functions. All that search time skews elapsed time reported by timestamp. USDT (static) probes are just a few NOP instructions in the code that get fixed up by DTrace as needed so they work much faster.

August 24, 2009

Startup time is measured from the entry to XRE_main to the return of the BrowserStartup JS function. It takes a good bit of time but nothing compared to the time elapsed from the start of Firefox to the call to XRE_main.

According to my static-init.d script, the static initialization time can be ignored. I'm recording the library name and then timing the following call to ImageLoader::runInitializers in dyld. The cumulative time is too small to be of essence, though.

0.000053548s for /System/Library/Frameworks/Carbon.framework/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0.000053732s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libsoftokn3.dylib
0.000069234s for /System/Library/PrivateFrameworks/Shortcut.framework/Versions/A/Shortcut
0.000070455s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/libnssckbi.dylib
0.000072754s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowsercomps.dylib
0.000073443s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Resources/ATSHI.dylib
0.000074363s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCSync.A.dylib
0.000075845s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libCGATS.A.dylib
0.000076892s for /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/CoreGraphics.framework/Versions/A/Resources/libRIP.A.dylib
0.000089767s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libnssdbm3.dylib
0.000094390s for /Users/joelr/Work/mozilla/startup/./Minefield.app/Contents/MacOS/libfreebl3.dylib
0.000100462s for /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/CarbonCore
0.000115375s for /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0.001161267s for /Users/joelr/Work/mozilla/startup/MinefieldRelease.app/Contents/MacOS/components/libbrowserdirprovider.dylib
---------------
= 0.002181527s

Initialization: 42.986413480s
Startup       : 7.015292701s
---------------
= 50.001706181s

Digging deeper...

August 21, 2009

Blogged.

My DTrace scripts live here. Use like this

sudo ./cold.sh static-init.d

DTrace tips and tricks

timestamp vs vtimestamp

vtimestamp measures CPU time of the current thread, excluding IO and DTrace overhead. timestamp can still be used for deltas but the goal is to use as few pid$target probes as possible as they affect timestamp when dtrace has to switch between kernel and userland. io and syscall providers are fast and run in the kernel.

Invalid address

Have you seen this kind of error before?

dtrace: error on enabled probe ID 27 (ID 22130: pid34547:libSystem.B.dylib:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28
dtrace: error on enabled probe ID 2 (ID 22782: pid34547:dyld:dlopen:entry): invalid address (0x2ac204) in action #1 at DIF offset 28

More likely than not, you are using copyinstr on memory that hasn't been paged in yet. Try saving the pointer on entry and doing the copying on return or later.