L20n/Firefox/Performance: Difference between revisions

Jump to navigation Jump to search
m
no edit summary
(add two more hypothesis)
mNo edit summary
Line 27: Line 27:
* See if we can understand if we pay too much for Promises (stas/axel)
* See if we can understand if we pay too much for Promises (stas/axel)


====================
==Previous Research==


At the end of [[L20n/Firefox#Sprint_5:_August_16_.E2.80.94_August_21|Sprint 5]] we [http://hg.mozilla.org/projects/larch/rev/4138eb4850bd merged mozilla-central into larch] and re-rum Talos tests.  The [https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=63988dbb2532&newProject=try&newRevision=bdaa76536166&framework=1 results] show the following tests regressing in performance:
At the end of [[L20n/Firefox#Sprint_5:_August_16_.E2.80.94_August_21|Sprint 5]] we [http://hg.mozilla.org/projects/larch/rev/4138eb4850bd merged mozilla-central into larch] and re-rum Talos tests.  The [https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=63988dbb2532&newProject=try&newRevision=bdaa76536166&framework=1 results] show the following tests regressing in performance:
Line 35: Line 35:
* ts_paint - ~8% hit
* ts_paint - ~8% hit


==Hypothesis 1: IO Is Slow==
===Hypothesis: IO Is Slow===
Are we IO-bound? We start fetching resources almost 60ms after we start l20n.js. We can start earlier. We also seem to be blocked on the IO front and the return of the results come around 150ms when executed at the point where we do this, while without load the IO takes only 3-4ms. It may be that it's because we're async or within Promises. Let's try to make it sync.
Are we IO-bound? We start fetching resources almost 60ms after we start l20n.js. We can start earlier. We also seem to be blocked on the IO front and the return of the results come around 150ms when executed at the point where we do this, while without load the IO takes only 3-4ms. It may be that it's because we're async or within Promises. Let's try to make it sync.


Line 53: Line 53:
In (B) we still need to guard the initial translation of the document with documentReady().  The reason for this is that we can't rely on the unguarded MutationObserver in XUL documents: {{bug|1294197}}.
In (B) we still need to guard the initial translation of the document with documentReady().  The reason for this is that we can't rely on the unguarded MutationObserver in XUL documents: {{bug|1294197}}.


==<s>Hypothesis 2: NodeList Iteration Is Very Slow</s>==
===<s>Hypothesis: NodeList Iteration Is Very Slow</s>===
Are we DOM-bound?  Iterating over the <code>[data-l10n-id]</code> NodeList is slow: taking 25-30 ms on a 114-element-long NodeList (on 3rd+ launch).  It doesn't matter if the iteration is done via a regular <code>for</code> loop, <code>for…of</code> or <code>Array.from</code>.  See [https://github.com/l20n/l20n.js/blob/56045d87d496c7e7cc47f573b3086030025b7eb9/src/lib/observer/base.js#L221-L230 <code>getTranslatables</code>].
Are we DOM-bound?  Iterating over the <code>[data-l10n-id]</code> NodeList is slow: taking 25-30 ms on a 114-element-long NodeList (on 3rd+ launch).  It doesn't matter if the iteration is done via a regular <code>for</code> loop, <code>for…of</code> or <code>Array.from</code>.  See [https://github.com/l20n/l20n.js/blob/56045d87d496c7e7cc47f573b3086030025b7eb9/src/lib/observer/base.js#L221-L230 <code>getTranslatables</code>].


Line 74: Line 74:
I doesn't matter if we're using <code>querySelectorAll</code> or <code>getElementsByClassName</code> (with a special class for this purpose).  Hypothesis: we're the first code that requests these DOM nodes and it's slow.  Subsequent iterations are very fast.
I doesn't matter if we're using <code>querySelectorAll</code> or <code>getElementsByClassName</code> (with a special class for this purpose).  Hypothesis: we're the first code that requests these DOM nodes and it's slow.  Subsequent iterations are very fast.


After fixing {{bug|1298974}} the NodeList iteration isn't a bottleneck anymore.  It turned out XUL documents have different readystates than HTML and l20n's logic was being run too early when the DOM wasn't ready yet.  With the fix, the first iteration takes less than 5 ms. This is still much more than the subsequent calls to querySelectorAll(); see H5 for more discussion about it.
After fixing {{bug|1298974}} the NodeList iteration isn't a bottleneck anymore.  It turned out XUL documents have different readystates than HTML and l20n's logic was being run too early when the DOM wasn't ready yet.  With the fix, the first iteration takes less than 5 ms. This is still much more than the subsequent calls to querySelectorAll(); see ''Lazy-to-actual XULElements conversion'' for more discussion about it.


==<s>Hypothesis 3: Formatting Translations Is Slow</s>==
===<s>Hypothesis: Formatting Translations Is Slow</s>===
Are we CPU-bound?  [https://github.com/l20n/l20n.js/blob/56045d87d496c7e7cc47f573b3086030025b7eb9/src/lib/dom/base.js#L24-L47 <code>formatWithFallback</code>] is taking 20-25 ms.  It seems that much of that is being spent handling errors (concatenating them with previously reported ones).
Are we CPU-bound?  [https://github.com/l20n/l20n.js/blob/56045d87d496c7e7cc47f573b3086030025b7eb9/src/lib/dom/base.js#L24-L47 <code>formatWithFallback</code>] is taking 20-25 ms.  It seems that much of that is being spent handling errors (concatenating them with previously reported ones).


Line 90: Line 90:
* returning an empty errors array from each format() call : {{bug|1301198}}
* returning an empty errors array from each format() call : {{bug|1301198}}


==<s>Hypothesis 4: Async is slow</s>==
===<s>Hypothesis: Async is slow</s>===
Do we yield to other code in the event loop queue when we call then()?  Unlikely.  In the scenario where we trigger document translation on DOMContentLoaded, the IO has finished much earlier and the l10n.interactive promise is resolved.  then() callbacks are executed in microtasks on the same turn of the event loop without yielding to other code in the queue.
Do we yield to other code in the event loop queue when we call then()?  Unlikely.  In the scenario where we trigger document translation on DOMContentLoaded, the IO has finished much earlier and the l10n.interactive promise is resolved.  then() callbacks are executed in microtasks on the same turn of the event loop without yielding to other code in the queue.


Line 96: Line 96:
* <code>Promise.resolve().then(() => performance.mark('…'))</code> instead of <code>translateRoot()</code>
* <code>Promise.resolve().then(() => performance.mark('…'))</code> instead of <code>translateRoot()</code>


==Hypothesis 5: Lazy-to-actual XULElements conversion==
===Hypothesis: Lazy-to-actual XULElements conversion===
Are we working against XUL layout optimizations?  XUL elements are first created as XULPrototypeElements which only carry just enough information to be useful for layout.  When they're accessed from JS code they are converted to actual XULElements.  The hypothesis is that there are at least two things when creating a new browser.xul window that trigger this conversion before tpaint: our code and something else, let's call it X.  The first piece of code to make the call pays the lazy-to-actual tax.  If it's us (because we start very early, like when we trigger on MozBeforeLayout), we take a lot of time (see Hypothesis 2!), but still finish pretty early, then X runs faster and tpaint happens in 270 ms (on stas's machine).  If it's not us (like when we trigger on DOMContentLoaded) then X takes more time; we run later but finish faster (we take 20 ms to translate everything), and... tpaint hapens in 270 ms.  For tpaint it only matters if the tax has been paid.  Below are the rough numbers in milliseconds:
Are we working against XUL layout optimizations?  XUL elements are first created as XULPrototypeElements which only carry just enough information to be useful for layout.  When they're accessed from JS code they are converted to actual XULElements.  The hypothesis is that there are at least two things when creating a new browser.xul window that trigger this conversion before tpaint: our code and something else, let's call it X.  The first piece of code to make the call pays the lazy-to-actual tax.  If it's us (because we start very early, like when we trigger on MozBeforeLayout), we take a lot of time (see ''NodeList Iteration''), but still finish pretty early, then X runs faster and tpaint happens in 270 ms (on stas's machine).  If it's not us (like when we trigger on DOMContentLoaded) then X takes more time; we run later but finish faster (we take 20 ms to translate everything), and... tpaint hapens in 270 ms.  For tpaint it only matters if the tax has been paid.  Below are the rough numbers in milliseconds:


   Trigger                l20n-start  io-start  io-end  trigger/translateRoot-start  formatting-start (async)  formatting-end  translateRoot-end  tpaint
   Trigger                l20n-start  io-start  io-end  trigger/translateRoot-start  formatting-start (async)  formatting-end  translateRoot-end  tpaint
Line 104: Line 104:
   MozBeforeLayout                  8        10      35                          40                        65            135                140    270
   MozBeforeLayout                  8        10      35                          40                        65            135                140    270


==Hypothesis 6: DOM translation is slow==
===Hypothesis: DOM translation is slow===
This is similar to H3, but larger in scope. Are we CPU-bound during document translation?  translateRoot from start to finish is taking 20 ms on stas's machine and it's very consistent (in the DOMContentLoaded scenario).  This lines up with thefollowing tpaint numbers:
This is similar to ''Formatting Translations'', but larger in scope. Are we CPU-bound during document translation?  translateRoot from start to finish is taking 20 ms on stas's machine and it's very consistent (in the DOMContentLoaded scenario).  This lines up with thefollowing tpaint numbers:


                         DOMConentLoaded  translateRoot-end  tpaint
                         DOMConentLoaded  translateRoot-end  tpaint
Line 134: Line 134:
** this looks like it helps: try shows [https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=47cb962768ac&newProject=try&newRevision=2f1fd92838c1&framework=1&showOnlyImportant=0 -15ms on tpaint] vs. larch
** this looks like it helps: try shows [https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=47cb962768ac&newProject=try&newRevision=2f1fd92838c1&framework=1&showOnlyImportant=0 -15ms on tpaint] vs. larch


==Hypothesis 7: Returning errors from MessageContext is slow==
===Hypothesis: Returning errors from MessageContext is slow===
I noticed on Cleopatra that <code>entitiesFromContext</code> takes ~6ms but it only spends 1ms in <code>format</code>.  <s>Are we paying for transforming the data structure that we get from the parser into a { value, attrs } object that the XUl bindings consume?</s>  It looks like this is MessageContext.format's fault after all.  Creating an empty errors array is costing us 2-3 ms.  If I pass the array as an argument to append errors to, the whole thing takes 10-11 ms (13-14 ms before).  I filed {{bug|1301198}}.
I noticed on Cleopatra that <code>entitiesFromContext</code> takes ~6ms but it only spends 1ms in <code>format</code>.  <s>Are we paying for transforming the data structure that we get from the parser into a { value, attrs } object that the XUl bindings consume?</s>  It looks like this is MessageContext.format's fault after all.  Creating an empty errors array is costing us 2-3 ms.  If I pass the array as an argument to append errors to, the whole thing takes 10-11 ms (13-14 ms before).  I filed {{bug|1301198}}.


==Hypothesis 8: I/O cache impact tpaint==
===Hypothesis: I/O cache impacts tpaint===
Because tpaint launches 20 windows, 19 of them use cached I/O. We disabled the cache to see how it impact tpaint results.
Because tpaint launches 20 windows, 19 of them use cached I/O. We disabled the cache to see how it impact tpaint results.
[https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=65cba468defc&newProject=try&newRevision=1d23b2721e83&framework=1&showOnlyImportant=0 The results] indicate that there's no impact of I/O cache on tpaint - disabling it doesn't change results.
[https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=65cba468defc&newProject=try&newRevision=1d23b2721e83&framework=1&showOnlyImportant=0 The results] indicate that there's no impact of I/O cache on tpaint - disabling it doesn't change results.


==Hypothesis 9: Caching parsed entries instead of strings will impact tpaint==
===Hypothesis: Caching parsed entries instead of strings will impact tpaint===
On larch we cache the strings we load (see Hypothesis 8), but we do parse the string every time. Our census indicates that parsing costs around 5ms, but we also have to copy a large string between L10nRegistry JSM, bindings, back to IntlMessageContext JSM where we parse it and then carry it back to bindings.
On larch we cache the strings we load (see Hypothesis ''I/O cache''), but we do parse the string every time. Our census indicates that parsing costs around 5ms, but we also have to copy a large string between L10nRegistry JSM, bindings, back to IntlMessageContext JSM where we parse it and then carry it back to bindings.


[https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=65cba468defc&newProject=try&newRevision=f328389107a3&framework=1&showOnlyImportant=0 The results] indicate that moving parsing to be done within L10nRegistry and cached does shave off ~9ms on tpaint and 6ms on tpaint(e10s).
[https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=65cba468defc&newProject=try&newRevision=f328389107a3&framework=1&showOnlyImportant=0 The results] indicate that moving parsing to be done within L10nRegistry and cached does shave off ~9ms on tpaint and 6ms on tpaint(e10s).
That indicates that there's more than just parsing time that we shave off, but we also store more in memory (JSON objects instead of strings) and it mostly impacts tpaint test (opening the same window 20 times) rather than actual real life user experience (launching UI once).
That indicates that there's more than just parsing time that we shave off, but we also store more in memory (JSON objects instead of strings) and it mostly impacts tpaint test (opening the same window 20 times) rather than actual real life user experience (launching UI once).
canmove, Confirmed users
1,448

edits

Navigation menu