User:Philikon/SyncInstrumentation

From MozillaWiki
Jump to: navigation, search

Account

Using an account with roughly the following collection counts:

  • clients: 50
  • forms: 500
  • history: 16000 (mobile only fetches 5000)
  • bookmarks: 175
  • passwords: 60
  • tabs: 20

Instrumentation

  • To preserve its synchronous APIs, Sync spins the main thread event loop while waiting for an async callback. Icky but actually rather helpful for instrumentation.
  • Logging calls whenever Sync yields to the main thread and when it returns into Sync (including stack trace so we know which engine it's coming from)
  • Log formatter that computes deltas between log messages belonging to the same logger
  • Log appender that logs to memory and flushes to a disk file at the end of a sync (avoids disk I/O during logging, dumps to downloads dir to make accessible via USB storage)

Analysis

  • Grab the delta that's produced for the log message right before we spin the main event loop. That should be the duration we blocked the main thread in that instant
  • Bin results into 50ms sized bins per engine. Ignore anything that's in the 0-50ms bin (it's huge and uninteresting because it's fast).

Results

Below are results for two Sync runs using the 2011-02-10 nightly for both Minefield and Fennec (but with an instrument Sync add-on).

Desktop

alt text
alt text

Mobile (Android 2.3 on Nexus S)

alt text
alt text

Mobile (Android 2.2 on Galaxy S)

alt text
alt text

Mobile (Android 2.2 on Droid 2)

alt text
alt text
  • On desktop we stay under 200ms for the most part (we're not even showing the massive number of times when we're under 50ms), on mobile the upper limit for most events is ~400ms.
  • History (mobile): There are ~250 blocking events >50ms. Comparing that to the ~320 batches we do for the 16k records (50 per batch), it seems that most batches need more than 50ms to apply. The only thing we do synchronously still is pretty much arranging the incoming data into a form that the async history API can handle. We should verify that with more specific instrumentation, but I'm guessing we chew up some processing power there.
  • Passwords seem to be the outlier still. It may be that bug 630720 may not really have had the desired effect (yet). We should investigate tuning the batch size here and perhaps yielding back to the main thread even during a batch.
  • There are some huge spikes (like the one showing for bookmarks in the first desktop graph, others not showing) that are hard to believe. In fact, they're not observed in terms of blocking UI. It may very well be that the profiling data has systematic errors.


Conclusions

  • Batch sizes for passwords and forms can probably be tuned better, bug 636312
  • Much of the churn is due to GC. Lots of individual issues, tracking bug is bug 636304


Preliminary results

  • Big offenders knocked down already, more to come.

Galaxy S

alt text

Droid 2

alt text