1. JSON vs SQLite storage The SQLite schema used in FHR is overly complex and seems over-engineered and has a degree of redundancy. We could simplify FHR storage by storing data in JSON files, this will cut down amount of IO and fsyncs. We should flush out current session data on shutdown(without fsync to delay shutdown).
2. Flushing prefs FHR currently uses browser prefs to store data such as lastPingTime, lastSubmitID, last submission success, etc. It will need to explicitly flush the prefs store to disk for this information to survive a crash. Currently, prefs can only be flushed on the main thread, and this is likely to be a source of jank as typical pref files are sizeable.
3. User-activity tracking The FHR user-activity handler should be implemented natively. The overhead of one JS event running every 5 seconds (while the user is interacting with the browser) is not likely to cause any user-visible jank, but this functionality really belongs in platform code. Once this data provider is moved to native code, FHR will have 1) "pull type" data providers that are only polled at shutdown or upload time, and 2) "push type" providers that record specific user actions (e.g. Google searches). If we could move the "push type" providers out of FHR, FHR would only have to be in memory at shutdown or submission time.
4. FHR event polling Currently, FHR wakes up every minute to check if it needs to submit a report, expire any data, etc. This can be made neater by simply scheduling wake-ups for a specific time in the future.
5. Polling for startup timestamps If one of the startup milestones (e.g. session-restore) hasn't completed by the time FHR is initialized on startup, FHR polls every 5 seconds for up to 5 minutes to try to obtain the missing timestamp. Couldn't this timestamp be obtained at the end of the session or by registering a listener for the last startup event of interest?
6. Chrome workers Any disk or network operations should be moved entirely to a chrome worker to avoid unintentionally slowing down the main thread. For example, serializing 6-months worth of FHR data on the main thread before passing the string to Necko's thread-pool could cause jank (we've seen this problem with session restore). FHR can then use XHR from a worker to do uploads instead of re-implementing the functionality on the main thread & using XPCOM to open network channels.
7. Spinning of event loop at shutdown We shouldn't be spinning the event loop at shutdown. It seems event-loop spinning was added to allow an FHR upload to finish before Firefox shuts down, but we could achieve the same effect by cancelling the upload and updating state atomically at the end of the upload.
8. Telemetry probes FHR should track its first-run initialization time in Telemetry separately from subsequent initialization times. We should also change the existing Telemetry probes to accurately track the amount of time spent on FHR operations instead of tracking the wall clock time elapsed from issuing an asynchronus FHR operation to the operation's completion handler running. It's causing the probes to over-estimate the amount of time required by FHR operations.
9. Other comments The current uses of Task.jsm are hard to read, they should be rewritten with better naming conventions or task closures.