Loop/Telemetry: Difference between revisions
| Line 58: | Line 58: | ||
== Call Setup Latency == | == Call Setup Latency == | ||
Tracked in {{bug|1003163}} | |||
== URL Sharing == | == URL Sharing == | ||
Revision as of 16:08, 12 June 2014
To monitor the performance and other aspects of the Loop service, we will be gathering information independent of the platform metrics and uploading it to the telemetry infrastructure. The list of collected metrics will be expanded in later phases of the project.
ICE Failures
Tracked in bug 998989
One of the first and most important bits of information we need to gather is ICE failure information. When we initially launch the MLP, it is quite likely that we will see non-trivial connection failures, and we will want to be in a position to rapidly diagnose (and, where possible, fix) the causes of these failures. To that end, we need to be able to get to the ICE statistics; and, if a call fails to set up (i.e., ICE connection state transitions to "failed" or "disconnected"), upload ICE logging information.
Details
- When a PeerConnection is instantiated, begin listening for the iceconnectionstatechange event
- This may require adding code in PeerConnection.js that allows retrieval of all of the PCs for a given WindowID, using the _globalPCList
- If the state changes to "failed" or "disconnected", Retrieve ICE logging information and upload it to the ICE log to the telemetry server:
- Instantate a WebrtcGlobalInformation object (see http://dxr.mozilla.org/mozilla-central/source/dom/webidl/WebrtcGlobalInformation.webidl)
- Call "getLogging" on the WGI object
- In the callback from getLogging (which takes an array of log lines as its argument):
- Consolidate the log file into a single string
- Create a report object as detailed below
- Create a new UUID-based report ID
- Compress the report using gzip
- Send an HTTP POST request to LOOP_TELMETRY_URL/REPORT_ID/loop/OPERATING_SYSTEM/VERSION/UPDATE_CHANNEL/BUILD_ID containing the compressed report.
- By default, LOOP_TELEMETRY_URL is https://loop.telemetry.mozilla.org/submit/telemetry, and is configurable in the loop.telemetryURL pref.
Example URL and payload
{
"ver": 1,
"info": {
"appUpdateChannel": "default",
"appBuildID": "20140421104955",
"appName": "Firefox",
"appVersion": "32.0",
"reason": "loop",
"OS":"Darwin",
"version":"12.5.0"
},
"report": "ice failure",
"connectionstate": "failed",
"stats": <...JSONified version of RTCStatsReport...>
"localSdp": <...SDP body...>,
"remoteSdp": <...SDP body...>,
"log": <...~50-1000 kB of log file...>
}
Nature of Data
- Uncompressed, these logs are on the order of 50-1000 kB, but they contain significant redundancy: when run through gzip, they reduce to something more in the 3-60 kB range.
- We anticipate that these logs will be generated for something on the order of 5% to 20% of total call volume initially, shrinking as we troubleshoot and eliminate the issues they help us discover.
- When the feature lands in nightly, we are optimistically hoping for on the order of 10,000 users, probably making no more than 1 or 2 calls a day (i.e., we would anticipate initial worst-case load to be approximately 4,000 logs per day). After we get some experience with the feature, we should be able to refine this to be more accurate.
- The logs will necessarily contain users' IP addresses (and will be nearly useless without them), so access control will be important.
- Eventually, we will want to develop fingerprinting heuristics for grouping logs together that are likely to share common root causes, but this is not required for the initial deployment.
- For initial analysis, we could probably do with something as simple as a report that says "on date, there were x failures, broken down as follows: failed: failed count, disconnected: disconnected count," and then lets us list all the failures for a given date/reason pair, ultimately allowing us to download the log to analyze. As we get experience with how things tend to break, we might want to refine this some, but it's a good start.
User Satisfaction
Tracked in bug 1003108
Call Setup Latency
Tracked in bug 1003163