Socorro/SocorroUI/Data Assumptions and Gotchas

From MozillaWiki
Jump to: navigation, search

There are some non-obvious things that you need to know to understand Socorro.

Data processing lag

Crashes take between 10 seconds and 2 minutes to get into all databases after hitting the Collectors, depending on current system load. In some cases, the IT department or the Socorro team may halt processing for up to one hour.

Several bits of summary data deliberately lags behind crash insert. The reason for this is to cope with delays in the system and crashes which arrive out-of-order.

  • Topcrashers: updated hourly for a period two hours ago
  • Duplicate flagging: updated hourly for a period two hours ago
  • First Report of Signature: updated hourly for a period one hour ago
  • ADUs update daily, for the previous day.

Throttling

In general, crashes from release versions of Firefox are "throttled" so that only 10% of these crashes are processed. This throttling starts sometime after each version is released.

Currently, the charts in the webapp are unable to cope with changes to throttling, so you may see what look like huge jumps or dips in crashes when throttling is turned on or off.

Average Daily Users (ADUs)

All ADU ratios are generated per 100 ADUs, not per single ADU. "3.5 crashes/ADU" means that there are 3.5 crashes per 100 users, not per user. This ratio also does not distinguish between users who crash multiple times and multiple crashing users.

ADUs come from blocklist pings.

Duplicate Flags

Currently we are attempting to flag crashes with appear to be "duplicates". Duplicates are defined as multiple crashes which were generated by the same user machine for the same crash. It is suspected that there is a bug in breakpad or other code which causes multi-submissions.

Due to privacy requirements of user data, it is not possible to precisely identify duplicates. Instead, we take a analytic approach, flagging duplicates based on a matching profile of timing and identical data. This algorithm is currently in its first iteration and will be improved incrementally.

If you see places where duplicate flagging is obviously wrong -- either false positives, or missing duplicate flagging -- please report the UUIDs of the crashes involved through Bugzilla.

Crashes per ADU spikes shortly after a release

Crashes are recorded and graphed according to pacific time while ADU is reported in UTC time. ADU data is updated once per day, for the previous day.

The first day of a release, we only have a partial day of ADU while we have a full day of crashes.