Bugzilla Anthropology/2013-01-29

From MozillaWiki
Jump to navigation Jump to search

Bugzilla Anthropology January 29th 2013

ElasticSearch

ElasticSearch (ES) is a fast and scalable document store. Each Bugzilla bug is extracted, comments and titles removed, and inserted into ES as a series of JSON documents; each represeting a point on the bug history. This is done for all bugs, including security bugs.

Publicizing ES Data

Currently we are going through a security review to identify the changes required to publicize the ES data.

The academic community has an interest in Mozilla's rich bug repository

  • Olga Baysal – University of Waterloo - Specifically looking at review times and how the rapid release cycle has change them: [1]
  • Sean O'Riordain – PhD into the statistics of bugs in software at Trinity College, Dublin, Ireland
  • David Eaves – Has interest in what motivates/discourages volunteer contributions at Mozilla: [2]
  • Working Conference on Mining Software Repositories (MSR) – May focus on the Mozilla codebase in 2014 [3]

Increasing Mindshare

We assume the positive effects the BZ rest interface had on the number of tools can be amplified further given the speed of ES:

Current Work

Parallel Efforts

Despite what exists, there is distinct need for more tools to better manage the large number of issues BZ deals with

  • B-Team – Working directly on BZ to improve it’s dashboards
  • David Bosewell – interest from the community perspective: needs to measure the effectiveness of the community programs
  • Liz Henry – Looking into bug triage practices
  • Marco Mucci – Currently using Scumbugs to track Metro, but needs better tools
  • 'UX Team – has settled on BZ for tracking bugs, but need tools to manage their work
  • Release Engineering – Has hired an intern to produce operational dashboards: to sort tracking bugs by component/priority and assignee.



ElasticSearch (Technical Summary)

Despite ES's technical limitations, the current Javascript libraries give us both fast and expressive dash-boarding capability: Scanning 7million documents in sub-second time.

  • ES Highlights
    • Fast - Automatically indexed, and in memory.
    • Scalable - Sharding assumes every document can stand alone
    • Extensible - MVEL scripting language allows arbitrary code to be run on the server-side
    • Limited Filtering - ES was designed for document search. BI queries require complicated filtering rules across multiple relations. ES' nested filters do not compare.
    • Limited Grouping - ES is designed only from simple grouping and document counting
  • Enhancments To Date
    • Javascript library to convert SQL-like queries to ES/MVEL queries
    • Javascript DB implementation to perform the joins and sophisticated calculations on client
  • Further Issues
    • Poor Stability - Need more human resources to identify and tune the existing ES cluster
    • Cluster Too Small - The hardware is 5 years old, and never setup to run production queries
    • Not centralized - Other projects are using ES instances, spreading the work over one large cluster will tame the relative usage peaks.