Bugzilla Anthropology/2013-01-29
ElasticSearch Summary
ElasticSearch (ES) is a fast and scalable document store. Each Bugzilla bug is extracted, comments and titles removed, and inserted into ES as a series of JSON documents; each represeting a point on the bug history. This is done for all bugs, including security bugs.
Publicizing ES Data
Currently we are going through a security review to identify the changes required to publicize the ES data.
- The academic community has an interest in Mozilla's rich bug repository
- Olga Baysal – University of Waterloo - Specifically looking at review times and how the rapid release cycle has change them: [1]
- Sean O'Riordain – PhD into the statistics of bugs in software at Trinity College, Dublin, Ireland
- David Eaves – Has interest in what motivates/discourages volunteer contributions at Mozilla: [2]
- Working Conference on Mining Software Repositories (MSR) – May focus on the Mozilla codebase in 2014 [3]
- Increasing Mindshare - We assume the positive effect the BZ Rest interface had on the number of dashboard tools can be amplified further given the speed of ES:
- David Bosewell maintains a [page of dashboards]
- [B2G Dashboard]
- [Firefox OS]
Current Work
Review Queues
Initial work focused on summarizing the review queues. The majority of the work was overcoming the technical limitations of ES.
Open Bug Counts
Counting open bugs by program/product/component and team. Again, significant work required to keep it fast.
Percentile Ages
Long term Programs can benefit from looking at the percentiles on the age of bugs over time. We can see the positive effects of focusing on security, while the negative effects of demphasizing Snappy:
Operational Dashboards
Simpler, current, dashboards have been made to focus on particular issues:
- []
Parallel Efforts
Despite what exists, there is distinct need for more tools to better manage the large number of issues BZ deals with
- B-Team – Working directly on BZ to improve it’s dashboards
- David Bosewell – interest from the community perspective: needs to measure the effectiveness of the community programs
- Liz Henry – Looking into bug triage practices
- Marco Mucci – Currently using Scumbugs to track Metro, but needs better tools
- 'UX Team – has settled on BZ for tracking bugs, but need tools to manage their work
- Release Engineering – Has hired an intern to produce operational dashboards: to sort tracking bugs by component/priority and assignee.
ElasticSearch (Technical Summary)
Despite ES's technical limitations, the current Javascript libraries give us both fast and expressive dash-boarding capability: Scanning 7million documents in sub-second time.
- ES Highlights
- Fast - Automatically indexed, and in memory.
- Scalable - Sharding assumes every document can stand alone
- Extensible - MVEL scripting language allows arbitrary code to be run on the server-side
- Limited Filtering - ES was designed for document search. BI queries require complicated filtering rules across multiple relations. ES' nested filters do not compare.
- Limited Grouping - ES is designed only from simple grouping and document counting
- Enhancments To Date
- Javascript library to convert SQL-like queries to ES/MVEL queries
- Javascript DB implementation to perform the joins and sophisticated calculations on client
- Further Issues
- Poor Stability - Need more human resources to identify and tune the existing ES cluster
- Cluster Too Small - The hardware is 5 years old, and never setup to run production queries
- Not centralized - Other projects are using ES instances, spreading the work over one large cluster will tame the relative usage peaks.