BMO/performance

From MozillaWiki
< BMO
Jump to: navigation, search
Draft-template-image.png THIS PAGE IS A WORKING DRAFT Pencil-emoji U270F-gray.png
The page may be difficult to navigate, and some information on its subject might be incomplete and/or evolving rapidly.
If you have any questions or ideas, please add them as a new topic on the discussion page.

bugzilla.mozilla.org and Performance

The Problem

Bugzilla has a long history of performance issues, especially when it starts to scale up to the sizes seen on the largest sites. In comparison with modern web applications it can be sluggish to load or update a bug.

Why?

Bugzilla's core architectural design originates in early web development times, when things were simpler. By far the largest issue that is holding back Bugzilla's responsiveness is the near-complete lack of cross-request caching. Unlike most modern web applications, Bugzilla doesn't utilise an external caching system such as memcached or redis to avoid regenerating the same content, and retrofitting this to Bugzilla without substantial rewriting is proving to be a difficult problem to solve.

Up until reasonably recently, Bugzilla didn't do much in terms of same-request caching: for example creating an object from its ID always required a round-trip to the database, even if the object had already been created while processing that request in another method.

The templating engine Bugzilla uses, Template-Toolkit, is excellent; however, some operations are slower than in other comparable engines. One feature which is lacking is the ability to easily flush the generated content to the client, which means the entire page must be generated before it can be sent to the client.

Another factor is the schema design, specifically BMO's use of Bugzilla's built-in custom fields for status and tracking flags. Bugzilla's custom-fields implementation is geared around a small number of fields and is implemented by adding columns to the main bugs table. Since the rapid release trains started, we've been adding 6 new fields every 6 weeks. This has resulted in a very wide bugs table (over 160 columns), which has a performance impact across both Bugzilla's business logic layer and within the database itself.

We've also seen a large increase in the amount of requests which Bugzilla is required to serve, some from real users but mostly from other tools built using Bugzilla's APIs.

What have we done already?

There has been significant analysis of the performance bottlenecks within Bugzilla (eg. using NYTProf) by many participants in Bugzilla development (most noticeably from Bugzilla, Mozilla, and RedHat developers). Quite a few changes have already been committed to upstream to improve performance by optimising code or increasing the use of the same-request cache.

To address the "every object request is loaded from the database" issue, bug 811280 implemented simple object caching (within the same request). This was backported from upstream's trunk and has been running on BMO for some time. We are working on extending this mechanism so more classes make use of it.

In March 2013, BMO was moved from a cluster in the PHX datacentre to SCL3, with improvements to the infrastructure it runs on. This includes, but is not limited to, more and faster webheads, moving email processing off the webheads, faster database servers, and a more recent version of MySQL.

What will we be doing?

Work is always ongoing with regards to the best way to implement caching mechanisms into Bugzilla, which has included investigations into reverse proxies and selected memcached usage. This effort will continue.

Analysis of bug updates has shown that email generation is taking a significant amount of time when a bug is updated, due to the amount of emails generated. Primarily because of component watching (a BMO-specific customisation), Bugzilla notifies an average of 40 recipients per change, and it isn't unusual for a single change to trigger over 100 emails. Currently their content is generated synchronously; however, the actual encryption and delivery of the email happens in a background daemon. bug 877078 moves the generation of bugmail to the daemon and has shown significant time savings during testing.

Work on refactoring how we store tracking and status flags is nearing completion (bug 750742), which should not only assist in overall performance but also allows us the opportunity to change how we present the tracking/status flag information without needing to heavily customise the Bugzilla core.

The BzAPI (REST) API proxy, which is currently a service completely external to Bugzilla, is nearing the end of its long journey to being included as an alternative native Bugzilla webservice endpoint, alongside XML-RPC, JSON-RPC and JSON-P. Follow bug 866927 to see the progress there. This is expected to greatly improve the responsiveness of the REST API, and it enables us to perform smart caching of requests.

While the BzAPI proxy consumers themselves should consider the impact these calls have on BMO and implement caching mechanisms, it's understandable why this doesn't always happen (I've seen many a quick-and-dirty solution quickly turn into production-level usage without refactoring to lessen its impact). Once the REST services are integrated with Bugzilla, we plan on caching identical requests within a set time frame to guard against accidental BzAPI-driven DDOSing of BMO.

One change in the pipeline which should net client-side benefits is the upgrade of our core javascript library, YUI, from version 2 to version 3. YUI3's footprint on the browser is significantly less when compared with YUI2, with a strong focus on lazy-loading of libraries. Yahoo are performing the work for us here, in bug 453268.

What isn't the problem?

  • Static assets. We set cache headers correctly; if you reload a bug, only the generated HTML is sent to the client. You can verify this with the network view in Firefox's Web Developer Tools or in Firebug.
  • HTML layout. While old-school, the time to render the page in the browser is a very small fraction of the overall load time, much less than server-side generation and network transmission times. Again, this is verifiable with Firefox's Web Developer Tools.
  • Hardware infrastructure. The webheads have ample processing power and memory.
  • Perl. Perl isn't a slow language by nature (it is pre-compiled to opcode akin to Java and Python), and there are modern high performance sites built using Perl, such as Duck Duck Go.