charts.mozilla.org

Overview

The charts.mozilla.org application is a pure Javascript client running on the browser. It accesses the BMO/ES cluster for data.

Objective

The objective of this Charts project is to specifically provide management dashboards, but to also exemplify the benefits of our BMO/ES backend.

Hacking/Contributers

Contributing to charts.mozilla.org has some distinct benefits over other projects:

Your work is highly visible to people working at Mozilla
The pages are simple static HTML and Javascript, and have a small learning curve
Setting up a development environment is very simple because of this simple static nature

Contact

Feel free to contact me, Kyle Lahnakoski, if you have questions.

IRC: ekyle@irc.mozilla.org

email: klahnakoski@mozilla.com

Architecture

Web Server

The application itself is served as a set of static html and javascript files from the Mozilla PAAS Stackato servers. There are two versions, Production and Staging; the latter will usually have more features, while being slightly more buggy. Please us the staging server as much as possible: like Firefox Nightly, using staging will help find bugs sooner.

Code for each version is in a separate branch

Server	URL	Source Code
production	http://charts.mozilla.org/	https://github.com/mozilla/charts
staging	http://charts.paas.allizom.org/	https://github.com/mozilla/charts/tree/allizom

Since the app is entirely client-side, it does not matter where the application is served from (I have various development versions on my people page). If both these servers are down, a simple git clone can allow you to "serve" the app directly from your local filesystem.

esFrontline

esFrontline is a simple Python-based proxy to limit ES requests to search requests and limit the indexes exposed. Other than these restrictions, this proxy is invisible to client application. See https://wiki.mozilla.org/BMO/ElasticSearch for more details.

ElasticSearch Clusters

Once the application is downloaded it will attempt to contact both the private and public clusters simultaneously; whichever responds will be chosen for all future connections, with preference given to the private cluster. The queries for the dashboard are then sent to the cluster as the dashboard app requires.

The clusters are configured to accept requests from any client. Hopefully this will promote development of alternative dashboards and charts.

Development

The Development server is responsible for secondary indexes built from the main bug_version. Currently it maintains the hierarchy index for determining recursive dependencies on bugs.

Slowness

There are some sources of slowness:

esFrontline - being Python, and simple, may be adding about 1/4 second latency to all ES searches
Virtual Machines - The nodes of the ES clusters are hosted on VMs, and may be contributing to some slowness.
Code - Nothing is minimized, some pages even pause to load JavaScript dynamically.

Production Support

Is there a problem with production? Read on...

Points of Contact for Technical Issues

For problems with the application itself, e.g. 404 errors, see cturra or others in #paas on irc.
- To restart PaaS, use the Stackato web console
- To upload a new image:
  1. git clone https://github.com/mozilla/charts.git
  2. Use the stackato.yml image to deploy:

stackato target api.paas.allizom.org

stackato login

stackato update

For problems with the database, e.g. basic framework loads but no data appears, see jakem, cyliang, or fubar in #it on irc.

Instructions

Past Problems

CORS

The charts application makes cross-platform requests: The app is served from charts.mozilla.org and data requested from esfrontline.bugzilla.mozilla.org. This requires the various proxy servers (not shown in architecture) ensure the Access-Control-Allow-Origin HTTP response header be set appropriately. In the past it has been shown this header is stripped from the esFrontline response and set according to Operations' guidelines.

Clusters Down or Not Responding

ElasticSearch is still prone to OutOfMemoryExceptions. Occasionally, this will bring down nodes in the cluster. The best solution has always been to reboot the problem nodes (or all of them).

The chance of data loss is very low: First, the data is replicated 2 times (for a total of three copies). Second, the ETL daemon (responsible for filling the cluster) performs some simple data checks, and writes full history of all changed bugs: Effectively overwriting corruption on any bugs that does occur. Corruption on inactive bugs can linger; assuming corruption exists, and assuming a change happened when the cluster was misbehaving, and assuming the ETL did not detect the misbehavior. Third, there are consistency checks built into MoDevMetrics that monitors or consistency between bug versions, corruption is sometimes detected, and inevitably fixed by the next ETL run.

Auto-tools/Projects/Charts

Contents