Changes

Jump to: navigation, search

Outreachy

1,604 bytes added, 00:03, 1 March 2017
Upgrade ActiveData to use Elasticsearch v5+: added the rest of the round 14 projects
'''Project Description:'''
''About ActiveData'' <br />
ActiveData is a publicly accessible data warehouse holding many billions of records, for some dozen+ datasets concerning Mozilla's testing infrastructure: This includes test results, job results, code coverage, and extracts from other systems. The ActiveData code itself is really only a stateless query translation layer; leaving the hard work of high speed filtering and aggregation to Elasticsearch.
''Background''<br />
Elasticsearch is designed for text search, but can also serve as an extremely fast data warehouse. The speed comes from using [inverted indices](https://www.elastic.co/guide/en/elasticsearch/guide/current/inverted-index.html) to provide high performance data filtering and aggregation. Elasticsearch can index almost any JSON document, perform schema merging, and index all its properties, with almost no human intervention. By letting the machine manage the schema, we can query the JSON without transforming it
Elasticsearch does have a drawback: Its query language is designed for text search and is painful to use in a data warehouse context. Hence the need for ActiveData.
''Problem''<br />
Elasticsearch 1.7.x was the last version that did a reasonable job of schema merging. Newer versions (2.0+) have disallowed schema merging, preventing ingestion of JSON documents that have a schema that conflicts with previous documents. We would like to use newer, faster, and more stable versions of Elasticsearch, but they can not handle varied data.
''Solution'' <br />
Build a translator will convert a variety of JSON formats into a single, strictly-typed, schema. The translator will use schema merging and property-renaming to perform a translation on documents before they go Elasticsearch.
''Benefits'' <br />
ElasticSearch's schema merging is great, but has always been incomplete: 1. It could not merge inner objects `{"a":{"b":1}}` with nested objects `{"a":[{"b":0}]}`, and 2. Merging numbers `{"a": 1}` with strings `{"b": "1"}` did not cause failure, but did leave the schema ambiguous, and made the queries clumsy. This upgrade will make ActiveData more flexible, improve service stability, and provide a step towards promoting this project to production.
''Required Skills''<br />
Some particular experience will make this task easier (most important first):
* Python
* Denormalization and data warehousing
''References'' <br />1. Similar project for smaller data: [Mapping JSON to strict DB schema](https://github.com/klahnakoski/JSONQueryExpressionTests/blob/master/docs/GSOC%20Proposal.md)<br />2. [ELT](https://en.wikipedia.org/wiki/Extract,_transform,_load) Links: [A](http://hexanika.com/why-shift-from-etl-to-elt/), [B](https://www.ironsidegroup.com/2015/03/01/etl-vs-elt-whats-the-big-difference/)<br />3. [data cubes](https://en.wikipedia.org/wiki/OLAP_cube) <br />4. [fact tables](https://en.wikipedia.org/wiki/Fact_table) in a [data warehouse](https://en.wikipedia.org/wiki/Data_warehouse). <br />5. [MDX](https://en.wikipedia.org/wiki/MultiDimensional_eXpressions).<br/> ====Upgrade Lightbeam===='''Mentor:''' [Paul Theriault https://mozillians.org/en-US/u/pauljt/]'''IRC:''' pauljt '''Project Description:''' The Mozilla Lightbeam extension is a key tool for Mozilla to educate the public about privacy and it’s due for an upgrade. It needs to be rewritten as a Web Extension and we will take this opportunity to investigate potential new features or integrations with Firefox. We are seeking a candidate with web development skills to work with the security engineering team to make this happen. We are looking for a front-end web developer who is passionate about privacy on the web. They should have an interest in design and visualization as the key part of this upgrade would be exploring how we can convey complex privacy & security concepts to the all Firefox users. Experience working with Web Extensions is a bonus but not required. ====Improve DevTools Network Monitor===='''Mentor:''' [https://mozillians.org/en-US/u/odvarko/ Jan Odvarko] <br />'''IRC:''' Honza '''Project Description:'''"Firefox Developer tools can be used to intercept all HTTP requests and responses executed by a page. This feature is represented by a Network panel that shows list of all HTTP requests and related details (like e.g. headers, formatted response body, timings, etc.) In order to improve value of the panel we'd like to connect the existing UI to other backends performing HTTP activity (for example, Chrome browser or NodeJS) and adapt our current remote protocol (RDP) to a new environment."
==Outreachy Program Cohort: Round 13 (Dec 2016-March 2017)==
Confirm
514
edits

Navigation menu