Changes

Telemetry/Custom analysis with spark

1,080 bytes added, 18:32, 8 September 2016

Scheduled Jobs

The notebook is setup to work with Spark. See the "Using Spark" section for more information.

=== Setting Up a Dashboard ===

Scheduled Spark jobs allow a Jupyter notebook to be updated consistently, making a nice and easy-to-use dashboard.

To schedule a Spark job:

# Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address.

# Click “Schedule a Spark Job”.

# Enter some details:

## The “Job Name” field should be a short descriptive name, like “chromehangs analysis”.

## Upload your IPython notebook containing the analysis.

## Set the number of workers of the cluster in the “Cluster Size” field.

## Set a schedule frequency using the remaining fields.

Now, the notebook will be updated automatically, and the results can be easily shared.

For reference, see [https://robertovitillo.com/2015/03/13/simple-dashboards-with-scheduled-spark-jobs-and-plotly Simple Dashboard with Scheduled Spark Jobs and Plotly].

== Using Spark ==

Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data.

Check out [https://robertovitillo.com/2015/06/30/spark-best-practices/ Spark Best Practices] for tips on using Spark to it's full capabilities.

=== SparkContext (sc) ===

Fbertsch

29

edits

Changes

Telemetry/Custom analysis with spark

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools