Changes

Jump to: navigation, search

Telemetry/Custom analysis with spark

1,080 bytes added, 18:32, 8 September 2016
Scheduled Jobs
The notebook is setup to work with Spark. See the "Using Spark" section for more information.
 
=== Setting Up a Dashboard ===
 
Scheduled Spark jobs allow a Jupyter notebook to be updated consistently, making a nice and easy-to-use dashboard.
 
To schedule a Spark job:
 
# Visit the analysis provisioning dashboard at telemetry-dash.mozilla.org and sign in using Persona with an @mozilla.com email address.
# Click “Schedule a Spark Job”.
# Enter some details:
## The “Job Name” field should be a short descriptive name, like “chromehangs analysis”.
## Upload your IPython notebook containing the analysis.
## Set the number of workers of the cluster in the “Cluster Size” field.
## Set a schedule frequency using the remaining fields.
 
Now, the notebook will be updated automatically, and the results can be easily shared.
 
For reference, see [https://robertovitillo.com/2015/03/13/simple-dashboards-with-scheduled-spark-jobs-and-plotly Simple Dashboard with Scheduled Spark Jobs and Plotly].
== Using Spark ==
Spark is a general-purpose cluster computing system - it allows users to run general execution graphs. APIs are available in Python, Scala, and Java. The Jupyter notebook utilizes the Python API. In a nutshell, it provides a way to run functional code (e.g. map, reduce, etc.) on large, distributed data.
 
Check out [https://robertovitillo.com/2015/06/30/spark-best-practices/ Spark Best Practices] for tips on using Spark to it's full capabilities.
=== SparkContext (sc) ===
29
edits

Navigation menu