Telemetry/Custom analysis with spark: Difference between revisions

→‎How do I load an external library into the cluster?: Update external library loading instructions to include ipython context + alternate egg downloading method
(Show how to distribute a repo)
(→‎How do I load an external library into the cluster?: Update external library loading instructions to include ipython context + alternate egg downloading method)
Line 180: Line 180:


=== How do I load an external library into the cluster? ===
=== How do I load an external library into the cluster? ===
Assuming you've got a url for the repo, you can distribute the egg this way:
Assuming you've got a url for the repo, you can create an egg for it this way:


  import sys
  import os
   !git clone <repo url> && cd <repo-name> && python setup.py bdist_egg
   !git clone <repo url> && cd <repo-name> && python setup.py bdist_egg
   sc.addPyFile('<repo-name>/dist/my-egg-file.egg')
   sc.addPyFile('<repo-name>/dist/my-egg-file.egg')
  sys.path.append(os.path.join(os.getcwd(), '<repo-name>/dist/my-egg-file.egg'))
Alternately, you could just create that egg locally, upload it to a web server, then download and install it:
  import requests
  import sys
  import os
  r = requests.get('<url-to-my-egg-file>')
  with open('mylibrary.egg', 'wb') as f:
    f.write(r.content)
  sc.addPyFile('mylibrary.egg')
  sys.path.append(os.path.join(os.getcwd(), 'mylibrary.egg'))
You will want to do this '''before''' you load the library. If the library is already loaded, restart the kernel in the ipython notebook.
Confirmed users
955

edits