Changes

Jump to: navigation, search

Telemetry/Custom analysis with spark

193 bytes removed, 15:41, 29 March 2017
m
How do I load an external library into the cluster?: remove redundant steps
Assuming you've got a url for the repo, you can create an egg for it this way:
import sys
import os
!git clone <repo url> && cd <repo-name> && python setup.py bdist_egg
sc.addPyFile('<repo-name>/dist/my-egg-file.egg')
sys.path.append(os.path.join(os.getcwd(), '<repo-name>/dist/my-egg-file.egg'))
Alternately, you could just create that egg locally, upload it to a web server, then download and install it:
import requests
import sys
import os
r = requests.get('<url-to-my-egg-file>')
with open('mylibrary.egg', 'wb') as f:
f.write(r.content)
sc.addPyFile('mylibrary.egg')
sys.path.append(os.path.join(os.getcwd(), 'mylibrary.egg'))
You will want to do this '''before''' you load the library. If the library is already loaded, restart the kernel in the ipython notebook.
Confirm
955
edits

Navigation menu