Socorro/Pig Python InstallGuide
From MozillaWiki
< Socorro
This is documentation that bsmedberg wrote primarily for rhelmer and xstevens. This is what I had to do to write a python UDF which imports the socorro signature generation mechanism.
Buyer beware.
Trying to use python UDFs with pig is... painful. Here are the known issues:
- pig 0.9.x does not support shipping recursively imported modules. So if you have a single pig script which only imports standard modules, pig 0.9.x will probably work fine, but if you are using any other modules then it will probably fail. This is https://issues.apache.org/jira/browse/PIG-1824
- The jython.jar shipped with pig 0.10.0 release is not the standalone version with the python libraries. You can replace it in-place with a jython-2.5.2 jar and that will improve things: https://issues.apache.org/jira/browse/PIG-2665
- pig 0.10.0 also broke simply-imported modules in hadoop mode (what we care about). This is https://issues.apache.org/jira/browse/PIG-2761 which is fixed on the 0.9.x and 0.10.x branches (no release available yet)
- jython 2.5.2 supports the python 2.5 standard library. This does not include the following modules that bsmedberg needs for socorro work: json (workaround pretty simple), and the collections.ABCs (MutableMapping etc) which are required for configman (no easy workaround). This is fixed on jython trunk.
- With a little hacking and very careful classpath/environment setup it's possible to "remove" jython from the pig build so that you can build pig against a custom version of jython. BUT the jython API changed for 2.6 so there is one code change that is required in pig.
With all of this, then, I think I have a working pig/jython setup:
- self-build jython trunk
- ant && ant jar-standalone
- self-build pig using this branch: https://github.com/bsmedberg/pig "jython-trunk-support"
- cp jython/dist/jython-standalone.jar pig/lib/jython-standalone.jar
You should now have a pig which does useful things with python UDFs. But the work isn't done: parts of configman don't run under jython because.... well just because. So you'll have to use this version of configman: https://github.com/bsmedberg/configman/tree/formappingexception
The UDF, .pig script, and postprocessor script are now at https://github.com/bsmedberg/socorro-toolbox/tree/improveskiplist in
src/main/pig/improveskiplist.pig src/main/python/pig_socorro.py (the UDF) src/main/python/checkimprovedskiplist.py (postprocessor)