Socorro/Pig Python InstallGuide

From MozillaWiki
Jump to: navigation, search

This is documentation that bsmedberg wrote primarily for rhelmer and xstevens. This is what I had to do to write a python UDF which imports the socorro signature generation mechanism.

Buyer beware.

Trying to use python UDFs with pig is... painful. Here are the known issues:

  • pig 0.9.x does not support shipping recursively imported modules. So if you have a single pig script which only imports standard modules, pig 0.9.x will probably work fine, but if you are using any other modules then it will probably fail. This is https://issues.apache.org/jira/browse/PIG-1824
  • The jython.jar shipped with pig 0.10.0 release is not the standalone version with the python libraries. You can replace it in-place with a jython-2.5.2 jar and that will improve things: https://issues.apache.org/jira/browse/PIG-2665
  • jython 2.5.2 supports the python 2.5 standard library. This does not include the following modules that bsmedberg needs for socorro work: json (workaround pretty simple), and the collections.ABCs (MutableMapping etc) which are required for configman (no easy workaround). This is fixed on jython trunk.
  • With a little hacking and very careful classpath/environment setup it's possible to "remove" jython from the pig build so that you can build pig against a custom version of jython. BUT the jython API changed for 2.6 so there is one code change that is required in pig.

With all of this, then, I think I have a working pig/jython setup:

  • self-build jython trunk
    • ant && ant jar-standalone
  • self-build pig using this branch: https://github.com/bsmedberg/pig "jython-trunk-support"
  • cp jython/dist/jython-standalone.jar pig/lib/jython-standalone.jar

You should now have a pig which does useful things with python UDFs. But the work isn't done: parts of configman don't run under jython because.... well just because. So you'll have to use this version of configman: https://github.com/bsmedberg/configman/tree/formappingexception

The UDF, .pig script, and postprocessor script are now at https://github.com/bsmedberg/socorro-toolbox/tree/improveskiplist in

 src/main/pig/improveskiplist.pig
 src/main/python/pig_socorro.py (the UDF)
 src/main/python/checkimprovedskiplist.py (postprocessor)