Auto-tools/Projects/PublicES: Difference between revisions

Jump to navigation Jump to search
Line 41: Line 41:
This project took more effort than expected.  Here are some of the complications that slowed down development.  Please keep in mind I only had a couple months of Python before doing this conversion, feel free to take pleasure at my ignorance:
This project took more effort than expected.  Here are some of the complications that slowed down development.  Please keep in mind I only had a couple months of Python before doing this conversion, feel free to take pleasure at my ignorance:


* Python and Javascript property access is different enough to cause a multitude of bugs when just performing naive conversion:  For example <nowiki>if (a.b){ ... }</nowiki>
* Python and Javascript property access is different enough to cause a multitude of bugs when just performing naive conversion:  For example, converting Javascript <tt>if (!a.b){ ... }</tt> to Python <tt>if not a["b"]:  ....</tt> can emit key exceptions and simply take the wrong path when dealing with empty sets.
* Python is slow.  Python speed comes from the C libraries it uses, spending time in the Python interpreter is a bad idea.  For example, going through the characters in all strings to check for invalid Unicode turned a slow program into an unusable one.  The solution was to find a builtin library that did the work for me (or would raise an exception if the conditions were false).  This ETL program has significant data structure transformations that can only be done in Python.  The solution is to move to use the PyPy interpreter.
* PyPy does not work well with C libraries.  The C libaries had to be removed in favour of pure Python versions of the same.  This was not too hard, except when it came to JSON libraries
* JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code.  Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!)  This made the PyPy version appear inferior despite the speed up in the ETL portion of the code.  Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing to deal deal with special types) has Ujson speeds.  The fastest solution, so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass to the default PyPy JSON emitter.
* Python has old and has non-intuitive routine names (strftime, mktime, randrange, etc) these take time to find, and confirm if there isn't a later library that should be used instead.  I opted to add a facade to all of them to re-envowel their names, and isolate myself from the risk of using the wrong lib (or have it behave in unexpected ways).
* Python2.7 strings are confusing: str() can be either ASCII or UTF8 encoded, but without any typing to indicate which encoding is used.  There are also unicode() strings, which look like strings until you try to compare them: <tt>"é"!=u"é"<br/>
Confirmed users
513

edits

Navigation menu