Auto-tools/Projects/PublicES: Difference between revisions

Jump to navigation Jump to search
Line 47: Line 47:
* JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code.  Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!)  This made the PyPy version appear inferior despite the speed up in the ETL portion of the code.  Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing, etc) has Ujson speeds.  The fastest solution I found so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass to the default PyPy JSON emitter[https://github.com/klahnakoski/pyLibrary/blob/61928e3c9b01b823d666bafcc68b90ab2e4199e3/tests/util/test_json_speed.py].
* JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code.  Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!)  This made the PyPy version appear inferior despite the speed up in the ETL portion of the code.  Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing, etc) has Ujson speeds.  The fastest solution I found so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass to the default PyPy JSON emitter[https://github.com/klahnakoski/pyLibrary/blob/61928e3c9b01b823d666bafcc68b90ab2e4199e3/tests/util/test_json_speed.py].
* Python has old and has non-intuitive routine names (strftime, mktime, randrange, etc) these take time to find, and confirm if there isn't a later library that should be used instead.  I opted to add a facade to all of them to re-envowel their names, and isolate myself from the risk of using the wrong lib (or have it behave in unexpected ways).
* Python has old and has non-intuitive routine names (strftime, mktime, randrange, etc) these take time to find, and confirm if there isn't a later library that should be used instead.  I opted to add a facade to all of them to re-envowel their names, and isolate myself from the risk of using the wrong lib (or have it behave in unexpected ways).
* Python2.7 strings are confusing: str() can be either ASCII or UTF8 encoded, but without any typing to indicate which encoding is used.  There are also unicode() strings, which look like strings until you try to compare them: <tt>"é" != u"é"<br/>
* Python2.7 strings are confusing: str() can be either ASCII or UTF8 encoded, but without any typing to indicate which encoding is used.  There are also unicode() strings, which look like strings until you try to compare them: <tt>"é" != u"é"</tt>
* Multithreading was necessary so we can handle multiple network requests at one time, while keeping the code easy to read.  Python's threading library is still immature in that it has no higher level threading constructs to deal with common use cases in an environment that raises exceptions.
* Multithreading was necessary so we can handle multiple network requests at one time, while keeping the code easy to read.  Python's threading library is still immature in that it has no higher level threading constructs to deal with common use cases in an environment that raises exceptions.
* Python2.7 has no exception chaining - added it
* Python2.7 has no exception chaining - added it
*
Confirmed users
513

edits

Navigation menu