Confirmed users
513
edits
Klahnakoski (talk | contribs) |
Klahnakoski (talk | contribs) |
||
| Line 50: | Line 50: | ||
* PyPy does not work well with C libraries. The C libraries had to be removed in favor of pure Python versions of the same. This was not too hard, except when it came to JSON libraries | * PyPy does not work well with C libraries. The C libraries had to be removed in favor of pure Python versions of the same. This was not too hard, except when it came to JSON libraries | ||
* JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code. Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!) This made the PyPy version appear inferior despite the speed up in the ETL portion of the code. Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing, etc) has Ujson speeds. The fastest solution I found so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass it to the default PyPy JSON emitter[https://github.com/klahnakoski/pyLibrary/blob/61928e3c9b01b823d666bafcc68b90ab2e4199e3/tests/util/test_json_speed.py]. | * JSON generation is slow: The built-in JSON emitter used generators to convert data structures to a JSON string, but the PyPy optimizer is terrible at analyzing generator code. Furthermore, the JSON libraries available to CPython are incredibly fast (Ujson is by almost 2 orders of magnitude faster!) This made the PyPy version appear inferior despite the speed up in the ETL portion of the code. Part of the solution was to use PyPy's own JSON emitter, but also realize PyPy's default JSON emitter (no pretty printing, no sub-classing, etc) has Ujson speeds. The fastest solution I found so far, is to copy the data structure (with sets, Decimal, and other special types) to one with simple dicts, lists and floats and pass it to the default PyPy JSON emitter[https://github.com/klahnakoski/pyLibrary/blob/61928e3c9b01b823d666bafcc68b90ab2e4199e3/tests/util/test_json_speed.py]. | ||
* Python has old-school, unintuitive, routine names (strftime, mktime, randrange, etc) these take time to find, and confirm | * Python has old-school, unintuitive, routine names (strftime, mktime, randrange, etc) these take time to find, and time to confirm there isn't a better library that should be used instead. I opted to add a facade to most of them to re-envowel their names, and isolate myself from the risk of using the wrong lib (or have it behave in unexpected ways). | ||
* Python2.7 strings are confusing: str() can be either ASCII or UTF8 encoded, but without any typing to indicate which encoding is used. There are also unicode() strings, which look like strings until you try to compare them: <tt>"é" != u"é"</tt> | * Python2.7 strings are confusing: str() can be either ASCII or UTF8 encoded, but without any typing to indicate which encoding is used. There are also unicode() strings, which look like strings until you try to compare them: <tt>"é" != u"é"</tt> | ||
* Multithreading was necessary so we can handle multiple network requests at one time, while keeping the code easy to read. Python's threading library is still immature: It has no high level threading constructs to deal with common use cases in an environment that raises exceptions. | * Multithreading was necessary so we can handle multiple network requests at one time, while keeping the code easy to read. Python's threading library is still immature: It has no high level threading constructs to deal with common use cases in an environment that raises exceptions. | ||