Javascript:SpiderMonkey:OdinMonkey

From MozillaWiki
Jump to navigation Jump to search

Goal

Provide an optimized implementation of the (co-evolvoing) asm.js spec which achieves near-native performance (within 2x of -O2) on JS generated from C/C++ (with the pilot code generator being Emscripten).

Status

It looks like we're getting the expected performance, even without the optimizations in the below section: we're seeing anywhere from 2x to 6x improvements against trunk/v8 on large Emscripten apps. It's to talk to other engines and work toward something land-able (tasks below).

The code is currently in https://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey and works on x64-unix; other platforms are in progress.

Tasks before initial landing

  • Tier 1 platform support (remaining):
    • x64 (Windows)
    • x86 (Windows, Unix)
    • ARM
  • Fuzz (with decoders mutation-based fuzzer based on existing asm.js programs)
  • Make sure SPS doesn't crash, summarize all asm.js calls as a single frame.
  • Don't crash if someone neuters the ArrayBuffer
  • Audit Odin code for OOM-safety.
  • Final polish on error messages
  • Unbreak IonSpew

Further work (after initial landing)

Optimizations, roughly in priority order:

  1. Optimize asm.js-to-Ion transition with custom-generated exit stub
  2. Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
  3. Add DataView to asm.js to allow us to avoid masking on loads/stores
    • Measure how much it'll win us (so far, 20% on zlib)
    • Signal-handler support
    • Investigate this 32-bit/64-bit register slicing issue
  4. Avoid slow-script and stack overflow checks with signal handler support
  5. Add a float32 type to asm.js to allow 32-bit float arithmetic
  6. Optimize idiv (using signal handler for FP exception)
  7. Optimize double-to-int conversion (using signal handler for FP exception)
  8. Full GVN/range analysis support for all the new asm.js IM MIR nodes
  9. ToInt shouldn't generate so much code
  10. Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).

Other work items:

  • FunctionBlob (TODO: link to proposal)
    • Efficient transfer between workers
    • Efficient IndexedDB serialization/deserialization
  • Add a mapping from return pc -> function information
    • Use to provide profile information to SPS without dynamic instrumentation.
    • Make asm.js calls show up in backtraces (Debugger, StackIter)
  • Create automatic instrumentation so we can compare, at the basic-block level, how many instructions are executed in both GCC/LLVM-compiled C++ and Odin-compiled asm.js. This should point us directly to our worst codegen pain points.

General IonMonkey optimizations

  • Optimize control flow to minimize jumping (we tend to a lot worse than GCC here since we don't even try to optimize this)
    • Rearrange loops to put the condition at the end
    • Fold jump-to-jump into single jump
    • Reorder blocks to replace jumps with fall-through.
  • Align loop headers on natural boundaries (I can see GCC doing this, it's also a well-known suggestion)
  • Align ExecutableAllocator allocations to a 16-byte boundary.
  • Use 32-bit register encoding on x64 for when the MIRType is int32
  • Use pc-relative constant double loads instead of trying to use immediates (GCC does, need to measure perf)