Javascript:SpiderMonkey:OdinMonkey: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 5: Line 5:
== Status ==
== Status ==


The project is currently in a '''research''' phase with the major evaluation criteria being to get several large-scale Emscripten apps ([https://developer.mozilla.org/en-US/demos/detail/bananabread BananaBread], zlib, and Bullet) running and showing the target performanceAt this point, we can make the informed decision to transition to an engineering project.
It looks like we're getting the expected performance, even without the optimizations in the below section: we're seeing anywhere from 2x to 6x improvements against trunk/v8 on large Emscripten apps.  It's to talk to other engines and work toward something land-able (tasks below).


The code is currently available as a [http://hg.mozilla.org/users/lwagner_mozilla.com/inbound-patches patch queue] based on [https://hg.mozilla.org/integration/mozilla-inbound mozilla inbound].  Unit tests (which can serve as example code) are in <code>js/src/jit-tests/tests/asm.js/</code>.
The code is currently in https://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey and works on x64-unix; other platforms are in progress.
 
The entire asm.js language has been implemented.  Most of the (smaller) Emscripten benchmarks are running and coming in under 2x.  The three big benchmarks are in: zlib is right at 2x, Bullet is more like 2.3x (we think it needs float32), BananaBread doesn't have a comparison w/ native yet but we are seeing a 4x speedup over general IonMonkey/V8.
 
Remaining experimental work:
* Get BananaBread working in the browser (with Alon's sweet deterministic harness).


== Tasks before initial landing ==
== Tasks before initial landing ==

Revision as of 00:54, 1 February 2013

Goal

Provide an optimized implementation of the (co-evolvoing) asm.js spec which achieves near-native performance (within 2x of -O2) on JS generated from C/C++ (with the pilot code generator being Emscripten).

Status

It looks like we're getting the expected performance, even without the optimizations in the below section: we're seeing anywhere from 2x to 6x improvements against trunk/v8 on large Emscripten apps. It's to talk to other engines and work toward something land-able (tasks below).

The code is currently in https://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey and works on x64-unix; other platforms are in progress.

Tasks before initial landing

  • Tier 1 platform support (remaining):
    • x64 (Windows)
    • x86 (Windows, Unix)
    • ARM
  • Safety
    • Add slow-script and stack-overflow checks (to be removed later).
    • Fix that pesky x64 call-out-of-range TODO by using a single linear allocation.
    • Don't crash if someone neuters the ArrayBuffer
  • Quality
    • Fuzz (with decoders mutation-based fuzzer based on existing asm.js programs)
  • Engine integration:
    • Error.stack should include asm.js frames
    • Get initial SPS profiler support (summarizing all asm.js code as a single frame)
  • Audit Odin code for OOM-safety.
  • Final polish on error messages
  • Unbreak IonSpew

Further work (after initial landing)

Optimizations, roughly in priority order:

  1. Optimize asm.js-to-Ion transition with custom-generated exit stub
  2. Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
  3. Add DataView to asm.js to allow us to avoid masking on loads/stores
    • Measure how much it'll win us (so far, 20% on zlib)
    • Signal-handler support
    • Investigate this 32-bit/64-bit register slicing issue
  4. Avoid slow-script and stack overflow checks with signal handler support
  5. Add a float32 type to asm.js to allow 32-bit float arithmetic
  6. Optimize idiv (using signal handler for FP exception)
  7. Optimize double-to-int conversion (using signal handler for FP exception)
  8. Full GVN/range analysis support for all the new asm.js IM MIR nodes
  9. ToInt shouldn't generate so much code
  10. Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).

Other work items:

  • FunctionBlob (TODO: link to proposal)
    • Efficient transfer between workers
    • Efficient IndexedDB serialization/deserialization
  • Get per-function SPS profile information.
  • Create automatic instrumentation so we can compare, at the basic-block level, how many instructions are executed in both GCC/LLVM-compiled C++ and Odin-compiled asm.js. This should point us directly to our worst codegen pain points.

General IonMonkey optimizations

  • Optimize control flow to minimize jumping (we tend to a lot worse than GCC here since we don't even try to optimize this)
    • Rearrange loops to put the condition at the end
    • Fold jump-to-jump into single jump
    • Reorder blocks to replace jumps with fall-through.
  • Use 32-bit register encoding on x64 for when the MIRType is int32
  • Use pc-relative constant double loads instead of trying to use immediates (GCC does, need to measure perf)