Javascript:SpiderMonkey:OdinMonkey: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
Line 17: Line 17:
# Optimize asm.js-to-Ion transition with custom-generated exit stub
# Optimize asm.js-to-Ion transition with custom-generated exit stub
# Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
# Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
# Add DataView to asm.js to allow us to avoid masking on loads/stores
# Avoid stack overflow checks with signal handler support
#* Measure how much it'll win us (so far, 20% on zlib)
# Add float32 and uint64 [http://wiki.ecmascript.org/doku.php?id=strawman:value_objects see value objects proposal]
#* Signal-handler support
#* Investigate this 32-bit/64-bit register slicing issue
# Avoid slow-script and stack overflow checks with signal handler support
# Add a float32 type to asm.js to allow 32-bit float arithmetic (surprisingly, float32 math seems to be about 50% faster than float64).
# Optimize idiv (using signal handler for FP exception)
# Optimize idiv (using signal handler for FP exception)
# Optimize double-to-int conversion (using signal handler for FP exception)
# Optimize double-to-int conversion (using signal handler for FP exception)
Line 29: Line 25:
# Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).
# Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).
# ARM: use hardfp internally
# ARM: use hardfp internally
# Consider re-enabling effective-address folding on x64.


Other work items:
Other work items:

Revision as of 03:37, 26 February 2013

Goal

Provide an optimized implementation of the (co-evolvoing) asm.js spec which achieves near-native performance (within 2x of -O2) on JS generated from C/C++ (with the pilot code generator being Emscripten).

Tasks before initial landing

The code is currently on https://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey.

  • See bug 840282.
  • Also:
    • Unbreak IonSpew
    • Final polish on error messages (name the types/numbers involved in failure)

Further work (after initial landing)

Optimizations, roughly in priority order:

  1. Optimize asm.js-to-Ion transition with custom-generated exit stub
  2. Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
  3. Avoid stack overflow checks with signal handler support
  4. Add float32 and uint64 see value objects proposal
  5. Optimize idiv (using signal handler for FP exception)
  6. Optimize double-to-int conversion (using signal handler for FP exception)
  7. Full GVN/range analysis support for all the new asm.js IM MIR nodes
  8. ToInt shouldn't generate so much code
  9. Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).
  10. ARM: use hardfp internally
  11. Consider re-enabling effective-address folding on x64.

Other work items:

  • FunctionBlob (TODO: link to proposal)
    • Efficient transfer between workers
    • Efficient IndexedDB serialization/deserialization
  • Add a mapping from return pc -> function information
    • Use to provide profile information to SPS without dynamic instrumentation.
    • Make asm.js calls show up in backtraces (Debugger, StackIter)
  • Create automatic instrumentation so we can compare, at the basic-block level, how many instructions are executed in both GCC/LLVM-compiled C++ and Odin-compiled asm.js. This should point us directly to our worst codegen pain points.

General IonMonkey optimizations

  • Optimize control flow to minimize jumping (we tend to a lot worse than GCC here since we don't even try to optimize this)
    • Rearrange loops to put the condition at the end
    • Fold jump-to-jump into single jump
    • Reorder blocks to replace jumps with fall-through.
  • Align loop headers on natural boundaries (I can see GCC doing this, it's also a well-known suggestion)
  • Align ExecutableAllocator allocations to a 16-byte boundary.
  • Use 32-bit register encoding on x64 for when the MIRType is int32
  • Use pc-relative constant double loads instead of trying to use immediates (GCC does, need to measure perf)
  • Soup up EffectiveAddressAnalysis to handle (see TODO)