Javascript:SpiderMonkey:OdinMonkey: Difference between revisions

Revision as of 21:35, 27 February 2013

Goal

Provide an optimized implementation of the (co-evolvoing) asm.js spec which achieves near-native performance (within 2x of -O2) on JS generated from C/C++ (with the pilot code generator being Emscripten).

Tasks before initial landing

The code is currently on https://hg.mozilla.org/users/lwagner_mozilla.com/odinmonkey.

See bug 840282.
Also:
- Unbreak IonSpew
- Final polish on error messages (name the types/numbers involved in failure)

Further work (after initial landing)

Fit in better with the browser:

Per-function profile information
Make asm.js calls show up in backtraces (Debugger, StackIter)
Better about:memory reporting

Optimizations:

Optimize asm.js-to-Ion transition with custom-generated exit stub
Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
Avoid stack overflow checks with signal handler support
Optimize idiv (using signal handler for FP exception)
Optimize double-to-int conversion (using signal handler for FP exception)
Full GVN/range analysis support for all the new asm.js IM MIR nodes
ToInt shouldn't generate so much code
Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).
ARM: use hardfp internally
Consider re-enabling effective-address folding on x64.
Don't spill non-volatile registers at calls out to C++
Create automatic instrumentation so we can compare, at the basic-block level, how many instructions are executed in both GCC/LLVM-compiled C++ and Odin-compiled asm.js. This should point us directly to our worst codegen pain points.

Extensions to DOM/JavaScript that would help asm.js:

FunctionBlob (TODO: link to proposal)
- Efficient transfer between workers
- Efficient IndexedDB serialization/deserialization
- Browser-wide code caching via postMessage
Add float32 and uint64 see value objects proposal
Add SIMD support using BinaryData objects with value semantics
Add ArrayBuffer.swap to allow a single linked asm.js module to work with many hunks of data over time.
Add ArrayBuffer.resize to allow growable heap (sbrk).
To allow asm.js generation from pthread code, allow a single ArrayBuffer to be shared by two or more Workers and add necessary synchronization primitives. Because threads+locks are such a great paradigm for concurrency.

General IonMonkey optimizations

Optimize control flow to minimize jumping (we tend to a lot worse than GCC here since we don't even try to optimize this)
- Rearrange loops to put the condition at the end
- Fold jump-to-jump into single jump
- Reorder blocks to replace jumps with fall-through.
Align loop headers on natural boundaries (I can see GCC doing this, it's also a well-known suggestion)
Align ExecutableAllocator allocations to a 16-byte boundary.
Use 32-bit register encoding on x64 for when the MIRType is int32
Use pc-relative constant double loads instead of trying to use immediates (GCC does, need to measure perf)
Soup up EffectiveAddressAnalysis to handle (see TODO)

@@ Line 14: / Line 14: @@
 == Further work (after initial landing) ==
-Optimizations, roughly in priority order:
+Fit in better with the browser:
-# Optimize asm.js-to-Ion transition with custom-generated exit stub
+* Per-function profile information
-# Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
+* Make asm.js calls show up in backtraces (Debugger, StackIter)
-# Avoid stack overflow checks with signal handler support
+* Better about:memory reporting
-# Add float32 and uint64 [http://wiki.ecmascript.org/doku.php?id=strawman:value_objects see value objects proposal]
-# Optimize idiv (using signal handler for FP exception)
-# Optimize double-to-int conversion (using signal handler for FP exception)
-# Full GVN/range analysis support for all the new asm.js IM MIR nodes
-# ToInt shouldn't generate so much code
-# Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).
-# ARM: use hardfp internally
-# Consider re-enabling effective-address folding on x64.
-# Don't spill non-volatile registers at calls out to C++
-Other work items:
+Optimizations:
+* Optimize asm.js-to-Ion transition with custom-generated exit stub
+* Investigate why we spend so much time (15%) in ion::ThunkToInterpreter and ion::Bail under FFI calls; these should be trivial functions.
+* Avoid stack overflow checks with signal handler support
+* Optimize idiv (using signal handler for FP exception)
+* Optimize double-to-int conversion (using signal handler for FP exception)
+* Full GVN/range analysis support for all the new asm.js IM MIR nodes
+* ToInt shouldn't generate so much code
+* Ensure we're emitting the smallest mod/rm encoding for lea/loads/stores on x86 (viz., if displacement is 0).
+* ARM: use hardfp internally
+* Consider re-enabling effective-address folding on x64.
+* Don't spill non-volatile registers at calls out to C++
+* Create automatic instrumentation so we can compare, at the basic-block level, how many instructions are executed in both GCC/LLVM-compiled C++ and Odin-compiled asm.js. This should point us directly to our worst codegen pain points.
+Extensions to DOM/JavaScript that would help asm.js:
 * FunctionBlob (TODO: link to proposal)
 ** Efficient transfer between workers
 ** Efficient IndexedDB serialization/deserialization
-* Add a mapping from return pc -> function information
+** Browser-wide code caching via postMessage
-** Use to provide profile information to SPS without dynamic instrumentation.
+* Add float32 and uint64 [http://wiki.ecmascript.org/doku.php?id=strawman:value_objects see value objects proposal]
-** Make asm.js calls show up in backtraces (Debugger, StackIter)
+* Add SIMD support using [http://wiki.ecmascript.org/doku.php?id=harmony:binary_data BinaryData] objects with value semantics
-* Create automatic instrumentation so we can compare, at the basic-block level, how many instructions are executed in both GCC/LLVM-compiled C++ and Odin-compiled asm.js. This should point us directly to our worst codegen pain points.
+* Add ArrayBuffer.swap to allow a single linked asm.js module to work with many hunks of data over time.
+* Add ArrayBuffer.resize to allow growable heap (sbrk).
+* To allow asm.js generation from pthread code, allow a single ArrayBuffer to be shared by two or more Workers and add necessary synchronization primitives.  Because threads+locks are such a great paradigm for concurrency.
 == General IonMonkey optimizations ==

Javascript:SpiderMonkey:OdinMonkey: Difference between revisions

Revision as of 21:35, 27 February 2013

Contents

Goal

Tasks before initial landing

Further work (after initial landing)

General IonMonkey optimizations

Navigation menu

Javascript:SpiderMonkey:OdinMonkey: Difference between revisions

Revision as of 21:35, 27 February 2013

Goal

Tasks before initial landing

Further work (after initial landing)

General IonMonkey optimizations

Navigation menu

Search