JaegerMonkey: Difference between revisions

JaegerMonkey (view source)

Revision as of 01:52, 20 April 2010

611 bytes removed , 20 April 2010

→‎Major Optimizations

Dmandelin

313

edits

@@ Line 29: / Line 29: @@
 |-
 ! scope="col" | Name<br>
-! scope="col" colspan="2" | Est. SS Benefit (ms)<br>
+! colspan="2" scope="col" | Est. SS Benefit (ms)<br>
 ! scope="col" | Est. V8 Benefit<br>
 ! scope="col" | Size (wks)<br>
@@ Line 35: / Line 35: @@
 |-
 | PIC<br>
-| align="right" colspan="2" | 50<br>
+| align="right" colspan="2" | 100<br>
 | align="right" | 3500<br>
 | align="center" | 1
@@ Line 41: / Line 41: @@
 |-
 | Compiler value handling<br>
-| align="right" colspan="2" | 200<br>
+| align="right" colspan="2" | 100<br>
-| align="right" | 2000<br>
+| align="right" | 1000<br>
 | align="center" | 1<br>
 | dvander<br>
@@ Line 57: / Line 57: @@
 | align="center" | 4<br>
 | intern<br>
-|-
-| Trace monitoring<br>
-| align="right" colspan="2" | 200<br>
-| align="right" | 2000<br>
-| align="center" | &lt;1<br>
-| dvander<br>
 |-
 | Regexps<br>
@@ Line 71: / Line 65: @@
 |-
 | New jsvals<br>
-| align="right" colspan="2" | 200<br>
+| align="right" colspan="2" | 300<br>
-| align="right" | 2000<br>
+| align="right" | 3500<br>
 | align="center" | 8<br>
 | lw (+others)<br>
 |-
 | Compiler fast paths<br>
-| align="right" colspan="2" | 200<br>
+| align="right" colspan="2" | 350<br>
-| align="right" | 2000<br>
+| align="right" | 3500<br>
 | align="center" | 4<br>
 | all<br>
@@ Line 90: / Line 84: @@
 *Compiler value handling. Currently, each jsop is compiled to work exactly the way it does in the interpreter: load values from the stack or local slots; do stuff; then store values back to the stack. We are improving the compiler so that it can hold values in machine registers across jsops. It will also avoid moves and stores entirely when possible. "Register allocation" would be a reasonable name for this optimization, but doesn't capture all of it. We found a 10-20% perf improvement in early tests, which is reflected in the table estimates. This is partially done--the 1 week size is to finish.
 *Globals. Globals work as they do in the interpreter right now. It should be possible to get them down in most cases to 1-2 loads per global, plus a shape guard if the global was undeclared. Some initial thoughts on how to do this were posted in the js newsgroup.
-*Scope chain. AKA "closure variable access". This is similar to globals. We don't know how much of this is present in the benchmarks, but it's clearly important for general web perf. It seems to require a major overhaul of the scope chain.
+*Scope chain. AKA "closure variable access". This is similar to globals. We don't know how much of this is present in the benchmarks, but it's clearly important for general web perf. It seems to require a major overhaul of the scope chain.<br>
-*Trace monitoring. Currently, we call out to the trace monitoring function on every loop edge. This optimization means that when we blacklist, we would patch the method so that it doesn't call out any more. We should also not call out when tracing is not enabled. We also might want to do loop edge counts without calling out at the beginning. The 200 ms benefit for SunSpider is based on the fact that our pure JM&nbsp;score went up 200 ms when tracing was combined with JM.
 *Regexps. I&nbsp;believe we don't compile all the regexps in v8. This item means getting a new regexp compiler, or upgrading our current one, so we can compile them all.
 *New jsvals. We are going to a new jsval format. Currently, we are working on a 128-bit format, with a 64-bit value payload (that can hold any int, double, or pointer without masking or compression on any 32- or 64-bit platform) and 64 bits for alignment and type tags. <br>We know that we need a new format to be fast, but there is some risk about exactly what format. The pluses for the 128-bit idea are that it performed well in a pilot study and that extracting the unboxed value is simply taking the right bits, with no masking, shifting, or arithmetic. A minor risk is that it will increase memory usage too much, but measurements there suggest we will be OK. A bigger risk is that it will require more register pressure, or more memory traffic when copying values, decreasing performance. There is no way to know which format is best without implementing it and testing it for real. <br>The specific benefits of a new format are (a) doubles don't have to be on the heap, making allocation much faster and reducing indirection, (b) integers can be 32 bits, allowing a larger range to be stored in an integer format and making boxing much cheaper following bit operations, and (c) potentially reducing the number of operations it takes to box and unbox, depending on the format. Benefits (a) and (b) can be achieved with any reasonable alternate boxing format (32-bit NaN-boxing, 64-bit NaN-boxing, or "fat" values like our 128-bit values). Benefit (c) is maximized if the boxing format contains the unboxed value unmodified--only fat value formats like our current 128-bit values achieve that.

JaegerMonkey: Difference between revisions

JaegerMonkey (view source)

Revision as of 01:52, 20 April 2010

Navigation menu

Search