Changes

Jump to: navigation, search

JaegerMonkey

3,016 bytes removed, 00:15, 5 May 2010
Major Optimizations
== Major Optimizations<br> ==
Here is a chart of the major optimizations that we think we need to do in order to make JM&nbsp;fast, defined as cutting 950ms from our current JM-only SunSpider score and 13500ms from our v8-v4 score. For each optimization, the table shows a guess as to how many ms it will cut off our JM-only SunSpider and v8-v4 scores, the size of the task in person-weeks, and a possible assignee. All numbers are just guesses except as noted in the comments. Keep in mind that benefits are not really additive, but for the purpose of this table, interaction benefits or penalties are shared out among the individual items. [Updated 4 May 2010]
Note that most of these items can be worked on independently. The main exception is that compiler fast paths need to be done after First, how far do we have decided what a jsval is. Thus, those two items form the critical path. Note that those items together to go? The rough numbers on our reference machine are 1/3-1/2 of the win we need, so we cannot be fast without them(arewefastyet. com):
It{| width="50%" cellspacing="1" cellpadding="1" border="0"|-| | align="right" | '''SunSpider (ms)'''| align="right" | '''v8-v4 (ms)'s not known that we will be fast after completing and tuning these items''|-| Current score| align="right" | 1150| align="right" | 9100|-| Target score| align="right" | 400| align="right" | 2300|-there may be other issues, e.g., with object allocation and GC, that are not covered here. | Needed improvement| align="right" | 750| align="right" | 6800|}
<br> Now, how do we get there?
{| width="8095%" cellspacing="1" cellpadding="1" border="0"
|-
! scope="col" | Name<br> '''Optimization'''! colspan="2" scope| align="colright" | Est. SS Benefit '''SunSpider improvement (ms)<br> '''! scope| align="colright" | Est. V8 Benefit<br> '''v8-v4 improvement (ms)'''! scope| align="colright" | '''Size (wks)<br> '''! scope="col" | '''Candidate Assignee<br>'''
|-
| PIC<br> Globals| align="right" colspan="2" | 100<br> | align="right" | 3500<br> 1000| align="centerright" | 1 2| dmandelin<br>dvander
|-
| Compiler value handling<br> Regexes| align="right" colspan="2" | 100<br> 25| align="right" | 1000<br> 300| align="centerright" | 1<br> 2-8| dvander<br>cdleary,intern
|-
| Globals<br> Strings| align="right" colspan="2" | 100<br> ???| align="right" | 500<br> ???| align="centerright" | 4<br> | dvander<br>
|-
| Scope chain<br> Dates| align="right" colspan="2" | 0<br> ???| align="right" | 500<br> ???| align="centerright" | 4<br> | intern<br>
|-
| Regexps<br> Math| align="right" colspan="2" | 25<br> ???| align="right" | 300<br> ???| align="centerright" | 6 | cdleary<br>
|-
| New jsvals<br> | align="right" colspan="2" | *300<br> | align="right" | 3500<br> *2500| align="centerright" | 8<br> | lw (+others)<br>
|-
| Compiler fast paths<br> | align="right" colspan="2" | 350<br> *325| align="right" | 3500<br> *3000| align="centerright" | 4<br> 8| all<br>dvander+others
|}
<br> * means the improvement value is just a guess. Values without stars are based on some kind of measurement.
Item descriptions Description and commentarycomments for each item:
*PIC[https://bugzilla. Polymorphic inline caching for property accessesmozilla.org/show_bug.cgi?id=561218 Globals]. This is basically done--all that remains are to sort out some minor correctness issues means optimizing global variable access with fast paths. dvander has already started this and get it running on ARM. Performance is pretty much done, has wins of 50/500 ms (SS/v8) so the estimated perf benefits are real measured perf benefits in this casefar. *Compiler value handlingRegexes. Currently, each jsop is compiled to work exactly the way it does in the interpreter: load values from the stack or local slots; do stuff; then store values back to the stack. We There are improving the compiler so that it can hold values in machine registers across jsops. It will also avoid moves and stores entirely when possible. "Register allocation" would be a reasonable name for this optimization, but doesnsome regexes we don't capture all of it. We found a 10-20% perf improvement compile in early tests, which is reflected in the table estimatesSunSpider and v8. This There is partially done--the 1 week size is to finish. *Globals. Globals work as they do only one in the interpreter right now. It should SunSpider and it would not be possible hard to get them down in most cases extend our current compiler to 1-2 loads per globalhandle that case. v8 has more, plus a shape guard if the global was undeclared. Some initial thoughts on how to do this were posted in the js newsgroup. *Scope chain. AKA "closure variable access". This is similar to globals. We and we don't know how much of this is present in the benchmarks, but it's clearly important for general web perf. It seems to require a major overhaul of the scope chainwhich ones count yet and what features are needed.<br>*Regexps. The estimates should regex project could be very accurate--they are based on measurements completed either by cdleary of how much time we spend running uncompiled regexps (30 ms and 350 ms, with an assumed 6x speedup from compilation). This item means getting a new regexp improving our regex compiler, or upgrading our current one, so we can compile them alljust taking yarr from JSC. *New jsvalsThe latter should be preferred if possible. We are going to a new jsval format. Currently, we are working The licensing on a 128-bit formatyarr is OK, with a 64-bit value payload (but it uses vector and unicode classes that can hold any int, double, or pointer without masking or compression on any 32- or 64-bit platform) are GPL&nbsp;and 64 bits for alignment and type tags. <br>We know that we would need a new format to be fastreplaced.*Strings, Dates, but there is some risk about exactly what formatMath. The pluses for the 128-bit idea These are that it performed well in a pilot study and that extracting the unboxed value is simply taking the right bits, with no masking, shiftingkey runtime functions. They may be very fast already, or arithmeticwe might have some functions that are slower than they could be. A minor risk is that it will increase memory usage too much, but We are doing measurements there suggest we will be OKnow.*[https://bugzilla.mozilla.org/show_bug. A bigger risk is that it will require more register pressure, or more memory traffic when copying values, decreasing performancecgi?id=549143 jsvals]. There This is no way to know which format is best without implementing it and testing it for realthe 128-bit jsvals. <br>The specific benefits of a new format are (a) doubles don't have This may not turn out to be a huge speedup all on the heapits own, making allocation much faster and reducing indirection, (b) integers can be 32 bits, allowing a larger range to be stored in an integer format and making boxing much cheaper following bit operations, and (c) potentially reducing but together with the number of operations compiler fast paths it takes to box and unbox, depending on the format. Benefits (a) and (b) can will be achieved with any reasonable alternate boxing format (32-bit NaN-boxing, 64-bit NaN-boxing, or "fat" values like our 128-bit values). Benefit (c) is maximized if the boxing format contains the unboxed value unmodified--only fat value formats like our current 128-bit values achieve thatvery important. *Compiler fast paths. It's clear from our past experience and measurements that staying on fast paths and avoiding stub calls is key to method This means making the JIT performanceinline all commonly run ops. Good fast paths for the most common There are probably about 50 or so ops should cover 99% of ops runthem. This is partially blocked on the new jsvals, because the code generation depends somewhat on the jsval format. It doesn't make sense to But we could start this these before finishing the new jsvals are done. The good thing about this one is that it parallelizes very well, and patch things up as needed.
== Ongoing Work ==
313
edits

Navigation menu