Changes

JaegerMonkey

73 bytes removed, 10:14, 26 January 2010

m

fixed typo, removed unnecessary br tags

This is the coder's badge of glory, ~~ ~~That he protect and tend his monkey, ~~ ~~Code with honor, as is due, ~~ ~~And through the bits to God is true.~~ ~~

--damons, IRC~~ ~~

</blockquote>

JaegerMonkey (or JägerMonkey) is '''inline threading''' for SpiderMonkey. The goal is to get reliable baseline performance on the order or of other JS JIT systems. "Inline threading" really just means a baseline whole-method JIT that doesn't necessarily do many traditional compiler optimizations. Instead, it does dynamic-language-JIT-oriented optimizations like PICs and specialization of constant operands.

The rest of this wiki page presents our initial development plan.

1. Do everything per-thread, just like TM does with traces.

2. For the native code generator, take Nitro's cross-platform~~ ~~ "assembly blatter".

3. To make trace transitions fast, change the interpreter and trace~~ ~~ stack layouts so they closely match.

Discussion on point 2 (code gen):

We considered and rejected these alternatives for the code generator:

2x1. Generate code blocks ahead of time and memcpy blocks together to create native code. I tried this at the beginning of my first prototype, and it didn't work very well. One problem is that relative jump displacements need patching, so this isn't as simple as it first seems. Also, in order to get good perf, you need to~~ ~~ bake in constants and do other specialization, which requires increasingly complicated patching.

Adobe is doing an interesting research variant on this idea, where they compile the interpreter C code to LIR, compile that, and then memcpy (and presumably patch) those chunks. But this sounds~~ ~~ too complicated and risky for us.

2x2. Generate LIR and compile with nanojit. Sully did this. The main problem is that there is not enough control over the results to get the best code. In particular, there are tricks for calling "stub functions" (functions that implement JS ops that are not inlined) very efficiently that nanojit doesn't currently support. We think there will be other tricks with manual register allocation and such that are also not currently supported. We don't want to~~ ~~ gate this work on nanojit development or junk nanojit up with features that will be non-useful for it's current applications. Also, the compilation time is much longer for LIR than for using an assembler.

2x3. Roll our own assembler. This just sounds like extra unnecessary work if we can just use Nitro's.

More detail on point 3 (stack layouts):

Ideally, the interpreter stack layout would be identical to the on-trace stack layout, so that no importation or conversions are necessary. Of course, the interpreter requires type tagging but tracing must not ~~have type~~ havetype tagging, so we have to compromise a little bit.

Luke's current idea is to have the interpreter use two chunks of stack memory. One will have unboxed values. The other will have type tags, and any other metadata the tracer doesn't care about. ~~Allocating stack~~ Allocatingstack slots or frames will be just two pointer bumps and a bounds check. In inline-threaded code, 2 registers can be reserved to point to a known position (e.g., start of active frame), so that stack accesses are just a machine load or two (for the tag). Values will be boxed in the current SM style when they are stored to object slots.

The layout of the unboxed stack will be the same in the interpreter or on trace. To get this, we mostly have to delete or move out of band the extra fields in JSStackFrame. We will need to reorder a bit too. ~~Once we~~ Oncewe have that, to enter trace, we do no work, and to leave trace, we just memcpy typemaps into the interpreter type tags stack.

= Planned First Steps =

The first two chunks of work are to get the stack frame layouts to match, and to import the Nitro assembler. We should be able to do these in parallel, but they block most further work.

Luke is already starting the stack frame layout work. We hope to be able to mostly complete that in a week-long "sprint" in early/mid January. By "sprint", I mean focusing as much as possible on that task for the week,~~ ~~and closely collaborating with each other.

After that, the next step is to get up a basic call-threaded system that doesn't necessarily inline much or optimize anything. The main pieces here are to figure out how to track and manage the compiled code, ~~and do~~ anddo the easy thing to get control flow and calling stub functions working.~~ ~~We hope to be able to do a lot of this in one or two further sprints.

At this point, we can start adding optimizations, and this should parallelize well.

= ~~ ~~Planned Optimizations =

#Fast calls to stub functions. This is based on a trick that Nitro uses. The idea is that stub functions logically have an array parameter or several parameters, which include input jsvals and also interpreter stuff like the sp, fp, cx, etc. Much of this is constant so the call can be made fast by setting up an area in the C stack with all the arguments filled in. To make a call, we just have to store the input jsvals and do a call instruction.

#Eliminate SP update. Inside basic blocks of JSOPs, we shouldn't need to keep a proper stack. Instead, we can teach the compiler to track which logical stack element is in which register and generate faster code.

#Fast closures. This is important for advanced web apps as well as Dromaeo and the V8 benchmarks. See [https://bugzilla.mozilla.org/show_bug.cgi?id=517164 bug 517164].

~~ ~~

Desertfox

14

edits

Changes

JaegerMonkey

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

How to Contribute

MozillaWiki

Around Mozilla

Tools