From MozillaWiki
Jump to: navigation, search
Ambox outdated.png THIS PAGE IS OBSOLETE
This article is in parts, or in its entirety, outdated. Hence, the information presented on this page may be incorrect, and should be treated with due caution. Visit for more up to date information.

This is a list of specific aspects of the current TraceMonkey JIT that could be improved to reduce overall JIT complexity and allow better optimization. This list (in its current form) does not determine a single specific new design but does have high-level design implications.

  1. Do not interleave compilation and execution
    1. There is a significant challenge in keeping the VM state and what the compiler thinks the VM state is coherent.
    2. When compilation straddles execution of an op, this forces us to use the gross "pending X" pattern which makes logic non-local (e.g., bug 622318).
  2. Trace in terms of micro-ops instead of (current) fat-ops
    1. Macro-ops cause code duplication or complex factored-out helper functions
      1. (njn) Can you give an example? Nanojit's CSE pass removes a lot of code duplication.
        1. (luke) I mean in the compiler, not in the generated code.
    2. Conflict: the interp/JM compiler want fat ops for performance/optimization
  3. Remove the "deep bail" possibility
    1. Deep bailing requires the programmer to maintain subtle invariants in mind at all times, especially in the presence macro ops.
    2. (dvander) the problem isn't that deep bails can happen, but that they're a bad solution with a complex implementation. deep bails happen for two reasons:
      1. the engine has broken an invariant, and now we want to recompile, despecialize, and replace on-stack. type inference already has to do both this and maintaining the tricky sanity invariants that deep bailing did, like catching writes to properties.
      2. the engine wants to know something about the interpreter state. well, that's just a silly reason to deep bail :) we have to be able to reify state anyway.
  4. No nested trace trees: compile multiple loops
    1. The process of recording the call from an inner tree to outer is very delicate. Inner/outer tree reasoning is complex.
  5. Compile with knowledge of multiple iterations and selectively despecialize to avoid trace explosion
    1. Specifically, support despecializing:
      1. callee (e.g., for polymorphic raytrace);
      2. type (e.g., for untyped data shuffling);
      3. shape (want ICs); and
      4. control flow (want non-linear traces).
    2. It is hard to simulate with always-linear traces and knowledge of only what is happening right now.
    3. (dvander) despecializing types would seem to defeat the point of tracing. for data shuffling, we have to box values anyway. it might make more sense to just have a system that can hold onto boxes, and only guard on their type when the value is used by some type-dependent operation.
      1. The intention of the phrase "selectively despecialize" was that despecialization only occurs sometimes, so most slots/ops stay remain typed. "just have a system that can hold onto boxes" is exactly what is being called for (just not said so clearly :).

This list below is a more speculative collection of potential design decisions that address the above concerns:

  • Split recording into a profiling phase followed by a more traditional compilation phase. The compilation phase consumes a data structure built by the profiling phase.
    • Addresses 1 and 5
    • If we want to take advantage of multi-core, this allows a clean shared-nothing hand-off of profile data from the main thread to a compilation thread.
  • Keep the existing bytecode/IR; define a new micro-op IR and a decomposition function that maps a single fato-ops into a set of micro-ops.
    • Resolves conflict 2.2
    • Don't have to rewrite parser, decompiler, mjit, type inference, etc.
    • May be slow if profiling was done from an interpreter; perhaps let the mjit compiler or a special profiling compiler compile functions with profiling instrumentation.
      • (dmandelin) I don't understand the above point. I think the sentence wants to say "let the mjit compiler or a special profiling compiler compile functions with profiling instrumentation", but I'm not sure. Anyway, I'm not sure about that. We know that we can do 100-400 iters in an interpreter in the time it takes to compile, so if profiling requires fewer runs than that, we should just do it in the interpreter, or a special profiling interpreter.
        • (luke) Good point; this was written before the 100-400 figure which certainly seems to indicate that we can profile for a while.
    • Can define simple "definitional interpreter" for micro-ops that would (1) help new hackers (2) help isolate bugs in other execution modes.
    • (dmandelin) I think we should think about long-term goals too. Ideally, I would shoot for a "primary" bytecode that helps simplify the compilers as much as possible. I think that would be a thin-op bytecode with an unlimited number of virtual registers. (That may be too far, though--the interpreter would certainly require a register-assignment lowering pass on top of such a thing and we need to keep startup cheap.)
      • Now, we can't take the time to overhaul that right now, so for next year I'm with you.
      • But we should at least have a viable path there, and make what steps we can in that direction.
  • To avoid deep bails:
    • Have the tracer execute in-place instead of in TraceNativeStorage (bug 590871)
      • (dmandelin) I think this means "do it like the mjit does", i.e., hold stuff in registers and so on, but know how to save out to the canonical representation at safe points.
        • Not exactly; more like "do it like the tjit, just on the VM stack, using the VM layout instead of on some separate hunk o' memory (TraceNativeStorage) with a custom layout (VistFrameSlots)"
    • Assume every native called from jit code clobbers globals and visible upvars except those explicitly annotated not to do so in their JSNativeTraceInfo.
      • Thus, if VM reenters or VM state is modified during call from trace, no assumptions are broken.
      • Rely on (1) type inference results and (2) redundant-guard elimination to keep computational kernels fast and unfettered.