Kraken Info

From MozillaWiki
Jump to: navigation, search

This info is for Kraken 1.0.


A toy benchmark. A path-finding program that uses A* search.

Key features. The benchmark spends almost all of its time in the loop in the following function.

Array.prototype.findGraphNode = function(obj) {
    for(var i=0;i<this.length;i++) {
        if(this[i].pos == obj.pos) { return this[i]; }
    return false;

Mozilla-specific things.

Removing unnecessary function guards gains ~1.15x (Bug 606892).

LICM also helps by another ~1.2x (Bug 545406).


This is identical to audio-fft, except it has an insignificant amount of extra stuff at the end. Bug 603837 is open for this.


Description. A kernel. Computes a Discrete Fourier Transform.

Key features. Most of run-time is in this loop:

for ( var n = 0; n < buffer.length; n++ ) {
   real += this.cosTable[k*n] * buffer[n];
   imag += this.sinTable[k*n] * buffer[n];

buffer.length is 2048. Property and array gets dominate.

Mozilla-specific things. Execution is dominated by a single trace fragment. LICM will probably help a lot, esp. due to hoisting 'this.cosTable' and 'this.sinTable'.


Description. A kernel. Computes a Fast Fourier Transform.

Key features. A big chunk of run-time is in this loop:

 while ( i < bufferSize ) {
   off = i + halfSize;
   tr = (currentPhaseShiftReal * real[off]) - (currentPhaseShiftImag * imag[off]);
   ti = (currentPhaseShiftReal * imag[off]) + (currentPhaseShiftImag * real[off]);
   real[off] = real[i] - tr;
   imag[off] = imag[i] - ti;
   real[i] += tr;
   imag[i] += ti;
   i += halfSize << 1;

Array gets and sets dominate.

Mozilla-specific things. In the above loop, 'i' and 'off' are loaded as doubles and have to be converted to integers using a combination of inline code and calls to js_DoubleToInt32(). This should be avoided (bug 606879).



Description. A kernel. The image (and representation thereof) from imaging-desaturate.js is reused. The program transforms the contents of the array.

Key features. The benchmark features a quadruply-nested loop. The inner loop is dominated by array gets and some highly repetitive array index computations, including numerous calls to Math.abs(). Array sets aren't very important as they only occur in the 2nd-most outer loop.

Similar to imaging-desaturate, this benchmark starts with the main array holding integers, but they quickly become doubles.

Mozilla-specific things. Nanojit's CSE pass commons up the repetitive index computations beautifully.

Using an integer-only version of Math.abs() is easy and makes things much faster (bug 606441).

Again, similar to imaging-desaturate, because the array-of-ints becomes an array-of-doubles, the tracer generates code assuming it's always an array-of-ints, and so the fragment for the inner loop frequently (about 50% of the time) side-exits to a secondary fragment.

All the integer overflow checks hurt. Removing them (unsafe in general, ok for this benchmark) makes it 1.27x faster.

The first three capacity checks on squidImageData[] are redundant w.r.t. the fourth. Removing those would help a little.



Description. A kernel. A 400 x 267 image is represented as an array where each pixel gets four consecutive elements, representing its R/G/B/Alpha values. All values are in the range 0..255. The program transforms the contents of the array.

Key features. Almost all the benchmark's time is spent in this loop:

while (p--)
    data[pix-=4] = data[pix1=pix+1] = data[pix2=pix+2] = (data[pix]*0.3 + data[pix1]*0.59 + data[pix2]*0.11);

It iterates through each pixel, setting the R/G/B values all to the the same value, which is a function of the prior R/G/B values.

This desaturation loop is repeated 200 times. The first time the desaturation loop runs, the (R, G, B) values are converted from integers to doubles. The remaining 199 times, the values are doubles to begin with, and so the fragment always side-exits to a secondary fragment.

The program is kind of stupid for (a) not truncating the desaturated R/G/B values and (b) desaturating the image 200 times in a row. Indeed, the benchmarks admits as much for (b):

//XXX improve dataset rather than loop
for (var pixcounter = 0; pixcounter < 200; pixcounter++)

Bug 603841 is open to change this behaviour.

Mozilla-specific things. The trace code generated assumes the array elements are integers, because that's what is seen at record-time, but in the end 99.5% of the time it's not, so we end up taking a particular side-exit every time to a second trace fragment. This really hurts, because (a) the GETELEM we abort on has to be repeated, and (b) the code in the secondary fragment is worse because CSE does less because it has less context to work with. Some experiments show this costs us something like 10--20%.

If bug 603841 isn't fixed, it would be good if we could somehow (using the oracle?) detect that the side-exit is hot and recompile the fragment. Or, maybe type inference could identify that it's really an array-of-doubles and then we'd only side-exit 0.5% of the time.

After that's fixed, better alias analysis of the slot stores in the SETELEMs would help CSE a lot. More specifically, if you have GETELEM followed by SETELEM and they both access the same element, the SETELEM can be optimized quite a bit. But here we have GETELEM(a), GETELEM(b), GETELEM(c), SETELEM(c), SETELEM(b), SETELEM(a) and the intervening SETELEMs prevent the latter two SETELEMs from being optimized in this way. But this would require a big extension of the current simple alias analysis, which would be difficult.



A microbenchmark. Calls JSON.stringify() 1000 times on an object that takes up over 450,000 chars to express in a file.

Key features. The result of the call to JSON.stringify() is never used, so this benchmark is susceptible to gaming -- an implementation could detect that the result isn't used and call a faster version of JSON.stringify() that doesn't build the result (ie. an empty function). Bug 603838 is open for this.