Javascript Profiling

From MozillaWiki
Jump to: navigation, search

Note: the API described here has been set aside; it will not become part of the Mozilla platform. We're going to use the SPS profiler as our profiling back end.

Profiling API

The goal of the profiling API is to allow javascript access to profiling information and enable building tools on top of it to display the information in a readable and useful fashion. It should also be flexible and generalized to support different forms of profiling and even custom profiling techniques.

As an idea of what should be possible to build on top of this API, see Unity's profiler,

The Profiler object

There will be a global Profiler object available to instantiate to begin profiling the runtime in which the profiler was created in. The Profiler object has the following methods:

new Profiler()
The constructor takes no arguments
start([sample_rate, [max_samples]])
Begins profiling. All arguments are optional. The sample rate is the time between samples of the call stack specified in microseconds (defaults to 1000 = 1ms). The sample rate is mostly a guideline as the platform being run might not guarantee the granularity of timing. Samples will not happen more frequently than the specified rate, but may occur more infrequently. A maximum number of samples can be specified as well, but the default is unlimited (0). If the maximum number of samples is reached, then the sample storage is treated as a ring buffer, discarding the oldest samples. If some other Profiler object is profiling the runtime, then the other profiler is forcibly stopped. Eventually a call to stop() must be paired with this call to cease data collection.
isProfiling()
Returns whether profiling is active for this Profiler instance
frame()
Fetches an object to represent the current stack trace to be later returned via results(). This object can have any properties attached to it and will persist across different invocations of frame() so long as the same backtrace is present each time. All information specified here will later be available via results() with the full backtrace listed. By default this returns an empty object with no properties. If the profiler is not active or has been forcibly stopped, then an error is thrown.
results()
Returns all profile information gathered. The data returned is all that is collected between the last invocation of reset() and all the sample data.
samples()
Returns all profiling samples gathered. The data returned is all that is collected between the last invocation of reset() and now. If profiling is turned on, it is disabled for the duration of this function.
reset()
Resets all information that is stored. This includes samples and also frame() information.
stop()
Ceases profiling on the runtime. If this profiler is not active or has been forcibly stopped, an error is thrown.

The return of .results()

The return value is an object which is a hash containing all the information. The hash can loosely be viewed as a trie. Here's some example code:

// input.js
var p = new Profiler();                
function foo(n) {                      
    var sum = 0;                       
    for (var i = 0; i < n; i++) {      
        sum += i;                      
    }                                  
    var fac = 1;                       
    for (var i = 1; i < 2 * n; i++) {  
        fac *= i;                      
    }                                  
    return sum + fac;                  
}                                      
                                       
function bar(n) {                      
    p.frame().one = 1;                 
    var info = p.frame();              
    info.two = 2;                      
    info.n   = n;                      
}                                      
                                       
p.start();                   
foo(4000000);                          
bar(200);                              
p.stop();
                                       
print(JSON.stringify(p.results()))       

And the following hash would be printed:

{                                                                   
  samples: 624,                                                       
  gcSamples: 1,                                                       
  micros: 12421,                                                       
  children: [                                                       
    {                                                               
      function: { file: 'input.js', line: 1, name: '<top level>' },                      
      site: { file: 'input.js', line: 25 },                         
      samples: 624,                                                   
      children: [                                                   
        {                                                           
          function: { file: 'input.js', line: 1, name: 'foo' },     
          site: { file: 'input.js', line: 7 },                      
          samples: 195,                                               
          selfCount: 195                                                 
        }, {                                                        
          function: { file: 'input.js', line: 1, name: 'foo' },     
          site: { file: 'input.js', line: 11 },                     
          samples: 429,                                               
          selfCount: 429                                                 
        }                                                           
      ]                                                             
    }, {                                                            
      function: { file: 'input.js', line: 1 },                       
      site: { file: 'input.js', line: 26 },                          
      children: [                                                   
        {                                                           
          function: { file: 'input.js', line: 18, name: 'bar' },    
          site: { file: 'input.js', line: 18 },                     
          data: { one: 1 }                                          
        }, {                                                        
          function: { file: 'input.js', line: 18, name: 'bar' },    
          site: { file: 'input.js', line: 19 },                     
          data: { two: 2, n: 200 }                                  
        }                                                           
      ]                                                             
    }                                                               
  ]                                                                 
}                                                                   

With this structure, each "stack trace" has its own object which is persisted across invocations. Each object also has information about whatever children were invoked from it.

Profiled Statistics

At the root level of the results is the following information:

samples
The total number of samples that were taken while the profiler was running.
gcSamples
This is the total number of samples which were taken where GC activity was happening, but no other JS code was running.
micros
This is the number of microseconds for which the profiler was running. If the results were generated when the profiler was stopped, then this is the amount of time between the start/stop. If the profiler is still running, then this is the amount of time from the start of the profiler to when the results() method was called.

Each child then has the following profile statistics:

function
This is the function in which the sample was taken. It contains information about the function's name/file/line definition
site
This is the actual site of execution at which the sample was taken. For example if function a calls function b and a sample were then taken, the site of b would be the line at which the sample was taken inside b, and the site of a would be the line of the function call to b.
data
If the frame() method was invoked, then this represents the corresponding data which was attached to the frame information.
samples
This is an integer value of the number of ticks while this function's frame was on the stack. This is useful when counting total time spent in a function. It should be true that: samples = selfCount + children.sum('samples')
selfCount
This is an integer value like "samples," but instead only counts the time spent in the function itself. When a sample is taken and a function is currently running (it's the top of the stack), it's selfCount is increased.


The return of .samples()

The samples() method of a Profiler returns a list of sample objects which describe a sample taken at a particular time. Each object has the following structure:

{                                                     
    micros: 100,                                      
    weight: 2,                                        
    stack: [                                          
        {                                             
            function: { line: 20, file: '...' },      
            site: { line: 30, file: '...' }           
        },                                            
        ...                                           
    ]                                                 
}                                                     

Each sample's field are as follows:

micros
This is the time at which the sample was taken (in microseconds).
weight
Due to the way sampling is currently implemented, time spent in native C++ functions prevent samples from being taken. For this reason, the time is attributed to the next sample. Hence this weight indicates how many samples were deferred to this stack trace.
stack
This will be a non-empty array with the stack trace at the time of the sample. The first function is the top of the stack and each subsequent function called the previous function. The 'function' field describes where the function was defined, and the 'site' field describes where the function is currently executing.

Use Cases

Here's some example use cases for when profiling is wanted and how they would use the API provided above.

Test Cases

The test suite would initially create a Profiler object and then start/stop profiling the current page on each call to start/stop and the information returned from results would be what is displayed. Between each test, the reset method would be used to prevent pollution between runs.

Extension

Opened on a page and creates its own Profiler object. It then profiles the current page via start (probably a button to be hit). Some time later results is used to get information about the run of profiling (again probably a button being hit) and then all of the information is sorted through and displayed in some fancy fashion.

The page being profiled is unaware that it's being profiled, and there's nothing that it needs to do to help it along.

Implementation

The goals of the implementation of this profiling API are:

  • Works on all platforms
  • Works with IonMonkey, JaegerMonkey, and the interpreter
  • Virtually no overhead when profiling is turned off
  • Very little overhead when profiling is turned on

The concept of a "global" is the same as that of a compartment in the current engine's implementation. The method of profiling is outlined below:

Time Profiling

Timing information is crucial to gather, and also the most difficult. All other events can most likely be lumped into the category below. The two methods of getting timing information are sampling and instrumentation:

Instrumentation

Using instrumentation it is very easy to walk the stack because corruption is not a problem, and there is existing methods to do this. There is also no problems with locks, allocations, or whatnot. Some decisions which would need to be made:

  • Only sample every Nth function call? (technique similar to https://bugzilla.mozilla.org/show_bug.cgi?id=652535)
  • Worry about the stack trace? Is it useful to know that you spent X% of time in in function foo, or Y% of time in foo called from bar and Z% of time in foo called from baz?
  • Where is the timing information actually stored? (probably part of question above)

Pros

  • Easy idea, simple (no signal handling/validation business)
  • Can achieve fine-tuned results of both timing and lots of other information if necessary.

Cons

  • Can bias timing data due to overhead
  • Starting/stopping profiling affects code generation, which can be expensive. Might not be so great for the test profiling use case described above.
  • More overhead than sampling (instrumentation is per-function)
  • Where to store timing information could get tricky and having a very quick recorder could get fun depending on what degree of context is desired.

Sampling

Sampling is possible in two flavors. One is via just a raw signal handler, and the other is as an "operation callback." The major drawback of using a raw signal handler is that the state of the interrupted world could be corrupt, and everything inspected must be heavily validated. Additionally, the current context is unknown and difficult to get, so stack traversal also gets fairly hairy. On the other hand, an operation callback looks as if it will solve these problems because it is passed the current context and is only called at a "good time." This way it won't require extra validation.

Pros

  • Statistically accurate (not biased against frequently run functions)
  • starting and stopping profiling is as easy as turning on/off the callback.
  • Stack trace context is free because it has to be inspected anyway and a relatively expensive operation isn't run that often

Cons

  • Biased, operational callbacks are only invoked at particular points in the code, so true line-by-line sampling is impossible. It is possible with signal-based sampling, but that has a large number of other issues with it.
  • Might still require some wonky code maybe? Not quite sure about this...

Choice

Ideally a hybrid solution of both sampling and instrumentation would be used, but for a first iteration only one should be implemented. So long as an operation callback is cheap and easy to implement, then sampling looks like the way to go.

Generalized Events

There will be a global "stack list" which keeps track of all stack traces which have had events on them. When an event fires, the current stack is looked up in this list, and if found, the event is added to the stack. If not found, the stack trace is serialized and added to the list.

There will be a way to instrument events from C++ as well to track things like GC triggers, GC allocations, type inference failures, etc.

Extensions

  • Create a generic Profiler.Statistics which is an aggregator for stats, kinda like http://sourceware.org/systemtap/langref/Statistics_aggregates.html where you call stats.insert(4) and it would automatically track min, max, average, count, etc.
  • Throw away all data more than N seconds old because it's probably not interesting and wasting memory now
  • Possibly have attributes or methods to enable/disable profiling specific attributes? (memory/time/network/etc.)