Script Origin Tracking

From MozillaWiki
Jump to: navigation, search

This draft is being discussed in bug 637572. The interface it describes is not stable, and perhaps not even implemented.

A debugger should be able to explain how any given piece of JavaScript code running in a web page got there.

JavaScript code can initially enter a browsing context in several ways:

  • A script may appear in an HTML <script> element (or be cited by its src attribute).
  • A script may appear in HTML as an event handler content attribute.
  • The browser could retrieve a javascript: URL.

Once loaded, JavaScript code can then itself introduce more code:

  • It can call eval, the Function constructor, and similar functions.
  • It can create web workers; and web workers can call importScripts.
  • It can assign new scripts to a DOM elements' event handler IDL attributes.
  • It can use DOM manipulation (assignments to innerHTML, calls to appendChild, and so on) to introduce new <script> elements and event handler content attributes.

Given a particular piece of JavaScript, script origin tracking provides a complete trail showing how that JavaScript was loaded as a consequence of navigating to a resource.

Location values

A location is a value that describes a particular point in markup text or a script. A location has the form:

 { origin:origin, line:line, column:column }

where origin is an origin value (described below), line is a zero-based line number, and column is a zero-based column number. The column property is optional. The line property may also be omitted if it is not available; simple consumers could treat a missing line as referring to the beginning of the text.

A markup location is a location in markup text: a location whose origin is a markup origin. A script location is a location in a script: a location whose origin is a script origin.

Note that line is always relative to the start of the text represented by origin, not relative to whatever documents that may contain origin. For example, if document is an origin describing a document with a <script> element at line 10, then the location

 { line: 5,
   origin: { scriptElement: element,
             markupLocation: { line: 10, origin: document }
           }
 }

represents line 15 of document, or the sixth line of the <script> element.

Origin values

An origin value is a value that describes where a particular markup text or script text came from: a URL, for example. A script origin is where a script came from; a markup origin is where some markup text (HTML or XML) came from. We describe the forms origin values can take and their meanings below.

Script origin values

A script origin value describes the origin of a particular piece of JavaScript code. It has one of the following forms:

{ scriptElement:element, markupLocation:location }
This script belongs to the <script> element element whose content appears inline at location. Element is a DOM element object; location is a markup location value.
{ scriptElement:element, markupLocation:location, url:url }
As above, but for script elements with a src attribute, that refer to an external script resource. Url is the absolute form of the URL given by the src attribute.
{ scriptElement:element, scriptLocation:location }
This script belongs to the dynamically constructed <script> element element, whose contents were assigned to it at location. Script elements created by createElement or similar functions use this form. Element is a DOM element object, and location is a script location value.
{ eventHandler:element, attribute:attribute, markupLocation:location }
This script is the event handler content attribute attribute of element, appearing in markup at location. By 'event handler content attribute', we mean a bit of JavaScript code appearing in markup as the value of an element attribute. Element is a DOM element, attribute is the name of the event handler attribute, a string, and location is the location of the element's attribute, a markup location.
{ eventHandler:element, attribute:attribute, scriptLocation:location }
As above, except that the handler script was assigned to element's event handler IDL attribute attribute by JavaScript code at location, a script location. This covers both JavaScript assignments to element properties (like element.property = script) and calls to DOM methods that manipulate element attributes (like element.setAttribute("attribute", script).
{ evaluated:function, scriptLocation:location }
The call at location to function produced this script. Location is a script location. Function is a string, naming the function called to evaluate or compile the script. Common values for function might be:
  • "eval", referring to the global object's eval property
  • "Function", referring to the Function constructor
  • "setTimeout", referring to the HTML5 setTimeout function

{ evaluated:function, scriptLocation:location, url:url }
As above, where function loaded this script from url. This is used for functions like the Web Workers API's importScripts. Url is the absolute form of the URL from which the script was loaded, a string.

(Ideally, we would provide a way for cooperative custom content module loaders (the sort implemented using XMLHttpRequest and eval) to construct their own script origin values like this for the scripts they pass to eval, whose location values referred to the point at which they were called. Of course, telling a function the point from which it was called has security repercussions, so this would need to be handled carefully.)

{ javascriptURL:url }
Retrieving the javascript: URL url created this script. Url is a string.

Usually, the code in 'javascript:' URLs is so ephemeral that debuggers won't come across it, but it is possible for such code to live longer. For example, the effect of visiting a URL like:

 javascript:g=function(){return\"look%20on%20my%20works,%20ye%20mighty,%20and%20despair\";};(void0)  

is to create a function g on window --- the page's global object --- whose source can only reasonably be attributed to the javascript: URL.

Note that the result of evaluating a javascript: URL may itself taken to be markup or JavaScript source code. This origin refers to the code in the javascript: URL itself, not code produced by dereferencing such a URL.

Any script origin value may also have a property named source, whose value is the original source code of the script.

Markup origin values

A markup origin value describes the origin of a particular piece of markup text (HTML; XHTML; and so on). A markup origin value has one of the following forms:

{ browsingContext:window }
This describes a top-level browsing context whose window is window, a DOM window object. From the window you can find the URL being visited, and the parent browsing context, if any.
{ dynamicMarkup:node, method:method, scriptLocation:location }
The call at location to node's method named method inserted this markup. This form is used for calls to document.write and similar functions. Node may be a DOM document or element; method is a string; and location is a script location.
{ dynamicMarkup:node, attribute:attribute, scriptLocation:location }
The assignment at location to node's attribute named attribute inserted this markup. This form is used for assignments to properties like innerHTML. Node may be a DOM document or element; attribute is a string; and location is a script location.

Any markup origin value may have a property named source, whose value is the markup text.

Examples

distinction between javascript: URL content and retrieved resource content

The origin value prototype

The prototypes of origin values and location values hold the following methods:

toString()
Format the location or origin value as a human-readable string. In English. [Author is crushed by gigantic Monty Python-esque weight labeled "localization"]

The Debug.Script.prototype.origin accessor

The origin property of a Debug.Script is an origin value describing how the given script was loaded into its browsing context.

Open items

  • Might be nice to have a lazy script location object that initially just holds the JSScript and PC (cheap to construct, because it avoids consulting the source map to get the line number), but can look up (and memoize) the origin/line on demand. These could hold a weak reference to the JSScript, such that, just before the JSScript goes away, we do the (JSScript, PC) -> (origin/line) computation. This saves a source map lookup when the location is never actually used and the JSScript outlives the lazy script location object.
  • XDR serialization should just preserve this information; it will need to throw away DOM element references.
  • Need to be able to pass an origin to eval explicitly (and thus, need the prototype to be public)
  • It's possible to provide more details about javascript: URLs: where does the URL appear? Who put it there. But it doesn't seem like it should be a high priority.

jorendorff says:

  • This means every JSScript keeps the DOM elements relevant to its origin gc-alive, right? It seems like the kind of awful web site where everything is on a single page no matter what the user does, and all content is dynamically loaded, might produce really long origin chains basically documenting the user's path through the site and perhaps preventing GC from collecting anything. But maybe not. We can just try it and see how it goes.
  • It looks like anyone using this information would be getting important details from the DOM. Are we sure that navigation or DOM mutation won't cause the resulting information to be wrong or horribly misleading? Are the browsingContext examples immune to this? What about script elements being adoptNoded into other documents?
  • If it's no pain for a javascript: URL origin to say how we got there (for example, if it's as simple as calling getCurrentLocation() at the point where we compile that little script), we should do it.
  • It seems like the sensible way to hook this up to the Debug object is via something like Debug.Frame.prototype.location, but then we would want to provide Debug.Objects (or at least x-ray wrappers), not actual DOM nodes.
  • Also w.r.t. debuggers, is it possible for a debugger to what to know where something happened, and for the answer to contain non-debuggee information, such as chrome: URLs and line numbers?