DOM/XPath Generator

From MozillaWiki
< DOM
Jump to: navigation, search

This page is to propose a XPathGenerator global constructor function for returning a XPath string leading from the context node to the target node. Each instance would be a XPCOM component, providing features for modifying the desired XPath value, and a namespace resolver which corresponds to the XPath.

A consumer might use the code in the following manner:

var generator = new XPathGenerator();
generator.searchFlags |= generator.GO_TO_DOCUMENT_ROOT;
generator.includeAttributeNS("http://www.w3.org/1999/xlink", "href", true);
var xpath = generator.generateXPath(targetNode, contextNode);


Potential Use Cases

Currently, mozilla.org code uses similar functionality in at least one:

http://lxr.mozilla.org/seamonkey/source/content/base/src/nsContentUtils.cpp#1567 GenerateStateKey

Other sources:

https://bugzilla.mozilla.org/show_bug.cgi?id=208500 Copy XPath of node in DOM Inspector (not implemented yet)

(From WeirdAl's weblog) http://xpath.alephzarro.com/index XPather, adds XPath support to Firefox and DOM Inspector

(From WeirdAl's weblog) http://annozilla.mozdev.org/ "My annozilla extension is a client for the W3C's Annotea server, which uses XPointers. I guess that XPointer generation/resolution would be easier with build-in XPath support." -- Matthew Wilson

(From the implementation bug) "This kind of thing would be useful for http://www.melez.com/mykzilla/2006/01/son-of-live-bookmarks.html, too. (Oops, ok, pornzilla, too.) Both of which only make sense if we serialize both the expression and the namespace resolver (assuming that we need to handle XML to be future proof)." -- Alex Hecht

Verbosio would be able to translate a XPath for a document it's editing into a XPath for a XUL tree (a la DOM Inspector) representing the document's structure. This would be through simple text replacements. Thus, a node found in the master document would easily translate to a XUL tree item.

Shane Caraveo at ActiveState is working to develop a XUL unit test extension that would work in a similar fashion to Selenium and Selenium Recorder. This consists of recording DOM events, and saving an xpath to the event target so the target can found and used to replay an event. Exploratory code is in bug 323938.

Developers would no longer have to guess a working XPath to a node for future reference. This would reduce development time.

Proposed IDL

#include "domstubs.idl"
#include "nsIDOMXPathNSResolver.idl"

[scriptable, uuid(341d8cbe-bcb3-4d9a-a5a6-dd4ef72a402d)]
interface nsIXPathGenerator : nsISupports {
  /* A collection of bitwise flags which modify behavior.
   * 0x01000000 and flags greater are reserved for custom implementations.
   */

  // Ignore ID-type attribute nodes on elements and continue to the document or context node.
  const unsigned long GO_TO_DOCUMENT_ROOT           = 0x00000001;

  // Return expression containing a single step that uses the descendant axis.
  const unsigned long USE_DESCENDANT_AXIS           = 0x00000002;

  /**
   * Flags which modify the parameters used to generate the xpath string.
   */
  attribute unsigned long searchFlags;

  /**
   * If an attribute is present at a particular element's step, include it.
   *
   * @param namespaceURI     The namespace URI of the attribute.
   * @param localName        The local name of the attribute.
   * @param includeAttrValue Include the attribute value as well.
   */
  void includeAttributeNS(in DOMString namespaceURI,
                          in DOMString localName,
                          in boolean includeAttrValue);

  /**
   * Stop including any attributes with this namespace URI and local name.
   *
   * @param namespaceURI     The namespace URI of the attribute.
   * @param localName        The local name of the attribute.
   */
  void excludeAttributeNS(in DOMString namespaceURI,
                          in DOMString localName);

  /**
   * Clear list of attributes included in returned xpath.
   */
  void clearAttributes();

  /**
   * Namespace resolver corresponding to all generated xpaths.
   */
  readonly attribute nsIDOMXPathNSResolver resolver;

  /**
   * Add a namespace URI and prefix to the namespace resolver.
   *
   * @param namespaceURI     The namespace URI of the namespace.
   * @param prefix           The prefix of the namespace.
   */
  void addNamespace(in DOMString namespaceURI,
                    in DOMString prefix);

  /**
   * Generate a xpath as a string.
   *
   * @param targetNode  The node our xpath ends at.
   * @param contextNode The node our xpath starts from.  If null, use targetNode's owner document.
   *
   * @return DOMString XPath from the context node to the target node.
   */
  DOMString generateXPath(in nsIDOMNode targetNode,
                          in nsIDOMNode contextNode);

  /**
   * Generate a xpointer as a string.
   *
   * @param targetNode  The node our xpath ends at.
   * @param contextNode The node our xpath starts from.  If null, use targetNode's owner document.
   *
   * @return DOMString XPointer from the context node to the target node.
   */
  DOMString generateXPointer(in nsIDOMNode targetNode,
                             in nsIDOMNode contextNode);
};

API Notes

The resolver of each XPathGenerator object would not be a standard document.createNSResolver(node) object. Instead, the XPathGenerator object will create and maintain the resolver independently. This will allow the generator to add additional namespaces transparently and harmlessly as needed.

Certain consumers may wish to have attributes included in the xpath (for example, /a[href='foo.html'] or /a[href][2] instead of /a[5]). The includeAttributeNS() method exists for this purpose. Also, as consumers are generally expected to reuse the same generator, excludeAttributeNS allows consumers to exclude an attribute previously included, and clearAttributes() excludes all attributes currently in the search queue.

The bitwise flags are currently largely undefined, with the intent being for developers to add flags as needed. For instance, if a developer wants to ignore anonymous-to-non-anonymous content boundaries, they can add a flag for it, implement it, and offer a patch. While a patch awaits review or if it has been rejected, the upper 8 bits of the searchFlags number may be used privately for the same feature. The official XPathGenerator implementation must not use any of these eight flags for any purpose.

Implementation: Bug 319768


I think we should add a generateXPointer function too, since it's trivial to implement once we have implemented generateXPath and the resolver. You just need to serialize the namespace mappings from the resolver following the XPointer xmlns() scheme and use the XPointer xpointer() scheme for the result of generateXPath. -- Peter Van der Beken

It would probably be a good idea to include a addNamespace(namespaceURI, prefix) method to the IDL, to point into the resolver's namespace map. -- Alex Vincent

I'd like the ability for a caller to tell the generator that it should identify the starting node by the content it contains when calling generateXPath. We could do that by adding a searchFlag called something like IDENTIFY_START_BY_CONTENT which triggers the behavior.

When I say "identify the starting node by the content it contains", I mean use the contains() method or the like to filter nodes with the same tag name. For example, for an XPath expression starting at a DIV node whose content is "foo", the beginning of the XPath expression should be DIV[contains(.,"foo")] or some other XPath construct which identifies the node by the content it contains. -- Myk Melez

XPath: Content Boundaries

(non-normative)

Mozilla's XPath implementation doesn't currently support crossing boundaries between framed content and the container frame element, or between real content and its anonymous children. Our XPath and XPathGenerator can be reasonably expected to support this in the future, so now's the best time to come up with a function syntax for doing this.

Suggested new XPath functions:

  • From element in framed document to the container iframe:
    • XPath 1: "frameElement(.)" (Alex Vincent)
    • XPath 2: "./frameElement()" (Alex Vincent)
  • From container iframe to element in the framed document:
    • XPath 1: "contentDocument(.)" (Alex Vincent)
    • XPath 2: "./contentDocument()" (Alex Vincent)
  • From anonymous content to the XBL-bound parent:
    • XPath 1: "bindingParent(.)" (Alex Vincent)
    • XPath 2: "./bindingParent()" (Alex Vincent)
  • From XBL-bound parent to first anonymous child:
    • XPath 1: "anonymousChild(.)[1]" (Alex Vincent)
    • XPath 2: "./anonymousChild()[1]" (Alex Vincent)
<sicking>	we do not want a different syntax in xpath2
<sicking>	at least i think not. at the very least we do not want to commit to it

Whatever you think is a better syntax, please suggest it here. All suggestions are welcome.

Implementation: bug 326745