Javascript:SpiderMonkey:PropertyElementStorage

From MozillaWiki
Jump to: navigation, search
Ambox outdated.png THIS PAGE IS OBSOLETE
This article is in parts, or in its entirety, outdated. Hence, the information presented on this page may be incorrect, and should be treated with due caution. Visit SpiderMonkey.dev for more up to date information.

Objective

Summary: Store and represent properties of objects differently, depending whether the property is a string containing an unsigned 32-bit number (indexed: an element) or is not such a string (non-indexed: a "property").

Currently we represent all properties of an object through a single mechanism. Every property is represented either through the shape_ field of the object, or through class/objectops hooks that are passed a jsid. There's no API separation between properties that are indexed by a uint32_t ("0", "17", "4294967295") and properties that are named ("baz", "-1", "-0", "4294967296"). And for objects which represent properties using shape_, all properties of any kind (excepting low-valued indexes, in certain circumstances -- but not uniformly) are intermingled.

But the distinction between indexed and non-indexed properties exists in various APIs implemented in JavaScript. It's present in WebIDL, implicitly in Array itself (if one excludes UINT32_MAX), implicitly in typed arrays, and elsewhere. (Not to mention that most people write code that respects this distinction -- object literals use non-indexed properties, dotted property access is inherently non-indexed, most objects get used non-indexed-ly unless they're array-likes [in which case accesses are pretty much always obviously indexed or obviously non-indexed], and so on.)

One result of this lack of separation is increased overhead for accessing all properties. Any object whose properties split this way, that uses class/objectops hooks, incurs extra overhead to differentiate the two cases. A jsid might be either an index or not, and that distinction must be checked before indexed or non-indexed behavior can occur. (Unless certain hacks are done, but those are pretty tricky to do, and they're one-off each time.)

Another result is that for objects where we've manually worked around this extra overhead -- for example, typed arrays -- we usually have to give up on representing all properties. (This is currently the case for typed arrays, which cannot have extra properties added to them. It was historically the case for dense arrays, too, before recent changes to make an initial sequence of indexed properties be stored more compactly, in some cases.)

A last nicety is that this all makes implementing a non-writable length property on arrays much easier.

People

Accountable: Naveed
Responsible: Waldo
Consulted:
Informed: Product Marketing

Steps

Property/element splitting

Remove resolve flags

Resolve flags stand in the way of making our property-access APIs close enough to the ECMAScript internal operations to cleanly implement the split. The only remaining flag -- many have been removed since mid-2012 -- is JSRESOLVE_ASSIGNING.

Time: M weeks

  1. Remove all tests of flags & JSRESOLVE_ASSIGNING in all files - ???
    • dom/base/nsDOMClassInfo.cpp
      • Fix the test in resolving readonly-replaceable properties - Waldo, 1 day?
        • A moderate hackaround might be easy. Not sure.
      • Fix the test to optimize fast expandos on window - bug 823227, bz, 1 day
      • Fix the test in nsNamedArraySH::NewResolve - a day?
        • Suss out exactly how this code gets called, write a test for new behavior if there's a change, done. Likely simple.
      • Fix the test in nsHTMLDocumentSH::NewResolve - Waldo, 1 day?
        • This might just be totally removable -- unclear.
      • Fix the test in nsHTMLFormElementSH::NewResolve - Waldo, 1 day?
        • This might just be totally removable. Unclear.
    • dom/bindings/Codegen.py
      • This probably requires work from DOM bindings people, to implement set hooks rather than relying on getPropertyDescriptor to implement assignish behavior. Not sure how much time it'll take.
    • js/src/shell/js.cpp - Waldo, ~no time
    • js/src/jswrapper.cpp - bug 836301, bholley - DONE
    • js/xpconnect/wrappers/XrayWrapper.cpp - bug 836301, bholley - DONE
  2. Remove JSRESOLVE_ASSIGNING completely - Waldo, ~no time

When all flags have been removed, we can either pass 0 everywhere or remove all flags arguments. We should certainly do the latter at some point (and remove all the code that existed only so that flags could be used), but it's irrelevant to forward progress on property/element work. Probably a good mind-is-mush-need-a-break task.

Implement the property key stuff in ES6

ES6 property names aren't strings, they're property keys made by ToPropertyKey spec op -- either strings or ES6 symbols. It'll be a very clean split to implement property keys but have indexes as a third kind of property key. This provides nice typing at the underlying API boundary level, and it enables a high-level type for undistinguished property accesses, that straightforwardly decomposes into the more specific types.

ES6 symbols aren't specified yet. We can carve out the API space for them underneath property keys, as currently-dead code -- maybe add a super-stupid symbol-like creator function to the shell if we want to exercise them.

Fully implementing property keys requires removing E4X SpecialIds. Thus landing of some parts of this work depends on E4X removal.

Time: M weeks

  1. Implement PropertyKey as a class containing a Value, with is* and as*, with index/name/symbol subclasses and accessors for PropertyName*/uint32_t/symbol - bug 837773, 2 days DONE pretty much
    • This should be pretty simple to do in terms of Value's existing interface.
  2. Implement JS::ToPropertyKey and JSAPI entry points that take PropertyKey - bug 837773 and ???, 2 days
    • This provides a clean, long-lived way (modulo ES6 changes, but we'll roll with them) for embedders to access spec functionality.
  3. Switch the ObjectOps method signatures to take handles to the relevant PropertyKey subclasses, rather than what they take now - 2 days
    • This is straightforward enough (but depends on E4X removal), but there are a lot of implementations of these methods spread across many files (not easily searched for).
  4. Make shapes use PropertyKey instead of jsid - 1 week?
    • This involves changing the underlying field types, the methods used to expose the id, and so on.
    • Fallout from this in other code may sweep fairly wide.
  5. Make the baseops::* methods take Property handles - 1 week?
    • Perhaps the ugliest part of all this, because of the significant complexity amongst all these methods in their current forms.

Meta-object protocol changes

Our internal meta-object protocol, as represented in ObjectOps, is quite dissimilar to the ECMAScript one. That one is formulated in terms of own properties throughout, and in terms of property descriptor objects. Ours is formulated in terms of property lookups, property values, and attributes accessed through attribute-accessing methods, and it lacks descriptors entirely. Our MOP also requires reimplementation of the property lookup process (in the start object, along the prototype chain, etc.) in several places.

Changing underlying structure, and doing it in an obviously correct way, requires converting our MOP to one more like ES6. Almost certainly a superset of it in specific areas -- property descriptors must be able to represent PropertyOp and StrictPropertyOp, for now -- but the idioms should be obviously parallel.

This also has benefits for the DOM bindings people, who have implemented WebIDL bindings using our current setup and have ignored the issues our current MOP doesn't let them address.

Time: ???

  • Remove lookup*
  • define* meta-op
  • get* meta-op
  • Remove getElementIfPresent
  • set* meta-op
  • Remove get*Attributes and set*Attributes
  • delete* meta-op
  • Adding more ops if necessary

Sparse elements

Properties already have a storage representation. Elements when split out will have one when they're dense, but they need one for when they're sparse.

v8 uses the exact same representation for sparse elements as for properties -- just a difference in a template parameter. Possibly we could also do this. Unfortunately our shape representation is quite complex, and its internals are intricately tied to the rest of the object representation, to type inference, and elsewhere. Possibly this could be disentangled. I'm not sure how long it would take. If we didn't disentangle, and perhaps instead just used (say) a HashMap, it would still take a bit of time. It might take less time that way.

I don't have any good answers here, nor do I have much idea how long this should take, either way.

Time: unknown

  1. step 1 - time
    • ...
  2. step 2 - time
    • ...

Split all baseops into property/symbol and element variants

Basically this is propagating the property/symbol and element distinction further downward, so that the element methods are clearly distinguished and ready to be rewritten. This has been somewhat ongoing for awhile, but the lack of PropertyKey and the mismatch of jsid have somewhat hindered this. So this depends on the PropertyKey work being complete.

There is some overlap in code touched between this and the meta-object protocol changes, but the two are separate enough to proceed in somewhat parallel, with some merging/rebasing pain for the parties involved.

Time: unknown

  1. Split the baseops methods (which, given the PropertyKey work, now take a key) to take either property/symbol or element - ???
    • define meta-ops
    • get meta-ops
    • set meta-ops
    • delete meta-ops
    • other meta-ops

Type inference changes

Type inference currently associates type information with things through jsid, and it does so for (almost) all properties. It attempts to perform its own property/element splitting already: non-negative number properties (this is not the same thing as an index as referred to in this document!) are grouped together under JSID_VOID. This distinction admits more than just unsigned 32-bit integers.

The existing type inference algorithm must be changed so that it doesn't track information for elements. Tracking for elements needs to be moved into a separate location, consulted only for element access. It's also possible it'll need to be updated for whatever structure is used to represent sparse elements. There may be some applicability of the current code to sparse elements, if the property tree stuff is used to represent sparse elements, but it seems likely to be an awkward fit. Whatever happens here will much depend on sparse elements' representation.

Time: unknown

  1. ...
  2. ...

Issues

  • thing to consider
  • other thing to consider

Risks

  • risk 1
    • mitigating idea 1
    • mitigating idea 2
  • risk 2
    • mitigating idea 1