SVG:Data Storage: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
No edit summary
m (SVGDev:Data Storage moved to SVG:Data Storage: killing SVGDev)
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Feel free to edit this document as heavily or lightly as you want. Please try to keep comments denoted as explicitly being made by you to a minimum though. (People are reluctant to remove such comments, and this prevents documents from evolving into a spec/final documentation because they just become an unmanagable mass of comments no on will touch.) Instead try to integrate your comments into the flow of the document where possible. Do a significant rewrite if necessary.
Feel free to edit this document as heavily or lightly as you want, but please try to keep comments explicitly denoted as having being made by you to a minimum. People are reluctant to remove such comments, and this prevents documents from evolving into a spec/final documentation because they just become an unmanageable mass of comments no on will touch. Instead try to integrate your comments into the flow of the document where possible. Do a significant rewrite if necessary.


== Introduction ==
== Introduction ==


One of the ways in which the SVG specification stands out from other W3C XML specifications is the extent to which it crams a lot of data into attributes. For example, it's not uncommon to see an SVG file containing a <path> tag with a 'd' attribute (for the path data points) similar to the following:
One of the ways in which the SVG specification stands out from other W3C XML specifications is the extent to which it crams lots of data into attributes. For example, it's not uncommon to see an SVG file containing a <path> tag with a 'd' attribute (for the path data points) similar to the following:


<pre>
<pre>
Line 17: Line 17:
-->
-->


Having so much data crammed into attributes could make it difficult to programmatically access and change much of the data in an SVG graphic. To make that easier, almost all SVG attributes (data crammed or otherwise) are mirrored by multi-level trees of objects in the SVG DOM which contain typed data corresponding to the attributes' values. This eliminates the need for scripters and others to write their own parsing code. However, it presents challenges to implementers who must provide these object heavy interfaces while minimising memory use and maximising rendering speed.
Having so much data crammed into attributes makes it difficult to programmatically access and change much of the data in an SVG graphic using get/setAttribute. To make that easier, almost all SVG attributes (data crammed or otherwise) are mirrored by multi-level trees of objects in the SVG DOM which contain typed data corresponding to the attributes' values. This eliminates the need for scripters and others to write their own parsing/serializing code for attribute values. However, it presents challenges to implementers who must provide these object heavy interfaces while minimizing memory use and maximizing rendering speed.


The problem this document aims to address is how and where to store the typed data.
The problem this document aims to address is how and where to store the typed data.
Line 23: Line 23:
== Current Implementation Strategy ==
== Current Implementation Strategy ==


The strategy currently employed in Mozilla's SVG implementation is to implement the object heavy SVG DOM interfaces as full objects and store the typed data used by the internal code on those objects. For each attribute that has meaning on a particular SVG element there is always a corresponding DOM object tree in memory - even when the attribute isn't set on the element in the SVG markup. The internal code can then obtain typed data (often default values) as required from these trees since they always exist.
The strategy currently employed in Mozilla's SVG implementation is to implement the object heavy SVG DOM interfaces as full objects and store the typed data used by the internal code on those objects. For each attribute that has meaning on a particular SVG element there is always a corresponding DOM object tree in memory - even when the attribute isn't set on the element in the SVG markup. The internal code obtains typed data (often default values) as required from these trees since they always exist.


The strategy currently used is the most obvious and straightforward one (and it has several non-obvious advantages) but it is inherently very (too) memory intensive. It's made memory intensive not because trees of objects and typed data are maintained, but because the objects in those trees are XPCOM objects implementing multiple interfaces, and because those trees exist even for non-existant attributes.
The strategy currently used is the most obvious and straightforward one (and it has several non-obvious advantages) but it is inherently very (too) memory intensive. There is a profusion of objects (e.g., three objects for each SVGAnimatedLength), and the objects in our implementation are XPCOM objects implementing multiple interfaces, and objects exist even for non-existent attributes.


We need to come up with a new implementation strategy to drastically reduce our excessive memory consumption while allowing for fast rendering. Of course each strategy has significant implementation problems of their own. The rest of this document describes possible strategies and the problems we need to solve before we can use them.
We need to come up with a new implementation strategy to drastically reduce our excessive memory consumption while allowing for fast rendering and declarative animation. Of course each strategy has significant implementation problems of its own. The rest of this document describes possible strategies and the problems we need to solve.


== Alternative Implementation Strategies ==
== Alternative Implementation Strategies ==
Line 33: Line 33:
The following strategies assume that parsing data from attribute values every time it's needed would be too expensive. Therefore we will continue to store typed data mirroring SVG attributes in some way.
The following strategies assume that parsing data from attribute values every time it's needed would be too expensive. Therefore we will continue to store typed data mirroring SVG attributes in some way.


The strategies below also assume we will use tearoffs to implement the object heavy SVG DOM element interfaces so that their member object trees (the DOM trees that mirror attributes) are only created "on demand".
The strategies below also assume we will use tearoffs to implement the object heavy SVG DOM element interfaces.


=== Strategy A ===
There are two different strategies: one for attributes which are usually present on a particular kind of SVG element, and one for attributes which are usually not present.


Store the typed data tree directly on the content object as mTransform etc., similar to the way we store the DOM trees now. We could still have nulled out pointers for data trees when an attribute isn't set.
=== Strategy A: Frequently Present Attributes ===
 
Store the typed data directly in the content element object as a field member (NOT pointer or reference), not reference counted. For example, animated lengths can be stored in about 8 bytes (a float, plus some units and other metadata) in the common case where animation is not being used.
 
==== Pros. ====
 
* The typed data is always there for the internal code to access. Using flags (including null pointers), the internal code may not need to have so much knowledge about default values or have much branching code to handle whether the attribute is present or not.
 
=== Strategy B: Infrequently Present Attributes ===
 
Store the typed data as a "property object" attached to the content element via SetProperty/GetProperty. Create the property object whenever data is needed by DOM access or by SVG rendering. Property objects can be XPCOM objects so we don't need tearoffs for them. The content element holds a strong reference to the object, and releases that reference when the content element dies. Use state bits in the content element to record whether the property is present.


==== Pros. ====
==== Pros. ====


* The typed data is always there (along with default values) for the internal code to access. Depending on what extent we null out pointers, the internal code may not need to have so much knowledge about default values or have much branching code to handle whether there is typed data avalable or not.
* No storage required when the attribute is not present. Reasonably fast access to the data when required.


* No GetAttr call required to get to the typed values (unlike strategy B below). Is this a significant saving/issue?
=== Tearoffs ===


=== Strategy B ===
Tearoff objects will hold strong references to content element objects, because their underlying data resides in the content element object. Tearoffs cannot copy their data because that would break consistency between DOM properties and attribute values (and besides, it's wasteful).


Store each attribute's typed data on its nsAttr similar to the way we do things for the 'style' attribute on HTML content. Every time we want an attribute's typed data we'd have to go via GetAttr.
We may want to store tearoffs as properties of content element objects, to be sure we reuse a tearoff if the same getter is used many times.


==== Pros. ====
Tearoffs retrive/set their data by calling methods on content elements. We would like to share tearoff classes as much as possible; this may be aided by definining common value getter/setter methods in nsSVGElement. For example we may want a common method GetBaseValue() which takes a tag parameter specifying which base value is being retrieved (e.g., tag_X). Then we can have a single tearoff class for "base values" which contains an mTag field and can be used to retrieve base lengths from all kinds of SVG elements.


* Absolutely no memory is taken up for typed data unless the attribute has been set. There aren't even nulled out pointers on the content objects for absent attributes.
=== Issues to Solve ===


==== Issues to Solve ====
* Notifications. Attribute changes notify the content element, so it can update inline data (A) or any present property objects (B). For updates via DOM API calls, the API implementation (in a tearoff, content element object, or freestanding XPCOM object) will have to route notifications itself. Style change notifications go through frames' DidSetStyleContext which needs to route notifications.


* Will the number of GetAttr calls be prohibitive? Every time we (re)render we'd be calling it <b>a lot</b>.
* Shape-changing DOM object trees. Animated lengths are a simple example because the shape of the DOM object tree never changes; there's always an animated length object, which always has baseVal and animVal children. Path data, for example, is more complicated. What happens if JS retrieves a PathSeg and then replaces that segment in the underlying path? It depends on the SVG semantics. Perhaps the semantics are that  pathsegs just have to be copied. (Otherwise, regardless of implementation, it's unclear what should happen when someone grabs a reference to a PathSeg and then changes the attribute ... how can you know "which" path segment has changed?) Exactly how this works is going to be resolved on a case by case basis.


* If we have tearoff objects fetch their values from the attribute's internal typed data tree (see below), what should the tearoffs do if they are accessed when there is no corresponding attribute (or after RemoveAttribute has been called)? Should they have knowledge of default values?


* Our current code assumes that there is always a typed data tree (which will provide default values as necessary) for it to access. Getting rid of this assumption could be tricky. We'd also need to code in knowledge of default values all over our code wherever the typed data is accessed.
<!--


It seems undesirable to duplicate knowledge of default values around the source. It raises the potential for defaults being right in some instances and wrong in others.
== jwatt's crap (I'll sort this tomorrow) ==


== Issues to Solve that are Independant of Strategy ==
Yeah, how should a tearoff for an item in a list keep track of which item it needs if someone inserts a new item? Do notifications go to the tearoff??


* How do changes to values of the objects in the typed object trees result in the correct notifications? Does every object in the tree have to keep a pointer to its corresponding nsAttr? Or to its owning content object <b>and</b> corresponding nsAttr? Or to the object above it in the typed data tree (so notifications go up the tree)?
Once SetProperty has been called to create a typed data tree, the tree exists until the content element dies. There is no point churning memory by removing it if UnsetAttr is called only to construct a new tree if a dependant tearoff is subsequently accessed by a script. (So we will still have to fix UnsetAttr for infrequently used attributes.)


* Tearoffs. We have a choice. Do we (a) store typed data on the objects created by tearoffs as well as in the typed object trees maintained by the internal code; or do we (b) have the tearoff objects fetch their typed data from the internal typed object tree that corresponds to them?
When tearoffs fetch typed data from their content object they could find their data tree needs to be created. We should have them call a method that will do this when necessary rather than call GetProperty directly.


* If we choose (a) in the above point, this seems a waste of memory. It would also mean we'd have <b>three</b> copies of essentially the same data to keep in sync (ugh! Nightmarish notification scenario with careful design required to make sure we don't have notification loops). How would we do that?
The tearoff object setters will have to do something similar to nsSVGElement::DidModifySVGObservable to make sure SetAttrAndNotify is called. It seems each tearoff will need to store data to enable it to map to it's corresponding attribute name <strong>and</strong> map to it's corresponding bit of typed data in the typed data tree. Having them store a ref to an nsAttr is not possible since UnsetAttr could be called followed by SetAttr. Maybe use an atom for the attribute name?


* If we choose (a) in the above point, how are the tearoffs to access their corresponding typed object trees on the content object? First, how will they get to the right tree? Then how do they identify which item in a tree and/or list corresponds to them?
The data in the typed data tree should only ever be changed by the tearoffs or by SetAttr/UnsetAttr. But what if we want to allow internal code to manipulate SVG content? How would their changes to the typed data tree send notifications so SetAttrAndNotify is called?


* If we choose (b) in the above point (if the typed date corresponding to attributes is stored on the content objects or their nsAttr objects), what should the objects implementing the SVG DOM interfaces do if the content object is removed from the content tree and consequently deleted? Should an SVGMatrix, say, be left with the values it had? Probably. Or can we just allow it to become the identity matrix as it would if the corresponding attribute was removed? We probably want the former, but how do we support this if the values are stored as a separate tree on the content object?
How are the tearoffs to access their corresponding typed object trees on the content object? First, how will they get to the right tree? Then how do they identify which item in a tree and/or list corresponds to them?
 
-->

Latest revision as of 13:08, 14 May 2006

Feel free to edit this document as heavily or lightly as you want, but please try to keep comments explicitly denoted as having being made by you to a minimum. People are reluctant to remove such comments, and this prevents documents from evolving into a spec/final documentation because they just become an unmanageable mass of comments no on will touch. Instead try to integrate your comments into the flow of the document where possible. Do a significant rewrite if necessary.

Introduction

One of the ways in which the SVG specification stands out from other W3C XML specifications is the extent to which it crams lots of data into attributes. For example, it's not uncommon to see an SVG file containing a <path> tag with a 'd' attribute (for the path data points) similar to the following:

<path d="M600,350 l 50,-25 
         a25,25 -30 0,1 50,-25 l 50,-25 
         a25,50 -30 0,1 50,-25 l 50,-25 
         a25,75 -30 0,1 50,-25 l 50,-25 
         a25,100 -30 0,1 50,-25 l 50,-25">


Having so much data crammed into attributes makes it difficult to programmatically access and change much of the data in an SVG graphic using get/setAttribute. To make that easier, almost all SVG attributes (data crammed or otherwise) are mirrored by multi-level trees of objects in the SVG DOM which contain typed data corresponding to the attributes' values. This eliminates the need for scripters and others to write their own parsing/serializing code for attribute values. However, it presents challenges to implementers who must provide these object heavy interfaces while minimizing memory use and maximizing rendering speed.

The problem this document aims to address is how and where to store the typed data.

Current Implementation Strategy

The strategy currently employed in Mozilla's SVG implementation is to implement the object heavy SVG DOM interfaces as full objects and store the typed data used by the internal code on those objects. For each attribute that has meaning on a particular SVG element there is always a corresponding DOM object tree in memory - even when the attribute isn't set on the element in the SVG markup. The internal code obtains typed data (often default values) as required from these trees since they always exist.

The strategy currently used is the most obvious and straightforward one (and it has several non-obvious advantages) but it is inherently very (too) memory intensive. There is a profusion of objects (e.g., three objects for each SVGAnimatedLength), and the objects in our implementation are XPCOM objects implementing multiple interfaces, and objects exist even for non-existent attributes.

We need to come up with a new implementation strategy to drastically reduce our excessive memory consumption while allowing for fast rendering and declarative animation. Of course each strategy has significant implementation problems of its own. The rest of this document describes possible strategies and the problems we need to solve.

Alternative Implementation Strategies

The following strategies assume that parsing data from attribute values every time it's needed would be too expensive. Therefore we will continue to store typed data mirroring SVG attributes in some way.

The strategies below also assume we will use tearoffs to implement the object heavy SVG DOM element interfaces.

There are two different strategies: one for attributes which are usually present on a particular kind of SVG element, and one for attributes which are usually not present.

Strategy A: Frequently Present Attributes

Store the typed data directly in the content element object as a field member (NOT pointer or reference), not reference counted. For example, animated lengths can be stored in about 8 bytes (a float, plus some units and other metadata) in the common case where animation is not being used.

Pros.

  • The typed data is always there for the internal code to access. Using flags (including null pointers), the internal code may not need to have so much knowledge about default values or have much branching code to handle whether the attribute is present or not.

Strategy B: Infrequently Present Attributes

Store the typed data as a "property object" attached to the content element via SetProperty/GetProperty. Create the property object whenever data is needed by DOM access or by SVG rendering. Property objects can be XPCOM objects so we don't need tearoffs for them. The content element holds a strong reference to the object, and releases that reference when the content element dies. Use state bits in the content element to record whether the property is present.

Pros.

  • No storage required when the attribute is not present. Reasonably fast access to the data when required.

Tearoffs

Tearoff objects will hold strong references to content element objects, because their underlying data resides in the content element object. Tearoffs cannot copy their data because that would break consistency between DOM properties and attribute values (and besides, it's wasteful).

We may want to store tearoffs as properties of content element objects, to be sure we reuse a tearoff if the same getter is used many times.

Tearoffs retrive/set their data by calling methods on content elements. We would like to share tearoff classes as much as possible; this may be aided by definining common value getter/setter methods in nsSVGElement. For example we may want a common method GetBaseValue() which takes a tag parameter specifying which base value is being retrieved (e.g., tag_X). Then we can have a single tearoff class for "base values" which contains an mTag field and can be used to retrieve base lengths from all kinds of SVG elements.

Issues to Solve

  • Notifications. Attribute changes notify the content element, so it can update inline data (A) or any present property objects (B). For updates via DOM API calls, the API implementation (in a tearoff, content element object, or freestanding XPCOM object) will have to route notifications itself. Style change notifications go through frames' DidSetStyleContext which needs to route notifications.
  • Shape-changing DOM object trees. Animated lengths are a simple example because the shape of the DOM object tree never changes; there's always an animated length object, which always has baseVal and animVal children. Path data, for example, is more complicated. What happens if JS retrieves a PathSeg and then replaces that segment in the underlying path? It depends on the SVG semantics. Perhaps the semantics are that pathsegs just have to be copied. (Otherwise, regardless of implementation, it's unclear what should happen when someone grabs a reference to a PathSeg and then changes the attribute ... how can you know "which" path segment has changed?) Exactly how this works is going to be resolved on a case by case basis.