SVG:Data Storage
Feel free to edit this document as heavily or lightly as you want. Please try to keep comments denoted as explicitly being made by you to a minimum though. (People are reluctant to remove such comments, and this prevents documents from evolving into a spec/final documentation because they just become an unmanagable mass of comments no on will touch.) Instead try to integrate your comments into the flow of the document where possible. Do a significant rewrite if necessary.
Introduction
One of the ways the SVG specification differs from other W3C XML specifications is in the way it crams a lot of data into attributes. For example, it's not uncommon to see an SVG file containing a <path> tag with a 'd' attribute (for the path data points) similar to the following:
<path d="M600,350 l 50,-25
a25,25 -30 0,1 50,-25 l 50,-25
a25,50 -30 0,1 50,-25 l 50,-25
a25,75 -30 0,1 50,-25 l 50,-25
a25,100 -30 0,1 50,-25 l 50,-25">
Having so much data crammed into attributes could make it difficult to programmatically access and change much of the data in an SVG graphic. To make that easier, almost all SVG element attributes (data crammed or otherwise) have a corresponding tree of objects full of typed data in the SVG DOM. This eliminates the need for simple scripters to write their own parsing code. However, it presents many challenges to implementers who must provide these object heavy interfaces while minimising memory use and maximising rendering speed.
This page contains thoughts regarding how we might redesign the way we store the typed values associated with the attributes of SVG elements. E.g., the SVG 'transform' attribute is mirrored by an SVGAnimatedTransformList object, which contains baseVal and animVal lists of SVGTransform objects, where each SVGTransform owns an SVGMatrix. (Many other attribute values are also represented by different types of object trees such as SVGLength[List], SVGPointList, SVGAngle, SVGColor, SVGPreserveAspectRatio, SVGPathSegList, etc.)
We don't want to parse the values out of the attributes every time we (re)render the graphic so we store these typed value trees and keep them in sync with the attribute values.
The problem this document aims to address is how and where to store the typed data. There are significant problems to be overcome.
We plan to use tearoffs for the object heavy SVG DOM. If we aren't to use excessive memory storing typed data in multiple places, the tearoffs and the objects in the trees they create will need to go to their content object for this information. (Duplicating the typed data in SVG DOM objects would also cause nightmarish notification scenarios. Then you would have three sets of potential data to keep in sync, and care would have to be taken not to create a notification loop.)
Issues that are Independant of Strategy
- How are the tearoffs to access their corresponding typed object trees on the content object? First, how will they get to the right tree? Then how do they identify which item in a tree and/or list corresponds to them?
- How do changes to values of the objects in the typed object trees result in the correct notifications? Does every object in the tree have to keep a pointer to its corresponding nsAttr? Or to its owning content object and corresponding attribute? Or to its owning tree object (notifications to go up the tree first)?
- If the typed date corresponding to attributes is stored on the content objects or their nsAttr objects, what should the objects implementing the SVG DOM interfaces do if the content object is removed from the content tree and consequently deleted? Should an SVGMatrix, say, be left with the values it had? Or should it become the identity matrix as it would if the corresponding attribute was removed? We probably want the former, but how do we support this if the values are stored as a separate tree on the content object?
Strategies
- Strategy A: store the typed value tree directly on the content object as mTransform etc. exactly as we do now.
- Strategy B: store the typed value tree as a member of the nsAttr as we do for properties in a 'style' attribute on HTML content.
Strategy A
Pros.:
- Faster since no GetAttr call required in the frames code to get to the typed values. Is this significant?
Strategy B
Pros.:
- Absolutely no memory is taken up for typed data unless the attribute has been set. There aren't even nulled out pointers on the content objects for absent attributes.
Issues:
- What should tearoffs do if they are accessed when there is no corresponding attribute (or after RemoveAttribute has been called)? They could have knowledge of default values, but frames (and probably other code?) will also need to have this knowledge since often attributes and thus their typed object tree won't exist. It seems undesirable to duplicate this knowledge around the source.