SVG:Data Storage

From MozillaWiki
Revision as of 13:38, 2 March 2006 by Jonathan Watt (talk | contribs)
Jump to navigation Jump to search

Feel free to edit this document as heavily or lightly as you want. Please try to keep comments denoted as explicitly being made by you to a minimum though. (People are reluctant to remove such comments, and this prevents documents from evolving into a spec/final documentation because they just become an unmanagable mass of comments no on will touch.) Instead try to integrate your comments into the flow of the document where possible. Do a significant rewrite if necessary.

Introduction

One of the ways in which the SVG specification stands out from other W3C XML specifications is the extent to which it crams a lot of data into attributes. For example, it's not uncommon to see an SVG file containing a <path> tag with a 'd' attribute (for the path data points) similar to the following:

<path d="M600,350 l 50,-25 
         a25,25 -30 0,1 50,-25 l 50,-25 
         a25,50 -30 0,1 50,-25 l 50,-25 
         a25,75 -30 0,1 50,-25 l 50,-25 
         a25,100 -30 0,1 50,-25 l 50,-25">


Having so much data crammed into attributes could make it difficult to programmatically access and change much of the data in an SVG graphic. To make that easier, almost all SVG attributes (data crammed or otherwise) are mirrored by multi-level trees of objects in the SVG DOM which contain typed data corresponding to the attributes' values. This eliminates the need for scripters and others to write their own parsing code. However, it presents challenges to implementers who must provide these object heavy interfaces while minimising memory use and maximising rendering speed.

The problem this document aims to address is how and where to store the typed data.

Current Implementation Strategy

The strategy currently employed in Mozilla's SVG implementation is to implement the object heavy SVG DOM interfaces as full objects and store the typed data used by the internal code on those objects. For each attribute that has meaning on a particular SVG element there is always a corresponding DOM object tree in memory - even when the attribute isn't set on the element in the SVG markup. The internal code can then obtain typed data (often default values) as required from these trees since they always exist.

The strategy currently used is the most obvious and straightforward one (and it has several non-obvious advantages) but it is inherently very (too) memory intensive. It's made memory intensive not because trees of objects and typed data are maintained, but because the objects in those trees are XPCOM objects implementing multiple interfaces, and because those trees exist even for non-existant attributes.

We need to come up with a new implementation strategy to drastically reduce our excessive memory consumption while allowing for fast rendering. Of course each strategy has significant implementation problems of their own. The rest of this document describes possible strategies and the problems we need to solve before we can use them.

Alternative Implementation Strategies

The following strategies assume that parsing data from attribute values every time it's needed would be too expensive. Therefore we will continue to store typed data mirroring SVG attributes in some way.

The strategies below also assume we will use tearoffs to implement the object heavy SVG DOM element interfaces so that their member object trees (the DOM trees that mirror attributes) are only created "on demand".

Strategy A

Store the typed data tree directly on the content object as mTransform etc., similar to the way we store the DOM trees now. We could still have nulled out pointers for data trees when an attribute isn't set.

Pros.

  • The typed data is always there (along with default values) for the internal code to access. Depending on what extent we null out pointers, the internal code may not need to have so much knowledge about default values or have much branching code to handle whether there is typed data avalable or not.
  • No GetAttr call required to get to the typed values (unlike strategy B below). Is this a significant saving/issue?

Strategy B

Store each attribute's typed data on its nsAttr similar to the way we do things for the 'style' attribute on HTML content. Every time we want an attribute's typed data we'd have to go via GetAttr.

Pros.

  • Absolutely no memory is taken up for typed data unless the attribute has been set. There aren't even nulled out pointers on the content objects for absent attributes.

Issues to Solve

  • Will the number of GetAttr calls be prohibitive? Every time we (re)render we'd be calling it a lot.
  • If we have tearoff objects fetch their values from the attribute's internal typed data tree (see below), what should the tearoffs do if they are accessed when there is no corresponding attribute (or after RemoveAttribute has been called)? Should they have knowledge of default values?
  • Our current code assumes that there is always a typed data tree (which will provide default values as necessary) for it to access. Getting rid of this assumption could be tricky. We'd also need to code in knowledge of default values all over our code wherever the typed data is accessed.

It seems undesirable to duplicate knowledge of default values around the source. It raises the potential for defaults being right in some instances and wrong in others.

Issues to Solve that are Independant of Strategy

  • How do changes to values of the objects in the typed object trees result in the correct notifications? Does every object in the tree have to keep a pointer to its corresponding nsAttr? Or to its owning content object and corresponding nsAttr? Or to the object above it in the typed data tree (so notifications go up the tree)?
  • Tearoffs. We have a choice. Do we (a) store typed data on the objects created by tearoffs as well as in the typed object trees maintained by the internal code; or do we (b) have the tearoff objects fetch their typed data from the internal typed object tree that corresponds to them?
  • If we choose (a) in the above point, this seems a waste of memory. It would also mean we'd have three copies of essentially the same data to keep in sync (ugh! Nightmarish notification scenario with careful design required to make sure we don't have notification loops). How would we do that?
  • If we choose (a) in the above point, how are the tearoffs to access their corresponding typed object trees on the content object? First, how will they get to the right tree? Then how do they identify which item in a tree and/or list corresponds to them?
  • If we choose (b) in the above point (if the typed date corresponding to attributes is stored on the content objects or their nsAttr objects), what should the objects implementing the SVG DOM interfaces do if the content object is removed from the content tree and consequently deleted? Should an SVGMatrix, say, be left with the values it had? Probably. Or can we just allow it to become the identity matrix as it would if the corresponding attribute was removed? We probably want the former, but how do we support this if the values are stored as a separate tree on the content object?