From MozillaWiki
< Labs‎ | Bespin‎ | DesignDocs
Jump to: navigation, search


There are a number of areas where we have a growing need to store data about data:

  • History of saved and unsaved changes to a file since it left VCS
  • Status messages
  • Mobwrite diff records

This proposal provides a generic mechanism for all these uses.


The meta-data system should:

  • Be accessed via an API so the disk layout can be changed in the future
  • Have zero risk of data and meta-data files colliding
  • Should allow the storage of large amounts of data (e.g. the current edit version of a file)
  • Should allow fast append only mode which doesn't require re-writing large amounts of data
  • Storage should count towards a users quota (TODO: Are there any cases where this should not be the case?)
  • Ensure that data on a file should be deleted/moved/renamed with the file
  • Not waste space by leaving unowned flotsam files or directories behind
  • Should allow efficient serialization of Python objects (pickling?)

Proposed Solution


# Get a File object
project = get_project(user, owner, 'MyProject')
file = project.get_file_object("example.js")

# Read the 'live-edit' meta-data
current = file.metadata['live-edit']
# Reads from MyProjectMeta/example.js/live-edit

# Write to the 'status-messages' meta-data
file.metadata['status-messages'] = new_msg
# Writes to MyProjectMeta/example.js/status-messages

TODO: Missing from this API are examples of pickling and appending.

Data Storage

Inside a users project directory we should have something like:

- SomeProject/
  - example1.js
  - some-dir/
    - example2.js
  - ...
- SomeProjectMeta/
  - example1.js/
    - status-messages
    - live-edit
    - chat-log
    - ...
  - some-dir/
    - example2.js/
      - status-messages
      - live-edit
      - chat-log
      - ...

I think this system is extensible, and there isn't any danger that the data will collide with the meta-data.

Potential Uses

This is an annotated JSON (ish) dump of the potential meta-data that we might record against a file:

  // We add to this list on each save when there is a status message
  // And clear the list on a commit, having offered the list
    "fixing bug #42",
    "frobbing the foo setting to see whats up"

  // We need to store the current version of the file separately from
  // the saved version. This could be large, and should probably be stored
  // in a separate file to avoid unnecessary IO
  live-edit:"The full text of the file\nincluding new lines\n",

  // We need a set of diffs for time machine to take us from the saved
  // version to the live version. This example is raw from mobwrite but
  // I suspect we will need a more compact, more coalesced version
  // Also while the individual changes may not be large, this could have
  // a high write frequency
    }, {
      status:"Bug #42"

  // We should record each time the file is saved back to the last commit
  // This allows time machine to work properly. The changes will be
  // larger than with the diffs-saved-to-live case but will be much more
  // coalesced. We should certainly use an external diff format rather
  // than mobwite for this
      diff:"24,25d23\n<  \n< alert('hello');\n27c25\n< alert('world');\n---\n",
      status:"Bug #42"

  // If discussion of features can be tied to a file then we could have a
  // big head-start in writing documentation
    { timestamp:2009-04-06-13-09-00, sender:jwalker, message:"Hello" },
    { timestamp:2009-04-06-13-09-05, sender:kdangoor, message:"Hi Joe" },

TODO: Work out how fast-append might work with pickling