Labs/Bespin/DesignDocs/MetaData
< Labs | Bespin | DesignDocs
Jump to navigation
Jump to search
Examples
There are a number of areas where we have a growing need to store data about data:
- History of saved and unsaved changes to a file since it left VCS
- Status messages
- Mobwrite diff records
This proposal provides a generic mechanism for all these uses.
Requirements
The meta-data system should:
- Be accessed via an API so the disk layout can be changed in the future
- Have zero risk of data and meta-data files colliding
- Should allow the storage of large amounts of data (e.g. the current edit version of a file)
- Should allow fast append only mode which doesn't require re-writing large amounts of data
- Storage should count towards a users quota (TODO: Are there any cases where this should not be the case?)
- Ensure that data on a file should be deleted/moved/renamed with the file
- Not waste space by leaving unowned flotsam files or directories behind
- Should allow efficient serialization of Python objects (pickling?)
Proposed Solution
API
# Get a File object
project = get_project(user, owner, 'MyProject')
file = project.get_file_object("example.js")
# Read the 'live-edit' meta-data
current = file.metadata['live-edit']
# Reads from MyProjectMeta/example.js/live-edit
# Write to the 'status-messages' meta-data
file.metadata['status-messages'] = new_msg
# Writes to MyProjectMeta/example.js/status-messages
TODO: Missing from this API are examples of pickling and appending.
Data Storage
Inside a users project directory we should have something like:
- SomeProject/
- example1.js
- some-dir/
- example2.js
- ...
- SomeProjectMeta/
- example1.js/
- status-messages
- live-edit
- chat-log
- ...
- some-dir/
- example2.js/
- status-messages
- live-edit
- chat-log
- ...
I think this system is extensible, and there isn't any danger that the data will collide with the meta-data.
Potential Uses
This is an annotated JSON (ish) dump of the potential meta-data that we might record against a file:
{
// We add to this list on each save when there is a status message
// And clear the list on a commit, having offered the list
status-messages:[
"fixing bug #42",
"frobbing the foo setting to see whats up"
],
// We need to store the current version of the file separately from
// the saved version. This could be large, and should probably be stored
// in a separate file to avoid unnecessary IO
live-edit:"The full text of the file\nincluding new lines\n",
// We need a set of diffs for time machine to take us from the saved
// version to the live version. This example is raw from mobwrite but
// I suspect we will need a more compact, more coalesced version
// Also while the individual changes may not be large, this could have
// a high write frequency
diffs-saved-to-live:[{
timestamp:2009-04-06-12-10-00,
creator:kdangoor,
diff:"u:ycxraw:-\nF:11:sharer/sharer_project/test.txt\nd:11:=7+inserted",
status:null
}, {
timestamp:2009-04-06-12-10-30,
creator:jwalker,
diff:"u:ycxraw:-\nF:12:sharer/sharer_project/test.txt\nd:12:",
status:"Bug #42"
}
],
// We should record each time the file is saved back to the last commit
// This allows time machine to work properly. The changes will be
// larger than with the diffs-saved-to-live case but will be much more
// coalesced. We should certainly use an external diff format rather
// than mobwite for this
diffs-tip-to-saved:[{
timestamp:2009-04-06-01-12-00,
creator:dalmaer,
diff:"24,25d23\n< \n< alert('hello');\n27c25\n< alert('world');\n---\n",
status:"Bug #42"
}
],
// If discussion of features can be tied to a file then we could have a
// big head-start in writing documentation
chat-log:[
{ timestamp:2009-04-06-13-09-00, sender:jwalker, message:"Hello" },
{ timestamp:2009-04-06-13-09-05, sender:kdangoor, message:"Hi Joe" },
}
}
TODO: Work out how fast-append might work with pickling