Labs/Bespin/DesignDocs/VCSIntegration

From MozillaWiki
< Labs‎ | Bespin‎ | DesignDocs
Jump to: navigation, search

Version Control System (VCS) Integration

Kevin Dangoor, Mozilla Labs Developer Tools

Feature details

What's this all about?

One of the first steps in making Bespin useful for day-to-day work is integrating it with the systems we use already. For software developers, the version control system (VCS) is the authoritative place where source code lives, so it makes sense for Bespin to integrate with the VCSes that people already use.

This document describes the *short term* plan for integrating current version control systems into Bespin.

There are three top level tasks for the VCS integration:

  1. getting information into Bespin from a VCS
  2. working with the code once it's in Bespin
  3. getting information back out of Bespin

Supported Version Control Systems

Our primary initial target is Mercurial, so that we can begin using Bespin to work on Bespin itself. However, the plan is for it to be very easy to support different systems. Subversion, Git and Bazaar-NG are other first-class targets that we want to support as soon as possible.

Getting projects into Bespin

Currently, all Bespin data is stored directly in an SQL database. The VCS release of Bespin will move all of the files (and likely file metadata) into the filesystem where the native VCS tools can manage them.

In the simple case, a user wants to create a new project from a public repository.

hg clone http://hg.mozilla.org/labs/bespin bespin

This will create a clone of the repository at the given location, with the result going into the new project 'bespin'. There should be a confirmation dialog if the project already exists, because this would overwrite the existing project.

Bespin needs to know which version control system is in use, so this command is necessarily version control system specific. Other commands in this set will be:

git clone git://github.com/tlrobinson/jack.git
svn checkout http://paver.googlecode.com/svn/trunk/ paver
bzr branch https://launchpad.net/drizzle

After you have your project set up, you can use the VCS-independent commands.

Authenticated Repository Access

There are bound to be people who need to access repositories that are not publicly readable. Something that is certain is that there will be people who want to push code changes from Bespin back up to a server that requires authentication for commit access.

This is potentially a very dangerous capability to provide. For many, there is a username/password that they use to commit code. For others, we would need their SSH *private* key to perform a commit.

Someone running Bespin behind a firewall (or on their desktop) may not care. On the other hand, a publicly accessible Bespin needs to keep that information as secure as possible.

For this reason, remote repository credentials can be kept encrypted with a password that the user keeps to themselves. bespin.mozilla.com will require this, and that password will need to be typed for each authenticated interaction with the remote server.

A dialog will be displayed when authentication needs to be gathered (either at checkout or push time). This dialog will prompt the user for their remote credentials (username/password or ssh key) plus their repository access password.

After that initial dialog, any future remote access attempts will pop up a dialog requesting the repository access password.

General Command Concepts

In all commonly used version control systems, there is a small collection of commands that are used all the time. These common commands will be implemented as generic commands that work regardless of version control system in use. This is an important convenience, because users can have projects pulled from many different systems.

Because the VCS software is running on the shared Bespin infrastructure, the security implications of allowing unfettered access to all of the commands are unclear. Consequently, we will not provide complete access to all of the commands.

However, we *can* basically whitelist commands and parameters to those commands where we believe we can ensure their safe use within our environment. We can also offer an option for people running their own private Bespins to not include those limitations.

As an example, we may not offer a general "bundle" command, because that's not as common a need and not all VCSes offer that. But, we may make an "hg bundle" command.

Working with a checked out project

The common set of commands will be provided. We will start off with a limited set of commands and options and then grow them as required.

log [filename or directory]
display the history of the project/file/directory. Initially, at least, the format will not be consistent between VCSes. It will be whatever the VCS reports.
diff [filename or directory]
display the differences between the working copy and the checked out code. Ideally, the diff should be displayed in an editor window with syntax highlighting.
createpatch
creates a patch file for all outgoing changes (your browser will prompt you to download the file)
update
command requiring REMOTE access, this will update with the latest code from the repository that Bespin pulled from. Conflicting changes will be marked up in the files.
status
display the status of the files. The server API for this command will return {'filename' : 'flag'} for the output, so that the UI can potentially display this information in the dashboard. For example {'README.txt' : 'M'} means that README.txt was modified.
resolve [filename or directory]
mark conflicts as resolved
commit [-m "message"] [filename or directory]
For the DVCSes, this will commit to the local repository. For Subversion, this command will commit to the svn server.
push [filename or directory]
REMOTE COMMIT access required for DVCSes, which will push the changes up to the same repository that was originally cloned. For svn, this will just display the message "Use the commit command to send changes to the repository."

VCS-specific commands

When working with a given project, only the VCS-specific commands for that project should be available. For example, if you're working on Bespin, the "hg" command will be there, but the "bzr" command will not.

These commands will be *exactly* the commands used on the command line with the difference that the commands/subcommands and options will all go through a whitelist and vigorous validation of parameters.

Getting projects out of Bespin

Distributed version control systems offer a convenient way to get code out of Bespin. If we can make Bespin's repositories available directly via the VCS's normal protocol, then the user can simply pull changes from Bespin's repository.

For Subversion, and the instances where it's likely more convenient in a DVCS, we will provide authenticated push access. This may be necessary for DVCSes anyway, if there proves to be technical hurdles to exposing Bespin's repositories natively.

Finally, users can use the current export feature. That is the least convenient for normal use, but it is working today.

Server API changes

Due to the flexible nature of this command, I'm envisioning adding a small new server API:

POST /vcs/[project]/ 
Send a JSON message {'command' : ['command', 'other', 'arguments']} as the body. A 400 response with error text in the response body will result from a bad command line. A 200 response will have a JSON result. For many commands, there will just be 'output' on the object with string output that can be displayed to the user.

Data model changes

User changes

Users will remain in the database. Each user will be given a UUID (uuid4 - random) that will be used for on-disk identification of the user.

The user's directory will also include a .bespin-status.json file which provides user-level information about the files in their project. (At this point, it is a JSON object with an "open" attribute that lists all of the user's open files).

Filesystem layout

The general layout in the filesystem will be like this:

datadir/
  userUUID1/
    .bespin-status.json
    .BespinSettings.json
    BespinSettings/
      (checked out files/repository)
    .OtherProject.json
    OtherProject/
      (checked out files/repository)
 userUUID2/
   BespinSettings/
   

Many filesystems do not like large directories. There will be a setting "fslevels" that determines the amount of nesting that should occur. In development, that setting will be 0, so there is no nesting. Basically, the characters of the UUID (excluding dashes) will be make up the nesting levels. Because the UUIDs are random, we will just assume a fairly even distribution. With enough levels, the directories will not be overpopulated.

Project changes

The projects table will go away, and projects appear as above in the file system.

Project metadata will be stored in a file called .ProjectName.json. It is a json file that will look like:

{type: 'hg', open: {'path/to/file' : {'mode' : 'rw'}}}
type
kind of repository (files, hg, svn, bzr, git)
open
open files object

Directory changes

The directories table will go away. (Of course, the directories will all be converted over!)

File changes

The files table will go away. (Files will be converted over to the file system format as well.)

Edits are being punted on for the moment, until it is clear how they intersect with the collaboration work.

File Status changes

The file status table will go away with its information in the project metadata file.