Community:SummerOfCode09:WebPagesOverRsync

From MozillaWiki
Jump to: navigation, search

This page summarises the known information regarding the possible Summer of Code 2009 project: "Web pages over rsync".

Abstract

A lot of web pages today are dynamic and so uncacheable, even though large parts of them are the same between requests. The rsync protocol is a great way of sending the deltas between two similar files in a small number of bytes. tridge has done a proof of concept, using proxies and librsync, for caching everything and using rsync to send deltas instead of complete pages when things change a bit. This has the potential to transform the web experience for users on slow connections. The project would be to fix up his proxy, and make a matching Firefox extension which together would form a proof of concept.

tridge has agreed to co-mentor from the rsync/compression protocol side.

Possible Modifications To The Above

Some Apache people are at the moment working on an Apache module using a different algorithm called "crcsync". So it looks like this is an idea whose time has come. Subject to discussions with that project, we may end up going with a different algorithm. The implementation should be flexible enough to support different algorithms. It is anticipated that the bulk of the work here will involve modifications to Mozilla to support this type of use rather than working with rsync or crcsync directly.

The Apache group have a mailing list for discussing this. Potential students are encouraged to join and to read the archives.

Emails from Tridge

On 21/03/09 at 01:39 GMT, Tridge wrote:

I would not concentrate on the delta-compression library at first. I think the more important piece is to create the infrasructure in mozilla to support the flow of data you need for delta-compression. The librsync library is a bit of a mess, and it may be that some more recent code written by Rusty Russell may be a better choice.

I think the features we need in mozilla are:

  1. the ability for a plugin to say "cache all pages, regardless of cache tags or normal http cache semantics". You'll need to somehow ensure that if delta-compression is not used, that these extra cached pages are not used. I'm guessing this will require a fair bit of surgery in the mozilla page cache code.
  2. the ability for a plugin to add a new supported encoding type, along with an additional header (or possibly an etag?) to give the server additional information on how to do the encoding.
  3. the ability for that plugin to then intercept the page as it comes back, check the encoding type, and decode the resulting page.

For testing purposes, the encoding could be as simple as an XOR with a random string. That would allow you to test that the idea works, while not worrying too much about the details of the delta-compression scheme. You could test this against a simple perl/python CGI script under apache to make sure it works right.

Only after that works nicely would you start plugging in a real delta-compression scheme. Then you can either try to resurrect the librsync code, or you could look at the newer crc based delta compression that Rusty has been working on in ccan (Rusty is CCd on this email).

You also might like to look at this early (but working) version of rproxy:

 http://samba.org/tridge/rproxy-99.tgz

It is from a demo I gave in 1999. To run it, do something like this:

 rproxy localhost:8081 8080

then in another window (or on another machine) do this:

 rproxy yourproxy:3128 8081

then set your browser to go via a proxy at localhost:8080. The data will be delta-compressed between the two instances of rproxy.

The code is pretty horrible, but it may be a useful way of you playing with a working example.

Cheers, Tridge


On 21/03/09 at 05:31 GMT Tridge wrote:

yes - for this to be useful we need to get in into 3 main types of programs:

  1. web browsers
  2. web servers (probably starting with Apache)
  3. web proxies (eg. squid)

Getting it into any two of these will make it useful. In all 3 will benefit the most.

The existing prototype of rproxy enables the protocol extension when two or more entities in the chain between the client and the server support the extension. The delta-compression then applies to all the data going between those entities, and it is transparent to those on either side.

Cheers, Tridge


On 23/03/09 at 05:10 GMT, Tridge wrote:

I would hope you'd hook into the existing cache code in firefox, but extend it to allow the plugin to ask for pages that are normally not cached (such as dynamic pages) to be cached. So the cache size limits that are already controllable by firefox users would work.

One interesting question is whether we can have a way to prevent cacheing when the site doesn't support the extensions to take advantage of the cached data. Should the plugin not trigger unless we've visited the site previously and received an indication that the server supports the extension?

> Again, would you please share your prototype with me? 

sure, it is here:

 http://samba.org/tridge/rproxy-99.tgz

It is from a demo I gave in 1999. To run it, do something like this:

 rproxy localhost:8081 8080

then in another window (or on another machine) do this:

 rproxy yourproxy:3128 8081

then set your browser to go via a proxy at localhost:8080. The data will be delta-compressed between the two instances of rproxy.

The code is pretty horrible, but it may be a useful way of you playing with a working example.

Cheers, Tridge


Resources