Community:SummerOfCode09:WebPagesOverRsync
This page summarises the known information regarding the possible Summer of Code 2009 project: "Web pages over rsync".
Emails from Tridge
On 21/03/09 at 01:39 GMT, Tridge wrote:
I would not concentrate on the delta-compression library at first. I think the more important piece is to create the infrasructure in mozilla to support the flow of data you need for delta-compression. The librsync library is a bit of a mess, and it may be that some more recent code written by Rusty Russell may be a better choice.
I think the features we need in mozilla are:
- the ability for a plugin to say "cache all pages, regardless of cache tags or normal http cache semantics". You'll need to somehow ensure that if delta-compression is not used, that these extra cached pages are not used. I'm guessing this will require a fair bit of surgery in the mozilla page cache code.
- the ability for a plugin to add a new supported encoding type, along with an additional header (or possibly an etag?) to give the server additional information on how to do the encoding.
- the ability for that plugin to then intercept the page as it comes back, check the encoding type, and decode the resulting page.
For testing purposes, the encoding could be as simple as an XOR with a random string. That would allow you to test that the idea works, while not worrying too much about the details of the delta-compression scheme. You could test this against a simple perl/python CGI script under apache to make sure it works right.
Only after that works nicely would you start plugging in a real delta-compression scheme. Then you can either try to resurrect the librsync code, or you could look at the newer crc based delta compression that Rusty has been working on in ccan (Rusty is CCd on this email).
You also might like to look at this early (but working) version of rproxy:
http://samba.org/tridge/rproxy-99.tgz
It is from a demo I gave in 1999. To run it, do something like this:
rproxy localhost:8081 8080
then in another window (or on another machine) do this:
rproxy yourproxy:3128 8081
then set your browser to go via a proxy at localhost:8080. The data will be delta-compressed between the two instances of rproxy.
The code is pretty horrible, but it may be a useful way of you playing with a working example.
Cheers, Tridge
On 21/03/09 at 05:31 GMT Tridge wrote:
yes - for this to be useful we need to get in into 3 main types of programs:
- web browsers
- web servers (probably starting with Apache)
- web proxies (eg. squid)
Getting it into any two of these will make it useful. In all 3 will benefit the most.
The existing prototype of rproxy enables the protocol extension when two or more entities in the chain between the client and the server support the extension. The delta-compression then applies to all the data going between those entities, and it is transparent to those on either side.
Cheers, Tridge
On 23/03/09 at 05:10 GMT, Tridge wrote:
I would hope you'd hook into the existing cache code in firefox, but extend it to allow the plugin to ask for pages that are normally not cached (such as dynamic pages) to be cached. So the cache size limits that are already controllable by firefox users would work.
One interesting question is whether we can have a way to prevent cacheing when the site doesn't support the extensions to take advantage of the cached data. Should the plugin not trigger unless we've visited the site previously and received an indication that the server supports the extension?
> Again, would you please share your prototype with me?
sure, it is here:
http://samba.org/tridge/rproxy-99.tgz
It is from a demo I gave in 1999. To run it, do something like this:
rproxy localhost:8081 8080
then in another window (or on another machine) do this:
rproxy yourproxy:3128 8081
then set your browser to go via a proxy at localhost:8080. The data will be delta-compressed between the two instances of rproxy.
The code is pretty horrible, but it may be a useful way of you playing with a working example.
Cheers, Tridge
RFC 3229
RFC 3229 also specifies a delta encoding standard for HTTP.