LocalStorage

From MozillaWiki
Jump to: navigation, search

Problem

The web is an application platform. There are millions of people who use the web as their primary tool to read email, store and edit documents and search for information. However, the web is only usable as an interface when the network is available.

When people want to take an interface that they have designed for the web and make it work when that network is not available they only have a couple of options. They can write an application that the user installs on their hard drive and runs as a full native application. This could be written using native tools (i.e. a windows application) or as a an interpreted application using an OS-in-a-box (i.e. against sun's proprietary Java implementation or something .NET based.) This nearly always involves creating a second interface in addition to the "main" web interface. This is an expensive proposition for developers of software. The inability of most solutions in this area to leverage the existing front end web code should be considered a failure.

A second problem is that installing software on your computer often involves an all-or-nothing choice. Either you install the software, and give it complete access to your system or you don't install it and you can't gain the benefits of having the ability to do your work offline. Victims of spyware on windows understand this pain. It would be nice if we could offer a system that sandboxes data for a web application in such a way as it doesn't have to give access to local resources, other than what is normally available to a standard web application. Our extensions system also suffers from this problem.

The last problem that we believe needs to be solved is the problem surrounding how difficult it is to install software. We believe that for most users installing software is hard and scary. We believe that this is one of the reasons why IE has such a huge advantage over us in the market. It would be nice if we could offer a solution that included no obvious software installation. A good measure would be that adding a new offline "application" would be as easy as making a bookmark. Most users know how to make bookmarks and consider it to be a safe operation. This is in contrast to Extensions, which solve most of the above problems with the exception of trust and ease of interface.

Goals

These three problems lead us to a few specific goals:

1. The system should leverage the web technologies that exist today. This means that JavaScript, CSS and the DOM are the main technologies used.

2. The system should use an incremental approach that allows web developers the ability to add this to their sites with very little cost or development time.

3. The system should operate within the security principal of the original web site that provides the application, and except for an additional set of APIs that the application can use, the apps get no more permissions.

4. The system should be so easy to use and safe-looking that using it should not make users uncomfortable. This means no installation dialogs, no preferences and no progress bars. In fact, they shouldn't even know they are using them.

Development strategy

Deployment

The first thing that we need to describe is what makes up a web application. This generally consists a set of files. HTML or XML to describe a basic document heirarchy, JavaScript to manipulate it and CSS to describe how to render it. A manifest must be included that describes all of the components that make up an application.

Another problem with deployment is versioning. You have to know that a particular browser version supports a particular set of apis. With multiple versions of various browser implementations out there this creates a large matrix of support and testing that has to take place. It would be nice if the api that a browser supports also includes a capability-based versioning scheme. This also has the advantage that some browsers (like a small handheld browser with limited storage) might only have to implement part of the api.

Roc I think the right thing to do here is to make an offline application just a set of pinned pages in the cache. Then you can write a single set of pages that can function online or offline. HTTP caching instructions take care of application updating. We could automatically allow a site to pin N% of your disk space, if your free disk space is >M% of total; if they try to go over that, the user has to approve it.
Instead of providing a manifest, a more Web-friendly approach might be to crawl the page. Basically, when you select a page for offline viewing (which could be as simple as just bookmarking it), we pin it and also download and pin any of the resources we would if we were doing Save As --- images, stylesheets, scripts, IFRAMEs. In addition we could provide a new 'rel' value for <a> links that says "crawl this link for offline viewing". This would be really easy for Web developers to use and maintain and not too difficult to implement, I suspect. Application crawling and download could be done in the background.


Blizzard Pinning pages in the cache sounds like a great idea as a way to implement this, but I don't think using a heuristic that eventually asks the user to add storage is probably the right way to go. Users don't often know how big their cache is nor how much space they have left on their hard drive. I did have a hook for the actual application to throw the space dialog, but that's a little different than the browser throwing that dialog. That requires the app knowing how much space is allocated to it and how much space it is using. The nice thing about that is the app can avoid ever throwing that dialog by expiring data instead of requiring the user to just add more data.
I also think that it's important that we keep the maifest separate from the pages themselves for a few reasons:
  1. It's hard to know where an offline application "starts." That is, if you're on page X does that app start on page X or on another page entirely? The manifest is the logical starting point for the "bookmark" and can also contain "start" information. i.e. which cached page should be loaded when you are offline and the bookmark is loaded?
  2. When do you expire data? If you update your offline app how do you tell when certain pages are unused? If there's no single location for all of the pages to be found upgrades can get a lot more challenging and there's a good chance that we'll end up with some unpredictable heuristic.
  3. Raw crawling misses a lot of data. Urls and locations that are accessed via javascript and building form data on the fly can create locations that won't show up in a crawl.
  4. It would be a lot of work to maintain all of the possible links in all of your pages. I think it would be a lot easier just to have a simple text-based file that contains a list of resources for all of the pages. And a simple <link> page that links to that manifest. Everything is explicit and well-understood in that case.
udayanOne more set of problems arises with deployment of applications in geogaphically remote areas. These areas have connectivity on a very limited basis ( not 24 X 7 ), connectivity is of poor quality ( speed and reliability ) and connectivity is expensive. Also the remoteness means applications deployed in such areas need to be remotely managable, as physical access is not always feasible. This creates a need for the applications to be : (a) Remote depoyable and managable and (b) Applications must work in offline mode ( forms data based data entry, locally cached, submitted on detection of connectivity, upgrades to application and things like form templates to be cownloaded on detection of connectivity ).
For this we need the ability in the browser to be able to explicitly tag content as "offline capable", where "offline capable" would imply a set of things and not just being able to "pin" content in a cache.

Storage

It's important to think about the kinds of strategies we want to use for apps to store data. Very often on the backend of most web sites there's a structured database. We believe that following that model for our data model makes it very easy for reflect that data back into the client side store with very little effort, including encouraging the building of automated utilities to do so. However, it's not our goal to create a fully featured relational database on the client. It's important to find the correct mix between enough complexity to get things done and the simplicity that makes things easy to use.

We believe that allowing for storing and querying from the usual two standard models is important. These are query by key (maps to dictionaries) and iterative and lookup by offset (maps to arrays.)

Because people will add apps online and sometimes remove apps offline, we believe that it's possible to mark some sets of data as being "dirty" so the browser can warn a user that some data associated with that application has not been saved.

People writing apps will want to be able to download and store chunks of content and binary data such as images or java class files. Therefore, we think it's important that you have the ability to redirect the output from a url load to particular place in a database. Also, the ability to execute a url load (javascript or image, for example) from a place in storage using a url syntax is important.

It's also important to add in some basic techniques for querying the data stored in the rows has value. For example, if you have an app that's storing 50 megabytes of email and you want to search that text, loading and searching each piece of text is expensive. We should add some simple and useful sugar to make this kind of thing easy. Also, user-defined functions for sorting and comparisons in queries would make this kind of thing a lot more useful.

It would also be useful to app writers if some data could be automatically expired, like from a LIFO cache. This means that some apps could be written so that they could keep a "smart cache" around. Imagine browsing a wiki and the wiki site kept a list of all of the apps that were linked to the pages that you read. This would still allow you to do some research and editing while you were offline, even though you hadn't specifically downloaded all those articles. Or you a maps site could keep chunks of map data in your cache near your house so that they would be available when you were offline.

Applications should be able to manage their own cache if they want. This means that we need to expose the amount of storage used by any particular application and allow the application to break down the amount of data stored in any particular table.

Sharing data between applications is also important. We believe that we should use the cookie model as the basis for how to share data across applications. For example, if you're using a mail application to read mail offline the addressbook data, stored at a different site, should also be easy to access. (Think of mail.yahoo.com and address.yahoo.com.)

Roc I think that trying to provide structured storage on the client is hopeless for the same reasons that trying to provide structured storage in a filesystem is hopeless. Whatever model we choose will never be the right model for the majority of applications. Furthermore, it diverges from today's Web programming models. I think we should just provide a simple filesystem API --- essentially, a persistent hashmap from string keys to string values --- slightly enhanced cookies. Remember that developers will want to at least do all the things they do with cookies today, including obfuscation, encryption, and on our side, quota management. People can build libraries on top of this if they want to. They can build libraries for indexing, LIFO cache management and so on.
VladVukicevic 18:00, 7 Jul 2005 (PDT): That's my thinking as well -- something like "supercookies" instead of fully queryable structured storage. At the very least, they'd be much simpler to start to use. We can also provide helpers for serializing a JS Object to/from the value format, which should take care of most consumers' needs. I wouldn't try to just use the cookies api though, since unlike any local-cookies (brownies?) these would never be transmitted to a remote server as part of a request. I think a simple hash map would fit well with the AJAX model as well.
Roc I agree so I deleted that part of my comment.

UI

We believe that it's important to avoid having to re-educate users about what this new model means to them. Their only experience should be that the web is suddenly much more useful than it was before, while at the same time being just as safe.

We think that the best possible place to make this change is through the bookmark user interface. It leverages the existing training and user interface to which people are accustomed. People know how to use bookmarks and already feel safe doing so. For example if you bookmark a site and that site contains a special <link> tag that bookmark will be added as a "smart bookmark" downloading the manifest for the application. Then when you access the site through the bookmark and you're offline you get the local copy of in the manifest instead of the one off the network.

Deleting a smart bookmark can delete the associated storage of that data, after warning a user of any unsaved data.

We can only identify two places where new UI might be required.

1. A way to modify the amount of storage that's allocated to a particular app.

2. A warning that a user is about to remove an "bookmark" that contains unsaved data.

Since apps will know how much of the allocated storage they have used, it might be nice to allow an app to throw the dialog that change the amount of storage that's used.

There's also the problem of trusted vs. untrusted computers. We should probably add a way to easily disable this functionality completely for use on untrusted computers.

In summary, it's important that we leverage the existing UI that's out there and make these new "apps" painless and transparent to users.

Roc I really like the bookmarks UI idea. When we're in offline mode bookmarks for pages that are not available should be disabled with a tooltip explaining why. Although maybe everything that gets bookmarked should be downloaded for offline access anyway.

APIs

The apis have a few easy rules

1. Think about the use cases.

2. Always leverage what you have today.

3. Avoid over-complex calls and abstractions. People can (and will) add those later through their own libraries.

Known use cases

1. Storage <-> XML-RPC bridge. It's clear that people will want to build bridges between XML-RPC and this storage mechanism.

2. Binary data retreived from URLs. People will want to download binary data (especially images) into a database row directly.

3. Storing and querying text. It should be possible to download and search text strings.

4. Storing and querying structured data. If you download an XML document into a database row, it might be nice to be able to search that data based on an attribute name or value stored in that XML document.

Deployment APIs

A few different APIs have been proposed:

1. A database api that's based on a classic relational databaes model. That is, tables and rows.

2. A simple dictionary system. A single-level lookup based on a simple string key.

3. No API. Just the ability to just cache local pages.

Functional Coverage

Required APIs probably include:

1. A way to access local storage and read information out of the database.

2. Assuming that we want to go with a system which allows a huge amount of storage, a way to access and query (?) that data.

3. A way to handle page transitions from one page to another. This would probably be a client-side equiv to form handling on the server side.