WebAPI/WebActivities/LessonsLearned

From MozillaWiki
Jump to: navigation, search

Based on internal experience, as well as experience talking with other partners like Google and Pocket, we have learned a lot about how to design the next iteration of WebActivities.

Below I'll use the term 'app' not to refer to a Android/iOS installed app, but rather to the generic concept of a webapp.

To simplify the discussion, here's two standard flows when using WebActivities

Using an activity that returns a value

  1. User launches facebook
  2. User clicks "attach a picture"
  3. Facebook launches the "pick" WebActivity
  4. The activity picker comes up.
  5. User chooses to use the camera app as activity handler
  6. Camera app is launched
  7. User takes a picture
  8. Camera app returns the picture as result of the activity
  9. Camera app is closed and user is back in facebook app.

Using an activity that does not return a value

  1. User goes to news website and finds an interesting article
  2. User clicks "share" button
  3. News website lauches the "share" WebActivity
  4. The activity picker comes up.
  5. User chooses to use the twitter app as activity handler
  6. Twitter app is launched
  7. Twitter app prefills URL of news article as tweet content
  8. User writes short message to go along with the URL and submits the tweet

UX issues with activities that return a value

In FirefoxOS, if the user in step 7 of the returns a value flow temporarily switches app, for example to look something up, or to answer an incoming phone call, then the activity is aborted and facebook receives an error message. We were forced to do this in order to deal with the situation when the user might switch back to the facebook app, in which case we didn't want to force facebook to deal with having to implement a sensible UI while the activity was still in progress. However aborting the activity just because the user looked something up is obviously bad.

In Chrome's Web Intents, step 6 of returns a value flow launches the camera app in a separate tab in the browser. That creates an awkward relationship between the tab with the facebook app and the tab with camera app. In particular, if the user closes the facebook tab, then when the picture is taken, it's dropped on the floor since there is no facebook app to receive it.

Recommended solutions

For activities that return a value, the activity handler should be overlayed on top of the initiating app. I.e. in the example above the camera app should be rendered on top of the facebook app.

In a small-screen device this means that the application switcher shouldn't display facebook and the camera as two separate apps currently running. Instead only the facebook app should be listed, though possibly the UI could somehow indicate that the facebook app is currently using the camera app or some such. Switching to the facebook app should render the camera app on top of it.

On a large-screen device with a traditional tabbed browser UI, the camera app should be rendered in the same tab as the facebook app. I.e. the camera app would be on top of the facebook app.

UX issues with opening in activities in a new app/tab

In FirefoxOS a webactivity always launches a new fullscreen app. This makes activities always have a fairly heavy flow since it involves two app switches, one to the activity handler, and one to switch back to the original app.

Activity handlers that want to implement things like "save this for later" or "share on my photo stream" only needs to display minimal UI that doesn't take the user out of the current app.

Recommended solutions

We should implement a "disposition: 'inline'" like what Web Intents has. This is already specified for WebActivities but was never implemented. An inline handler would be rendered like a "dialog" on top of the current app.

On large-screen devices this likely means that it sizes to content. On small-screen devices this might simply mean that it doesn't take up the full screen.

Pocket also has dome some really great research here.

This research proposes even allowing overlays that render directly on top of the initiating app. This provides for some pretty awesome UI, but also exposes issues like clickjacking. Ideally disposition:inline can bring most of the benefits of this proposal, while still not exposing clickjacking risks.

UX issues if the handler app is already running when the activity is launched

In FirefoxOS's implementation of WebActivities, if the twitter app was already running in the background when the does not return a value flow happens, we switch to the twitter app and send a message to it and ask it to handle the activity. However this means that the app has to leave it's current state in order to do so. So if the user was in the middle of some other task within the twitter app, that is now lost.

Recommended solutions

If we follow the recommendation above for UX issues with activities that return a value that actually solves the problem for activities that return a value. A new page will always be opened and rendered on top of the app that initiated the activity.

Likewise disposition:inline activities will also not suffer this problem since they open a new page inside the inline UI on top of the page that initiated the activity.

For other activity handlers a good default is likely to always open a new tab to handle the activity.

But we could also enable handing an activity using an existing page by allowing using a ServiceWorker as activity handler, and then allow the service worker to delegate to an already open page. Possibly we could even allow the ServiceWorker to open a disposition:inline activity.

Lack of ability to save intermediate results

Consider a "Google Drive" app that uses an "edit" activity to launch a "photoshop" app in order to edit a picture file.

In the current WebActivities and Web Intents implementations, the only way to accomplish this would be to have the photoshop "edit" activity handler return the edited image once the user was done editing, and then have the Google Drive app save the resulting file.

There are a couple of issues with this flow though. First off given that the activity would be one returning a value, the photoshop app would have to be opened on top of Google Drive. This isn't always desirable.

A bigger problem is that photoshop would not be able to save intermediate drafts to the Google Drive app. It would either not save them at all, which means risking more dataloss in case of a crash or accidentally closing the app, or it would have to save them somewhere in photoshop's storage area. In case of crash it would be awkward to get the edited data back into Google drive. The user would have to relaunch the edit activity and select photoshop again, and then photoshop would have to detect that the same file was being edited and offer to reuse the previously saved draft.

A desired flow here is instead to enable Google Drive to launch the "edit" activity such that Photoshop could be opened in a new window. But also enable Photoshop to have an open communication channel back to Google Drive. However the user should be able to close the Google Drive tab while still enabling Photoshop to send data to Google Drive in order to save intermediate drafts.

Recommended solutions

When launching an activity, it should be possible to also provide "back channel" information. If provided, the activity handler would be able to postMessage arbitrary information back to the app that initiated the activity.

These messages would likely need to be sent not to the execution context that initiated the activity, but rather to its Service Worker. This way messages can be sent even if the execution context that initiated the activity has been closed.

The app initiating the activity would also need to provide some arbitrary data which is passed back any time messages are sent from the activity handler and which can't be altered by the activity handler. In the example above the Google Drive app could provide the filename of the file being edited.

Ability to switch from inline to full-window handler

A disposition:inline handler might need to defer to a more complex UI depending on user actions. For example a facebook "share" handler might start as an inline handler, but need to switch to a more complex UI if the user wants to configure security settings or add complex data to the post.

This gets especially tricky for activity handlers that return a value. In this case the full-window handler would need to be rendered on top of the current app, I.e. it couldn't be handled like a normal window.open with target=_blank. Additionally the new page needs to take over the responsibility of returning a value.

Recommended solutions

Sadly I don't have any recommendations here. Possibly simply not supporting this scenario is the right solution for now. Instead we can allow a display:inline handler to resize itself to handle the more complex UI.

What might help is to force this scenario to be handled by allowing a ServiceWorker to handle the activity and then allowing the ServiceWorker to open inline or full-window handlers on top of the initiating app.

Activity handlers that don't return a value can always open new pages using target=_blank links or window.open calls.

Should activity launcher have a say in the disposition of the handler?

This is mostly based on notes from a WebActivities/Web Intents discussion. I sadly don't remember all the details here.

For activities that do not return a value, the initiating app might want to treat launching the activity handler as either a "navigation" or as a "open in new tab".

In the Google Drive/Photoshop example above, it seems like it should be the decision of the Google Drive app if Photoshop should replace the Google Drive app, or if opening Photoshop should be treated like opening a <a target="_blank"> link.

Recommended solutions

We could support a target attribute when initiating an activity. The target would be ignored for activities that return a value, and possibly also for disposition:inline activities. However I don't know of use cases which involve targeting named windows, so possibly a target attribute is too generic.

Another question is if it should be possible to target _self if the current page is open in an iframe. I.e. should activity handlers need to worry about possibly being opened in a subframe?

Finding activity handlers that match a given activity request

When an activity is initiated we want to fairly quickly bring up a list of apps that are able to handle the activity. In WebActivities we allow activity handlers to provide an object which describes a filter which the activity data is matched against.

However it's proven pretty hard to find a format to express these filters. For example for the "pick" activity if an application asks for a picture it could do that either by specifying any of type: 'image/*', type: 'image/gif' or type: 'image/jpg'. So any activity handler for the "pick" activity that can provide an picture has to enumerate at least all of those types. And it also has to make sure that its filter matches if no specific type is requested.

Another problem is that if a contact-manager app provides a handler for the "pick" activity in order to allow the user to choose a contact when a contact is explicitly requested, but not come up when the "pick" activity is activated to select an arbitrary file, i.e. when no type is specified.

Recommended solutions

It's possible that WebActivities' current filter mechanism actually can handle most use cases. It currently supports features like "match any of the values in this array", regexps and optional vs. required attributes. However mimetype matching has been awkward and handler applications have forgotten to enumerate wildcard types like "image/*" which means that they don't always show up when they should.

Possibly the situation can be improved by providing explicit features for mimetype matching.

Another thing that we might ultimately want to do is to allow passing a javascript expression which is evaluated on the activity data and then returns true/false. However this is not something we'd be able to implement in FirefoxOS right now since we're trying to keep process separation between code from different apps.

UX Problems with activity picker

Again, Pocket has done great research on problems with activity pickers, in particular in the Android Intent implementation. One of the main points is that in situation when there's a long list of candidates for handling an activity, making it easy to access the two or three most commonly used handlers is important. All handlers should not be treated equal.

Pocket recommendations.

Differences between "data sources" and activities/intents

One of the things we looked at solving with WebActivities in FirefoxOS was access to "data sources". For example the list of contacts, the list of calendar events, the music library or the camera photo stream.

However due to various complexities we found that "data sources" tend to be different enough from other WebActivity use cases that we'll likely need significantly different primitives. Though it might be possible to tie in with WebActivities somehow.

When granting access to for example contacts to an app, you likely want that app to have access to all your contacts, not just the contacts from a specific other app. So you'd want to grant access to your gmail contacts, your facebook friends, your outlook address book and any "built in" contact address book.

Getting contacts from many apps at the same time can be prohibitively expensive if you have to launch all those apps, even if they don't need to render UI. Each app runs in a separate process and starting multiple processes takes a lot of CPU and memory on mobile.

Displaying a merged contact list from all your contacts sources will take too long if you have to get the full contact list from each source, then merge those lists, then sort them, then reformat to get just the data you want to display in HTML form.

This is even more true for a photo stream where you also need to generate thumbnails.

It needs to be possible to do some of that work ahead of time and then react to any changes in those data sources before the user launches the app. Potentially it could help if the application cached the previously generated list, then ask each data source provider for changes since last time, and then update once it has received those changes. However we still want to get those updates fairly quickly to avoid the risk of the user looking at old data for too long and then having it "snap" to show the new data.

When a user grants access to a data source we need to indicate if permanent access is granted, or just a one-shot access. That indicates a somewhat different UI than we want for "share" and "pick".

Similarly, the UI that comes up when an application tries to access "contacts" data sources, the user needs to be able to choose multiple applications that can provide that type of data. Normally for a WebActivity the user just picks a single application to handle the activity.

Recommended solutions

In FirefoxOS we have implemented a "shared indexeddb-like database" API which allows one application to write a database to disk which another application can then read without the need of starting the application that owns the data source.

This enabled us to allow reading "contacts" databases from multiple apps without worrying about launching multiple processes. When access to "contacts" is requested, we plan to render a "data source" specific UI which enables the user to pick multiple applications as well as choose if read vs. readwrite access is granted.

It also enables knowing when the data in a data source changes, and what changes were made. This way we can wake up applications that want to precompute thumbnails or do other heavy data operations. We can also enable the UA to provide applications with the list of changes that were made since last time they updated, without requiring that each data source needs to implement this algorithm.

This solution is still very experimental, and we still haven't figured out all the aspects of this.

We haven't figured out the security aspects of exposing write access to the "contacts" data sources, without worrying that a rouge app could simply delete or corrupt all the user's contacts. A simple yes/no security dialog doesn't feel like enough to protect the user. So for now we are sadly relying on similar mechanisms that we use to protect TCPSocket and SD-card access, i.e. signatures from a trusted party.

We also haven't figured out how to ensure that an app that has write access to contacts follows whatever format a contact should have.

For v1 my recommendation is to punt on the "data sources" use case.