Service Workers & Offline MDN - HackOnMDN 2015

This is a project from HackOnMDN 2015, focusing on Service Worker documentation on MDN and trying to put this new technology to good use to have the ability to cache/make available sections MDN pages offline.

Goals in expanding the Service Worker documentation:

Expand on the documentation
Clean up and add samples and to existing documentation
Add an introductory page maybe? (something like http://www.html5rocks.com/en/tutorials/service-worker/introduction/ the html5rocks tutorial)
Best practices page? (based on http://jakearchibald.com/2014/offline-cookbook/#putting-it-together the Service Worker cookbook)

Offline MDN is explained in detail below.

About Service Workers

The W3C Service Workers is a new web standard that empowers web developers to create great offline for their webpages (scripted offline caching) in a modern & highly customizable way. Also useful for improving page download speed (if well configured) even when there is a connection. For more info check out the explainer document or these slides.

Service workers are currently available in Chrome Stable & (as of 40+) and are coming to Firefox ( https://blog.wanderview.com/blog/2015/03/24/service-workers-in-firefox-nightly/ first available in Nightly & Dev Edition in April/May, with a possible stable release projected for Firefox 40+). All above should be interpreted as "parts of the API becoming available" - the spec./API is still in much flux, and implementations in Chrome/Firefox still miss key parts which need to be polyfilled or worked around (see in detail later below).

Service Worker docs on MDN

Docs are going into MDN in a steady pace on Cache API, Fetch API and Service Workers in general, but could use some more love. Example documentation links:

Samples in articles are scarce and mostly reference (stale) chrome-related external samples.

Offline MDN

The idea is to have two-level caching (static+on-demand) on MDN pages, gated on Service Worker browsers (as a progressive-enhancement feature).

Core Service Worker caching support

Once Service Worker support is detected, the browser installs the SW script which caches key, core parts of the MDN experience (such as assets, images, main page, also might cache current page to preemptively cache all visited pages). Once the browser window is reloaded/a navigation occurs the service worker activates and starts serving these parts from the browser cache, also making available the core experience online.

On-demand cacheable MDN segments

After the Service Worker activated, the second level of caching becomes available: placing a "Save this section offline" on pages, users could cache sections of MDN (JavaScript documentation, DOM, Web API-s etc.), which pages then will be downloaded and kept up to date by the Service Worker and made available offline afterwards. Later, a management interface could be created for more fine-grained control & selection of cached offline content.

MDN Service Worker implementation

Generate a Service Worker script (currently: main.sw.js
Service Worker script must include and maintain a list of static assets/pages for the core caching functionality
Include the Service Worker script in the document, use feature detection to install the SW and enable offline functionality (currently: save-for-offline.js)
Once the SW has activated, show a button "Make this section available offline"
When above button clicked, we request the list of pages and assets contained in the section the currently open page belongs to
Above request should be served as an API (currently hardcoded)
When the list of URLs are fetched from the API, the Service Worker caches them and maintains the cache afterwards (dynamics for maintaining the cache, such as versioning etc. TBD)

On the long run

Service Workers are capable of much more than just static caching. Once caching of pages is possible, a natural next step would be making editing pages available offline. Service workers could save dynamic requests (POST-s), too, while offline - and replay them once the browser gets online. By building an infrastructure that supports this (we need client-side generation of previews, also a way to handle conflicts when trying to replay edits on changed content after extensive offline editing etc.), offline MDN editing could be implemented.

For this to work we need to

Generate (a possibly multi-level) tree of "MDN segment"-s
Collect URLs of pages that belong to those segments, serve these up as an API
Extract resource URLs from pages so they themselves (e.g. images) could be cached
Implement versioning for segments, so Service Workers could keep the offline caches up-to-date
Define guidelines for external/dynamic content caching & replacement (embedded videos, iframes, jsfiddles/jsbins etc)

Plans for HackOnMDN 2015 weekend

Prototype a proof-of-concept page, either live on a staging area or just a static demo
Demonstrate the proof-of-concept (possibly both on desktop AND mobile)

Ongoing work

There is an old bug for this: bug 665750

Current work is tracked at: https://github.com/flaki/kuma/tree/offline-mdn

Current Status: Above branch should be a working proof-of concept in latest Chrome, when started with the `--ignore-certificate-errors` command line parameter.

Set up an MDN development environment via Vagrant
With the branch `offline-mdn` checked out and a few demo pages created a few pages (list is here) the Service Worker should install and cache static assets
Reload the page and a caching button should show up
Click the caching button, you should see in the log that your sections are cached
Halt the vagrant virtual machine
Reload the page - it should load from cache.

Note: for this early demo you may have to update the timestamps in the static url list for your `main.sw.js` for caching to work properly.

Once preliminary API work is done, a preview of the functionality should be hidden behind a waffle-flag and deployed on the staging server. Due to the staging server having a valid SSL certificate, above command line flags would be unnecessary, and testing would be available on all standard Chrome installs (desktop & mobile). For expected Firefox & FirefoxOS support see notes below.

Implementation timeline/proposed stages

First stage: basic functionality

First and foremost, have Service Workers up & running. Have a basic service workers script that caches core page assets on install + all visited pages. This should result in performance improvements and basic offline capabilities.

Main goal: experiment with Service Workers, gather data, wait for the API & implementations to stabilize.

TODO:

Generalize current proof-of-concept service worker code.
Generating service worker code on the server (include proper timestamps/cachebusting query parameters in asset URL list).
Crawl static assets' CSS files for referenced assets and include their URLs int the URL list.
Crawl current page for referenced assets and cache both the page & its linked assets (this could happen on the client side in JS)

Second stage: on-demand caching

After basic implementation, implement on-demand caching: user could choose any number of "MDN segments" and cache the pages/assets for those segments.

Main goal: real useful offline capability for MDN.

TODO:

Figure out how to split MDN content into "segments"
- tags? (simple, linear taxonomy)
- path? (multi-level, treelike structure)
- other?
Add button to cache "whole current segment" for the currently visited page (ie: JavaScript reference segment on console.log() page)
Add an API to query page/asset URL list for the current segment (this is to avoid needlessly including the list on all the pages)
When the cache button is clicked, load the segment url-list asynchronously and pass it to the Service Worker
The Service Worker caches the URLs of the segment.
- Note segment cache "versions" - keep track of changes to pages contained in a cacheable segment
- Figure out update mechanism for cached segments (automatic? manual (update button)?)
- Optionally implement an interface for managing cached segments (i.e. a checkboxed list of segments, to download and cache various segments of MDN at once)

Third stage: advanced functionality

Offline editing, offline search and other functionality: anything that would be "nice-to-have" and that Service Workers could help accomplish.

Main goal: push Service Workers to the limit - have useful functionality that is also making good use of the power of the SW tech.

Possibly implement:

Offline editing
- Make offline editing possible, cache assets needed for the editor interface.
- Make previewing possible (I am not familiar with Kumascript, but this could require reimplementing functionality in JS)
- Store edits on the client side, replay them once connectivity is restored (+figure out how to deal with conflicts)
Offline search
- Service Worker could take over search functionality, generating results by crawling documents in the cache
Any other useful feature (ideas welcome)

Notes, limitations, experiences from the weekend

Below are the experiences of the HackOnMDN Service Worker work, portraying some of the obstacles faced during the implementation and explaining the solutions (if any) to these problems.

Service Worker must be served with special headers or from a URL that is top-level (root) relative to the top url/path it wants to control. To overcome this, it must be served with a special header "Service-Worker-Allowed" - https://github.com/slightlyoff/ServiceWorker/issues/468#issuecomment-60276779
- SOLUTION: use the special header method, add the special header to all files with the .sw.js extension in /media/js using a custom .htaccess directive.
- NOTE: apparently this is still unimplemented at least in Firefox: bug 1130101
- NOTE: Fixed in Chrome as of M-42 https://code.google.com/p/chromium/issues/detail?id=436747
- TEST: check using curl -k -s -D -https://developer-local.allizom.org/media/js/main.sw.js-o /dev/null for the correct Service-Worker-Allowed header, test in Chrome-dev M-42+

Service Workers require HTTPS connections - but the self-signed certificate for the local vagrant/virtualbox based setup fails the security check in Chrome which in turn makes testing on a local dev setup cumbersome/impossible - https://github.com/slightlyoff/ServiceWorker/issues/274
- SOLUTION: restrict development to local machine, use certificates in /puppet/files/etc/apache2/ssl to install a local CA certificate on the machine, use the staging server to test on mobile (i.e. Flames flashed with MC/Nightly & bug 1125961#c35 set)
- Firefox apparently also has a setting (dom.serviceWorkers.testing.enabled -> true in about:config) to disable https security checks for development purposes - https://developer.mozilla.org/en-US/docs/Web/API/ServiceWorker_API/Using_Service_WorkersBrowser_support#Browser_support - requested info for generalizing this developer support in https://github.com/slightlyoff/ServiceWorker/issues/658#issuecomment-87283965
- On Chrome you could start Chrome with the --ignore-certificate-errors command-line parameter which will disable SSL certificate checks for the session.
Firefox Nightly (as of 03-28) seems to fail to register SW - https://jakearchibald.github.io/trained-to-thrill/ works in chrome as intended, while navigator.serviceWorker.controllerreturns null on Nightly even after install, should be using Maple builds: http://blog.wanderview.com/sw-builds/ (download/install & use firefox -P -no-remoteto create a new profile and run it next to standard firefox)
- Later nightly builds like that of 04-10 and later seem to work, mostly obsolating the need for using SW-specific builds.

NOTE that Chrome (as of V43.0.2342.2 dev (64-bit)) does not support add/addAll methods out-of-the-box on opened cache objects - you will need a polyfill (https://github.com/coonsta/cache-polyfill) to use it. Chrome bug for native addAll() support in blink-dev: https://code.google.com/p/chromium/issues/detail?id=440298

NOTE Chrome's Service Worker communication samples (https://github.com/GoogleChrome/samples/tree/gh-pages/service-worker/post-message) recommend using the MessagePort API for passing messages between the SW/page.
- ~~Firefox does not really implement the API bug 952139 - further info is required on implementation status or on how could this be overcome.~~
  MessagePort API has landed in Firefox 41
- As of the time of writing, even Chrome does not imlement the latest spec in this regard. More info on this on GitHub and linked StackOverflow post.

Docs changes: in https://developer.mozilla.org/en-US/docs/Web/API/ServiceWorker_API/Using_Service_Workers#The_premise_of_Service_Workers
- Missing: https://developer.mozilla.org/en-US/docs/Web/API/ServiceWorker/register - not even mentioned on the https://developer.mozilla.org/en-US/docs/Web/API/ServiceWorker page

ServiceWorkers - Developer QuickStart Reference

Articles on Service Workers

Advanced topics

Debugging Service Workers
- In Chrome:
  - Chrome Service Worker FAQ
  - Use the --ignore-certificate-errors command line parameter to disable HTTPS cert. checks
  - Ongoing discussion on lifting HTTPS requirement for developers. Voiced our concerns in comment #19.
- In Firefox:
  - Worker debugging is coming soon: bug 1003097
  - Use about:config → devtools.serviceWorkers.testing.enabled=true to skip HTTPS cert. checks

Track browser implementations:

Service Workers in Firefox OS (GAIA rearchitecture/3.0)

MDN/Get involved/Events/HackOnMDN/Project: Service Workers

Contents

About Service Workers

Service Worker docs on MDN

Offline MDN

Core Service Worker caching support

On-demand cacheable MDN segments

MDN Service Worker implementation

On the long run

For this to work we need to

Plans for HackOnMDN 2015 weekend

Ongoing work

Implementation timeline/proposed stages

First stage: basic functionality

Second stage: on-demand caching

Third stage: advanced functionality

Notes, limitations, experiences from the weekend

ServiceWorkers - Developer QuickStart Reference

Navigation menu

MDN/Get involved/Events/HackOnMDN/Project: Service Workers

About Service Workers

Service Worker docs on MDN

Offline MDN

Core Service Worker caching support

On-demand cacheable MDN segments

MDN Service Worker implementation

On the long run

For this to work we need to

Plans for HackOnMDN 2015 weekend

Ongoing work

Implementation timeline/proposed stages

First stage: basic functionality

Second stage: on-demand caching

Third stage: advanced functionality

Notes, limitations, experiences from the weekend

ServiceWorkers - Developer QuickStart Reference

Navigation menu

Search