Services/Sync/WEP/106

From MozillaWiki
< Services‎ | Sync‎ | WEP
Jump to: navigation, search
Draft-template-image.png THIS PAGE IS A WORKING DRAFT Pencil-emoji U270F-gray.png
The page may be difficult to navigate, and some information on its subject might be incomplete and/or evolving rapidly.
If you have any questions or ideas, please add them as a new topic on the discussion page.

WEP 106 - Backoff Specification

  • Champions: Mike Connor
  • Status: Draft
  • Type: ?
  • Created: 2009 Sep 1
  • Reference Implementation: TBD
  • WEP Index

Introduction and Rationale

Intelligent backoff protocols are an important factor in maintaining service reliability for large services where clients continously attempt to reconnect. We need to design and build a client solution that recognizes servers under high load conditions and back off appropriately.

Proposal

  • Error Handling
    • Handle 503 + Retry-After as an explicit "stop syncing, retry after the time given by the server"
    • Handle all other HTTP 5xx errors as an immediate backoff.
    • Handle all other errors as network issues, retry once on a normal schedule, then back off.
  • Backoff intervals
    • If we receive 503+Retry-After, we will retry after that time + some amount of fuzzing later (to ensure that clients don't bunch up at the end of a service downtime)
      • If this is received during sync, we will halt all subsequent engine syncs.
    • For all other issues, we will follow a progressive series of intervals, with a significant amount of entropy to guard against traffic spikes (existing impl).
  • X-Weave-Backoff header
    • Returns an integer in milliseconds. If returned, we will delay the next sync for at least that long. This allows the server to serve users under high load but still ask the client to back off.
  • UI presentation during normal sync
    • Initial backoff phase (first few attempts)
      • Some friendly notification, with messaging to make clear that we will automatically retry and this is probably temporary.
      • Except for the 503+Retry-After case, users will be allowed to manually try _once_, after which UI will be disabled if that sync fails due to server issues.
    • Secondary backoff phase (after three retry attempts)
      • No option to manually sync, clearly we have major issues on the server at this point.
      • Stronger warning, so users are aware data is not syncing. (Key principle: if we can't propagate the user's data, we should inform them.)
  • about:weave behaviours
    • If we are in backoff, we should show that information, along with when the next sync will be.
  • new user backoff
    • Must handle 503+retry-later in user-creation code and tell users what's up
    • X-Weave-Backoff may be returned from registration, we will not allow initial sync until this timeout hits
  • Open questions
    • In high load situations where the service is simply overloaded and not dead, we probably want a way to tell the server that we're in backoff mode already (i.e. retryAttempt=2) so that rather than an all or nothing backoff, we can selectively update the users who are furthest behind. The big question is how do we ensure that the next sync attempt is still delayed, not in 5 minutes.
    • What needs to be done on the server for this to work?

Pre-Requirements

None.

New Opportunities

  • Survive the server melting on upgrades

Other Proposals