Services/NodeAssignment

From MozillaWiki
Jump to: navigation, search

Node Assignment

Goal

The Node Assignment product provides a central server to allocate users of Mozilla Services products to a node associated with that product. It is a standalone service (though of minimal use outside of the company or with no other products to support)

There is generally a mapping of 1-1 between a node entry and an http entry point. Some of them are explicit; others may have additional details that are hidden from the user but stored locally in the users account and can be retrieved during authentication.

As a secondary effect, the node assignment service serves as the central coordinating point for downing or backing off users from nodes in the system. The data is kept centrally, but each project is responsible for gathering the data for their nodes from the internal api and propagating it such that the servers behave correctly.


APIs

This service incorporates two APIs:

Internal

An internal API for node management - addition of nodes, adjustment of weighting factors, downing and backing off nodes

There is no authentication attached to the internal API. It is assumed that it will only be accessible locally, and with any protections provided by the server.

All API responses return json.

GET

/{product} - returns a list of all clusters in the product
/{product}/{cluster} - returns a hash keyed by nodes in the cluster. Values are hashes of the data for each node.
/{product}/{cluster}/{node} - Returns the hash of the data for a single node.

PUT

/{product}/{cluster}/{node}/{key}
/{product}/{cluster}/{key}
/{product}/{key}

Sets the {key} field to the value specified in the PUT body. The scope of this change depends on whether {cluster} (and {node}) are specified in the URL.

Valid keys are: weight, current_in_period, down, backoff. Attempts to set any other keys will get a 400 error. Those should be set using a script or directly into the DB.

A successful PUT will receive a 0 as the response

External

The external API has one function call, and is password-protected by central auth:

/<version>/<username>/<product>

(it's assumed that the DNS entry point will be solely for this service. If this doesn't appear to be correct, we'll need a /prefix at the beginning.)

This API call will return one of the following:

503: internal error
401: <username> fails to auth
404: <product> does not exist (or URL doesn't exist entirely. Clients should generally interpret this to mean use the same server to which this query was issued.
200 ('null'): no node for the product is available for assignment
200 (other text): the name of the node, including protocol, that has been assigned to the user

Design

The user's nodes will be stored in the authentication LDAP, in an array under the primaryNode attribute. The storage format will be node<internal information. Internal information consists of arbitrary data, as defined by the product.

In a situation where the user already has an assignment corresponding to the product, authenticating the user will be sufficient, as the data from that will be sufficient to fulfill the response.

If there is no record for the product in the primaryNode field, the application will request an assignment from the assignment table. This is a mysql table that will evaluate the available nodes for a product, and produce the one with the best current availability, returning the data to be written into LDAP.

DB Fields

Product - Name of the associated service
Cluster - Cluster for the service. This will generally correspond to the colo, but we may end up with multiple clusters in the same colo
Node - The specific node in the cluster that a user is assigned to
LDAP - The value to be written into the LDAP if this node is selected
Available - A count of remaining assignments available in this period. This value is not defined by nodeassigment and is simply set by the service at whatever period is desired.
Current Load - Although it is not a direct reflection of users, it will be incremented by 1 for each user added. Services should set their weights according to how much impact they want each additional user to have to the algorithm
Capacity - The theoretical maximum weight to be associated with the node
Down - a binary value indicating whether a node is down. The node service makes this available via api, but it is expected that the service itself will be regularly checking the value. Users will not be assigned to a node that has been marked as down.
Backoff - a value in seconds that should be added to responses from services that support backoff. As with the down flag, services are responsble for accessing the data here and pushing it to their machines.

Assignment Weighting

When a request comes in for a new node, the api will choose the available server with the lowest Weight/Capacity number, then increase the weight by 1


What This Isn't

Config Management: We should have central config management for the various node configurations for the various services. The first attempt at a node-assignment service included that, but this version does not. There are several reasons for that:

  • It should be handled by operations, who may not want to do it as a service, and almost certainly don't want it to depend on another service. If a service is needed to make this work, it can be written separately
  • Not including this functionality here means we don't need to expose our config database to an external service, even if it has layers of indirection
  • The different services are likely to have a wide variety of config requirements, which would complicate an otherwise straightforward DB.