Changes

Jump to: navigation, search

CloudServices/Sagrada/Metlog

1,226 bytes added, 22:37, 27 September 2011
no edit summary
= Overview =
The '''Metrics''' project is part of [[Services/Sagrada|Project Sagrada]], providing a service for applications to capture and inject arbitrary data into a back end storagesuitable for out-of-band analytics and processing.
= Project =
== User Requirements ==
=== Phase 1 === The first version of the Metrics system will focus on providing an easy mechanism forthe [[Services/Sync| Sync]] and [https://browserid.org/ BrowserID] projects (and anyother internal Mozilla services) to efficiently send profiling data and any otherarbitrary metrics information that may be desired into one or more backend storagelocations. Once the data has made it to its final destination, there should be availableto those w/ appropriate access the ability to do analytics queries and report generationon the accumulated data.
Requirements:
* Service Apps will be provided a simple mechanism for inserting arbitrary data points into the metrics system.
* All inserted data will be transparently (to the service app) processed and passed into the appropriate back end storage and analytics destination.
* Service app owners have access to an interface (or interfaces) where they can access a predefined set of reports and/or graphs & charts displaying useful information re: certain predefined captured data points (e.g. unique daily users).
* Service app owners have access to an interface (or interfaces) where they can perform arbitrary queries against the data points that have been captured.
 
 
=== Phase 2 ===
 
The second phase of will focus on improving the back end reporting infrastructure. Once data has started flowing and there is an opportunity to assess which reports and graphs would be most generally useful, the goal will to be to make it as easy as possible for existing and new service app owners to get to their information.
 
Requirements:
 
* Service app owners have access to an interface (or interfaces) where they can access a set of predefined reports and/or graphs & charts displaying useful information based on captured data points (e.g. unique daily users, average time elapsed handling requests).
 
The full Services Metrics infrastructure will consist of a couple of different APIs. The first will be a mechanism for sending performance- and ops-related data, limited to increment counters and timers (i.e. time elapsed for completion of a certain operation), into a [https://github.com/etsy/statsd statsd] setup, which will ultimately feed into a [http://graphite.wikidot.com/ graphite] installation. The second will provide a way to capture arbitrary text data, analogous to syslog-style log entries, with each record accompanied by a set of string tokens that will identify the type of payload the record contains as well as any other metadata that may be useful for analytics and/or processing.
The statsd API will be achieved by the inclusion of existing statsd client libraries: [https://github.com/jsocol/pystatsd pystatsd] for the Python services and [https://github.com/sivy/node-statsd node-statd] for node.js-based services. The core Python service app platform that Services provides will already contain pystatsd calls capturing basic information such as successful login counters, total time elapsed for HTTP request handling, etc. Inclusion of a 'statsd = false' setting in the app configuration will prevent this data from being collected.
The API for general metrics data collection will be minimal. For Python apps we will provide a `metlog` library that will provide the following functions:
Sends a single log message to the previously specified metlog listener.
''tokens'' should be a tuple sequence of string tokens containing any metadata
required to identify and classify the message, while ''msg'' should contain
the main data payload. This will be serialized into a simple format and
sent via UDP to the listener, "fire and forget" -style for minimal performance performance impact on the calling application.
A similar library can be constructed in Javascript for use in node.js applications.
The first iteration of this solution will not require a great deal of engineering to implement, as it will leverage lots of infrastructure that is already in place. The statsd client will be configured to talk to the statsd services that are already planned to be running on every Services host. The stats gathered from the various hosts will be sent on to the Services Ops graphite installation, which will aggregate and graph the stats of a similar nature for developer consumption.
The metlog portion will similarly make use of existing infrastructure. Services Ops is going to have instances of the [http://logstash.net/ logstash] service in place which will be processing the log output from our various processes. We will write a UDP listener input module for logstash which will be the metlog listener. Logstash will then batch these messages and will construct HTTP requests providing collections of messages to a [https://github.com/mozilla-metrics/bagheera Bagheera] instance provided to us by the Metrics team. The messages will ultimately end up in a Hadoop data store. Access to a [https://hive.apache.org/ Hive] interface will be available to allow for construction of arbitary queries against any of the data that has landed.
 
== Use Cases ==
=== BrowserID ===
 
The BrowserID team has started specifying their metrics gathering requirements, described in some detail in [https://bugzilla.mozilla.org/show_bug.cgi?id=679139 Bug 679139]. The conversation attached to that bug focuses primarily on specific information that can be extracted from captured log files. While there is useful information to be obtained from the logs, it's already evident that some inference will need to be made, and certain information will need to be explicitly prevented from being processed to ensure sufficient levels of user privacy. The ability to capture and store arbitrary data points from within the code itself will simplify collection of certain data points, and the ability to use statsd timers will provide application performance metrics that would be impossible from log files alone.
Confirm
125
edits

Navigation menu