The content of this page is a work in progress intended for review.
Please help improve the draft!
Ask questions or make suggestions in the discussion
or add your suggestions directly to this page.
Socorro Client API
Currently, Socorro has pythonic collectors, processors, and middleware that communicate with HBase via the Thrift protocol. There is a class named hbaseClient.py that contains several pieces of the business logic such as creation of HBase rowkeys based on ooid, and the methods for putting or retrieving data in HBase and manipulating queues and metrics counters.
One of the biggest limitations of the current architecture is that it is very sensitive to latency or outages on the HBase side. If the collectors cannot store an item in HBase then they will store it on local disk and it will not be accessible to the processors or middleware layer until it has been picked up by a cronjob and successfully stored in HBase. The same goes for processors, if they cannot retrieve the raw data and store the processed data, then it will not be available.
This page gives a high level overview of a potential solution to this problem that involves replacing the Thrift communication layer with a higher level API that lives outside of HBase and provides an intermediate queuing and temporary storage layer for crash report data. It also provides separation from the underlying permanent data storage which will allow us to develop multiple implementations such as NFS, single machine local disk, HBase, and ElasticSearch.
Central to this proposal is a project called Hazelcast. This is an In-Memory Data Grid that supports a variety of data collections such as queues and maps. It is written in Java. Hazelcast can run effectively on one machine or dozens of boxes. The data grid allows storage and retrieval of objects that are distributed throughout the memory of all the machines. Maps can be configured with a TTL and a persistence driver that will load or save objects from memory into any arbitrary permanent storage.
The easiest way to allow the Python clients to interact with Hazelcast is by modifying them to run inside Jython, a Python interpreter that runs in a JVM. With Jython, the API is nearly transparent. Clients simple get a reference to the desired map and call put() or get(). It would be possible to also support a REST API or some other cross-language transport, but the simplicity of running in Jython makes it my preferred approach.
Transparent to the clients, when data is to be written, it will be placed in a distributed map that will perform a write-through to HBase. The map will have size and age configuration parameters such that items that have been written to HBase can be expired. If items which haven't been durably stored yet need to be flushed out of the map due to memory constraints, they can be persisted to an alternate storage such as local disk instead.
If an item is requested from the map that is currently stored in it (i.e. not expired) then it will be immediately returned. If the item does not exist in the map, Hazelcast will attempt to load it from HBase. This provides both a hot cache for the most recent data as well as end-to-end accessibility of items even in the face of temporary unavailable of HBase.
Queue and processing management
When crash reports are stored via Hazelcast, they will be inspected for flags that will indicate any secondary actions to be taken. New legacy or priority crashes will be pushed into the appropriate queues so downstream workers can poll those queues and perform processing or indexing.
High Availability and Redundancy
The distributed nature of the queues and maps means that the in-memory data is stored with a configurable number of replica copies on other nodes in the pool. If one server dies or is inaccessible, the data is still immediately available through the replica copies. Adding more nodes will automatically redistribute the existing data and will also provide more storage. The pool will have a metrics interface that will allow introspection of the amount of data contained in the various parts of it as well as health levels.
As a proof of concept, the metrics team will be working on delivering a prototype that meets the following requirements:
- Mock Jython Client - submission application
- This application should be able to either receive via HTTP POST or randomly generate an object with a GUID as the key and containing the below attributes. This object should be stored in a Hazelcast distributed map and the key pushed into a queue for subsequent processing. The object should contain:
- a JSON document with a median size of 1 KB, minimum size of 256 bytes and maximum size of 20 KB. It should look similar to this crash report meta_data JSON File:Example.json.gz
- a binary data blob with a median size of 500 KB; min 200 KB; max 20 MB; 75th percentile 5 MB. Reference dump file: File:Example.binary.dump.gz
- Mock Jython Client - processing application
- This application should be able to read the processing queue to get a list of item GUID keys and use those keys retrieve the JSON and dump data from the Map. It will then generate a JSON document with a median size of 50 KB, min of 5 KB, max of 500 KB and store that new attribute in the crash report object File:Example.processed.json.gz.
- Mock Jython Client - simple web application
- This application should be able to be passed a GUID and retrieve any of the three data elements from the object for display or download (in the case of the dump file). The application should also be able to show various statistics about the health of the system such as the number of items in the map, memory used/free, oldest item age, etc. These statistics might be provided by an iframe to the Hazelcast monitor webapp.
- First persistence layer - NFS style shared shared storage
- Items should have a write-behind that will cause the data to be written to a shared file base storage system. This should be the easiest and quickest persistence system to implement.
- Second persistence layer - HBase storage
- Using the HBase Java client API, the items should be written into a table with three column families, one for each of the attributes.
- Optional persistence layer - ElasticSearch storage
- ElasticSearch will eventually be used for indexing this data. It would be worth while evaluating whether we can piggy back on it for small scale use as the actual data persistence layer as well.