Socorro:PyServe

From MozillaWiki
Jump to: navigation, search

HBase Thrift integration

The Socorro:HBase cluster runs Thrift servers which allow remote clients to communicate with HBase via several possible languages. The Socorro PyServe middleware interacts with this Thrift service via a client wrapper we have created named hbaseClient.py.

Thrift client API Refactoring/Cleanup

I propose the following changes associated with bug 565962:

  • Split hbaseClient.py into two modules. The second module would be called socorroHBaseClient.py and would contain all the additional socorro specific methods.
  • Move generic hbaseClient.py into third-party
  • Review HBase API endpoints
    • Delete any methods that are not useful
    • Add any non-existent methods that would be useful
    • Rename current methods as determined appropriate by Socorro devs.
    • Change descriptions if clarification needed for Socorro devs (descriptions below are the current Python method documentation strings

Key HBase API endpoints

  • put_json_dump(self, ooid, json_data, dump, add_to_unprocessed_queue = True)
   Create a crash report record in hbase from serialized json and
   bytes of the minidump
  • put_processed_json(self,ooid,processed_json)
   Create a crash report from the cooked json output of the processor
  • get_json_meta_as_string(self,ooid) 
   Return the json metadata for a given ooid as an unexpanded string
   If the ooid doesn't exist, return an empty string.
  • get_json_meta(self,ooid) 
 Return the json metadata for a given ooid as an json data object
  • get_dump(self,ooid) 
   Return the minidump for a given ooid as a string of bytes
   If the ooid doesn't exist, return an empty string. XXX: Do we want a different return?
  • get_processed_json_as_string(self,ooid) 
   Return the cooked json (jsonz) for a given ooid as a string
   If the ooid doesn't exist, return an empty string.
    • previously known as jsonz but that name should be deprecated since it isn't stored as a gzip file anymore
    get_processed_json(self,ooid) 
   Return the cooked json (self,jsonz) for a given ooid as a json object
   If the ooid doesn't exist, return an empty string.
  • get_raw_report(self,ooid) 
   Return the json and dump for a given ooid
   If the ooid doesn't exist, return an empty array
    • saves a separate request to the cluster
    • Is this useful? Might be candidate for deletion
    get_report_processing_state(self,ooid) 
   Return the current state of processing for this report and the submitted_timestamp needed
   For processing queue manipulation.
   If the ooid doesn't exist, return an empty array
  • union_scan_with_prefix(self,table,prefix,columns) 
   A lazy chain of iterators that yields unordered rows starting with a given prefix.
   The implementation opens up 16 scanners (one for each leading hex character of the salt)
   one at a time and returns all of the rows matching
  • merge_scan_with_prefix(self,table,prefix,columns) 
   A generator based iterator that yields totally ordered rows starting with a given prefix.
   The implementation opens up 16 scanners (one for each leading hex character of the salt)
   simultaneously and then yields the next row in order from the pool on each iteration.
  • limited_iteration(self,iterable,limit=10**6)
    No description
    iterator_for_all_legacy_to_be_processed(self,) 
    No description
    • This is the special iterator used by the monitor to gather ooids to be processed and remove them from the HBase unprocessed queue
    acknowledge_ooid_as_legacy_priority_job (self,ooid)
    No description
    • If the ooid exists in the unprocessed queue, remove it because it will be processed as a priority job.
    delete_from_legacy_processing_index(self,index_row_key)
    No description
    • Deletes from unprocessed queue and decrements current queue size
    put_crash_report_indices(self,ooid,timestamp,indices)
    No description
    • Adds an ooid to the given set of index tables prefixing with the timestamp for time range based iteration
    put_crash_report_hang_indices(self,ooid,hang_id,process_type,timestamp)
    No description
    • Adds a hangID to the hang specific index tables to allow lookup of hang pairs
    update_metrics_counters_for_submit(self,submitted_timestamp,legacy_processing,process_type,is_hang,add_to_unprocessed_queue)
   Increments a series of counters in the 'metrics' table related to CR submission
  • put_json_dump_from_files(self,ooid,json_path,dump_path,openFn=open)
   Convenience method for creating an ooid from disk