Socorro:SOLR API

From MozillaWiki
Revision as of 19:57, 17 August 2010 by DEinspanjer (talk | contribs) (Created page with "= Socorro SOLR API = This is a list of data accessed by the webapp which seems to be well suited to using a SOLR query to retrieve rather than a SQL query. == Bugzilla Associati...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Socorro SOLR API

This is a list of data accessed by the webapp which seems to be well suited to using a SOLR query to retrieve rather than a SQL query.

Bugzilla Associations

bug.php

bugsForSignatures
Since bugs are constantly being changed and Socorro needs to keep up to date with them, it would be easy for us to have a table using bug_id as the key that contains the bug data relevant to Socorro with a link to the signature(s). When we index that table, we could have a SOLR query that specifies a list of signature strings and it returns a list of bugs that are associated with that signature.
 signature:+(Hello_world OR Fubar)

common.php

getCommentsByParams
Comments are a field contained in the crash report record, so given a list of crash ids or a signature or any other criteria that can retrieve crash reports, this data can easily be returned through a SOLR query. Further, it would be possible to do SOLR searches for specific comment terms.
 comment:~suck
queryTopSignatures
I believe this query can be serviced by the Correlation API that Xavier has been working on. At worst case, if we have a SOLR query that filters for the appropriate conditions (i.e. platform, version etc.), it can return the signature field for every report matching those conditions. We can then count the occurances of every signature and return the top N.
totalNumberReports
This is the result set size of the desired criteria.
queryReports
The building block query. Can give plenty of examples of SOLR usage, but here is the link to Lucene syntax (which SOLR is based on): Lucene Query Syntax
queryFrequency
I don't have a clear enough understanding of the purpose of this query. Need examples to determine API.

extension.php

getExtensionsForReport
This is just a simple request for the extensions field of the report in HBase. Middleware layer only, no SOLR needed.

job.php

getByUUID
Don't know purpose.

mtbf.php

getMtbfOf
Looks like this is just math on the uptime field of reports matching particular criteria. If that is the case, this should be fairly simple?
listReports
Seems out of place. Purpose?

priorityjobs.php

I believe most of this functionality is now deprecated?

report.php

getByUUID
Simple middleware layer (mwl) retrieval
sig_exists
Simple mwl retrieval
getPairedUUID
Use mwl to retrieve hang record via hang id and filter for desired uuid
getAllPairedUUIDByUUid
Same as above but don't filter.

server_status.php

Working on this in a bug.

topcrashersbyurl.php

getTopCrashersByUrl
Not sure about this. It uses data from Alexa? We can search for reports containing a particular URL pattern. Just depends on what we need to do with it after that.
getTopCrashersByDomain
Same as above
getTopCrashersByTopsiteRank
Same as above
getUrlsByDomain
Same as above
getSignaturesByUrl
Same as above
listReports
Does this query belong here?

topcrashers.php

lastUpdatedByBranch
SOLR query to filter on branch and signature sort by max(window_end?)
lastUpdatedByVersion
SOLR query to filter on product+version and signature sort by max(window_end?)
getTotalCrashesByBranch
SOLR query to filter on branch
getTotalCrashesByVersion
SOLR query to filter on product+version
getTopCrashersByVersion
Need explanation on this one. Seems like same as above but totaling number of crashes by OS and signature?
getTopCrashersByBranch
totaling number of crashes by OS and branch?
ooppForSignatures
Need explanation.