Socorro:SOLR API

From MozillaWiki
Jump to: navigation, search

Writing Solr Queries

Solr Admin page available at: http://cm-hadoop24.mozilla.org:8983/solr/admin

Solr Admin Schema page available at: http://cm-hadoop24.mozilla.org:8983/solr/admin/schema.jsp

Rules

  • Must url-encode strings according to RFC 1738
  • Date/timestamps must adhere to ISO 8601

Values

  • branches - n/a ... expected to work ... q=branch:1.9.2
  • build_id - q=build:20100722155716
  • date_end - q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
  • date_start - q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
  • domain - n/a ... expected to work ... q=url:*gmail*
  • limit - rows=100
  • offset - start=0
  • ooid - q=ooid:010081800002baa-2526-4545-b575-3d3b12100818
  • os_names - q=os_name:windows
  • os_versions - q=os_version:5.1.2600
  • plugin_filename - n/a
  • plugin_name - n/a
  • report_process - n/a
  • report_type - n/a
  • signature - q=signature:flash
  • products - q=product:thunderbird
  • versions - q=version:3.6.8
  • url - n/a ... expected to work ... q=url:http%3A%2F%2Fwww.gmail.com%

Notes

Use facet.field=os_name in order to get a count for each of the OS's

Use facet.field=os_version in order to get a count for each of the OS versions

Use &wt=json to return a query in json; returns xml by default

Use AND/OR to query for more than 1 value in a specific field:

Use NOT to remove 1 value from a specific field:

Use parenthesis to query for more than 1 value in more than 1 field:

Use * to query using a like statement:

Use brackets to prepare date ranges:

Python APIs

The following APIs and calls will need to be provided for from within the Pythonic middleware layer. The new names for these calls should be representative of their fuctionality.

Bugzilla

  • bug.php - getBugsForSignatures()


Crash

  • common.php - getCommentsByParams()
  • common.php - queryReports()
    • Combine with common.php - totalNumberReports()
  • extension.php - getExtensionsForReport()
  • report.php - getPairedUUID()
  • report.php - getAllPairedUUIDByUUid()

Server Status

Top Crashers

  • common.php - queryTopSignatures()
    • Combine with common.php - queryFrequency()
  • topcrashersbyurl.php - getTopCrashersByUrl()
  • topcrashersbyurl.php - getTopCrashersByDomain()
  • topcrashersbyurl.php - getTopCrashersByTopsiteRank()
  • topcrashersbyurl.php - getUrlsByDomain()
  • topcrashersbyurl.php - getSignaturesByUrl()
  • topcrashers.php - getTopCrashersByBranch()
  • topcrashers.php - getTopCrashersByVersion()
  • topcrashers.php - ooppForSignatures()
  • topcrashers.php - formatTopcrasherVersions()

Socorro UI Methods

This is a list of data accessed by the webapp which seems to be well suited to using a SOLR query to retrieve rather than a SQL query.

Bugzilla Associations

bug.php

bugsForSignatures
Since bugs are constantly being changed and Socorro needs to keep up to date with them, it would be easy for us to have a table using bug_id as the key that contains the bug data relevant to Socorro with a link to the signature(s). When we index that table, we could have a SOLR query that specifies a list of signature strings and it returns a list of bugs that are associated with that signature.
 signature:+(Hello_world OR Fubar)

common.php

getCommentsByParams
Comments are a field contained in the crash report record, so given a list of crash ids or a signature or any other criteria that can retrieve crash reports, this data can easily be returned through a SOLR query. Further, it would be possible to do SOLR searches for specific comment terms.
 comment:~suck
queryTopSignatures
I believe this query can be serviced by the Correlation API that Xavier has been working on. At worst case, if we have a SOLR query that filters for the appropriate conditions (i.e. platform, version etc.), it can return the signature field for every report matching those conditions. We can then count the occurances of every signature and return the top N.
totalNumberReports
This is the result set size of the desired criteria.
queryReports
The building block query. Can give plenty of examples of SOLR usage, but here is the link to Lucene syntax (which SOLR is based on): Lucene Query Syntax
queryFrequency

Get a list of crash signatures by any number of search parameters including:

  • 1 or more products
  • 1 or more product versions
  • 1 or more operating systems
  • 1 or more branches
  • start timestamp
  • end timestamp
  • stack signature
  • build id
  • report process (any, browser only, plugin only)
  • report type (any, crash, hang)
  • plugin name
  • plugin filename

Order by number of crashes per signature. Include in the results for each crash:

  • number of crashes per signature
  • signature
  • plugin filename
  • number of crashes per each O/S platform (All, Win, Mac, Linux)

extension.php

getExtensionsForReport
This is just a simple request for the extensions field of the report in HBase. Middleware layer only, no SOLR needed.

report.php

getPairedUUID
Use mwl to retrieve hang record via hang id and filter for desired uuid

Lorentz crashes come in pairs. They are matched via OOIDs. This query is used to find the OOID for a crash report that is paired with the provided OOID.

getAllPairedUUIDByUUid
Same as above but don't filter.

Lorentz crashes come in pairs. They are matched via OOIDs. If a crash report is resubmitted, it's possible to have more than 2 crash reporters per OOID. This variation of the prior query will retrive all of the OOIDs for crash reports that are paired with the provided OOID.

server_status.php

Working on this in bug 579575

topcrashersbyurl.php

getTopCrashersByUrl

Get all of the top crashing signatures that are associated with a particular URL.

getTopCrashersByDomain

Get the domains that are associated with the highest number of crashes, ordered by the number of crashes.

getTopCrashersByTopsiteRank

Get the domains that are associated with the highest number of crashes, ordered by the number of crashes. Only display the domains that are found within the top 1000 sites of Alexa's topsite rankings.

The Alexa Topsite rankings are currently pulled once per week and placed in the alexa_topsites table.

getUrlsByDomain

Get the urls that are associated with the highest number of crashes, grouped by domain name, and ordered by the number of crashes.

getSignaturesByUrl

Get all of the signatures that are associated with a particular URL.

topcrashers.php

lastUpdatedByBranch

Get the time (window_end) when the top_crashes_by_signature table was last updated for a specific branch.

lastUpdatedByVersion

Get the time (window_end) when the top_crashes_by_signature table was last updated for a specific product and product version.

getTopCrashersByBranch

Get the top crashing signatures from the top_crashes_by_signature table for a specific branch, between a start timestamp and an end timestamp. Order the results by signatures that are associated with the most crashes.

getTopCrashersByVersion

Get the top crashing signatures from the top_crashes_by_signature table for a specific product and product_version, between a start timestamp and an end timestamp. Order the results by signatures that are associated with the most crashes.

ooppForSignatures

Get meta information for the crash reports for a specific product and product version between a start timestamp and end timestamp. The meta information obtained is the type of crash (hang, or not a hang) and type of process (plugin or browser).

DEPRECATED

job.php

getByUUID
Don't know purpose.

mtbf.php

DEPRECATED - no need to implement this

getMtbfOf
Looks like this is just math on the uptime field of reports matching particular criteria. If that is the case, this should be fairly simple?
listReports
Seems out of place. Purpose?

priorityjobs.php

I believe most of this functionality is now deprecated? [laura: done for 1.8 by ryan in bug 584136]

report.php

getByUUID
Simple middleware layer (mwl) retrieval
sig_exists
Simple mwl retrieval

topcrashers

listReports
Does this query belong here?
getTotalCrashesByVersion
SOLR query to filter on product+version
getTotalCrashesByBranch