- 1 Writing Solr Queries
- 2 Python APIs
- 3 Socorro UI Methods
Writing Solr Queries
Solr Admin page available at: http://cm-hadoop24.mozilla.org:8983/solr/admin
Solr Admin Schema page available at: http://cm-hadoop24.mozilla.org:8983/solr/admin/schema.jsp
- Must url-encode strings according to RFC 1738
- Date/timestamps must adhere to ISO 8601
- branches - n/a ... expected to work ... q=branch:1.9.2
- build_id - q=build:20100722155716
- date_end - q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
- date_start - q=client_crash_date:[2010-09-13T09:33:00Z+TO+2010-09-13T10:33:00Z]
- domain - n/a ... expected to work ... q=url:*gmail*
- limit - rows=100
- offset - start=0
- ooid - q=ooid:010081800002baa-2526-4545-b575-3d3b12100818
- os_names - q=os_name:windows
- os_versions - q=os_version:5.1.2600
- plugin_filename - n/a
- plugin_name - n/a
- report_process - n/a
- report_type - n/a
- signature - q=signature:flash
- products - q=product:thunderbird
- versions - q=version:3.6.8
- url - n/a ... expected to work ... q=url:http%3A%2F%2Fwww.gmail.com%
Use facet.field=os_name in order to get a count for each of the OS's
Use facet.field=os_version in order to get a count for each of the OS versions
Use &wt=json to return a query in json; returns xml by default
Use AND/OR to query for more than 1 value in a specific field:
Use NOT to remove 1 value from a specific field:
Use parenthesis to query for more than 1 value in more than 1 field:
Use * to query using a like statement:
Use brackets to prepare date ranges:
The following APIs and calls will need to be provided for from within the Pythonic middleware layer. The new names for these calls should be representative of their fuctionality.
- bug.php - getBugsForSignatures()
- common.php - getCommentsByParams()
- common.php - queryReports()
- Combine with common.php - totalNumberReports()
- extension.php - getExtensionsForReport()
- report.php - getPairedUUID()
- report.php - getAllPairedUUIDByUUid()
- common.php - queryTopSignatures()
- Combine with common.php - queryFrequency()
- topcrashersbyurl.php - getTopCrashersByUrl()
- topcrashersbyurl.php - getTopCrashersByDomain()
- topcrashersbyurl.php - getTopCrashersByTopsiteRank()
- topcrashersbyurl.php - getUrlsByDomain()
- topcrashersbyurl.php - getSignaturesByUrl()
- topcrashers.php - getTopCrashersByBranch()
- topcrashers.php - getTopCrashersByVersion()
- topcrashers.php - ooppForSignatures()
- topcrashers.php - formatTopcrasherVersions()
Socorro UI Methods
This is a list of data accessed by the webapp which seems to be well suited to using a SOLR query to retrieve rather than a SQL query.
- Since bugs are constantly being changed and Socorro needs to keep up to date with them, it would be easy for us to have a table using bug_id as the key that contains the bug data relevant to Socorro with a link to the signature(s). When we index that table, we could have a SOLR query that specifies a list of signature strings and it returns a list of bugs that are associated with that signature.
signature:+(Hello_world OR Fubar)
- Comments are a field contained in the crash report record, so given a list of crash ids or a signature or any other criteria that can retrieve crash reports, this data can easily be returned through a SOLR query. Further, it would be possible to do SOLR searches for specific comment terms.
- I believe this query can be serviced by the Correlation API that Xavier has been working on. At worst case, if we have a SOLR query that filters for the appropriate conditions (i.e. platform, version etc.), it can return the signature field for every report matching those conditions. We can then count the occurances of every signature and return the top N.
- This is the result set size of the desired criteria.
- The building block query. Can give plenty of examples of SOLR usage, but here is the link to Lucene syntax (which SOLR is based on): Lucene Query Syntax
Get a list of crash signatures by any number of search parameters including:
- 1 or more products
- 1 or more product versions
- 1 or more operating systems
- 1 or more branches
- start timestamp
- end timestamp
- stack signature
- build id
- report process (any, browser only, plugin only)
- report type (any, crash, hang)
- plugin name
- plugin filename
Order by number of crashes per signature. Include in the results for each crash:
- number of crashes per signature
- plugin filename
- number of crashes per each O/S platform (All, Win, Mac, Linux)
- This is just a simple request for the extensions field of the report in HBase. Middleware layer only, no SOLR needed.
- Use mwl to retrieve hang record via hang id and filter for desired uuid
Lorentz crashes come in pairs. They are matched via OOIDs. This query is used to find the OOID for a crash report that is paired with the provided OOID.
- Same as above but don't filter.
Lorentz crashes come in pairs. They are matched via OOIDs. If a crash report is resubmitted, it's possible to have more than 2 crash reporters per OOID. This variation of the prior query will retrive all of the OOIDs for crash reports that are paired with the provided OOID.
Working on this in bug 579575
Get all of the top crashing signatures that are associated with a particular URL.
Get the domains that are associated with the highest number of crashes, ordered by the number of crashes.
Get the domains that are associated with the highest number of crashes, ordered by the number of crashes. Only display the domains that are found within the top 1000 sites of Alexa's topsite rankings.
The Alexa Topsite rankings are currently pulled once per week and placed in the alexa_topsites table.
Get the urls that are associated with the highest number of crashes, grouped by domain name, and ordered by the number of crashes.
Get all of the signatures that are associated with a particular URL.
Get the time (window_end) when the top_crashes_by_signature table was last updated for a specific branch.
Get the time (window_end) when the top_crashes_by_signature table was last updated for a specific product and product version.
Get the top crashing signatures from the top_crashes_by_signature table for a specific branch, between a start timestamp and an end timestamp. Order the results by signatures that are associated with the most crashes.
Get the top crashing signatures from the top_crashes_by_signature table for a specific product and product_version, between a start timestamp and an end timestamp. Order the results by signatures that are associated with the most crashes.
Get meta information for the crash reports for a specific product and product version between a start timestamp and end timestamp. The meta information obtained is the type of crash (hang, or not a hang) and type of process (plugin or browser).
- Don't know purpose.
DEPRECATED - no need to implement this
- Looks like this is just math on the uptime field of reports matching particular criteria. If that is the case, this should be fairly simple?
- Seems out of place. Purpose?
I believe most of this functionality is now deprecated? [laura: done for 1.8 by ryan in bug 584136]
- Simple middleware layer (mwl) retrieval
- Simple mwl retrieval
- Does this query belong here?
- SOLR query to filter on product+version