From MozillaWiki
Jump to: navigation, search
Warning signWarning: This page is obsolete.

How Search.pm Works

This page will be a work in progress, probably for a while. One of the most difficult chunks of code to comprehend in Bugzilla is the Bugzilla::Search.pm module, which contains the code that builds the SQL queries to run searches generated by report.cgi and query.cgi.

In my opinion, the code is pretty ingenious, and one of the little bits of Bugzilla that Terry Weissman (who originally wrote it) should be quite proud of. I've often mentioned on IRC that the code is one of those things that you stare at it for months and have no clue what it's really doing, and then something clicks and it all makes perfect sense. The problem is getting from the staring to the clicking. :) Hopefully by writing up this document it will make life easier on new folks coming in to help out that are trying to "make it click", and also make everyone less dependent on myself and Joel Peshkin to do reviews on it (because we seem to be the only people around right now that actually understand it at the moment).

I'm going to start off by copy/pasting a bunch of stuff in here out of the comments that are in the code. Eventually it'll get cleaned up, and I'll add a bunch to it. Once it's done, we can reformat the page and get it into the developer docs at bugzilla.org. --Justdave 01:40, 19 December 2006 (PST)

If all this sounds very confusing and you just want to see how Bugzilla does it, simply perform a search in Bugzilla, and then add &debug=1 to the URL of the query result page. Then it will output a bunch of stuff, including the SQL statement that it uses to search for the bugs. You can take that and edit it to suit your needs. --Newacct 19:38, 16 July 2010 (UTC)

Query Data Structure

The data structure at the heart of Search.pm is the boolean chart. At the top end of Search.pm, the first thing we do is take all the UI fields from query.cgi that aren't part of the boolean chart, and coerce them into fitting into the chart. Internally in Search.pm, everything is the chart.

A boolean chart is a way of representing the terms in a logical expression. Bugzilla builds SQL queries depending on how you enter terms into the boolean chart. Boolean charts are represented in urls as tree-tuples of (chart id, row, column). The query form (query.cgi) may contain an arbitrary number of boolean charts where each chart represents a clause in a SQL query.

The query form starts out with one boolean chart containing one row and one column. Extra rows can be created by pressing the AND button at the bottom of the chart. Extra columns are created by pressing the OR button at the right end of the chart. Extra charts are created by pressing "Add another boolean chart".

Each chart consists of an arbitrary number of rows and columns. The terms within a row are ORed together. The expressions represented by each row are ANDed together. The expressions represented by each chart are ANDed together.

col1 col2 col3
row1 a1 a2
row2 b1 b2 b3
row3 c1

=> ((a1 OR a2) AND (b1 OR b2 OR b3) AND (c1))

row1 d1

=> (d1)

Together, these two charts represent a SQL expression like this SELECT blah FROM blah WHERE ( (a1 OR a2)AND(b1 OR b2 OR b3)AND(c1)) AND (d1)

The terms within a single row of a boolean chart are all constraints on a single piece of data. If you're looking for a bug that has two different people cc'd on it, then you need to use two boolean charts. This will find bugs with one CC matching 'foo@blah.org' and and another CC matching 'bar@blah.org'.

CC equal to foo@blah.org

CC equal to bar@blah.org

If you try to do this query by pressing the AND button in the original boolean chart then what you'll get is an expression that looks for a single CC where the login name is both "foo@blah.org", and "bar@blah.org". This is impossible.

CC equal to foo@blah.org
CC equal to bar@blah.org

Once we have our boolean chart defined, it's time to start processing it and building our query.

Building the SQL Query

The second key to Search.pm is the mini-functions which construct the SQL query.

The @funcdefs array contains key-value pairs of regexps and anonymous functions. The regexps in the keys are passed a comma-separated list of the field name being operated on, the operation being performed, and the argument to the operator (the value field from the chart). The regexp must contain at least one comma. In most cases, the value field is ignored, but it's there in case it's needed.

Although it looks like a hash definition, it's important that we define it as an array, because they're in a specific order. Immediately after the array is defined, we iterate through it and also stuff it into a hash, making an array of the key names, in the defined order, as we go. This gets us two things:

  1. An array containing the regexps to look at, in the order we want them tested
  2. A hash containing pointers to the anonymous functions to be called when a regexp matches.

When it comes time to process a row of the chart, we start at the top of the list of regexp keys, and compare the row of the chart we're testing to each one. Every time a key matches, the function referenced by it is called. There can be several matches that do different parts of the setup (for example, there's one that detects "changed*" fields and joins the activity table, letting other functions take care of how to deal with the operator that's involved in the test). When a function returns a value in the $term variable, we stop looking for more matches and don't go any further in the list of regexps.

The defined functions use several variables which are scoped outside of the definition hash, and are updated as we go through the chart.

$chartid is the number of the current chart whose SQL we're constructing

$row is the current row of the current chart

names for table aliases are constructed using $chartid and $row:

SELECT blah  FROM $table "$table_$chartid_$row" WHERE ....

These variables are reset between each row:

$f  = field of table in bug db (e.g. bug_id, reporter, etc)
$ff = qualified field name (field name prefixed by table)
      e.g. bugs_activity.bug_id
$t  = type of query. e.g. "equal to", "changed after", case sensitive substr"
$v  = value - value the user typed in to the form
$q  = sanitized version of user input = trick_taint(($dbh->quote($v)))
$term = the final SQL fragment to be added to the query.  regexp searches for
      the current chart row stop when one of the functions sets this.

These variables accumulate data as we go:

@supptables = Tables and/or table aliases used in query
%suppseen   = A hash used to store all the tables in supptables to weed
              out duplicates.
@supplist   = A list used to accumulate all the JOIN clauses for each
              chart to merge the ON sections of each.
$suppstring = String which is pasted into query containing all table names
@orlist     = SQL fragments to be ORed together at the end of a chart row
@chartandlist = SQL fragments to be ANDed together at the end of a chart (each
              item is a string generated from flattening @orlist)
@andlist    = SQL fragments generated from each chart, to be ANDed together
              at the end (each item is a string generated from flattening
@wherepart  = additional SQL fragments to go in the WHERE part of the query
              that are added by regexp-keyed functions