User:Andrew Sutherland/MailNews/GlobalDatabase/Review

From MozillaWiki
Jump to: navigation, search

bug 450494 is tracking the process of getting gloda onto the trunk.

This page reflects the status of the feedback from the bug as of comment 39. The following comment chains have not been included because, although making for interesting discussion, aren't particularly actionable, but could form the basis of an action-y thing:

  • c21 clarkbw's responsiveness desires. followups: asuth c23


Not yet/currently being worked, no notable progress, perhaps some discussion.


gloda suffixtree confusing comments

(bienvenu c26 q2) Some of this code is commented out - does that make it inconsistent with the comment about 'infinity'?

   // However, if this state is a leaf node (end == 'infinity'), then 'end'
   //  isn't describing an edge at all and we want to avoid accounting for it.
   let delta;
   if (state.end != this._infinity)
     //delta = index - end + 1;
     delta = end - (index - state.length); 
   else */
   delta = index - state.length - end + 1;


gloda.js :: initMyIdentities

(dmose) (Possibly premature) optimization:

  for (let iIdentity=0; iIdentity < numIdentities; iIdentity++) {
    let msgIdentity = msgAccountManager.allIdentities.GetElementAt(iIdentity)

If you cache msgAccountManager.allIdentities in a local variable before starting the for loop, you'll avoid an XPConnect crossing every iteration.

(asuth) XPConnect turns out to be oddly expensive, so this sounds reasonable and easy to do. I could also change-up to using fixIterator perhaps.

Half-Finished Comments

(dmose) (gloda.js) the kAttrDerived doxygen comment is incomplete

Noun Definition


(dmose) defineNoun: the doxygen commentary needs a bit of cleanup: I think the nounMeta property names want to be documented semi-separately in their own doxygen block, to give us a place to document the actual defineNoun function signature itself. Also make sure that the property list is complete (eg includes cacheBudget, queryClass, etc). Also, the parameter ordering in the function signature itself seems backwards from an intuitiveness point of view.

(asuth)Yes, the doxygen could use some love and its own block. Noun ID comes after noun meta because it is optional, but it's probably cleaner to just let the caller put 'id' in the meta object in the first place.

(dmose) similarly, would "instanceOf" be a clearer property name than "class"?

(asuth) This is also a good idea. I feel a pang of guilt about using something so close to a keyword, but even if we used "instanceof", spidermonkey would apparently be okay with it. I shall do so.

(dmose) defineAttribute: it's not really clear to me why it's worth having "bindName". Wouldn't it be simpler to just always use the attribute name?

(asuth) This was intentional to allow the database name to differ from the binding name. I think the idea was that the database names would make more sense standalone, whereas the binding name might be less intuitive in some ways. But I think in retrospect, the ability for them to deviate just adds potential confusion... so yes.

Being Addressed

Things being addressed. Next action is asuth's.


Need Input

asuth needs more info from peoples, grouped by peoples.




* With the new indexer, I'm seeing a moderate number of error messages of the form: 

2008-09-21 12:17:29	gloda.fundattr	ERROR	Message with subject
                        'Mark Banner' somehow lacks a valid author.  Bailing.

where in every case, the subject is purported to be the author name.
Which makes me think the code is getting confused somewhere.

I think this was a transient problem involving IMAP. Are you still seeing this? I do occasionally see things like "Message with subject somehow lacks a valid author", but I think that is a different problem either involving messages with odd MIME things happening or perhaps an event train hiccup.


  • in getIdentitiesForFullMailAddresses, are we actually guaranteed that the email addresses aren't relays?

I don't think the methods claim that they do this. I think the goal is that the attribute providers pierce the relay nonsense so it doesn't happen... since we basically depend on the attribute providers to mark relays as such, perhaps we could require them to do so? The result would be that either we don't know something is a relay, or we know it is and you shouldn't really hear about it. Could you clarify what your desired action would be here? (I'm sorry if my delay in getting to this point has made you forget...)

Agreed Addressed




Andrew is processing this set of comments. Do not touch!


gloda.js spelling typos

* Inspired by reasonable uses of triple-stores, I have tried to leverage
*  existing model and terminology rather than rolling out own for everything.


* I kinda suspect introducing the concept of predicates into the explanation
of attributes is likely to make it harder for many people to understand rather
than easier.

* for all modules, all exported symbols should have associated documentation
comments (these could just be pointers) 

* API points like getters on exported classes need doxygen commentary.

* "allowing the use of attribute-parameter as an attribute" seems
  unclear on first reading

* given that most of the rest of code-base uses the word "thread", it would
probably be good to include in the documentation of NOUN_CONVERSATION a
sentence or two about how this is (and/or isn't) different from that

* parameterized identity:
** s/constrainted/constrained/
** do we want to introduce some JS assertions about the constraints
   here?  I think Fx has a concept of such things...
** The intent of the "if (typeof [...]" is not obvious on first reading; a
   comment here would help, I think.
** shortNames need to come from .properties files, I suspect

** both bindAttribute and defineAttribute seem long enough that it's a
   little hard to keep all the necessary state in one's head at once.
   Maybe split them up?

* _bindAttribute
** please either add doxygen comments for the params or at least a
   @seealso pointing to defineAttribute
** Defining functions completely anonymously can make it hard to
   understand stacks in (e.g.) Venkman; prefer named functions.  

To Be Processed

asuth's exciting holding pen for things....


everything below here is done in the ugly-but-easy style!

dmose puts his new comments here...

* doxygen comment on cacheLookupOneByUniqueValue appears to have been
copied from cacheLookupOne but not modified.  When fixing, please be
sure to mention that the caller is expected to guarantee the
"uniqueness" invariant.  Backing up a step, though, is there any
particular reason not to collapse these two functions into one by
parameterizing the array to search?  (If it turns out that makes it
hard to read, I can live with keeping it as is).

* getFolderForFolder is not a very intuitive name.  How about

* it would be good for cacheLoadUnify* to have @return and @param dox

* Please add a comment explaining the reasoning behind GlodaCollection
maintaining both an array (items) and an object/hash (_idMap) that
point to the same data.


* the call to this._collectionsByNoun[nounID].filter in
  registerCollection would be more readable if it used an
  expression closure.

> if (msgHdr && !(msgHder.flags&MSG_FLAG_EXPUNGED))

* The lack of spaces around the & operator goes against prevailing style
and makes it hard to read


> // (_lruNext cannot be null)

* I suspect that making the above comment a bit more explicit about
why it's true will help prevent future code-maintainers from
inadvertently breaking this invariant.

* Do the constrainer functions in _bindAttribute really want to live
in query.js?  Perhaps the thing to do is to the solve the
_bindAttribute naming issue by splitting _bindAttribute itself
into _bindAttribute (which could stay in gloda.js) and
_setAttributeConstrainers (which could live in query.js).  If it's
helpful, a _setupAttribute convenience method that calls both could be

* As far as I can tell, usesParameter in nounMeta is not set by any
  callers.  Should it be discarded, or should (at least)
  the parameterized-identity definition set it?

* The use of "explicit" in naming GlodaExplicitQueryClass is a little
non-obvious to newbies; it'd be nice if we could think of something
better.  So far, I haven't been able to, however.  :-/

* Looks like neither or()/unions/owner are actually used by any callers.  Unless you know of a specific use case for them, I'm somewhat inclined to suggest axing them in the interest of simplicity.  This would also allow the newQuery documentation comment in gloda.js to be made more understandable as well.  On the other hand, this isn't a huge win.

* The purity-of-essence part of me kinda wonders if there shouldn't be
  a separate array for all these constrainer functions, but maybe it's
  not worth it.


* nsAutoCompleteGlodaResult._problem doesn't appear to ever actually
  be set anywhere.  Can it be removed?

(Yes, this is historical.)

* Given that the nsAutoCompleteGlodaResult constructor sets
  this._results to [] by default, it looks to me as though the test
  for null in the matchCount getter can never be true, so it's
  probably not necessary.

(Sounds reasonable.)

* the references to "rich" in the comments a bit non-obvious; maybe be
  a bit more explicit?

* the various uses of thing.{somegetter} mostly appear to be
  abstraction violations.  :-)

more comments

* given the current game plan, removing glautocomp.js from the tree
  seems like the right thing. alternately; we could leave it in the tree
  with a NOT_YET_REVIEWED comment if anticipate future enabling, however.

* please add a doxygen comment describing CallbackStreamListener

* would wrappedJSObject solve (or at least be somewhat less of a hack than)

 * Custom nsIMimeEmitter to build a sub-optimal javascript representation of a
 *  MIME message.  The intent is that a better mechanism than is evolved to
 *  provide a javascript-accessible representation of the message.

the last sentence is missing a word

* seems like it would be trivial to factor out common code from the constructor
  and mime_emitter_complete and that would leave the code less susceptible to
  future errors of omission

* Same question about the stuff in gloda/content as already asked about
 glautocomp.js: should all of this stuff live in comm-central going forward?
 If so but we're not using it now, maybe mark it as unreviewed somehow.

 * Starts retrieval of a MimeMessage instance for the given message header.
 *  Your callback will be called with the message header you provide and the

Trails off mid-sentence.

* Having MimeMessage.coerceBodyToPlaintext call bodies.join("") seems like it
 means that the two tokens at the body boundary will be smashed together and
 therefore not indexed correctly.  How about joining with some whitespace?

* Since it looks like the prettyString properties are purely for logging /
 debugging, how about being explicit about that in the commentary so that
 future patchers realize that they don't need to worry about l10n or about
 changing the format of the output. 

 * Custom nsIMimeEmitter to build a sub-optimal javascript representation of a
 *  MIME message.  The intent is that a better mechanism than is evolved to
 *  provide a javascript-accessible representation of the message.

The second sentence there is unclear; please rephrase.

* _partRE looks fragile in the sense that it I'd be pretty afraid of breaking
 something if I ever needed to touch it.  I suspect some unit tests would go a
 long way towards helping that.

* please file a bug about uncommenting and fixing the commented out code in

* It would be helpful if GlodaDatabind and its methods had at least minimal
 documentation comments.

* Since iteratorUtils.jsm is in the tree now, the custom copy of fixIterator
 can presumably go away.

* The comments in this entire file are extra super awesome; nice work!  

* Should the various listener issues described there be spun off into bugs?

* I would suggest removing this from the tree until such time as you are
 actually committed to supporting it (presumably whenever the first front-end
 feature that uses it wants to land).  This will keep folks from stumbling on
 it and expecting it to work/be supported for whatever it is they want.  I'm
 betting "function examplar()" can go away too.  :-)

* If _mapFolder and _mapFolderId were named in a way that more explicitly
 described what they did, reading calling code would be easier.

* I suspect pictureURL should probably just default to null.

* The commentary in MessagesByMessageIdCallback implies, but doesn't actually
  say, that if someone other than the indexer calls getMessagesByMessageID,
  things could get squirrelly.  If that's actually the case, I'd suggest
  renaming to _getMessagesByMessageID as a hint to extension authors and
  explicitly documenting it as part of the function Doxygen commentary as well.

* queryFromQuery is a fairly non-intuitive name.  In the interest of making
  calling code easier to read, maybe performQueryAsync?

* presumably, the database needs to have some sort of vacuuming strategy,
  whether that's simply using autovacuuming or periodically issuing vacuum
  commands.  This seems like it's probably a blocker for Tb3.  In a recent
  post, sdwilsh said:

  "There is almost never a good time to VACUUM, except for maybe when the
  user is idle.  In 1.9.1, we've added an asynchronous API for statement
  execution, and if you use that all the time, it shouldn't be
  noticeable to the user that you are doing a VACUUM."

* Assuming the comment "(Namely, we fail to purge items from the
  cache as they are purged from the database.)" is still correct, it's probably
  worth filing a spin-off bug about.

* The @namespace at the bottom of the big comment before GlodaDatastore appears
  to be stray, since it doesn't have an argument.

* The orphaned data problem probably wants a spin-off bug.

* Having doxygen comments documenting kConstraint* somewhere would be good.

* A comment explaining why the service is passed in to gloda_ds_init rather than
  just created locally on-demand would be nice.

* The _beginTransaction doxygen comment seems to trail off mid-sentence.  It 
  would be helpful to have a more complete description of the overall nested 
  transactional model.  As it stands, it's not very clear to the reader what 
  assumptions should be made / depended on when dealing with existing
  transactions (eg when it does and doesn't make sense to do the intermediate 
  commits that are seen in some parts of the code).

* The comment for trackAsync could stand to be fleshed out.  As it stands, it's
  not obvious from reading calling code or even the doxygen comment when or why
  someone would want to use it.  Seems like a more obvious function name could

* A quick grep suggests that there are no callers for getMessageByID or
  getMessageIDsByFolderID.  Perhaps these are something extensions might want,
  however...  If we do want to keep them, it seems like there's a high
  potential for unnecessary confusion of Message-ID headers and Gloda IDs for
  messages; perhaps a little name surgery is order.

* There are a few remaining references to APV stuff both here in and gloda.js,
  some (all?) of them would appear to be obsolete.  One example is that
  getMessageAttributes doesn't have any in-tree callers.

* comments for adjustAttributes, _convert*ByAttributeID, loadNounItem, and 
  _queryFromSQLString would improve code maintainability.

All done! You've done a great job keeping code that deals with a lot of complexity as simple and maintainable as it can be.